Probabilistic context likelihood of relatedness[1] to study the correlation network differences[2]. This package is developed using Python 3.10.
The package can be extremely slow for relatively large number of samples or variables.
The following packages are required:
cython 3.X
numpy
matplotlib
tqdm
All can be easily installed using pip or conda.
To install, download the package from the
Releases.
Uncompress the file and move them to the working directory. Open command line
tool like Command Prompt (in Windows) or Terminal (in macOS) and set
current directory to the working directory using cd command. In Windows, run
Python -m setup install
or in macOS, run
Python3 -m setup install
The package will be installed.
To run the package, in Python IDLE or other environment like
JupyterLab,
from PCLRC import PCLRC
-
Create
PCLRCobject with specified arguments:pclrc = PCLRC(num_sampling: int = int(1e5), frac_sampling: float = 0.75, q: float = 0.3, corr_prob: float = 0.9, bootstrap: bool = False, num_perms: int = int(1e4), num_cores: Optional[int] = None)- Arguments:
num_sampling: The number of subsampling procedures. Defaults to100000.frac_sampling: Fraction of samples selected in each subsampling procedure to calculate Pearson Correlation Coefficients (PCCs). Defaults to 0.75.q: The topq * 100%PCCs are considered as valid associations using the data subsampled, which will be set to 1 in binary adjacent matrix, otherwise will be set to 0. Defaults to0.3.- If
qis set to0., a hard threshold defined in Ref [3] is used.
- If
corr_prob: Correlation probability threshold for justifying significant associations, so that corresponding PCCs will be used to calculating differences in connectivity.bootstrap: Whether to use bootstrap sampling during subsampling procedures. This is used because some dataset may have very small number of samples. Defaults toFalse.num_perms: Number of permutations used in permutation tests. Defaults to10000.num_cores: Whether running the permutation tests in parallel and number of cores used for the parallelization. Defaults toNonewhich means no parallelization is used.
- Arguments:
-
Run
PCLRCwith data matrixxand sample labels ingroups:pclrc.network_diffs(x: np.ndarray, groups: np.ndarray)- Arguments:
x: Data matrix with sizenrows bypcolumns, wherenis the number of samples, andpis number of variables.groups: Group names in 1-D array with number ofnelements.
[!NOTE]
Only two groups are allowed for running the analysis, i.e.,len(set(groups))=2. - Arguments:
-
Other than the method
network_diffs, a method is implemented to obtain the correlation probability matrix:pclrc.corr_probs(x: np.ndarray, prog_bar: bool = True)- Arguments:
x: Data matrix with sizenrows bypcolumns, wherenis the number of samples, andpis number of variables.prog_bar: Whether to show progress bar for calculation.
- Arguments:
After running pclrc.network_diffs,
-
to obtain the correlation probability:
pclrc.pearson_corr_probs(label: Optional[Any] = None)- Arguments:
label: Group name ingroupsinput when runningpclrc.network_diffs, if set it toNone, correlation probability for both groups will be output, with first prob. matrix corresponding to groupgroups[0], second to groupgroups[1].
- Arguments:
-
to obtain differences in connectivity:
pclrc.diff_connectivity -
to obtain the significant differences in connectivity:
pclrc.sig_diff_connectivity(fdr: float=0.05)- Arguments:
fdr: The false discovery rate (FDR) to select significant differences in connectivity, which is the adjust p value threshold after BH correction.
- Arguments:
-
to obtain permutated probabilities for all variables:
pclrc.perm_probs -
to obtain the labels/group names:
pclrc.data_labels
To demonstrate how to run the package, a jupyter notebook was provided in the package, which can also be obtained at pclrc.demo.ipynb. The dataset was downloaded from Metabolomics Workbench, study ST003751, in negative mode. The figures shown in the references, e.g., connections between variables, differences in connectivity, were also provided in the demonstration.
[1] Saccenti E, et al. J. Proteome Res. 2015, 14, 2, 1101–1111.
[2] Vignoli A, et al. J. Proteome Res. 2020, 19, 949−961.
[3] Suarez-Diez M, et al. J. Proteome Res. 2015, 14, 12, 5119–5130.