PyGenePlexus
PyGenePlexus is a Python package for running the [GenePlexus] model.
Warning
The documentation on the “latest” version of ReadTheDocs may not yet reflect, code pushed to the main branch of the GitHub repository. However, the douemntation for releases on PyPi will be correct on ReadTheDocs.
PyGenePlexus enables researchers to predict genes similar to an uploaded geneset of interest based on patterns of connectivity in genome-scale molecular interaction networks, with the ability to translate these findings across species.
Overview of PyGenePlexus
Given a list of input genes and a geneset collection (GSC) to help select negative examples, the package trains a logistic regression model using node embeddings as features and generates the following outputs, either in the same species as the input genes or translated to a model species.
Genome-wide prediction of how functionally similar a gene is to the input gene list. Evaluation of the model is provided by performing k-fold cross validation. The default is 3-fold cross validation when a minimum of 15 input genes are supplied. PyGenePlexus does not enforce a minimum or maximum number of genes (the minumum number of genes can be set in
fit()), and we note evaluations of the model were carried out for gene sets ranging between 15 and 500 genes. Seefit()andpredict()(Optional) Interpretability of the model is provided by comparing the model trained on the user gene set to models pretrained on 1000’s of known gene sets from [GO] bioloigcal proceses, [Monarch] phenotypes and [Mondo] diseases. See
make_sim_dfs()(Optional) Interpretability of the top predicted genes is provided by returning their network connectivity.
make_small_edgelist()
Note
Links to other GenePlexus products
Quick start
PyGenePlexus comes with an easy to use command line interface (CLI) to run the full GenePlexus pipeline given an input gene list. Go get started, install via pip and run a quick example as follows.
pip install geneplexus
geneplexus --input_file my_gene_list.txt --output_dir my_result
Note that you need to supply the my_gene_list.txt file, which is a line
separated gene list text file (NCBI Entrez IDs, Symbol or Ensembl IDs are
accepted). An example can be found on the
GitHub page under
example/input_genes.txt. More info can be found in PyGenePlexus CLI.
Warning
All necessary files for a specific selection of parameters (network,
feature, species, and gene set collection) will be downloaded automatically and
saved under ~/.data/geneplexus. User can also specify the location of
data to be saved using the --output_dir argument. The example
provided will download files that occupy ~4GB of space.
Using the API
A quick example of generating predictions using an input gene list. More info can be found in PyGenePlexus API.
from geneplexus import GenePlexus
input_genes = ["ARL6", "BBS1", "BBS10", "BBS12", "BBS2", "BBS4",
"BBS5", "BBS7", "BBS9", "CCDC28B", "CEP290", "KIF7",
"MKKS", "MKS1", "TRIM32", "TTC8", "WDPCP"]
gp = GenePlexus(net_type="STRING", features="SixSpeciesN2V",
sp_trn="Human", sp_res="Human",
gsc_trn="Combined", gsc_res="Combined",
input_genes=input_genes, auto_download=True,
log_level="INFO")
gp.fit()
gp.predict()
df_probs = gp.model_info["All-Genes"].results["Human-Combined"].df_probs
print(df_probs.iloc[:10])
Note
v3 of PyGenePlexus is signifcanlty different than v2 and v1 Documentation of older stable releases can be found out https://pygeneplexus.readthedocs.io/en/v2.0.4/ or https://pygeneplexus.readthedocs.io/en/v1.0.1/
Using PyGenePlexus
Package reference
Appendix