PyGenePlexus

PyGenePlexus is a Python package for running the [GenePlexus] model.

PyGenePlexus enables researchers to predict genes similar to an uploaded geneset of interest based on patterns of connectivity in genome-scale molecular interaction networks, with the ability to translate these findings across species.

Given a list of input genes and a geneset collection (GSC) to help select negative examples, the package trains a logistic regression model using node embeddings as features and generates the following outputs, either in the same species as the input genes or translated to a model species.

Genome-wide prediction of how functionally similar a gene is to the input gene list. Evaluation of the model is provided by performing k-fold cross validation. The default is 3-fold cross validation when a minimum of 15 input genes are supplied. These parameters can be changed when accessing the Python class. PyGenePlexus does not enforce a minimum or maximum number of genes, and we note evaluations of the model were carried out for gene sets ranging between 5 and 500 genes. See fit_and_predict()
(Optional) Interpretability of the model is provided by comparing the model trained on the user gene set to models pretrained on 1000’s of known gene sets from [GO] bioloigcal proceses, [Monarch] phenotypes and [Mondo] diseases. See make_sim_dfs()
(Optional) Interpretability of the top predicted genes is provided by returning their network connectivity. make_small_edgelist()

Note

Links to other GenePlexus products

Quick start

PyGenePlexus comes with an easy to use command line interface (CLI) to run the full GenePlexus pipeline given an input gene list. Go get started, install via pip and run a quick example as follows.

pip install geneplexus
geneplexus --input_file my_gene_list.txt --output_dir my_result

Note that you need to supply the my_gene_list.txt file, which is a line separated gene list text file (NCBI Entrez IDs, Symbol or Ensembl IDs are accepted). An example can be found on the GitHub page under example/input_genes.txt. More info can be found in PyGenePlexus CLI.

Warning

All necessary files for a specific selection of parameters (network, feature, species, and gene set collection) will be downloaded automatically and saved under ~/.data/geneplexus. User can also specify the location of data to be saved using the --output_dir argument. The example provided will download files that occupy ~4GB of space.

Using the API

A quick example of generating predictions using an input gene list. More info can be found in PyGenePlexus API.

from geneplexus import GenePlexus
input_genes = ["ARL6", "BBS1", "BBS10", "BBS12", "BBS2", "BBS4",
               "BBS5", "BBS7", "BBS9", "CCDC28B", "CEP290", "KIF7",
               "MKKS", "MKS1", "TRIM32", "TTC8", "WDPCP"]
gp = GenePlexus(net_type="STRING", features="SixSpeciesN2V",
                sp_trn="Human", sp_res="Human",
                gsc_trn="Combined", gsc_res="Combined",
                input_genes=input_genes, auto_download=True,
                log_level="INFO")
df_probs = gp.fit_and_predict()[1]
print(df_probs.iloc[:10])

Note

v2 of PyGenePlexus is signifcanlty different than v1 and uses a different set of backend data, which only includes human data. For information of that version see https://pygeneplexus.readthedocs.io/en/v1.0.1/

Using PyGenePlexus

Package reference

Appendix

PyGenePlexus

Quick start

Using the API

Indices and tables