PyGenePlexus API
Download datasets
Manual download
Warning
PROCEED WITH CAUTION The first example download below will occupy ~4GB of space. The second example (full download) will occupy ~6.5GB of space.
from geneplexus.download import download_select_data
download_select_data("my_data", species = ["Human", "Mouse"]) # download just Human nd Mouse data
download_select_data("my_data") # download all data at once
See geneplexus.download.download_select_data() for more information
Auto download
Optionally, set the auto_download key word argument to True to automatically
download necessary data at initialization of the GenePlexus object.
from geneplexus import GenePlexus
gp = GenePlexus(net_type="STRING", features="SixSpeciesN2V",
sp_trn = "Human", sp_res = "Human",
auto_download=True)
Note
The default data location is ~/.data/geneplexus/. You can change this by
setting the file_loc argument of GenePlexus.
Run the PyGenePlexus pipeline
First, specify the input genes (can have mixed gene ID types, i.e. have any combination of Entrez IDs, Gene Symbols, or Ensembl IDs).
input_genes = ["6457", "7037", "3134", "TTC8"," BBS5", "BBS12", ...]
Alternatively, read the gene list from file
import geneplexus
input_genes = geneplexus.util.read_gene_list("my_gene_list.txt")
Next, run the pipline using the GenePlexus object.
# Instantiate GenePlexus class with default parameters
gp = geneplexus.GenePlexus()
# Load input genes and set up positives/negatives for training
gp.load_genes(input_genes)
# Train logistic regression model and get genome-wide gene predictions
mdl_weights, df_probs, avgps = gp.fit_and_predict()
# Optionally, compute model similarity to models pretrained on GO and DisGeNet gene sets
df_sim, weights_dict = gp.make_sim_dfs()
# Optionally, extract the subgraph induced by the top (50 by default) predicted genes
df_edge, isolated_genes, df_edge_sym, isolated_genes_sym = gp.make_small_edgelist()