PyGenePlexus Data
Preprocessed Data
PyGenePlexus comes with pre-processed data that can be downloaded
using geneplexus.download.download_select_data() or directly
from Zenodo.
All options:
Networks |
|
Species |
Human, Mouse, Zebrafish, Worm, Fly, Yeast |
Features |
|
GSCs |
Detailed species info:
Specifc Name |
Taxon Id |
|
Human |
Homo sapiens |
9606 |
Mouse |
Mus musculus |
10090 |
Fly |
Drosophila melanogaster |
7227 |
Zebrafish |
Danio rerio |
7955 |
Worm |
Caenorhabditis elegans |
6239 |
Yeast |
Saccharomyces cerevisiae |
4932 |
Due to the availability of the data, the following combinations are supported:
Human |
Mouse |
Fly |
Zebrafish |
Worm |
Yeast |
|
STRING |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
IMP |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
BioGRID |
✅ |
✅ |
✅ |
❌ |
✅ |
✅ |
Human |
Mouse |
Fly |
Zebrafish |
Worm |
Yeast |
|
GO |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
Monarch |
✅ |
✅ |
❌ |
✅ |
✅ |
✅ |
Mondo |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
Note
The Combined option for the GSC selection utilizes all available GSC options for a given species.
Custom Data
PyGenePlexus uses a strict naming convention rto read in the files. Users can supply there own data by generating files in the following formats:
NodeOrder__{Species}__{Network Name}.txtNetwork node orderingA text file with a single column containing all genes present in the network for a given species. The ordering of nodes in this file serves as the index map for thenetwork feature data.
Data__{Species}__{Feature Name}__{Network Name}.npyData arrayA numpy array of the chosen network representation (rows are genes ordered by NodeOrder file, columns are features).
GSC__{Species}__{GSC Name}__{Network Name}.jsonFiltered GSC for the networkA subsetted GSC where only the genes present in the network are considered.
{ "{Term ID}" # ID of the term : { "Name" : # returns string of term name "Genes" : # returns list of genes annotated to term "Task" : # returns type of GSC the term is from } "Universe" : # returns list of all genes in GSC "Term Order" : # returns list of all term IDs in GSC }
PreTrainedWeights__{Species}__{GSC Name}__{Network Name}__{Feature Name}.jsonPretrained weightsModel weights for model trained on terms is selected GSC
{ "{Term ID}" # ID of the term : { "Name" : # returns string of term name "Weights" : # returns list of model weights "PosGenes" : # returns list genes used as positives in the model "Task" : # returns type of GSC the term is from } }
Edgelist__{Species}__{Network Name}.edgNetwork edgelistEdgelist for the given network.
IDconversion__{Species}__{ID Type}-to-{ID Type}.jsonGene ID conversionsThese files convert from one gene ID convention to another
{ "{Gene ID}" : # returns list of how Gene ID is converted to other gene ID type }
BioMart__{Species}__{Species}.jsonOne-to-one ortholog conversionsThese files convert genes from one species into the corresponding one-to-one ortholog.
{ "{Gene ID}" : # returns string of how gene is converted to its one-to-one ortholog }