PyGenePlexus Data

Preprocessed Data

PyGenePlexus comes with pre-processed data that can be downloaded using geneplexus.download.download_select_data() or directly from Zenodo.

All options:

Networks

[BioGRID], [STRING], [IMP]

Species

Human, Mouse, Zebrafish, Worm, Fly, Yeast

Features

SixSpeciesN2V

GSCs

[GO], [Monarch], [Mondo], Combined

Detailed species info:

Specifc Name

Taxon Id

Human

Homo sapiens

9606

Mouse

Mus musculus

10090

Fly

Drosophila melanogaster

7227

Zebrafish

Danio rerio

7955

Worm

Caenorhabditis elegans

6239

Yeast

Saccharomyces cerevisiae

4932

Due to the availability of the data, the following combinations are supported:

Available Network Options

Human

Mouse

Fly

Zebrafish

Worm

Yeast

STRING

IMP

BioGRID

Available Gene Set Collection Options

Human

Mouse

Fly

Zebrafish

Worm

Yeast

GO

Monarch

Mondo

Note

The Combined option for the GSC selection utilizes all available GSC options for a given species.

Custom Data

PyGenePlexus uses a strict naming convention rto read in the files. Users can supply there own data by generating files in the following formats:

  1. NodeOrder__{Species}__{Network Name}.txt Network node ordering

    A text file with a single column containing all genes present in the network for a given species. The ordering of nodes in this file serves as the index map for thenetwork feature data.

  2. Data__{Species}__{Feature Name}__{Network Name}.npy Data array

    A numpy array of the chosen network representation (rows are genes ordered by NodeOrder file, columns are features).

  3. GSC__{Species}__{GSC Name}__{Network Name}.json Filtered GSC for the network

    A subsetted GSC where only the genes present in the network are considered.

    {
      "{Term ID}" # ID of the term : {
         "Name"  : # returns string of term name
         "Genes" : # returns list of genes annotated to term
         "Task"  : # returns type of GSC the term is from
         }
      "Universe" : # returns list of all genes in GSC
      "Term Order" : # returns list of all term IDs in GSC
    }
    
  4. PreTrainedWeights__{Species}__{GSC Name}__{Network Name}__{Feature Name}.json Pretrained weights

    Model weights for model trained on terms is selected GSC

    {
      "{Term ID}" # ID of the term : {
         "Name"  : # returns string of term name
         "Weights" : # returns list of model weights
         "PosGenes" : # returns list genes used as positives in the model
         "Task"  : # returns type of GSC the term is from
         }
    }
    
  5. Edgelist__{Species}__{Network Name}.edg Network edgelist

    Edgelist for the given network.

  6. IDconversion__{Species}__{ID Type}-to-{ID Type}.json Gene ID conversions

    These files convert from one gene ID convention to another

    {
      "{Gene ID}" : # returns list of how Gene ID is converted to other gene ID type
    }
    
  7. BioMart__{Species}__{Species}.json One-to-one ortholog conversions

    These files convert genes from one species into the corresponding one-to-one ortholog.

    {
      "{Gene ID}" : # returns string of how gene is converted to its one-to-one ortholog
    }