PyGenePlexus API
================

.. currentmodule:: geneplexus.geneplexus

Download datasets
-----------------

Manual download
^^^^^^^^^^^^^^^

.. warning::

   **PROCEED WITH CAUTION**
   The first example download below will occupy **~4GB** of space. The second example (full
   download) will occupy **~6.5GB** of space.

.. code-block:: python

   from geneplexus.download import download_select_data
   download_select_data("my_data", species = ["Human", "Mouse"]) # download just Human nd Mouse data
   download_select_data("my_data")  # download all data at once

See :meth:`geneplexus.download.download_select_data` for more information

Auto download
^^^^^^^^^^^^^

Optionally, set the ``auto_download`` key word argument to ``True`` to automatically
download necessary data at initialization of the :class:`GenePlexus` object.

.. code-block:: python

   from geneplexus import GenePlexus
   gp = GenePlexus(net_type="STRING", features="SixSpeciesN2V",
                   sp_trn = "Human", sp_res = "Human",
                   auto_download=True)

.. note::

   The default data location is ``~/.data/geneplexus/``. You can change this by
   setting the ``file_loc`` argument of :class:`GenePlexus`.

Run the PyGenePlexus pipeline
-----------------------------

First, specify the input genes (can have mixed gene ID types, i.e. have any combination of Entrez
IDs, Gene Symbols, or Ensembl IDs).

.. code-block:: python

   input_genes = ["6457", "7037", "3134", "TTC8"," BBS5", "BBS12", ...]

Alternatively, read the gene list from file

.. code-block:: python

   import geneplexus
   input_genes = geneplexus.util.read_gene_list("my_gene_list.txt")

Next, run the pipline using the :class:`GenePlexus` object.

.. code-block:: python

   # Instantiate GenePlexus class with default parameters
   gp = geneplexus.GenePlexus()

   # Load input genes and set up positives/negatives for training
   gp.load_genes(input_genes)

   # Train logistic regression model and get genome-wide gene predictions
   mdl_weights, df_probs, avgps = gp.fit_and_predict()

   # Optionally, compute model similarity to models pretrained on GO and DisGeNet gene sets
   df_sim, weights_dict = gp.make_sim_dfs()

   # Optionally, extract the subgraph induced by the top (50 by default) predicted genes
   df_edge, isolated_genes, df_edge_sym, isolated_genes_sym = gp.make_small_edgelist()