geneplexus.util

Utilities including file and path handling.

geneplexus.util.check_file(path)[source]

Check existence of a file.

Parameters:

path (str) – Path to the file.

Raises:

FileNotFoundError – if file not exist.

geneplexus.util.check_param(name, value, expected, /)[source]

Check parameter specified and raise ValueError for unexpected value.

Parameters:
  • name (str) –

  • value (Any) –

  • expected (List[Any]) –

geneplexus.util.format_choices(choices)[source]

Convert list of str to choices format.

Return type:

str

Parameters:

choices (List[str]) –

geneplexus.util.get_all_filenames()[source]

Iterate over filenames.

Return type:

Generator[str, None, None]

geneplexus.util.get_all_gscs(file_loc)[source]

Return list of GSCs found in the data directory.

Return type:

List[str]

Parameters:

file_loc (str | None) –

Note

Only the full GSC is checked (starts with GSCOriginal), but not the network specific ones (goodsets and universe).

geneplexus.util.get_all_net_types(file_loc)[source]

Return list of networks found in the data directory.

Return type:

List[str]

Parameters:

file_loc (str | None) –

Note

Only the node ordering files are checked (starts with NodeOrder).

geneplexus.util.load_correction_mat(file_loc, gsc, target_set, net_type, features)[source]

Load correction matrix.

Return type:

ndarray

Parameters:
  • file_loc (str) – Location of data files.

  • gsc (Literal['GO', 'DisGeNet']) – Gene set collection.

  • target_set (Literal['GO', 'DisGeNet']) – Target gene set collection.

  • net_type (Literal['BioGRID', 'STRING', 'STRING-EXP', 'GIANT-TN']) – Network used.

  • features (Literal['Adjacency', 'Embedding', 'Influence']) – Type of features used.

geneplexus.util.load_correction_order(file_loc, target_set, net_type)[source]

Load correction matrix order.

Return type:

ndarray

Parameters:
  • file_loc (str) – Location of data files.

  • target_set (Literal['GO', 'DisGeNet']) – Target gene set collection.

  • net_type (Literal['BioGRID', 'STRING', 'STRING-EXP', 'GIANT-TN']) – Network used.

geneplexus.util.load_gene_features(file_loc, features, net_type)[source]

Load gene features.

Return type:

ndarray

Parameters:
  • file_loc (str) – Location of data files.

  • net_type (Literal['BioGRID', 'STRING', 'STRING-EXP', 'GIANT-TN']) – Network used.

  • features (Literal['Adjacency', 'Embedding', 'Influence']) – Type of features used.

geneplexus.util.load_geneid_conversion(file_loc, src_id_type, dst_id_type, upper=False)[source]

Load the gene ID conversion mapping.

Return type:

Dict[str, List[str]]

Parameters:
  • file_loc (str) – Directory containig the ID conversion file.

  • src_id_type (Literal['ENSG', 'ENSP', 'ENST', 'Entrez', 'Symbol']) – Souce gene ID type.

  • dst_id_type (Literal['Entrez', 'ENSG', 'Name', 'Symbol']) – Destination gene ID type.

  • upper (bool) – If set to True, then convert all keys to upper case.

geneplexus.util.load_genes_universe(file_loc, gsc, net_type)[source]

Load gene universe a given network and GSC.

Return type:

ndarray

Parameters:
  • file_loc (str) – Location of data files.

  • gsc (Literal['GO', 'DisGeNet']) – Gene set collection.

  • net_type (Literal['BioGRID', 'STRING', 'STRING-EXP', 'GIANT-TN']) – Network used.

geneplexus.util.load_gsc(file_loc, gsc, net_type)[source]

Load gene set collection dictionary.

Return type:

Dict[str, Dict[Literal['Name', 'Genes'], Union[str, ndarray]]]

Parameters:
  • file_loc (str) – Location of data files.

  • target_set – Target gene set collection.

  • net_type (Literal['BioGRID', 'STRING', 'STRING-EXP', 'GIANT-TN']) – Network used.

  • gsc (Literal['GO', 'DisGeNet']) –

geneplexus.util.load_node_order(file_loc, net_type)[source]

Load network genes.

Return type:

ndarray

Parameters:
  • file_loc (str) – Location of data files.

  • net_type (Literal['BioGRID', 'STRING', 'STRING-EXP', 'GIANT-TN']) – Network used.

geneplexus.util.load_pretrained_weights(file_loc, target_set, net_type, features)[source]

Load pretrained model dictionary.

Return type:

Dict[str, Dict[Literal['Name', 'Weights', 'PosGenes'], Union[str, ndarray]]]

Parameters:
  • file_loc (str) – Location of data files.

  • target_set (Literal['GO', 'DisGeNet']) – Target gene set collection.

  • net_type (Literal['BioGRID', 'STRING', 'STRING-EXP', 'GIANT-TN']) – Network used.

  • features (Literal['Adjacency', 'Embedding', 'Influence']) – Type of features used.

geneplexus.util.mapgene(gene, entrez_to_other)[source]

Map entrez to other representations.

Return type:

str

Parameters:
  • gene (str) – Entrez gene ID.

  • entrez_to_other (Dict[str, List[str]]) – Mapping from Entrez to list of other gene representations of interest.

Returns:

Gene representation corresponding to the gene Entrez ID.

Return type:

str

Note

Mapping from a single Entrez to multiple representations is allowed and the representations will be separated by ‘/’.

geneplexus.util.normexpand(path, create=True)[source]

Normalize then expand path and optionally create dir.

Return type:

str

Parameters:
  • path (str) –

  • create (bool) –

geneplexus.util.read_gene_list(path, sep='newline')[source]

Read gene list from flie.

Return type:

List[str]

Parameters:
  • path (str) – Path to the input gene list file.

  • sep (str | None) – Seperator between genes (default: “newline”).

geneplexus.util.timeout(timeout, msg='')[source]

Timeout decorator using thread join timeout.

Parameters:
  • timeout (int) – Max function execution time in seconds.

  • msg (str) –