Ontology Class

class ddot.Ontology(hierarchy, mapping, edge_attr=None, node_attr=None, parent_child=False, add_root_name=None, propagate=None, ignore_orphan_terms=False, verbose=True, **kwargs)[source]

A Python representation for constructing, analyzing, and manipulating the hierarchical structure of ontologies.

An Ontology object contains the following attributes for representing the hierarchical structure. Do not directly modify these attributes.

Parameters:
  • genes (list) – Names of genes
  • terms (list) – Names of terms
  • gene_2_term (dict) – gene_2_term[<gene>] –> list of terms connected to <gene>. Terms are represented as their 0-based index in self.terms.
  • term_2_gene (dict) – term_2_gene[<term>] –> list of genes connected to <term>. Genes are represented as their 0-based index in self.genes.
  • child_2_parent (dict) – child_2_parent[<child>] –> list of the parent terms of <child>
  • parent_2_child (dict) – parent_2_child[<parent>] –> list of the children terms of <parent>
  • term_sizes (list) –

    A list of every term’s size, i.e. the number of unique genes that it and its descendant terms contain. This list has the same order as self.terms. It holds that for every i,

    term_sizes[i] = len(self.term_2_gene[self.terms[i]])

Read/write Ontology objects

classmethod Ontology.from_table(table, parent=0, child=1, is_mapping=None, mapping=None, mapping_parent=0, mapping_child=1, header=0, propagate=False, verbose=False, clixo_format=False, clear_default_attr=True, **kwargs)[source]
Ontology.to_table(output=None, term_2_term=True, gene_2_term=True, edge_attr=False, header=True, parent_child=True, clixo_format=False)[source]

Convert Ontology to a table representation. Return a pandas.DataFrame and, optionally, write it to a file as a tab-delimited file.

Parameters:
  • output (filepath or file-like) – File to write table. If None, then only return a pandas.DataFrame
  • term_2_term (bool) – Include (child term, parent term) pairs
  • gene_2_term (bool) – Include (gene, term) pairs
  • edge_attr (array-like or bool) – List of extra edge attributes to include. If True, then include all attributes. If False, then don’t include any attribute.
  • header (bool) – If True (default), then write the column names as the first row of the table.
  • parent_child (bool) – If True, then the first column is the parent term and the second column is the child term or gene. If False, then the columns are reversed.
  • clixo_format (bool) –

    If True, the table is the same format used the CLIXO C++ implementation. In particular, the table has three columns:

    Column 1) Parent Term Column 2) Child Term or Gene Column 3) The string “gene” if the row is a

    gene-term mapping, otherwise the string “default”.
Returns:

Contains at least three columns: (1) “Parent”, (2) “Child”, and (3) “EdgeType”.

Return type:

pandas.DataFrame

classmethod Ontology.read_pickle(file, compression='infer')[source]

Loads an Ontology object from a pickled state.

Ontology.to_pickle(file, compression='infer')[source]

Saves Ontology object with the Python pickle protocol.

Ontology.to_ndex(ndex_user, ndex_pass, ndex_server=None, name=None, description=None, network=None, main_feature=None, subnet_max_term_size=None, visible_term_attr=None, layout='bubble', propagate='reverse', style=None, node_alias='Original_Name', term_2_uuid=None, visibility='PUBLIC', verbose=False)[source]

Upload an Ontology object to NDEx. The Ontology can be preformatted in several ways including

  1. Set a name and description of the Ontology
  2. Upload a supporting gene-gene subnetwork for every term in the Ontology
  3. Propagate gene-term annotations
  4. Layout the nodes.
  5. Apply a visual style, e.g. specifying node and edge colors
Parameters:
  • name (str) – Name of Ontology
  • description (str) – Description of Ontology
  • layout (str) – The name of the layout algorithm for laying out the Ontology as a graph. Node positions are stored in the node attributes ‘x_pos’ and ‘y_pos’. If None, then do not perform a layout.
  • style (ndex.networkn.NdexGraph) – The Cytoscape.js visual style on NDEx. Represented using CX and stored in an NdexGraph.
  • network (pandas.Dataframe) – Dataframe describing gene-gene network from which to create subnetworks for every term. To be passed to Ontology.upload_subnets_ndex().
  • features (list of str) – Columns in the gene-gene network to upload. To be passed to Ontology.upload_subnets_ndex().
  • ndex_server (str) – URL of NDEx server
  • ndex_user (str) – NDEx username
  • ndex_pass (str) – NDEx password
  • public (bool) – Whether to make the Ontology public on NDEx
  • node_alias (str) –
  • visibility (str) –
Returns:

Return type:

ndex.networkn.NdexGraph

classmethod Ontology.from_ndex(ndex_uuid, ndex_user=None, ndex_pass=None, ndex_server=None, edgetype_attr=None, edgetype_value=None)[source]

Reads an Ontology stored on NDEx. Gene and terms are distinguished according by an edge attribute.

Parameters:
  • ndex_uuid (str) – NDEx UUID of ontology
  • edgetype_attr (str) – Name of the edge attribute that distinguishes a (gene, term) pair from a (child term, parent term) pair
  • gene_value (str) – Value of the edge attribute for (gene, term) pairs
Returns:

Return type:

ddot.Ontology.Ontology

Ontology.to_cx(output=None, name=None, description=None, term_2_uuid=None, spanning_tree=True, layout='bubble', style=None)[source]

Formats an Ontology object into a CX file format

Parameters:
  • output (str) – Filename or file-like object to write CX file. If None, then CX is returned as a JSON object, but not written to a file.
  • name (str) – Name of Ontology, as would appear if uploaded to NDEx.
  • description (str) – Description of Ontology, as would appear if uploaded to NDEx.
  • term_2_uuid (list) –

    A dictionary mapping a term to a NDEx UUID of a gene-gene subnetwork of genes in that term. the UUID will be stored in the node attribute ‘ndex:internallink’. If uploaded to NDEx, then this attribute will provide a hyperlink to the gene-gene subnetwork when the term is clicked upon on the NDEx page for this ontology.

    This dictionary can be created using Ontology.upload_subnets_ndex(). Default: no dictionary.

  • layout (str) – Layout the genes and terms in this Ontology. Stored in the node attributes ‘x_pos’ and ‘y_pos’. If None, then do not perform a layout.
Returns:

Return type:

CX representation as a JSON-like dictionary

Ontology.to_graphml(output, layout='bubble', spanning_tree=True)[source]

Writes an Ontology object in graphml format.

Parameters:
  • output (str) – Filename or file-like object to write CX file. If None, then CX is returned as a JSON object, but not written to a file.
  • layout (str) – Layout the genes and terms in this Ontology. Stored in the node attributes ‘x_pos’ and ‘y_pos’. If None, then do not perform a layout.

NetworkX and igraph

Ontology.to_networkx(layout='bubble', spanning_tree=True, layout_params=None, verbose=False)[source]

Converts Ontology into a NetworkX object.

Parameters:
  • node_attr (pandas.DataFrame) – Meta-data about genes and terms that will be included as node attributes in the NetworkX object.
  • edge_attr (pandas.DataFrame) – Meta-data about connections among genes and terms that will be included as edge attributes in the NetworkX object.
  • spanning_tree (bool) – If True, then identify a spanning tree of the DAG. include an edge attribute “Is_Tree_Edge” that indicates
  • layout (str) – The name of the layout algorithm for laying out the Ontology as a graph. Node positions are astored in the node attributes ‘x_pos’ and ‘y_pos’. If None, then do not perform a layout.
Returns:

Return type:

nx.DiGraph

classmethod Ontology.from_networkx(G, edgetype_attr=None, edgetype_value=None, clear_default_attr=True)[source]

Converts a NetworkX object to an Ontology object. Gene and terms are distinguished by an edge attribute.

Parameters:
  • G (nx.DiGraph) –
  • edgetype_attr (str) – Name of the edge attribute that distinguishes a (gene, term) pair from a (child term, parent term) pair
  • edgetype_value (str) – Value of the edge attribute for (gene, term) pairs
  • clear_default_attr (bool) – If True (default), then remove the node and edge attributes that are created in a NetworkX graph using Ontology.to_networkx() or Ontology.to_ndex(). These attributes include ‘Label’, ‘Size’, ‘NodeType’, and ‘EdgeType’. These attributes were created to make the NetworkX graph be an equivalent representation of an Ontology object; however, they are no longer necessary after reconstrcting the Ontology object.
Returns:

Return type:

ddot.Ontology.Ontology

classmethod Ontology.from_igraph(G, edgetype_attr=None, edgetype_value=None, verbose=False)[source]

Converts a igraph Graph object to an Ontology object. Gene and terms are distinguished by an edge attribute.

Parameters:
  • G (igraph.Graph) –
  • edgetype_attr (str) – Name of the edge attribute that distinguishes a (gene, term) pair from a (child term, parent term) pair
  • edgetype_value (str) – Value of the edge attribute for (gene, term) pairs
Returns:

Return type:

ddot.Ontology.Ontology

Ontology.to_igraph(include_genes=True, spanning_tree=False)[source]
Convert Ontology to an igraph.Graph object. Gene and term names are
stored in the ‘name’ vertex attribute of the igraph object.
Parameters:
  • include_genes (bool) – Include genes as vertices in the igraph object.
  • spanning_tree (bool) – If True, then identify a spanning tree of the DAG. include an edge attribute “Is_Tree_Edge” that indicates
Returns:

Return type:

igraph.Graph

Inspecting structure

Ontology.connected(descendants=None, ancestors=None, sparse=False)[source]

Calculate which genes or terms are descendants of other genes or terms.

Parameters:
  • descendants (list) – A list of genes and/or terms. Default: A list of all genes followed by a list of all terms, in the same order as self.genes and self.terms.
  • ancestors (list) – A list of genes and/or terms. Default: Same as the descendants parameter.
  • sparse (bool) – If True, return a scipy.sparse matrix. If False (default), return a NumPy array.
Returns:

d – A descendants-by-ancestors matrix. d[i,j] is 1 if term i is a descendant of term j, and 0 otherwise. Note that d[i,i]==1 and d[root,i]==0, for every i.

Return type:

np.ndarray or scipy.sparse.matrix

Ontology.get_best_ancestors(node_order=None, verbose=False, include_genes=True)[source]

Compute the ‘best’ ancestor for every pair of terms. ‘Best’ is specified by a ranking of terms. For example, if terms are ranked by size, from smallest to largest, then the smallest common ancestor is calculated.

Parameters:
  • node_order (list) – A list of terms, ordered by their rank with the ‘best’ term at the beginning.
  • include_genes (bool) –
Returns:

  • ancestors (np.ndarray) – ancestors[a,b] = the best common ancestor of terms a and b, represented as a 0-based index of self.terms
  • nodes (list) – List of the row and column names. Rows and columns are the same.

Ontology.topological_sorting(top_down=True, include_genes=False)[source]

Perform a topological sorting.

top_down :

If True, then ancestral nodes (e.g. the root nodes) come before descendants in the sorting. If False, then reverse the sorting

Manipulating structure

Ontology.unfold(duplicate=None, genes_only=False, levels=None, tree_edges=None)[source]

Traverses the ontology from the root to the leaves while duplicating nodes during the traversal to create a tree representation.

Traverse the ontology from the root nodes to the leaves in a breadth-first manner. Each time a node is traversed, then create a duplicate of it

Parameters:
  • duplicate (list) – Nodes to duplicate for unfolding. Default: all genes and terms
  • genes_only (bool) – If True, then duplicate all of the genes and none of the terms. Default: False
  • levels
Ontology.delete(to_delete=None, to_keep=None, preserve_transitivity=True, inplace=False)[source]

Delete genes and/or terms from the ontology.

Parameters:
  • to_delete (array-like (optional)) – Names of genes and/or terms to delete. Either to_delete or to_keep must be specified.
  • to_keep (array-like (optional)) – Names of genes and/or terms to keep; all other genes/terms are delete. Only used if to_delete is not specified.
  • preserve_transitivity (bool) –

    If True, then maintain transitive relations when deleting terms. For example, if the hierarchical structure consists of

    geneA –> term1 term1 –> term2 term2 –> term3 term2 –> term4

    then deleting term2 will result in the structure:

    geneA –> term1 term1 –> term3 term3 –> term4

    If False, then deleting term2 will result in a disconnected structure:

    geneA –> term1

  • inplace (bool) – If True, then modify the ontology. If False, then create and modify a copy.
Returns:

Return type:

ddot.Ontology.Ontology

Ontology.focus(branches=None, genes=None, collapse=False, root=True, verbose=True)[source]
Ontology.propagate(direction='forward', gene_term=True, term_term=False, verbose=False, inplace=False)[source]

Propagate gene-term annotations through the ontology.

As an example, consider an ontology with one gene g, three terms t1, t2, t3 and the following connections:

t1-->t2
t2-->t3
g-->t1
g-->t2

In “forward” propagation, a new relation g-->t3 is added. In “reverse” propagation, the relation “g–>t2” is deleted because it is an indirect relation inferred from “g–>t1” and “t1–>t2”.

Parameters:
  • direction (str) – The direction of propagation. Either ‘forward’ or ‘reverse’
  • inplace (bool) – If True, then modify the ontology. If False, then create and modify a copy.
Returns:

Return type:

ddot.Ontology.Ontology

Inferring data-driven ontology

Ontology.flatten(include_genes=True, include_terms=False, similarity='Resnik')[source]

Flatten the hierarchy into a node-node similarity matrix by calculating a similarity between pair of genes in genes_subset. Currently, only the Resnik semantic similarity measure is implemented.

Parameters:
  • include_genes (bool) – If True, then calculate pairwise similarities between genes. If include_terms is also True, then also calculate similarities between genes and terms.
  • include_terms (bool) – If True, then calculate pairwise similarities between terms. If include_genes is also True, then also calculate similarities between genes and terms.
  • similarity (str) –

    Type of semantic similarity. (default: “Resnik”)

    The Resnik similarity s(g1,g2) is defined as \(-log_2(|T_{sca}| / |T_{root}|)\) where \(|T|\) is the number of genes in genes_subset that are under term T. \(T_{sca}\) is the “smallest common ancestor”, the common ancestral term with the smallest term size. \(T_{root}\) is the root term of the ontology.

    Resnik, P. (1999). Semantic similarity in a taxonomy: An information-based measured and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11,95-130.

Returns:

A 2-tuple consisting of sim, a node-by-node NumPy array, and nodes, a NumPy array of the node names in sim.

Return type:

(sim, nodes)

classmethod Ontology.run_clixo(graph, alpha=0.0, beta=None, newman_modularity=None, miyauchi_modularity=None, stop_score=None, min_dt=-10000000, timeout=100000000, square=False, square_names=None, output=None, output_log=None, clixo_cmd=None, clixo_version=None, verbose=False, debug=False)[source]

Runs the CLIXO algorithm and returns the result as an Ontology object.

Acts as a wrapper for the C++ packages for CLIXO v0.3 (https://mhk7.github.io/clixo_0.3/) and v1.0 (https://github.com/fanzheng10/CliXO).

Parameters:
  • graph (pandas.DataFrame) – Gene-gene similarities represented as a 3-column pandas.DataFrame (sparse) or square-shaped pandas.DataFrame (square format)
  • alpha (float) – CLIXO alpha parameter
  • beta (float) – CLIXO beta parameter
  • min_dt (float) – Minimum similarity score
  • timeout (int) – Maximum time (in seconds) allowed to run CLIXO
  • square (bool) – If True, then <graph> is interpreted as a square-shaped DataFrame. Otherwise, it is interpreted as a 3-column DataFrame.
  • square_names (array-like (optional)) – The names of the rows and columns in <graph>. Only used when square==True.
  • output (str) – Filename to write the resulting Ontology as a table. Default: don’t write to file
  • output_log (str) – Filename to write log information from CLIXO. Default: don’t write to file
Returns:

Return type:

ddot.Ontology.Ontology

Aligning ontologies

Ontology.align(hier, iterations=100, threads=None, update_self=False, update_ref=False, align_label=None, calculateFDRs=None, mutual_collapse=True, output=None, verbose=False)[source]

Identifies one-to-one matches between terms in this ontology with highly similar terms in another ontology.

This function wraps around the C++ code in the alignOntology package by Michael Kramer at https://github.com/mhk7/alignOntology

Reference:

Dutkowski, J., Kramer, M., Surma, M.A., Balakrishnan, R., Cherry, J.M., Krogan, N.J. and Ideker, T., 2013. “A gene ontology inferred from molecular networks.” Nature biotechnology, 31(1).

Parameters:
  • hier (ddot.Ontology.Ontology) – The ontology to align against.
  • iterations (int) – The number of null model randomizations to create FDR score.
  • threads (int) – Number of CPU processes to run simultaneously. Used to parallelize the the null model randomizations. Default: The number of CPU cores returned by multiprocessing.cpu_count()
  • update_self (bool) – If True, then import the node attributes from the reference hierarchy as attributes in this hierarchy
  • update_ref (bool) – If True, then import the node attributes from the this hierarchy as attributes in the reference hierarchy
  • mutual_collapse (bool) – If True, then remove genes that are unique to either ontology, and then remove redundant terms in both ontologies using Ontology.collapse_ontology().
  • calculate_FDRs (str) – Filename of the ‘calculateFDRs’ scripts in the alignOntology C++package at https://github.com/mhk7/alignOntology. Default: use the ‘calculateFDRs’ script that comes built-in with ddot.
  • output (str) – Filename to write the results of the alignment as a tab-delimited file. Default: don’t write to a file
Returns:

Dataframe where index are names of terms in this ontology. There are three columns: ‘Term’ (name of the aligned term), ‘Similarity’ (the similarity score for the alignment), ‘FDR’ (the FDR of this alignment given the null models).

Return type:

pandas.DataFrame