Utility Functions¶
-
ddot.utils.
NdexGraph_to_nx
(G)[source]¶ Converts a NetworkX into a NdexGraph object.
Parameters: G (ndex.networkn.NdexGraph) – Returns: Return type: networkx.classes.DiGraph
-
ddot.utils.
bubble_layout_nx
(G, xmin=-750, xmax=750, ymin=-750, ymax=750, verbose=False)[source]¶ Bubble-tree Layout using the Tulip library.
The input tree must be a graph. The layout is scaled so that it is fit exactly within a bounding box.
Grivet, S., Auber, D., Domenger, J. P., & Melancon, G. (2006). Bubble tree drawing algorithm. In Computer Vision and Graphics (pp. 633-641). Springer Netherlands.
Parameters: Returns: Dictionary mapping nodes to 2D coordinates. pos[node_name] -> (x,y)
Return type:
-
ddot.utils.
color_gradient
(ratio, min_col='#FFFFFF', max_col='#D65F5F', output_hex=True)[source]¶ Calculate a proportional mix between two colors.
-
ddot.utils.
create_edgeMatrix
(X, X_cols, X_rows, verbose=True, G=None, ndex2=True)[source]¶ Converts an NumPy array into a NdexGraph with a special CX aspect called “edge_matrix”. The array is serialized using base64 encoding.
Parameters: Returns: Return type: ndex.networkn.NdexGraph
-
ddot.utils.
expand_seed
(seed, sim, sim_names, agg='mean', min_sim=-inf, filter_perc=None, seed_perc=None, agg_perc=0.5, expand_size=None, include_seed=True, figure=False, verbose=False)[source]¶ Identify genes that are most similar to a seed set of genes.
A gene is included in the expanded set only if it meets all of the specified criteria. If include_seed is True, then genes that are seeds will be included regardless of the criteria. At the same time, the number of genes returned is still limited by expand_size. One way to get n novel genes returned is therefore to set expand_size = n + |seed| and include_seed = True, and then to remove the seed list from expand.
Parameters: - seed (list) –
- sim (np.ndarray) –
- sim_names (list of str) –
- agg (str or function) – Aggregation method. Possible values are mean, min, max, perc.
- min_sim (float) – Minimum similarity to the seed set.
- filter_perc (float) – Filter based on a percentile of similarities between all genes and the seed set.
- seed_perc (float) – Filter based on a percentile of similarities between seed set to itself.
- agg_perc (float) – The <agg_perc> percentile of similarities to the seed set. For example, if a gene has similarities of (0, 0.2, 0.4, 0.6, 0.8) to five seed genes, then the 10% similarity is 0.2
- expand_size (int) – Maximum limit on the number of returned genes.
- include_seed (bool) – Include the seed genes even if they didn’t meet the criteria.
- figure (bool) – Generate a figure showing the average distances within the seed an d the average distances between seed and the background.
Returns: - expand – The list of expanded genes passing all filters.
- expand_idx – Indices of the ranking. I.e. expand_idx[0] is the index of the top gene, so you can get the name of the top gene with sim_names[expand_idx[0]] where sim_names is the input parameter.
- sim_2_seed – The returned array sim_2_seed is the calculated similarities of the genes to the seed set. So sim_2_seed[0] is the similarity of the gene
- fig – The generated figure. Can be saved like this: plt.savefig(‘foo.pdf’)
-
ddot.utils.
gridify
(parents, pos, G)[source]¶ Relayout leaf nodes into a grid.
Nodes must be connected and already laid out in “star”-like topologies. In each “star”, a set of nodes are positioned to form the shape of a circle and connect to a common parent node that is positioned at the circle’s center.
This function repositions the nodes in each start into a square grid that inscribes the circle.
Parameters: Returns: Modifies <pos> inplace
Return type:
-
ddot.utils.
ig_edges_to_pandas
(G, attr_list=None)[source]¶ Create pandas.DataFrame of edge attributes of a igraph Graph object.
Parameters: - G (igraph.Graph) –
- attr_list (list, optional) – Names of edge attributes. Default: all edge attributes
Returns: DataFrame where index is a MultIndex with two levels (u,v) referring to edges and the columns refer to edge attributes.
Return type: pandas.DataFrame
-
ddot.utils.
ig_nodes_to_pandas
(G, attr_list=None)[source]¶ Create pandas.DataFrame of node attributes of a igraph.Graph object.
Parameters: - G (igraph.Graph) –
- attr_list (list, optional) – Names of node attributes. Default: all node attributes
Returns: DataFrame where index is the names of nodes and the columns are node attributes.
Return type: pandas.DataFrame
-
ddot.utils.
ig_unfold_tree_with_attr
(g, sources, mode)[source]¶ Call igraph.Graph.unfold_tree while preserving vertex and edge attributes.
-
ddot.utils.
invert_dict
(dic, sort=True, keymap={}, valmap={})[source]¶ Inverts a dictionary of the form key1 : [val1, val2] key2 : [val1]
to a dictionary of the form
val1 : [key1, key2] val2 : [key2]
Parameters: dic (dict) – Returns: Return type: dict
-
ddot.utils.
load_edgeMatrix
(ndex_uuid, ndex_server, ndex_user, ndex_pass, ndex=None, json=None, verbose=True)[source]¶ Loads a NumPy array from a NdexGraph with a special CX aspect called “edge_matrix”.
Parameters: Returns: - X (np.ndarray)
- X_cols (list) – Column names
- X_rows (list) – Row names
-
ddot.utils.
make_index
(it)[source]¶ Create a dictionary mapping elements of an iterable to the index position of that element
-
ddot.utils.
make_seed_ontology
(sim, sim_names, expand_kwargs={}, build_kwargs={}, align_kwargs={}, ndex_kwargs={}, node_attr=None, verbose=False, ndex=True)[source]¶ Assembles and analyzes a data-driven ontology to study a process or disease
Parameters: - sim (np.ndarray) – gene-by-gene similarity array
- sim_names (array-like) – Names of genes as they appear in the rows and columns of <sim>
- expand_kwargs (dict) – Parameters for ddot.expand_seed() to identify an expanded set of genes
- build_kwargs (dict) – Parameters for Ontology.build_from_network(…) to build a data-driven ontology.
- align_kwargs (dict) – Parameters for Ontology.align() to align against a reference ontology.
- ndex_kwargs (dict) – Parameters for Ontology.to_ndex() to upload ontology to NDEx.
- node_attr (pd.DataFrame) – A DataFrame of node attributes to assign to the ontology.
- ndex (bool) – If True, then upload ontology to NDEx using parameters <ndex_kwargs>
-
ddot.utils.
melt_square
(df, columns=['Gene1', 'Gene2'], similarity='similarity', empty_value=0, upper_triangle=True)[source]¶ Melts square dataframe into sparse representation.
Parameters: - df (pandas.DataFrame) – Square-shaped dataframe where df[i,j] is the value of edge (i,j)
- columns (iterable) – Column names for nodes in the output dataframe
- similarity (string) – Column for edge value in the output dataframe
- empty_value – Not yet supported
- upper_triangle (bool) – Only use the values in the upper-right triangle (including the diagonal) of the input square dataframe
Returns: 3-column dataframe that provides a sparse representation of the edges. Two of the columns indicate the node name, and the third column indicates the edge value
Return type: pandas.DataFrame
-
ddot.utils.
ndex_to_sim_matrix
(ndex_url, ndex_server=None, ndex_user=None, ndex_pass=None, similarity=None, input_fmt='cx_matrix', output_fmt='matrix', subset=None, verbose=True)[source]¶ Read a similarity network from NDEx and return it as either a square np.array (compact representation) or a pandas.DataFrame of the non-zero similarity values (sparse representation)
Parameters: - ndex_url (str) – NDEx URL (or UUID) of ontology
- ndex_server (str) – URL of NDEx server
- ndex_user (str) – NDEx username
- ndex_pass (str) – NDEx password
- similarity (str) – Name of the edge attribute that represents the similarity/weight between two nodes. If None, then the name of the edge attribute in the output is named ‘similarity’ and all edges are assumed to have a similarity value of 1.
- input_fmt (str) –
- output_fmt (str) – If ‘matrix’, return a NumPy array. If ‘sparse’, return a pandas.DataFrame
- subset (optional) –
Returns: Return type: np.ndarray or pandas.DataFrame
-
ddot.utils.
nx_edges_to_pandas
(G, attr_list=None)[source]¶ Create pandas.DataFrame of edge attributes of a NetworkX graph.
Parameters: - G (networkx.Graph) –
- attr_list (list, optional) – Names of edge attributes. Default: all edge attributes
Returns: DataFrame where index is a MultIndex with two levels (u,v) referring to edges and the columns refer to edge attributes. For multi(di)graphs, the MultiIndex have three levels of the form (u, v, key).
Return type: pandas.DataFrame
-
ddot.utils.
nx_nodes_to_pandas
(G, attr_list=None)[source]¶ Create pandas.DataFrame of node attributes of a NetworkX graph.
Parameters: - G (networkx.Graph) –
- attr_list (list, optional) – Names of node attributes. Default: all node attributes
Returns: DataFrame where index is the names of nodes and the columns are node attributes.
Return type: pandas.DataFrame
-
ddot.utils.
nx_to_NdexGraph
(G_nx, discard_null=True)[source]¶ Converts a NetworkX into a NdexGraph object.
Parameters: G_nx (networkx.Graph) – Returns: Return type: ndex.networkn.NdexGraph
-
ddot.utils.
parse_ndex_uuid
(ndex_url)[source]¶ Extracts the NDEx UUID from a URL
Parameters: ndex_url (str) – URL for a network stored on NDEx Returns: UUID of the network Return type: str
-
ddot.utils.
pivot_square
(df, index, columns, values, fill_value=0)[source]¶ Convert a dataframe into a square compact representation.
Parameters: df (pandas.DataFrame) – DataFrame in long-format where every row represents one gene pair Returns: df – DataFrame with gene-by-gene dimensions Return type: pandas.DataFrame
-
ddot.utils.
set_edge_attributes_from_pandas
(G, edge_attr)[source]¶ Modify edge attributes according to a pandas.DataFrame.
Parameters: - G (networkx.Graph) –
- edge_attr (pandas.DataFrame) –
-
ddot.utils.
set_node_attributes_from_pandas
(G, node_attr)[source]¶ Modify node attributes according to a pandas.DataFrame.
Parameters: - G (networkx.Graph) –
- node_attr (pandas.DataFrame) –
-
ddot.utils.
sim_matrix_to_NdexGraph
(sim, names, similarity, output_fmt, node_attr=None)[source]¶ Convert similarity matrix into NdexGraph object
Parameters: - sim (np.ndarray) – Square-shaped NumPy array representing similarities
- names (list) – Genes names, in the same order as the rows and columns of sim
- similarity (str) – Edge attribute name for similarities in the resulting NdexGraph object
- output_fmt (str) – Either ‘cx’ (Standard CX format), or ‘cx_matrix’ (custom edgeMatrix aspect)
- node_attr (pandas.DataFrame, optional) – Node attributes, as a pandas.DataFrame, to be set in NdexGraph object
Returns: Return type: ndex.networkn.NdexGraph
-
ddot.utils.
transform_pos
(pos, xmin=-250, xmax=250, ymin=-250, ymax=250)[source]¶ Transforms coordinates to fit a bounding box.
Parameters: - pos (dict) – Dictionary mapping node names to (x,y) coordinates
- xmin (float, optional) – Minimum x-coordinate of the bounding box
- xmax (float, optional) – Maximum x-coordinate of the bounding box
- ymin (float, optional) – Minimum y-coordinate of the bounding box
- ymax (float, optional) – Maximum y-coordinate of the bounding box
Returns: New dictionary with transformed coordinates
Return type: