API¶
-
ssmpy.
calculate_information_content_intrinsic
(df, max_freq)¶ Calculates the information content of a dataframe of entries :param df: pandas DataFrame :param max_freq: maximum frequency in the ontology :return: df with extra column ‘IC’
-
ssmpy.
common_ancestors
(entry1, entry2)¶ Get common ancestors between two semantic base entries
- Parameters
entry1 (int) – first semantic base ID
entry1 – second semantic base ID
- Returns
List of common ancestors
- Return type
list
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.common_ancestors(gold, silver) [6, 2, 10]
-
ssmpy.
create_connection
(db_file)¶ - create a database connection to the SQLite database
specified by db_file
- Parameters
db_file – database file
- Returns
Connection object or None
-
ssmpy.
create_semantic_base
(owl_file, sb_file, name_prefix, relation, annotation_file='')¶ Create sqlite3 semantic base using a owl file.
- Parameters
owl_file (string) – File name of ontolgy in owl format
sb_file (string) – File name of database where semantic base will be stored
name_prefix (string) – Prefix of the concepts to be extracted from the ontology
relation (string) – Type of relation to be extracted from the ontology
annotation_file (string) – File containing ontology concepts to use as annotations and calculate concept frequency. Empty string if this file is not available.
- Example:
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("http://purl.obolibrary.org/obo/go.owl", "go.owl")[0] 'go.owl' >>> ssmpy.create_semantic_base("go.owl", "go.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "") loading the ontology go.owl calculating transitive closure at distance: 1 calculating transitive closure at distance: 2 calculating transitive closure at distance: 3 calculating transitive closure at distance: 4 calculating transitive closure at distance: 5 calculating transitive closure at distance: 6 calculating transitive closure at distance: 7 calculating transitive closure at distance: 8 calculating transitive closure at distance: 9 calculating transitive closure at distance: 10 calculating transitive closure at distance: 11 calculating transitive closure at distance: 12 calculating transitive closure at distance: 13 calculating transitive closure at distance: 14 calculating transitive closure at distance: 15 calculating transitive closure at distance: 16 calculating the descendents calculating the hierarchical frequency the end >>> ssmpy.semantic_base("go.db")
-
ssmpy.
db_select_entry
(conn, entry_list)¶ Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity names
- Returns
pandas dataframe with all columns in entry
-
ssmpy.
db_select_entry_by_id
(conn, entry_list)¶ Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity ids
- Returns
pandas dataframe with all columns in entry
-
ssmpy.
db_select_transitive
(conn, ids_list)¶ Query all rows in the transitive table, where id is in ids_list :param conn: the Connection object :param ids_list: list of entity ids
- Returns
pandas dataframe with all columns in transitive
-
ssmpy.
fast_jc
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the JC MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_jc]
-
ssmpy.
fast_lin
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the LIN MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_lin]
-
ssmpy.
fast_resn_lin_jc
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the RESNIK, LIN and JC MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_resnik, sim_lin, sim_jc]
-
ssmpy.
fast_resnik
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the RESNIK MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_resnik]
-
ssmpy.
get_all_commom_ancestors
(all_ancestors, it1, it2)¶ Get all common ancestors for it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
pandas DataDrame of common ancestors or zero
-
ssmpy.
get_ancestors
(entry)¶ Get ancestors of a given semantic base entry
- Parameters
entry (int) – semantic base ID
- Returns
List of ancestors
- Return type
list
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.get_ancestors(gold) [3, 6, 2, 10]
-
ssmpy.
get_id
(name)¶ Get semantic base ID of ontolgy concept by its original label (name).
- Parameters
name (string) – ontology label (depends on the ontolgy)
- Returns
semantic base ID or -1 if not found
- Return type
int
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> ssmpy.get_id("gold") 3
-
ssmpy.
get_name
(cid)¶ Get ontology label (name) for a given semantic base ID.
- Parameters
cid (int) – semantic base ID
- Returns
ontology label (name)
- Return type
string
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> ssmpy.get_name(3) 'gold'
-
ssmpy.
get_uniprot_annotations
(protein_acc)¶ Retrieve GO annotations for a UniProt ID using UniProt API
- Parameters
protein_acc (string) – UniProt protein ID
- Returns
list of GO terms
- Return type
list
- Example
>>> import ssmpy >>> import urllib.request >>> import gzip >>> import shutil >>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0] 'go.db.gz' >>> with gzip.open('go.db.gz', 'rb') as f_in: ... with open('go.db', 'wb') as f_out: ... shutil.copyfileobj(f_in, f_out) >>> ssmpy.semantic_base("go.db") >>> l = sorted(ssmpy.get_uniprot_annotations("Q12345")) >>> l [1746, 9044, 17053, 21566, 24341, 57621, 95359]
-
ssmpy.
information_content
(entry)¶ Get information content of a semantic base entry according to intrinsic.
- Parameters
entry (int) – semantic base ID
- Returns
information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.ssm.intrinsic = True >>> ssmpy.information_content(gold) 1.5040773967762742
-
ssmpy.
information_content_extrinsic
(entry)¶ Get the extrinsic information content of a semantic base entry.
The values are precomputated at the time of creation of the semantic base according to the annotations file provided.
- Parameters
entry (int) – semantic base ID
- Returns
extrinsic information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.information_content_extrinsic(gold) 1.2992829841302609
-
ssmpy.
information_content_intrinsic
(entry)¶ Get the intrinsic information content of a semantic base entry.
- Parameters
entry (int) – semantic base ID
- Returns
intrinsic information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.information_content_intrinsic(gold) 1.5040773967762742
-
ssmpy.
light_similarity
(conn, entry_ids_1, entry_ids_2, metric, cpu_cores)¶ main function :param conn: db_connection :param entry_ids_1: list of entries 1 :param entry_ids_2: list of entries 2 :param cpu_cores: number of cores to be used :param metric: ‘lin’, ‘resnick’, ‘jc’ or ‘all’ :return: list with results ([e1, e2, similarity] or [e1, e2, similarity resnik, similarity lin, similarity jc])
- Example
>>> import ssmpy >>> ssmpy.create_semantic_base('doid.owl', 'doid.db', "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "") >>> conn = ssmpy.create_connection('doid.db') >>> list1 = ['DOID_10587', 'DOID_2841'] >>> list2 = ['DOID_1927', 'DOID_1324'] >>> ssmpy.light_similarity(conn, list1, list2, 'all', 4) [[['DOID_10587', 'DOID_1324', -0.0, -0.0, 0.068819810490695], ['DOID_10587', 'DOID_1927', 5.937536205082426, 0.8269561090992177, 0.28695173228265203]], [['DOID_2841', 'DOID_1324', 3.703943983575332, 0.659762410973656, 0.20745912457314464], ['DOID_2841', 'DOID_1927', -0.0, -0.0, 0.07658496040867407]]]
-
ssmpy.
num_paths
(entry1, ancestor)¶ Get number of paths (edges) between two concepts.
- Parameters
entry1 (int) – Child concept
ancestor (int) – Parent concept
- Returns
number of edges between the two concepts
- Return type
int
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> metal = ssmpy.get_id("metal") >>> ssmpy.num_paths(gold, metal) 5
-
ssmpy.
run_query
(query, params)¶ Run any query on the semantic base.
- Parameters
query (string) – query to run on the semantic base
params (tuple) – query parameters
- Returns
query result
- Return type
sqlite3.Cursor
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> query = "SELECT id FROM entry WHERE name = ?" >>> ssmpy.run_query(query, ("gold",)).fetchone() (3,)
-
ssmpy.
semantic_base
(sb_file, **kwargs)¶ Initialize global connection object.
You can also pass other arguments to be given to the sqlite3.connect method, for example
check_same_thread
. After this method is called, the other methods will be applied to the semantic base.- Parameters
sb_file (string) – sqlite database filename
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db")
Calculate the shared information content of two concepts according to the value of ssmpy.ssm.mica
Previously computed values are stored in memory for faster computation.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Shared information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm.mica = True >>> ssmpy.shared_ic(gold, silver) 0.587786664902119
Calculate the shared information content of two concepts using disjunctive common ancestors.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Shared information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.shared_ic_dca(gold, silver) 0.587786664902119
Calculate the shared information content of two concepts using the most informative common ancestor.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Shared information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.shared_ic_mica(gold, silver) 0.587786664902119
-
ssmpy.
ssm_jiang_conrath
(entry1, entry2)¶ Calculate JC’s semantic similarity.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Semantic similarity
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm_jiang_conrath(gold, silver) 0.5456783339686456
-
ssmpy.
ssm_lin
(entry1, entry2)¶ Calculate Lin’s semantic similarity.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Semantic similarity
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm_lin(gold, silver) 0.39079549108439265
-
ssmpy.
ssm_multiple
(m, entry1_list, entry2_list)¶ Calculate semantic similarity over two lists of concepts.
- Parameters
m – semantic similarity function
entry1_list – First concept list
entry2_list – Second concept list
- Returns
Aggregate Similarity Measure
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> import gzip >>> import shutil >>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0] 'go.db.gz' >>> with gzip.open('go.db.gz', 'rb') as f_in: ... with open('go.db', 'wb') as f_out: ... shutil.copyfileobj(f_in, f_out) >>> ssmpy.semantic_base("go.db") >>> e1 = ssmpy.get_uniprot_annotations("Q12345") >>> e2 = ssmpy.get_uniprot_annotations("Q12346") >>> ssmpy.ssm_multiple(ssmpy.ssm_resnik, e1, e2) 1.653493583942882
-
ssmpy.
ssm_resnik
(entry1, entry2)¶ Calculate Resnik’s semantic similarity.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Semantic similarity
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm_resnik(gold, silver) 0.587786664902119