DiShIn: Semantic Similarity Measures using Disjunctive Shared Information¶
This software package provides the basic functions to start using semantic similarity measures directly from a rdf or owl file.
Getting started¶
Quick start¶
import ssmpy
Metals Example¶
To create the semantic base file (metals.db) from the metals.owl file:
ssmpy.create_semantic_base("metals.owl", "metals.db", "https://raw.githubusercontent.com/lasigeBioTM/ssm/master/metals.owl#", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "metals.txt")
ssmpy.semantic_base("metals.db")
The metals.txt contains the a list of occurrences. For example, the following contents has one occurrence for each term, except gold and silver with two occurrences.
gold
silver
gold
silver
copper
platinum
palladium
metal
coinage
precious
Now to calculate the similarity between copper and gold execute:
e1 = ssmpy.get_id("copper")
e2 = ssmpy.get_id("gold")
ssmpy.ssm_resnik (e1,e2)
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
0.22599256187152864
0.1504595366201814
0.281527889373394
Options¶
We can choose to calculate the measures using either the extrinsic or intrinsic Information Content (IC), and using the Most Informative Common Ancestors (MICA) or Disjunctive Common Ancestors (DCA). By default, the measures are calculated using extrinsic IC and DCA.
ssmpy.ssm.mica = False # determines if it uses MICA or DCA
ssmpy.ssm.intrinsic = False # determines if it uses extrinsic or intrinsic IC
Now calculate the similarity between copper and gold using intrinsic IC and MICA:
ssmpy.ssm.mica = True
ssmpy.ssm.intrinsic = True
e1 = ssmpy.get_id("copper")
e2 = ssmpy.get_id("gold")
ssmpy.ssm_resnik (e1,e2)
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
0.587786664902119
0.39079549108439265
0.35303485982596094
Other Examples¶
The following examples will assume the default options, i.e. the values shown are calculated using extrinsic IC and DCA.
Gene Ontology (GO) and UniProt proteins¶
Download the latest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/go202104.db.gz
gunzip -N go202104.db.gz
Now to calculate the similarity between maltose biosynthetic process and maltose catabolic process, first we need to obtain the semantic base IDs of those concepts:
ssmpy.semantic_base("go.db")
e1 = ssmpy.get_id("GO_0000023")
e2 = ssmpy.get_id("GO_0000025")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
4.315813746201754
0.38793452313030363
0.06840605034663635
Now to calculate the similarity between proteins Q12345 and Q12346, first we retrieve the GO terms associated with each one:
e1 = ssmpy.get_uniprot_annotations("Q12345")
e2 = ssmpy.get_uniprot_annotations("Q12346")
Next we use the ssm_multiple
to calculate the average maximum semantic similarity, using the resnik measure
ssmpy.ssm_multiple(ssmpy.ssm_resnik, e1, e2)
ssmpy.ssm_multiple(ssmpy.ssm_lin, e1, e2)
ssmpy.ssm_multiple(ssmpy.ssm_jiang_conrath, e1, e2)
Output:
0.6015115682274214
0.12201023476842265
0.09317326288224918
To create an updated version of the database, download the ontology and annotations:
wget http://purl.obolibrary.org/obo/go.owl
wget http://geneontology.org/gene-associations/goa_uniprot_all_noiea.gaf.gz
gunzip goa_uniprot_all_noiea.gaf.gz
The annotations will be used to calculate the extrinsic information content.
Next create the semantic base:
ssmpy.create_semantic_base("go.owl", "go.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "goa_uniprot_all_noiea.gaf)
This is stored in the form of a sqlite database on the same directory of your project.
Chemical Entities of Biological Interest (ChEBI) Example¶
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/chebi202104.db.gz
gunzip -N chebi202104.db.gz
Now to calculate the similarity between aripiprazole and bithionol execute:
ssmpy.semantic_base("chebi.db")
e1 = ssmpy.get_id("CHEBI_31236")
e2 = ssmpy.get_id("CHEBI_3131")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
1.4393842298350599
0.12935491517581163
0.049077257018319796
To create an updated version of the database, download the ontology:
wget http://purl.obolibrary.org/obo/chebi/chebi_lite.owl
And then create the new database:
ssmpy.create_semantic_base("chebi_lite.owl", "chebi.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')
Human Phenotype (HP) Example¶
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/hp202104.db.gz
gunzip -N hp202104.db.gz
Now to calculate the similarity between Optic nerve coloboma and Optic nerve dysplasia execute:
ssmpy.semantic_base("hp.db")
e1 = ssmpy.get_id("HP_0000588")
e2 = ssmpy.get_id("HP_0001093")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
4.593979372426621
0.5118244533189668
0.10242304162282165
To create an updated version of the database, download the ontology:
wget http://purl.obolibrary.org/obo/hp.owl
And then create the new database:
ssmpy.create_semantic_base("hp.owl", "hp.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')
Human Disease Ontology (HDO) Example¶
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/doid202104.db.gz
gunzip -N doid202104.db.gz
Now to calculate the similarity between Asthma and Lung cancer execute:
ssmpy.semantic_base("doid.db")
e1 = ssmpy.get_id("DOID_2841")
e2 = ssmpy.get_id("DOID_1324")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
2.3627836143597176
0.4328907089097581
0.13906777879867938
To create an updated version of the database, download the ontology:
wget http://purl.obolibrary.org/obo/doid.owl
And then create the new database:
ssmpy.create_semantic_base("doid.owl", "doid.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')
Medical Subject Headings (MeSH) Example¶
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/mesh202104.db.gz
gunzip -N mesh202104.db.gz
Now to calculate the similarity between Malignant Hyperthermia and Fever execute:
ssmpy.semantic_base("mesh.db")
e1 = ssmpy.get_id("D008305")
e2 = ssmpy.get_id("D005334")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
1.2582571367910345
0.17390901691859173
0.07719755683816652
To create an updated version of the database, download the _NT_ version from ftp://nlmpubs.nlm.nih.gov/online/mesh/rdf/mesh.nt.gz and unzip it:
wget ftp://nlmpubs.nlm.nih.gov/online/mesh/rdf/mesh.nt.gz
gunzip mesh.nt.gz
And then create the new database:
ssmpy.create_semantic_base("mesh.nt", "mesh.db", "http://id.nlm.nih.gov/mesh/", "http://id.nlm.nih.gov/mesh/vocab#broaderDescriptor", '')
Radiology Lexicon (RadLex) Example¶
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/radlex202104.db.gz
gunzip -N radlex202104.db.gz
Now to calculate the similarity between nervous system of right upper limb and nervous system of left upper limb execute:
ssmpy.semantic_base("radlex.db")
e1 = ssmpy.get_id("RID16139")
e2 = ssmpy.get_id("RID16140")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
9.366531825151093
0.9310964912333252
0.41905978419640516
To create an updated version of the database, download the RDF/XML version from http://bioportal.bioontology.org/ontologies/RADLEX and save it as radlex.rdf
And then create the new database:
ssmpy.create_semantic_base("radlex.rdf", "radlex.db", "http://www.radlex.org/RID/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')
WordNet Example¶
Download the lastest version of the database we created:
wget http://labs.rd.ciencias.ulisboa.pt/dishin/wordnet202104.db.gz
gunzip -N wordnet202104.db.gz
Now to calculate the similarity between the nouns ambulance and motorcycle execute:
ssmpy.semantic_base("wordnet.db")
e1 = ssmpy.get_id("ambulance-noun-1")
e2 = ssmpy.get_id("motorcycle-noun-1")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)
Output:
6.331085809208157
0.6792379292396559
0.14327549414725688
To create an updated version of the database, download the ontology:
wget http://www.w3.org/2006/03/wn/wn20/rdf/wordnet-hyponym.rdf
And then create the new database:
ssmpy.create_semantic_base("wordnet-hyponym.rdf", "wordnet.db", "http://www.w3.org/2006/03/wn/wn20/instances/synset-", "http://www.w3.org/2006/03/wn/wn20/schema/hyponymOf", '')
Data Sources¶
Gene Ontology (GO)¶
- Ontology: http://purl.obolibrary.org/obo/go.owl
- Annotation files (extrinsic): http://geneontology.org/gene-associations/goa_uniprot_all_noiea.gaf.gz
- SemanticBase: http://labs.rd.ciencias.ulisboa.pt/dishin/go202104.db.gz
ChEBI¶
Human Phenotype ontology (HPO)¶
- Ontology: http://purl.obolibrary.org/obo/hp.owl
- SemanticBase: http://labs.rd.ciencias.ulisboa.pt/dishin/hp202104.db.gz
Human Disease Ontology (DO)¶
- Ontology: http://purl.obolibrary.org/obo/doid.owl
- SemanticBase: http://labs.rd.ciencias.ulisboa.pt/dishin/doid202104.db.gz
Medical Subject Headings (MeSH) Example¶
RadLex¶
WordNet¶
API¶
-
ssmpy.
calculate_information_content_intrinsic
(df, max_freq)¶ Calculates the information content of a dataframe of entries :param df: pandas DataFrame :param max_freq: maximum frequency in the ontology :return: df with extra column ‘IC’
-
ssmpy.
common_ancestors
(entry1, entry2)¶ Get common ancestors between two semantic base entries
- Parameters
entry1 (int) – first semantic base ID
entry1 – second semantic base ID
- Returns
List of common ancestors
- Return type
list
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.common_ancestors(gold, silver) [6, 2, 10]
-
ssmpy.
create_connection
(db_file)¶ - create a database connection to the SQLite database
specified by db_file
- Parameters
db_file – database file
- Returns
Connection object or None
-
ssmpy.
create_semantic_base
(owl_file, sb_file, name_prefix, relation, annotation_file='')¶ Create sqlite3 semantic base using a owl file.
- Parameters
owl_file (string) – File name of ontolgy in owl format
sb_file (string) – File name of database where semantic base will be stored
name_prefix (string) – Prefix of the concepts to be extracted from the ontology
relation (string) – Type of relation to be extracted from the ontology
annotation_file (string) – File containing ontology concepts to use as annotations and calculate concept frequency. Empty string if this file is not available.
- Example:
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("http://purl.obolibrary.org/obo/go.owl", "go.owl")[0] 'go.owl' >>> ssmpy.create_semantic_base("go.owl", "go.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "") loading the ontology go.owl calculating transitive closure at distance: 1 calculating transitive closure at distance: 2 calculating transitive closure at distance: 3 calculating transitive closure at distance: 4 calculating transitive closure at distance: 5 calculating transitive closure at distance: 6 calculating transitive closure at distance: 7 calculating transitive closure at distance: 8 calculating transitive closure at distance: 9 calculating transitive closure at distance: 10 calculating transitive closure at distance: 11 calculating transitive closure at distance: 12 calculating transitive closure at distance: 13 calculating transitive closure at distance: 14 calculating transitive closure at distance: 15 calculating transitive closure at distance: 16 calculating the descendents calculating the hierarchical frequency the end >>> ssmpy.semantic_base("go.db")
-
ssmpy.
db_select_entry
(conn, entry_list)¶ Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity names
- Returns
pandas dataframe with all columns in entry
-
ssmpy.
db_select_entry_by_id
(conn, entry_list)¶ Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity ids
- Returns
pandas dataframe with all columns in entry
-
ssmpy.
db_select_transitive
(conn, ids_list)¶ Query all rows in the transitive table, where id is in ids_list :param conn: the Connection object :param ids_list: list of entity ids
- Returns
pandas dataframe with all columns in transitive
-
ssmpy.
fast_jc
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the JC MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_jc]
-
ssmpy.
fast_lin
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the LIN MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_lin]
-
ssmpy.
fast_resn_lin_jc
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the RESNIK, LIN and JC MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_resnik, sim_lin, sim_jc]
-
ssmpy.
fast_resnik
(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)¶ Calculates the RESNIK MICA INTRINSIC similarity between it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors (from table transitive)
df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC
df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
list: [e1, e2, sim_resnik]
-
ssmpy.
get_all_commom_ancestors
(all_ancestors, it1, it2)¶ Get all common ancestors for it1 and it2
- Parameters
all_ancestors – pandas DataFrame of all ancestors
it1 – entity 1 (id)
it2 – entity 2 (id)
- Returns
pandas DataDrame of common ancestors or zero
-
ssmpy.
get_ancestors
(entry)¶ Get ancestors of a given semantic base entry
- Parameters
entry (int) – semantic base ID
- Returns
List of ancestors
- Return type
list
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.get_ancestors(gold) [3, 6, 2, 10]
-
ssmpy.
get_id
(name)¶ Get semantic base ID of ontolgy concept by its original label (name).
- Parameters
name (string) – ontology label (depends on the ontolgy)
- Returns
semantic base ID or -1 if not found
- Return type
int
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> ssmpy.get_id("gold") 3
-
ssmpy.
get_name
(cid)¶ Get ontology label (name) for a given semantic base ID.
- Parameters
cid (int) – semantic base ID
- Returns
ontology label (name)
- Return type
string
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> ssmpy.get_name(3) 'gold'
-
ssmpy.
get_uniprot_annotations
(protein_acc)¶ Retrieve GO annotations for a UniProt ID using UniProt API
- Parameters
protein_acc (string) – UniProt protein ID
- Returns
list of GO terms
- Return type
list
- Example
>>> import ssmpy >>> import urllib.request >>> import gzip >>> import shutil >>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0] 'go.db.gz' >>> with gzip.open('go.db.gz', 'rb') as f_in: ... with open('go.db', 'wb') as f_out: ... shutil.copyfileobj(f_in, f_out) >>> ssmpy.semantic_base("go.db") >>> l = sorted(ssmpy.get_uniprot_annotations("Q12345")) >>> l [1746, 9044, 17053, 21566, 24341, 57621, 95359]
-
ssmpy.
information_content
(entry)¶ Get information content of a semantic base entry according to intrinsic.
- Parameters
entry (int) – semantic base ID
- Returns
information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.ssm.intrinsic = True >>> ssmpy.information_content(gold) 1.5040773967762742
-
ssmpy.
information_content_extrinsic
(entry)¶ Get the extrinsic information content of a semantic base entry.
The values are precomputated at the time of creation of the semantic base according to the annotations file provided.
- Parameters
entry (int) – semantic base ID
- Returns
extrinsic information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.information_content_extrinsic(gold) 1.2992829841302609
-
ssmpy.
information_content_intrinsic
(entry)¶ Get the intrinsic information content of a semantic base entry.
- Parameters
entry (int) – semantic base ID
- Returns
intrinsic information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> ssmpy.information_content_intrinsic(gold) 1.5040773967762742
-
ssmpy.
light_similarity
(conn, entry_ids_1, entry_ids_2, metric, cpu_cores)¶ main function :param conn: db_connection :param entry_ids_1: list of entries 1 :param entry_ids_2: list of entries 2 :param cpu_cores: number of cores to be used :param metric: ‘lin’, ‘resnick’, ‘jc’ or ‘all’ :return: list with results ([e1, e2, similarity] or [e1, e2, similarity resnik, similarity lin, similarity jc])
- Example
>>> import ssmpy >>> ssmpy.create_semantic_base('doid.owl', 'doid.db', "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "") >>> conn = ssmpy.create_connection('doid.db') >>> list1 = ['DOID_10587', 'DOID_2841'] >>> list2 = ['DOID_1927', 'DOID_1324'] >>> ssmpy.light_similarity(conn, list1, list2, 'all', 4) [[['DOID_10587', 'DOID_1324', -0.0, -0.0, 0.068819810490695], ['DOID_10587', 'DOID_1927', 5.937536205082426, 0.8269561090992177, 0.28695173228265203]], [['DOID_2841', 'DOID_1324', 3.703943983575332, 0.659762410973656, 0.20745912457314464], ['DOID_2841', 'DOID_1927', -0.0, -0.0, 0.07658496040867407]]]
-
ssmpy.
num_paths
(entry1, ancestor)¶ Get number of paths (edges) between two concepts.
- Parameters
entry1 (int) – Child concept
ancestor (int) – Parent concept
- Returns
number of edges between the two concepts
- Return type
int
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> metal = ssmpy.get_id("metal") >>> ssmpy.num_paths(gold, metal) 5
-
ssmpy.
run_query
(query, params)¶ Run any query on the semantic base.
- Parameters
query (string) – query to run on the semantic base
params (tuple) – query parameters
- Returns
query result
- Return type
sqlite3.Cursor
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> query = "SELECT id FROM entry WHERE name = ?" >>> ssmpy.run_query(query, ("gold",)).fetchone() (3,)
-
ssmpy.
semantic_base
(sb_file, **kwargs)¶ Initialize global connection object.
You can also pass other arguments to be given to the sqlite3.connect method, for example
check_same_thread
. After this method is called, the other methods will be applied to the semantic base.- Parameters
sb_file (string) – sqlite database filename
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db")
-
ssmpy.
shared_ic
(entry1, entry2)¶ Calculate the shared information content of two concepts according to the value of ssmpy.ssm.mica
Previously computed values are stored in memory for faster computation.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Shared information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm.mica = True >>> ssmpy.shared_ic(gold, silver) 0.587786664902119
-
ssmpy.
shared_ic_dca
(entry1, entry2)¶ Calculate the shared information content of two concepts using disjunctive common ancestors.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Shared information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.shared_ic_dca(gold, silver) 0.587786664902119
-
ssmpy.
shared_ic_mica
(entry1, entry2)¶ Calculate the shared information content of two concepts using the most informative common ancestor.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Shared information content
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.shared_ic_mica(gold, silver) 0.587786664902119
-
ssmpy.
ssm_jiang_conrath
(entry1, entry2)¶ Calculate JC’s semantic similarity.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Semantic similarity
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm_jiang_conrath(gold, silver) 0.5456783339686456
-
ssmpy.
ssm_lin
(entry1, entry2)¶ Calculate Lin’s semantic similarity.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Semantic similarity
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm_lin(gold, silver) 0.39079549108439265
-
ssmpy.
ssm_multiple
(m, entry1_list, entry2_list)¶ Calculate semantic similarity over two lists of concepts.
- Parameters
m – semantic similarity function
entry1_list – First concept list
entry2_list – Second concept list
- Returns
Aggregate Similarity Measure
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> import gzip >>> import shutil >>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0] 'go.db.gz' >>> with gzip.open('go.db.gz', 'rb') as f_in: ... with open('go.db', 'wb') as f_out: ... shutil.copyfileobj(f_in, f_out) >>> ssmpy.semantic_base("go.db") >>> e1 = ssmpy.get_uniprot_annotations("Q12345") >>> e2 = ssmpy.get_uniprot_annotations("Q12346") >>> ssmpy.ssm_multiple(ssmpy.ssm_resnik, e1, e2) 1.653493583942882
-
ssmpy.
ssm_resnik
(entry1, entry2)¶ Calculate Resnik’s semantic similarity.
- Parameters
entry1 (int) – First concept
ancestor (int) – Second concept
- Returns
Semantic similarity
- Return type
float
- Example
>>> import ssmpy >>> import urllib.request >>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0] 'metals.db' >>> ssmpy.semantic_base("metals.db") >>> gold = ssmpy.get_id("gold") >>> silver = ssmpy.get_id("silver") >>> ssmpy.ssm_resnik(gold, silver) 0.587786664902119
Reference:¶
- F. Couto and A. Lamurias, “Semantic similarity definition,” in Encyclopedia of Bioinformatics and Computational Biology (S. Ranganathan, K. Nakai, C. Schönbach, and M. Gribskov, eds.), vol. 1, pp. 870–876, Oxford: Elsevier, 2019 https://doi.org/10.1016/B978-0-12-809633-8.20401-9