DiShIn: Semantic Similarity Measures using Disjunctive Shared Information

This software package provides the basic functions to start using semantic similarity measures directly from a rdf or owl file.

Getting started

Installation

Either clone this repository or install from pypi:

pip install ssmpy

Quick start

import ssmpy

Metals Example

To create the semantic base file (metals.db) from the metals.owl file:

ssmpy.create_semantic_base("metals.owl", "metals.db", "https://raw.githubusercontent.com/lasigeBioTM/ssm/master/metals.owl#", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "metals.txt")
ssmpy.semantic_base("metals.db")

The metals.txt contains the a list of occurrences. For example, the following contents has one occurrence for each term, except gold and silver with two occurrences.

gold
silver
gold
silver
copper
platinum
palladium
metal
coinage
precious

Now to calculate the similarity between copper and gold execute:

e1 = ssmpy.get_id("copper")
e2 = ssmpy.get_id("gold")
ssmpy.ssm_resnik (e1,e2)
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

0.22599256187152864
0.1504595366201814
0.281527889373394

Options

We can choose to calculate the measures using either the extrinsic or intrinsic Information Content (IC), and using the Most Informative Common Ancestors (MICA) or Disjunctive Common Ancestors (DCA). By default, the measures are calculated using extrinsic IC and DCA.

ssmpy.ssm.mica = False # determines if it uses MICA or DCA
ssmpy.ssm.intrinsic = False # determines if it uses extrinsic or intrinsic IC

Now calculate the similarity between copper and gold using intrinsic IC and MICA:

ssmpy.ssm.mica = True
ssmpy.ssm.intrinsic = True
e1 = ssmpy.get_id("copper")
e2 = ssmpy.get_id("gold")
ssmpy.ssm_resnik (e1,e2)
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

0.587786664902119
0.39079549108439265
0.35303485982596094

Other Examples

The following examples will assume the default options, i.e. the values shown are calculated using extrinsic IC and DCA.

Gene Ontology (GO) and UniProt proteins

Download the latest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/go202104.db.gz
gunzip -N go202104.db.gz

Now to calculate the similarity between maltose biosynthetic process and maltose catabolic process, first we need to obtain the semantic base IDs of those concepts:

ssmpy.semantic_base("go.db")
e1 = ssmpy.get_id("GO_0000023")
e2 = ssmpy.get_id("GO_0000025")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

4.315813746201754
0.38793452313030363
0.06840605034663635

Now to calculate the similarity between proteins Q12345 and Q12346, first we retrieve the GO terms associated with each one:

e1 = ssmpy.get_uniprot_annotations("Q12345")
e2 = ssmpy.get_uniprot_annotations("Q12346")

Next we use the ssm_multiple to calculate the average maximum semantic similarity, using the resnik measure

ssmpy.ssm_multiple(ssmpy.ssm_resnik, e1, e2)
ssmpy.ssm_multiple(ssmpy.ssm_lin, e1, e2)
ssmpy.ssm_multiple(ssmpy.ssm_jiang_conrath, e1, e2)

Output:

0.6015115682274214
0.12201023476842265
0.09317326288224918

To create an updated version of the database, download the ontology and annotations:

wget http://purl.obolibrary.org/obo/go.owl
wget http://geneontology.org/gene-associations/goa_uniprot_all_noiea.gaf.gz
gunzip goa_uniprot_all_noiea.gaf.gz

The annotations will be used to calculate the extrinsic information content.

Next create the semantic base:

ssmpy.create_semantic_base("go.owl", "go.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "goa_uniprot_all_noiea.gaf)

This is stored in the form of a sqlite database on the same directory of your project.

Chemical Entities of Biological Interest (ChEBI) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/chebi202104.db.gz
gunzip -N chebi202104.db.gz

Now to calculate the similarity between aripiprazole and bithionol execute:

ssmpy.semantic_base("chebi.db")
e1 = ssmpy.get_id("CHEBI_31236")
e2 = ssmpy.get_id("CHEBI_3131")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

1.4393842298350599
0.12935491517581163
0.049077257018319796

To create an updated version of the database, download the ontology:

wget http://purl.obolibrary.org/obo/chebi/chebi_lite.owl

And then create the new database:

ssmpy.create_semantic_base("chebi_lite.owl", "chebi.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')

Human Phenotype (HP) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/hp202104.db.gz
gunzip -N hp202104.db.gz

Now to calculate the similarity between Optic nerve coloboma and Optic nerve dysplasia execute:

ssmpy.semantic_base("hp.db")
e1 = ssmpy.get_id("HP_0000588")
e2 = ssmpy.get_id("HP_0001093")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

4.593979372426621
0.5118244533189668
0.10242304162282165

To create an updated version of the database, download the ontology:

wget http://purl.obolibrary.org/obo/hp.owl

And then create the new database:

ssmpy.create_semantic_base("hp.owl", "hp.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')

Human Disease Ontology (HDO) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/doid202104.db.gz
gunzip -N doid202104.db.gz

Now to calculate the similarity between Asthma and Lung cancer execute:

ssmpy.semantic_base("doid.db")
e1 = ssmpy.get_id("DOID_2841")
e2 = ssmpy.get_id("DOID_1324")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

2.3627836143597176
0.4328907089097581
0.13906777879867938

To create an updated version of the database, download the ontology:

wget http://purl.obolibrary.org/obo/doid.owl

And then create the new database:

ssmpy.create_semantic_base("doid.owl", "doid.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')

Medical Subject Headings (MeSH) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/mesh202104.db.gz
gunzip -N mesh202104.db.gz

Now to calculate the similarity between Malignant Hyperthermia and Fever execute:

ssmpy.semantic_base("mesh.db")
e1 = ssmpy.get_id("D008305")
e2 = ssmpy.get_id("D005334")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

1.2582571367910345
0.17390901691859173
0.07719755683816652

To create an updated version of the database, download the _NT_ version from ftp://nlmpubs.nlm.nih.gov/online/mesh/rdf/mesh.nt.gz and unzip it:

wget ftp://nlmpubs.nlm.nih.gov/online/mesh/rdf/mesh.nt.gz
gunzip mesh.nt.gz

And then create the new database:

ssmpy.create_semantic_base("mesh.nt", "mesh.db", "http://id.nlm.nih.gov/mesh/", "http://id.nlm.nih.gov/mesh/vocab#broaderDescriptor", '')

Radiology Lexicon (RadLex) Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/radlex202104.db.gz
gunzip -N radlex202104.db.gz

Now to calculate the similarity between nervous system of right upper limb and nervous system of left upper limb execute:

ssmpy.semantic_base("radlex.db")
e1 = ssmpy.get_id("RID16139")
e2 = ssmpy.get_id("RID16140")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

9.366531825151093
0.9310964912333252
0.41905978419640516

To create an updated version of the database, download the RDF/XML version from http://bioportal.bioontology.org/ontologies/RADLEX and save it as radlex.rdf

And then create the new database:

ssmpy.create_semantic_base("radlex.rdf", "radlex.db", "http://www.radlex.org/RID/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", '')

WordNet Example

Download the lastest version of the database we created:

wget http://labs.rd.ciencias.ulisboa.pt/dishin/wordnet202104.db.gz
gunzip -N wordnet202104.db.gz

Now to calculate the similarity between the nouns ambulance and motorcycle execute:

ssmpy.semantic_base("wordnet.db")
e1 = ssmpy.get_id("ambulance-noun-1")
e2 = ssmpy.get_id("motorcycle-noun-1")
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

6.331085809208157
0.6792379292396559
0.14327549414725688

To create an updated version of the database, download the ontology:

wget http://www.w3.org/2006/03/wn/wn20/rdf/wordnet-hyponym.rdf

And then create the new database:

ssmpy.create_semantic_base("wordnet-hyponym.rdf", "wordnet.db", "http://www.w3.org/2006/03/wn/wn20/instances/synset-", "http://www.w3.org/2006/03/wn/wn20/schema/hyponymOf", '')

API

ssmpy.calculate_information_content_intrinsic(df, max_freq)

Calculates the information content of a dataframe of entries :param df: pandas DataFrame :param max_freq: maximum frequency in the ontology :return: df with extra column ‘IC’

ssmpy.common_ancestors(entry1, entry2)

Get common ancestors between two semantic base entries

Parameters
  • entry1 (int) – first semantic base ID

  • entry1 – second semantic base ID

Returns

List of common ancestors

Return type

list

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.common_ancestors(gold, silver)
[6, 2, 10]
ssmpy.create_connection(db_file)
create a database connection to the SQLite database

specified by db_file

Parameters

db_file – database file

Returns

Connection object or None

ssmpy.create_semantic_base(owl_file, sb_file, name_prefix, relation, annotation_file='')

Create sqlite3 semantic base using a owl file.

Parameters
  • owl_file (string) – File name of ontolgy in owl format

  • sb_file (string) – File name of database where semantic base will be stored

  • name_prefix (string) – Prefix of the concepts to be extracted from the ontology

  • relation (string) – Type of relation to be extracted from the ontology

  • annotation_file (string) – File containing ontology concepts to use as annotations and calculate concept frequency. Empty string if this file is not available.

Example:
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("http://purl.obolibrary.org/obo/go.owl", "go.owl")[0]
'go.owl'
>>> ssmpy.create_semantic_base("go.owl", "go.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "")
loading the ontology go.owl
calculating transitive closure at distance: 1
calculating transitive closure at distance: 2
calculating transitive closure at distance: 3
calculating transitive closure at distance: 4
calculating transitive closure at distance: 5
calculating transitive closure at distance: 6
calculating transitive closure at distance: 7
calculating transitive closure at distance: 8
calculating transitive closure at distance: 9
calculating transitive closure at distance: 10
calculating transitive closure at distance: 11
calculating transitive closure at distance: 12
calculating transitive closure at distance: 13
calculating transitive closure at distance: 14
calculating transitive closure at distance: 15
calculating transitive closure at distance: 16
calculating the descendents
calculating the hierarchical frequency
the end
>>> ssmpy.semantic_base("go.db")
ssmpy.db_select_entry(conn, entry_list)

Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity names

Returns

pandas dataframe with all columns in entry

ssmpy.db_select_entry_by_id(conn, entry_list)

Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity ids

Returns

pandas dataframe with all columns in entry

ssmpy.db_select_transitive(conn, ids_list)

Query all rows in the transitive table, where id is in ids_list :param conn: the Connection object :param ids_list: list of entity ids

Returns

pandas dataframe with all columns in transitive

ssmpy.fast_jc(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the JC MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_jc]

ssmpy.fast_lin(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the LIN MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_lin]

ssmpy.fast_resn_lin_jc(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the RESNIK, LIN and JC MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_resnik, sim_lin, sim_jc]

ssmpy.fast_resnik(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the RESNIK MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_resnik]

ssmpy.get_all_commom_ancestors(all_ancestors, it1, it2)

Get all common ancestors for it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

pandas DataDrame of common ancestors or zero

ssmpy.get_ancestors(entry)

Get ancestors of a given semantic base entry

Parameters

entry (int) – semantic base ID

Returns

List of ancestors

Return type

list

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.get_ancestors(gold)
[3, 6, 2, 10]
ssmpy.get_id(name)

Get semantic base ID of ontolgy concept by its original label (name).

Parameters

name (string) – ontology label (depends on the ontolgy)

Returns

semantic base ID or -1 if not found

Return type

int

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> ssmpy.get_id("gold")
3
ssmpy.get_name(cid)

Get ontology label (name) for a given semantic base ID.

Parameters

cid (int) – semantic base ID

Returns

ontology label (name)

Return type

string

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> ssmpy.get_name(3)
'gold'
ssmpy.get_uniprot_annotations(protein_acc)

Retrieve GO annotations for a UniProt ID using UniProt API

Parameters

protein_acc (string) – UniProt protein ID

Returns

list of GO terms

Return type

list

Example

>>> import ssmpy
>>> import urllib.request
>>> import gzip
>>> import shutil
>>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0]
    'go.db.gz'
    >>> with gzip.open('go.db.gz', 'rb') as f_in:
    ...    with open('go.db', 'wb') as f_out:
    ...        shutil.copyfileobj(f_in, f_out)
    >>> ssmpy.semantic_base("go.db")
    >>> l = sorted(ssmpy.get_uniprot_annotations("Q12345"))
    >>> l
    [1746, 9044, 17053, 21566, 24341, 57621, 95359]
ssmpy.information_content(entry)

Get information content of a semantic base entry according to intrinsic.

Parameters

entry (int) – semantic base ID

Returns

information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.ssm.intrinsic = True
>>> ssmpy.information_content(gold)
1.5040773967762742
ssmpy.information_content_extrinsic(entry)

Get the extrinsic information content of a semantic base entry.

The values are precomputated at the time of creation of the semantic base according to the annotations file provided.

Parameters

entry (int) – semantic base ID

Returns

extrinsic information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.information_content_extrinsic(gold)
1.2992829841302609
ssmpy.information_content_intrinsic(entry)

Get the intrinsic information content of a semantic base entry.

Parameters

entry (int) – semantic base ID

Returns

intrinsic information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.information_content_intrinsic(gold)
1.5040773967762742
ssmpy.light_similarity(conn, entry_ids_1, entry_ids_2, metric, cpu_cores)

main function :param conn: db_connection :param entry_ids_1: list of entries 1 :param entry_ids_2: list of entries 2 :param cpu_cores: number of cores to be used :param metric: ‘lin’, ‘resnick’, ‘jc’ or ‘all’ :return: list with results ([e1, e2, similarity] or [e1, e2, similarity resnik, similarity lin, similarity jc])

Example

>>> import ssmpy
>>> ssmpy.create_semantic_base('doid.owl', 'doid.db', "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "")
>>> conn = ssmpy.create_connection('doid.db')
>>> list1 = ['DOID_10587', 'DOID_2841']
>>> list2 = ['DOID_1927', 'DOID_1324']
>>> ssmpy.light_similarity(conn, list1, list2, 'all', 4)
[[['DOID_10587', 'DOID_1324', -0.0, -0.0, 0.068819810490695],
['DOID_10587', 'DOID_1927', 5.937536205082426, 0.8269561090992177, 0.28695173228265203]],
[['DOID_2841', 'DOID_1324', 3.703943983575332, 0.659762410973656, 0.20745912457314464],
['DOID_2841', 'DOID_1927', -0.0, -0.0, 0.07658496040867407]]]
ssmpy.num_paths(entry1, ancestor)

Get number of paths (edges) between two concepts.

Parameters
  • entry1 (int) – Child concept

  • ancestor (int) – Parent concept

Returns

number of edges between the two concepts

Return type

int

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> metal = ssmpy.get_id("metal")
>>> ssmpy.num_paths(gold, metal)
5
ssmpy.run_query(query, params)

Run any query on the semantic base.

Parameters
  • query (string) – query to run on the semantic base

  • params (tuple) – query parameters

Returns

query result

Return type

sqlite3.Cursor

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> query = "SELECT id FROM entry WHERE name = ?"
>>> ssmpy.run_query(query, ("gold",)).fetchone()
(3,)
ssmpy.semantic_base(sb_file, **kwargs)

Initialize global connection object.

You can also pass other arguments to be given to the sqlite3.connect method, for example check_same_thread. After this method is called, the other methods will be applied to the semantic base.

Parameters

sb_file (string) – sqlite database filename

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
ssmpy.shared_ic(entry1, entry2)

Calculate the shared information content of two concepts according to the value of ssmpy.ssm.mica

Previously computed values are stored in memory for faster computation.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Shared information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm.mica = True
>>> ssmpy.shared_ic(gold, silver)
0.587786664902119
ssmpy.shared_ic_dca(entry1, entry2)

Calculate the shared information content of two concepts using disjunctive common ancestors.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Shared information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.shared_ic_dca(gold, silver)
0.587786664902119
ssmpy.shared_ic_mica(entry1, entry2)

Calculate the shared information content of two concepts using the most informative common ancestor.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Shared information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.shared_ic_mica(gold, silver)
0.587786664902119
ssmpy.ssm_jiang_conrath(entry1, entry2)

Calculate JC’s semantic similarity.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Semantic similarity

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_jiang_conrath(gold, silver)
0.5456783339686456
ssmpy.ssm_lin(entry1, entry2)

Calculate Lin’s semantic similarity.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Semantic similarity

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_lin(gold, silver)
0.39079549108439265
ssmpy.ssm_multiple(m, entry1_list, entry2_list)

Calculate semantic similarity over two lists of concepts.

Parameters
  • m – semantic similarity function

  • entry1_list – First concept list

  • entry2_list – Second concept list

Returns

Aggregate Similarity Measure

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> import gzip
>>> import shutil
>>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0]
'go.db.gz'
>>> with gzip.open('go.db.gz', 'rb') as f_in:
...     with open('go.db', 'wb') as f_out:
...    shutil.copyfileobj(f_in, f_out)
>>> ssmpy.semantic_base("go.db")
>>> e1 = ssmpy.get_uniprot_annotations("Q12345")
>>> e2 = ssmpy.get_uniprot_annotations("Q12346")
>>> ssmpy.ssm_multiple(ssmpy.ssm_resnik, e1, e2)
1.653493583942882
ssmpy.ssm_resnik(entry1, entry2)

Calculate Resnik’s semantic similarity.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Semantic similarity

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_resnik(gold, silver)
0.587786664902119

Reference:

  • F. Couto and A. Lamurias, “Semantic similarity definition,” in Encyclopedia of Bioinformatics and Computational Biology (S. Ranganathan, K. Nakai, C. Schönbach, and M. Gribskov, eds.), vol. 1, pp. 870–876, Oxford: Elsevier, 2019 https://doi.org/10.1016/B978-0-12-809633-8.20401-9

Indices and tables