API

ssmpy.calculate_information_content_intrinsic(df, max_freq)

Calculates the information content of a dataframe of entries :param df: pandas DataFrame :param max_freq: maximum frequency in the ontology :return: df with extra column ‘IC’

ssmpy.common_ancestors(entry1, entry2)

Get common ancestors between two semantic base entries

Parameters
  • entry1 (int) – first semantic base ID

  • entry1 – second semantic base ID

Returns

List of common ancestors

Return type

list

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.common_ancestors(gold, silver)
[6, 2, 10]
ssmpy.create_connection(db_file)
create a database connection to the SQLite database

specified by db_file

Parameters

db_file – database file

Returns

Connection object or None

ssmpy.create_semantic_base(owl_file, sb_file, name_prefix, relation, annotation_file='')

Create sqlite3 semantic base using a owl file.

Parameters
  • owl_file (string) – File name of ontolgy in owl format

  • sb_file (string) – File name of database where semantic base will be stored

  • name_prefix (string) – Prefix of the concepts to be extracted from the ontology

  • relation (string) – Type of relation to be extracted from the ontology

  • annotation_file (string) – File containing ontology concepts to use as annotations and calculate concept frequency. Empty string if this file is not available.

Example:
>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("http://purl.obolibrary.org/obo/go.owl", "go.owl")[0]
'go.owl'
>>> ssmpy.create_semantic_base("go.owl", "go.db", "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "")
loading the ontology go.owl
calculating transitive closure at distance: 1
calculating transitive closure at distance: 2
calculating transitive closure at distance: 3
calculating transitive closure at distance: 4
calculating transitive closure at distance: 5
calculating transitive closure at distance: 6
calculating transitive closure at distance: 7
calculating transitive closure at distance: 8
calculating transitive closure at distance: 9
calculating transitive closure at distance: 10
calculating transitive closure at distance: 11
calculating transitive closure at distance: 12
calculating transitive closure at distance: 13
calculating transitive closure at distance: 14
calculating transitive closure at distance: 15
calculating transitive closure at distance: 16
calculating the descendents
calculating the hierarchical frequency
the end
>>> ssmpy.semantic_base("go.db")
ssmpy.db_select_entry(conn, entry_list)

Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity names

Returns

pandas dataframe with all columns in entry

ssmpy.db_select_entry_by_id(conn, entry_list)

Query all rows in the entry table, where name is in entry_list :param conn: the Connection object :param entry_list: list of entity ids

Returns

pandas dataframe with all columns in entry

ssmpy.db_select_transitive(conn, ids_list)

Query all rows in the transitive table, where id is in ids_list :param conn: the Connection object :param ids_list: list of entity ids

Returns

pandas dataframe with all columns in transitive

ssmpy.fast_jc(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the JC MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_jc]

ssmpy.fast_lin(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the LIN MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_lin]

ssmpy.fast_resn_lin_jc(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the RESNIK, LIN and JC MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_resnik, sim_lin, sim_jc]

ssmpy.fast_resnik(all_ancestors, df_entry_ancestors, df_entry_ic, it1, it2)

Calculates the RESNIK MICA INTRINSIC similarity between it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors (from table transitive)

  • df_entry_ancestors – pandas DataFrame of all ancestors (from table entry) with column IC

  • df_entry_ic – pandas DataFrame of all entities (from table entry) with column IC

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

list: [e1, e2, sim_resnik]

ssmpy.get_all_commom_ancestors(all_ancestors, it1, it2)

Get all common ancestors for it1 and it2

Parameters
  • all_ancestors – pandas DataFrame of all ancestors

  • it1 – entity 1 (id)

  • it2 – entity 2 (id)

Returns

pandas DataDrame of common ancestors or zero

ssmpy.get_ancestors(entry)

Get ancestors of a given semantic base entry

Parameters

entry (int) – semantic base ID

Returns

List of ancestors

Return type

list

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.get_ancestors(gold)
[3, 6, 2, 10]
ssmpy.get_id(name)

Get semantic base ID of ontolgy concept by its original label (name).

Parameters

name (string) – ontology label (depends on the ontolgy)

Returns

semantic base ID or -1 if not found

Return type

int

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> ssmpy.get_id("gold")
3
ssmpy.get_name(cid)

Get ontology label (name) for a given semantic base ID.

Parameters

cid (int) – semantic base ID

Returns

ontology label (name)

Return type

string

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> ssmpy.get_name(3)
'gold'
ssmpy.get_uniprot_annotations(protein_acc)

Retrieve GO annotations for a UniProt ID using UniProt API

Parameters

protein_acc (string) – UniProt protein ID

Returns

list of GO terms

Return type

list

Example

>>> import ssmpy
>>> import urllib.request
>>> import gzip
>>> import shutil
>>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0]
    'go.db.gz'
    >>> with gzip.open('go.db.gz', 'rb') as f_in:
    ...    with open('go.db', 'wb') as f_out:
    ...        shutil.copyfileobj(f_in, f_out)
    >>> ssmpy.semantic_base("go.db")
    >>> l = sorted(ssmpy.get_uniprot_annotations("Q12345"))
    >>> l
    [1746, 9044, 17053, 21566, 24341, 57621, 95359]
ssmpy.information_content(entry)

Get information content of a semantic base entry according to intrinsic.

Parameters

entry (int) – semantic base ID

Returns

information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.ssm.intrinsic = True
>>> ssmpy.information_content(gold)
1.5040773967762742
ssmpy.information_content_extrinsic(entry)

Get the extrinsic information content of a semantic base entry.

The values are precomputated at the time of creation of the semantic base according to the annotations file provided.

Parameters

entry (int) – semantic base ID

Returns

extrinsic information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.information_content_extrinsic(gold)
1.2992829841302609
ssmpy.information_content_intrinsic(entry)

Get the intrinsic information content of a semantic base entry.

Parameters

entry (int) – semantic base ID

Returns

intrinsic information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> ssmpy.information_content_intrinsic(gold)
1.5040773967762742
ssmpy.light_similarity(conn, entry_ids_1, entry_ids_2, metric, cpu_cores)

main function :param conn: db_connection :param entry_ids_1: list of entries 1 :param entry_ids_2: list of entries 2 :param cpu_cores: number of cores to be used :param metric: ‘lin’, ‘resnick’, ‘jc’ or ‘all’ :return: list with results ([e1, e2, similarity] or [e1, e2, similarity resnik, similarity lin, similarity jc])

Example

>>> import ssmpy
>>> ssmpy.create_semantic_base('doid.owl', 'doid.db', "http://purl.obolibrary.org/obo/", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "")
>>> conn = ssmpy.create_connection('doid.db')
>>> list1 = ['DOID_10587', 'DOID_2841']
>>> list2 = ['DOID_1927', 'DOID_1324']
>>> ssmpy.light_similarity(conn, list1, list2, 'all', 4)
[[['DOID_10587', 'DOID_1324', -0.0, -0.0, 0.068819810490695],
['DOID_10587', 'DOID_1927', 5.937536205082426, 0.8269561090992177, 0.28695173228265203]],
[['DOID_2841', 'DOID_1324', 3.703943983575332, 0.659762410973656, 0.20745912457314464],
['DOID_2841', 'DOID_1927', -0.0, -0.0, 0.07658496040867407]]]
ssmpy.num_paths(entry1, ancestor)

Get number of paths (edges) between two concepts.

Parameters
  • entry1 (int) – Child concept

  • ancestor (int) – Parent concept

Returns

number of edges between the two concepts

Return type

int

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> metal = ssmpy.get_id("metal")
>>> ssmpy.num_paths(gold, metal)
5
ssmpy.run_query(query, params)

Run any query on the semantic base.

Parameters
  • query (string) – query to run on the semantic base

  • params (tuple) – query parameters

Returns

query result

Return type

sqlite3.Cursor

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> query = "SELECT id FROM entry WHERE name = ?"
>>> ssmpy.run_query(query, ("gold",)).fetchone()
(3,)
ssmpy.semantic_base(sb_file, **kwargs)

Initialize global connection object.

You can also pass other arguments to be given to the sqlite3.connect method, for example check_same_thread. After this method is called, the other methods will be applied to the semantic base.

Parameters

sb_file (string) – sqlite database filename

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
ssmpy.shared_ic(entry1, entry2)

Calculate the shared information content of two concepts according to the value of ssmpy.ssm.mica

Previously computed values are stored in memory for faster computation.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Shared information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm.mica = True
>>> ssmpy.shared_ic(gold, silver)
0.587786664902119
ssmpy.shared_ic_dca(entry1, entry2)

Calculate the shared information content of two concepts using disjunctive common ancestors.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Shared information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.shared_ic_dca(gold, silver)
0.587786664902119
ssmpy.shared_ic_mica(entry1, entry2)

Calculate the shared information content of two concepts using the most informative common ancestor.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Shared information content

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.shared_ic_mica(gold, silver)
0.587786664902119
ssmpy.ssm_jiang_conrath(entry1, entry2)

Calculate JC’s semantic similarity.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Semantic similarity

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_jiang_conrath(gold, silver)
0.5456783339686456
ssmpy.ssm_lin(entry1, entry2)

Calculate Lin’s semantic similarity.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Semantic similarity

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_lin(gold, silver)
0.39079549108439265
ssmpy.ssm_multiple(m, entry1_list, entry2_list)

Calculate semantic similarity over two lists of concepts.

Parameters
  • m – semantic similarity function

  • entry1_list – First concept list

  • entry2_list – Second concept list

Returns

Aggregate Similarity Measure

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> import gzip
>>> import shutil
>>> urllib.request.urlretrieve("http://labs.rd.ciencias.ulisboa.pt/dishin/go201907.db.gz", "go.db.gz")[0]
'go.db.gz'
>>> with gzip.open('go.db.gz', 'rb') as f_in:
...     with open('go.db', 'wb') as f_out:
...    shutil.copyfileobj(f_in, f_out)
>>> ssmpy.semantic_base("go.db")
>>> e1 = ssmpy.get_uniprot_annotations("Q12345")
>>> e2 = ssmpy.get_uniprot_annotations("Q12346")
>>> ssmpy.ssm_multiple(ssmpy.ssm_resnik, e1, e2)
1.653493583942882
ssmpy.ssm_resnik(entry1, entry2)

Calculate Resnik’s semantic similarity.

Parameters
  • entry1 (int) – First concept

  • ancestor (int) – Second concept

Returns

Semantic similarity

Return type

float

Example

>>> import ssmpy
>>> import urllib.request
>>> urllib.request.urlretrieve("https://github.com/lasigeBioTM/DiShIn/raw/master/metals.db", "metals.db")[0]
'metals.db'
>>> ssmpy.semantic_base("metals.db")
>>> gold = ssmpy.get_id("gold")
>>> silver = ssmpy.get_id("silver")
>>> ssmpy.ssm_resnik(gold, silver)
0.587786664902119