Getting started

Installation

Either clone this repository or install from pypi:

pip install ssmpy

Quick start

import ssmpy

Metals Example

To create the semantic base file (metals.db) from the metals.owl file:

ssmpy.create_semantic_base("metals.owl", "metals.db", "https://raw.githubusercontent.com/lasigeBioTM/ssm/master/metals.owl#", "http://www.w3.org/2000/01/rdf-schema#subClassOf", "metals.txt")
ssmpy.semantic_base("metals.db")

The metals.txt contains the a list of occurrences. For example, the following contents has one occurrence for each term, except gold and silver with two occurrences.

gold
silver
gold
silver
copper
platinum
palladium
metal
coinage
precious

Now to calculate the similarity between copper and gold execute:

e1 = ssmpy.get_id("copper")
e2 = ssmpy.get_id("gold")
ssmpy.ssm_resnik (e1,e2)
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

0.22599256187152864
0.1504595366201814
0.281527889373394

Options

We can choose to calculate the measures using either the extrinsic or intrinsic Information Content (IC), and using the Most Informative Common Ancestors (MICA) or Disjunctive Common Ancestors (DCA). By default, the measures are calculated using extrinsic IC and DCA.

ssmpy.ssm.mica = False # determines if it uses MICA or DCA
ssmpy.ssm.intrinsic = False # determines if it uses extrinsic or intrinsic IC

Now calculate the similarity between copper and gold using intrinsic IC and MICA:

ssmpy.ssm.mica = True
ssmpy.ssm.intrinsic = True
e1 = ssmpy.get_id("copper")
e2 = ssmpy.get_id("gold")
ssmpy.ssm_resnik (e1,e2)
ssmpy.ssm_resnik(e1,e2)
ssmpy.ssm_lin(e1,e2)
ssmpy.ssm_jiang_conrath(e1,e2)

Output:

0.587786664902119
0.39079549108439265
0.35303485982596094