Airtable - Table

Hide fields

Filter

Group

Sort

eos7e3s

eos74km

eos8ub5

eos2db3

eos9gg2

eos3mk2

eos9p4a

eos39co

eos3wzy

eos3nn9

eos1pu1

eos39dp

eos6ru3

eos6ost

eos8aox

eos57bx

eos5guo

eos24ur

eos2401

eos5gge

eos7d58

eos694w

eos42ez

eos21q7

eos18ie

eos8bhe

eos5cl7

eos8aa5

eos3dq3

eos4djh

eos35g4

eos3ujl

eos9uqy

eos4f8y

eos30d7

eos69mr

eos8vud

eos9aqt

eos2fg2

eos5iy5

eos3nl8

eos1xje

eos1n4b

eos9ym3

eos30f3

eos5xng

eos69e6

eos4wt0

eos4x30

eos1ut3

eos9ivc

eos9zw0

eos633t

eos3kcw

eos1d7r

eos9ueu

eos4f95

eos2zmb

eos1noy

eos3le9

eos4rta

eos2l0q

eos3804

eos2hzy

eos8fma

eos1mxi

eos7yti

eos4qda

eos80ch

eos3ev6

eos7nno

eos5jz9

eos59rr

eos7kpb

eos2gw4

eos3cf4

eos3zur

eos9tyg

eos44zp

eos24jm

eos6aun

eos31ve

eos2fy6

eos2lqb

eos8fth

eos8lok

eos9yy1

eos22io

eos74bo

eos81ew

eos93h2

eos7qga

eos4avb

eos4cxk

eos8c0o

eos6hy3

eos5505

eos4se9

eos24ci

100

eos1086

101

eos5ecc

102

eos935d

103

eos4q1a

104

eos9taz

105

eos2rd8

106

eos9sa2

107

eos8a5g

108

eos238c

109

eos2v11

110

eos1579

111

eos6m4j

112

eos4zfy

113

eos2a9n

114

eos9c7k

115

eos7jlv

116

eos4b8j

117

eos3ae7

118

eos9be7

119

eos4tcc

120

eos5qfo

121

eos2mrz

122

eos2re5

123

eos30gr

124

eos526j

125

eos6pbf

126

eos2b6f

127

eos3xip

128

eos6o0z

129

eos85a3

130

eos8451

131

eos157v

132

eos481p

133

eos2mhp

134

eos6fza

135

eos5smc

136

eos9ei3

137

eos46ev

138

eos69p9

139

eos7a45

140

eos0t05

141

eos0t04

142

eos0t00

143

eos65rt

144

eos2hbd

145

eos97yu

146

eos6tg8

147

eos2gth

148

eos7pw8

149

eos8ioa

150

eos9yui

151

eos2r5a

152

eos6oli

153

eos6ao8

154

eos43at

155

eos1af5

156

eos2ta5

157

eos96ia

158

eos8d8a

159

eos7asg

160

eos2lm8

161

eos78ao

162

eos7jio

163

eos2thm

164

eos8a4x

165

eos8h6g

166

eos3b5e

167

eos5axz

168

eos3ae6

169

eos7w6n

170

eos4u6p

171

eos7a04

172

eos77w8

173

eos1amr

174

eos1vms

175

eos92sw

176

eos9f6t

177

eos0t03

178

eos0t02

179

eos0t01

180

eos4e40

Slug

Task

dili-pred

Archived

GitHub

Drug-induced liver injury prediction

Prediction of clinically relevant drug-induced-liver-injury (DILI), based solely on drug structure using binary classification methods. The authors collected a public dataset of 475 molecules with associated DILI outcomes, and built a model with an accuracy of 0.89. The model checkpoints have not been provided so Ersilia has used the provided data to retrain the model.

Retrained

Classification

Compound

Single

Probability

Float

Single

Probability that a drug causes DILI

Metabolism

Toxicity

https://github.com/ersilia-os/eos7e3s

https://pubmed.ncbi.nlm.nih.gov/30325042/

https://github.com/cptbern/QSAR_DILI_2019

None

leilayesufu

2/1/2024

https://github.com/leilayesufu

https://hub.docker.com/r/ersiliaos/eos7e3s

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7e3s.zip

2024

antimicrobial-kg-ml

Ready

GitHub

Antimicrobial class specificity prediction

Prediction of antimicrobial class specificity using simple machine learning methods applied to an antimicrobial knowledge graph. The knowledge graph is built on ChEMBL, Co-ADD and SPARK. Endpoints are broad terms such as activity against gram-positive or gram-negative bacteria. The best model according to the authors is a Random Forest with MHFP6 fingerprints.

Pretrained

Annotation

Compound

Single

Score

Float

List

Class probabilities for each antimicrobial class

Antimicrobial activity

https://github.com/ersilia-os/eos74km

https://www.biorxiv.org/content/10.1101/2024.12.02.626313v1.full

https://github.com/IMI-COMBINE/broad_spectrum_prediction

MIT

miquelduranfrigola

17/12/2024

https://github.com/miquelduranfrigola

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos74km.zip

Local

2024

chemical-space-projections-coconut

Ready

GitHub

Projections against Coconut

This tool performs PCA, UMAP and tSNE projections taking the Coconut natural products database as a chemical space of reference. The Ersilia Compound Embeddings are used as descriptors. Four PCA components and two UMAP and tSNE components are returned.

In-house

Representation

Compound

Single

Value

Float

List

Coordinates of 2D projections, namely PCA, UMAP and tSNE.

Embedding

https://github.com/ersilia-os/eos8ub5

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00478-9

https://github.com/ersilia-os/compound-embedding

GPL-3.0-or-later

miquelduranfrigola

10/11/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos8ub5

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8ub5.zip

Local

2024

chemical-space-projections-chemdiv

Ready

GitHub

Chemical space 2D projections against ChemDiv

This tool performs PCA, UMAP and tSNE projections taking a 100k ChemDiv diversity set as a chemical space of reference. The Ersilia Compound Embeddings are used as descriptors. Four PCA components and two UMAP and tSNE components are returned.

In-house

Representation

Compound

Single

Value

Float

List

Coordinates of 2D projections, namely PCA, UMAP and tSNE.

Embedding

https://github.com/ersilia-os/eos2db3

https://www.chemdiv.com/catalog/diversity-libraries/representative-diversity-libraries-out-of-1-6m-stock/

https://github.com/ersilia-os/compound-embedding

GPL-3.0-or-later

miquelduranfrigola

9/11/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2db3

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2db3.zip

Local

2024

chemical-space-projections-drugbank

Ready

GitHub

Chemical space 2D projections against DrugBank

This tool performs PCA, UMAP and tSNE projections taking the DrugBank chemical space as a reference. The Ersilia Compound Embeddings are used as descriptors. Four PCA components and two UMAP and tSNE components are returned.

In-house

Representation

Compound

Single

Value

Float

List

Coordinates of 2D projections, namely PCA, UMAP and tSNE.

Embedding

https://github.com/ersilia-os/eos9gg2

https://academic.oup.com/nar/article/52/D1/D1265/7416367

https://github.com/ersilia-os/compound-embedding

GPL-3.0-or-later

miquelduranfrigola

9/11/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos9gg2

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9gg2.zip

Local

2024

bbbp-marine-kinase-inhibitors

Ready

GitHub

BBBP model tested on marine-derived kinase inhibitors

A set of three binary classifiers (random forest, gradient boosting classifier, and logistic regression) to predict the Blood-Brain Barrier (BBB) permeability of small organic compounds. The best models were applied to natural products of marine origin, able to inhibit kinases associated with neurodegenerative disorders. The training set size was around 300 compounds.

Retrained

Annotation

Compound

Single

Score

Float

List

Classification score over three classifiers, namely random forest (rfc), gradient boosting classifier (gbc), and logistic regression (logreg).

Drug-likeness

Permeability

https://github.com/ersilia-os/eos3mk2

https://pubmed.ncbi.nlm.nih.gov/30699889/

https://github.com/plissonf/BBB-Models

MIT

miquelduranfrigola

23/10/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos3mk2

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3mk2.zip

Local

2024

deep-dl

Ready

GitHub

Drug-likeness scoring based on unsupervised learning

This model evaluates drug-likeness using an unsupervised learning approach, eliminating the need for labeled data and avoiding biases from incomplete negative sets. It extracts features directly from known drug molecules, identifying common characteristics through a recurrent neural network (RNN) language model. By representing molecules as SMILES strings, the model learns the probability distribution of known drugs and assesses new molecules based on their likelihood of appearing in this space.

Pretrained

Annotation

Compound

Single

Score

Float

Single

Higher score indicates higher drug likeness

Drug-likeness

https://github.com/ersilia-os/eos9p4a

https://pubs.rsc.org/en/content/articlehtml/2022/sc/d1sc05248a

https://github.com/SeonghwanSeo/DeepDL

GPL-3.0-or-later

https://eos9p4a-izpny.ondigitalocean.app/

miquelduranfrigola

4/9/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos9p4a

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9p4a.zip

https://ersilia-app-cubsw.ondigitalocean.app/?model_id=eos9p4a

Online

2024

unimol-representation

Ready

GitHub

Uni-Mol molecular representation

Uni-Mol offers a simple and effective SE(3) equivariant transformer architecture for pre-training molecular representations that capture 3D information. The model is trained on >200M conformations. The current model outputs a representation embedding.

Pretrained

Representation

Compound

Single

Value

Float

List

Uni-Mol representation embedding

Fingerprint

https://github.com/ersilia-os/eos39co

https://openreview.net/forum?id=6K2RM6wVqKu

https://github.com/deepmodeling/Uni-Mol

GPL-3.0-only

miquelduranfrigola

22/7/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos39co

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos39co.zip

Local

2024

qupkake

Ready

GitHub

Predict micro-pKa of organic molecules

QupKake is an innovative approach that combines graph neural network (GNN) models with semiempirical quantum mechanical (QM) features to forecast the micro-pKa values of organic molecules. QM has a significant role in both identifying reaction sites and predicting micro-pKa values. Precisely predicting micro-pKa values is vital for comprehending and adjusting the acidity and basicity of organic compounds, This has significant applications in drug discovery, materials science, and environmental c

Pretrained

Annotation

Compound

Single

Value

Float

List

Up to 10 pKa values for the molecule

pKa

https://github.com/ersilia-os/eos3wzy

https://doi.org/10.1021/acs.jctc.4c00328

https://github.com/hutchisonlab/QupKake

BSD-3-Clause

LauraGomezjurado

17/7/2024

https://github.com/LauraGomezjurado

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3wzy.zip

Local

2024

mpro-covid19

Ready

GitHub

Predict bioactivity against Main Protease of SARS-CoV-2

MProPred predicts the efficacy of compounds against the main protease of SARS-CoV-2, which is a promising drug target since it processes polyproteins of SARS-CoV-2. This model uses PaDEL-Descriptor to calculate molecular descriptors of compounds. It is based on a dataset of 758 compounds that have inhibition efficacy against the Main Protease, as published in peer-reviewed journals between January, 2020 and August, 2021. Input compounds are compared to compounds in the dataset to measure molecul

Pretrained

Annotation

Compound

Single

Value

Float

Single

Gives the pIC50 values for each compound to compare their bioactivity against the main protease

COVID19

https://github.com/ersilia-os/eos3nn9

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10289339/

https://github.com/Nadimfrds/Mpropred

MIT

HarmonySosa

1/7/2024

https://github.com/HarmonySosa

https://hub.docker.com/r/ersiliaos/eos3nn9

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3nn9.zip

Local

2024

cardiotox-dictrank

Ready

GitHub

Cardiotoxicity Classifier

Prediction of drug-induced cardiotoxicity as a binary classification of cardiotoxicity risk. The probability score depicts risk of the compound being cardiotoxic. Classification is based on the chemical data such as SMILES representations of compounds and a variety of descriptors such as Morgan fingerprints and Mordred physicochemical descriptors that describe the molecular structure of the drug interactions. Biological data is also used including gene expression and cellular paintings after dru

Retrained

Annotation

Compound

Single

Score

Float

Single

The model provides a probability score indicating the likelihood of a compound being cardiotoxic

Cardiotoxicity

DrugBank

https://github.com/ersilia-os/eos1pu1

https://doi.org/10.1021/acs.jcim.3c01834

https://github.com/srijitseal/DICTrank

None

kurysauce

29/6/2024

https://github.com/kurysauce

https://hub.docker.com/r/ersiliaos/eos1pu1

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1pu1.zip

Local

2024

phakinpro

Ready

GitHub

Pharmacokinetics Profiler (PhaKinPro)

Pharmacokinetics Profiler (PhaKinPro) predicts the pharmacokinetic (PK) properties of drug candidates. It has been built using a manually curated database of 10.000 compounds with information for 12 PK endpoints. Each model provides a multi-classifier output for a single endpoint, along with a confidence estimate of the prediction and whether the query molecule is within the applicability domain of the model.

Pretrained

Annotation

Compound

Single

Score

String

List

A list of several ADME predictions

Microsomal stability

ADME

Metabolism

Half-life

Permeability

https://github.com/ersilia-os/eos39dp

https://pubs.acs.org/doi/10.1021/acs.jmedchem.3c02446

https://github.com/molecularmodelinglab/PhaKinPro

MIT

sucksido

3/5/2024

https://github.com/sucksido

https://hub.docker.com/r/ersiliaos/eos39dp

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos39dp.zip

Local

2024

whales-qmug

Ready

GitHub

WHALES similarity search on 600k molecules from Q-Mug

Search Q-Mug based on WHALES descriptors. Q-Mug is a subset of 600k bioactive molecules from ChEMBL. Three conformers are given for each molecule. WhALES is a simple descriptor useful for scaffold hopping.

Pretrained

Sampling

Compound

Single

Compound

String

List

The top 100 most similar molecules are returned, based on WHALES descriptors. 3D conformer generation is done internally.

Similarity

https://github.com/ersilia-os/eos6ru3

https://link.springer.com/protocol/10.1007/978-1-0716-1209-5_2

https://github.com/ETHmodlab/scaffold_hopping_whales

GPL-3.0

miquelduranfrigola

22/4/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos6ru3

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6ru3.zip

Local

2024

reinvent4-libinvent

Ready

GitHub

REINVENT 4 LibInvent

REINVENT 4 LibInvent creates new molecules by appending R groups to a given input. If the input SMILES string contains specified attachment points, it is directly processed by LibInvent to generate new molecules. If no attachment points given, the model try to find potential attachment points, and iterates through different combinations of these points. It passes each combination to LibInvent to generate new molecules.

Pretrained

Sampling

Compound

Single

Compound

String

List

Model generates up to 1000 similar molecules per input molecule.

Similarity

https://github.com/ersilia-os/eos6ost

https://chemrxiv.org/engage/chemrxiv/article-details/65463cafc573f893f1cae33a

https://github.com/MolecularAI/REINVENT4

Apache-2.0

ankitskvmdam

18/4/2024

https://github.com/ankitskvmdam

https://hub.docker.com/r/ersiliaos/eos6ost

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6ost.zip

Local

2024

cc-signaturizer-3d

Ready

GitHub

Chemical Checker Signaturizer 3D

Building on the Chemical Checker bioactivity signatures (available as eos4u6p), the authors use the relation between stereoisomers and bioactivity of over 1M compounds to train stereochemically-aware signaturizers that better describe small molecule bioactivity properties. In this implementation we provide the A1, A2, A3, B1, B4 and C3 signatures

Pretrained

Representation

Compound

Single

Value

Float

List

2D projection of bioactivity signatures

Descriptor

Bioactivity profile

Embedding

https://github.com/ersilia-os/eos8aox

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-024-00867-4

https://gitlabsbnb.irbbarcelona.org/packages/signaturizer3d

MIT

GemmaTuron

19/3/2024

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos8aox

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8aox.zip

Local

2024

reinvent4-mol2mol-scaffold

Ready

GitHub

REINVENT 4 Mol2MolScaffold

Mol2MolScaffold uses REINVENT4's mol2mol scaffold prior and mol2mol scaffold generic prior to generate around 500 new molecules similar to the provided molecules. The generated molecules will be relatively similar to the input molecules.

Pretrained

Sampling

Compound

Single

Compound

String

List

Model generates up to 500 similar molecules per input molecule.

Similarity

https://github.com/ersilia-os/eos57bx

https://chemrxiv.org/engage/chemrxiv/article-details/65463cafc573f893f1cae33a

https://github.com/MolecularAI/REINVENT4

Apache-2.0

ankitskvmdam

8/3/2024

https://github.com/ankitskvmdam

https://hub.docker.com/r/ersiliaos/eos57bx

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos57bx.zip

Local

2024

erg-fingerprints

Ready

GitHub

ErG 2D Descriptors

The Extended Reduced Graph (ErG) approach uses the description of pharmacophore nodes to encode molecular properties, with the goal of correctly describing pharmacophoric properties, size and shape of molecules. It was benchmarked against Daylight fingerprints and outperformed them in 10 out of 11 cases. ErG descriptors are well suited for scaffold hopping approaches.

Pretrained

Representation

Compound

Single

Value

Float

List

Vector representing ErG fingerprint values

Descriptor

Fingerprint

https://github.com/ersilia-os/eos5guo

https://pubs.acs.org/doi/10.1021/ci050457y

https://www.rdkit.org/docs/source/rdkit.Chem.rdReducedGraphs.html

BSD-3.0

GemmaTuron

6/3/2024

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos5guo

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5guo.zip

Local

2024

whales-scaled

Ready

GitHub

WHALES scaled

Scaled version of the WHALES descriptors (see eos3ae6). WHALES are holistic molecular descriptors useful for scaffold hopping, based on 3D structure to facilitate natural product featurization. The scaling uses sklearn's Robust Scaler trained on a random set of 100K molecules from ChEMBL.

Pretrained

Representation

Compound

Single

Value

Float

List

Scaled vector representation of a molecule

Natural product

Descriptor

https://github.com/ersilia-os/eos24ur

https://www.nature.com/articles/s42004-018-0043-x

https://github.com/grisoniFr/scaffold_hopping_whales

MIT

miquelduranfrigola

5/3/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos24ur

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos24ur.zip

Local

2024

scaffold-decoration

Ready

GitHub

Scaffold decoration

The context discusses a novel notation system called Sequential Attachment-based Fragment Embedding (SAFE) that improves upon traditional molecular string representations like SMILES. SAFE reframes SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining compatibility with existing SMILES parsers. This streamlines complex molecular design tasks by facilitating autoregressive generation under various constraints. The effectiveness of SAFE is demonstrated by trai

Pretrained

Sampling

Compound

Single

Compound

String

List

Model generates up to 1000 new molecules from input molecule by replacing side chains of the scaffold

Compound generation

https://github.com/ersilia-os/eos2401

https://arxiv.org/pdf/2310.10773.pdf

https://github.com/datamol-io/safe/tree/main

Inyrkz

20/2/2024

https://github.com/Inyrkz

https://hub.docker.com/r/ersiliaos/eos2401

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2401.zip

Local

2024

dili-predictor

Ready

Early prediction of Drug-Induced Liver Injury

The DILI-Predictor predicts 10 features related to DILI toxicity including in-vivo and in-vitro and physicochemical parameters. It has been developed by the Broad Institute using the DILIst dataset (1020 compounds) from the FDA and achieved an accuracy balance of 70% on a test set of 255 compounds held out from the same dataset. The authors show how the model can correctly predict compounds that are not toxic in human despite being toxic in mice.

Pretrained

Annotation

Compound

Single

Score

Float

List

Prediction of 10 DILI-related endpoints. The most important is the first, DILI. Threshold for DILI active is set at 0.16 by the authors.

Toxicity

Metabolism

https://github.com/ersilia-os/eos5gge

https://pubs.acs.org/doi/10.1021/acs.chemrestox.4c00015

https://github.com/Manas02/dili-pip

None

Zainab-ik

19/2/2024

https://github.com/Zainab-ik

https://hub.docker.com/r/ersiliaos/eos5gge

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5gge.zip

Local

2024

admet-ai-prediction

Ready

ADMET properties prediction

ADMET AI is a framework for carrying out fast batch predictions for ADMET properties. It is based on ensemble of five Chemprop-RDKit models and has been trained on 41 tasks from the ADMET group in Therapeutics Data Commons (v0.4.1). Out of these 41 tasks, there are 31 classification tasks and 10 regression tasks. In addition to that output also contains 8 physicochemical properties, namely, molecular weight, logP, hydrogen bond acceptors, hydrogen bond doners, Lipinski's Rule of 5, QED, stereo c

Pretrained

Annotation

Compound

Single

Score

Value

Float

List

ADMET outcomes, including physicochemical properties and classification tasks, as well as percentile normalizations based on the DrugBank chemical space.

ADME

Toxicity

https://github.com/ersilia-os/eos7d58

https://academic.oup.com/bioinformatics/article/40/7/btae416/7698030

https://github.com/swansonk14/admet_ai

MIT

https://eos7d58-awe6b.ondigitalocean.app/

DhanshreeA

7/2/2024

https://github.com/DhanshreeA

https://hub.docker.com/r/ersiliaos/eos7d58

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7d58.zip

https://ersilia-app-cubsw.ondigitalocean.app/?model_id=eos7d58

Yes

Local

2024

reinvent4-mol2mol-medium-similarity

Ready

REINVENT 4 Mol2MolMediumSimilarity

The Mol2MolMediumSimilarity leverages REINVENT4's mol2mol medium similarity prior to generate up to 100 unique molecules. The generated molecules will be relatively similar to the input molecule.

Pretrained

Sampling

Compound

Single

Compound

String

List

Model generates up to 100 similar molecules per input molecule.

Similarity

https://github.com/ersilia-os/eos694w

https://chemrxiv.org/engage/chemrxiv/article-details/65463cafc573f893f1cae33a

https://github.com/MolecularAI/REINVENT4

Apache-2.0

ankitskvmdam

7/2/2024

https://github.com/ankitskvmdam

https://hub.docker.com/r/ersiliaos/eos694w

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos694w.zip

Local

2024

antibiotics-ai-cytotox

Ready

Human cytotoxicity endpoints

The authors tested the dataset of 39312 compounds used to train the antibiotics-ai model (eos18ie) against several cytotoxicity endpoints; human liver carcinoma cells (HepG2), human primary skeletal muscle cells (HSkMCs) and human lung fibroblast cells (IMR-90). Cellular viability was measured after 20133 days of treatment with each compound at 10 μM and activities were binarized using a 90% cell viability cut-off. 341 (8.5%), 490 (3.8%) and 447 (8.8%) compounds classified as cytotoxic for HepG2

Pretrained

Annotation

Compound

Single

Score

Float

List

Predicting cytotoxicity in human liver carcinoma cells (HepG2), human primary skeletal muscle cells (HSkMCs) and human lung fibroblast cells (IMR-90)

Cytotoxicity

https://github.com/ersilia-os/eos42ez

https://www.nature.com/articles/s41586-023-06887-8

https://github.com/felixjwong/antibioticsai

MIT

Richiio

5/2/2024

https://github.com/Richiio

https://hub.docker.com/r/ersiliaos/eos42ez

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos42ez.zip

Yes

Local

2024

inter-dili

Ready

InterDILI: drug-induced injury prediction

This model has been trained on a publicly available collection of 5 datasets manually curated for drug-induced-liver-injury (DILI). DILI outcome has been binarised, and ECFP descriptors, together with physicochemical properties have been used to train a random forest classifier which achieves AUROC > 0.9

Retrained

Annotation

Compound

Single

Score

Float

Single

Probability of Drug-Induced Liver Injury (DILI), higher score indicates higer risk

Toxicity

Human

Metabolism

https://github.com/ersilia-os/eos21q7

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00796-8

https://github.com/bmil-jnu/InterDILI

None

leilayesufu

30/1/2024

https://github.com/leilayesufu

https://hub.docker.com/r/ersiliaos/eos21q7

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos21q7.zip

Local

2024

antibiotics-ai-saureus

Ready

Antibiotic activity prediction against Staphylococcus aureus

The authors use a mid-size dataset (more than 30k compounds) to train an explainable graph-based model to identify potential antibiotics with low cytotoxicity. The model uses a substructure-based approach to explore the chemical space. Using this method, they were able to screen 283 compounds and identify a candidate active against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci.

Pretrained

Annotation

Compound

Single

Score

Float

Single

Probability of growth inhibition (80% cut off at 50uM)

Antimicrobial activity

ESKAPE

https://github.com/ersilia-os/eos18ie

https://www.nature.com/articles/s41586-023-06887-8

https://github.com/felixjwong/antibioticsai

MIT

Richiio

26/1/2024

https://github.com/Richiio

https://hub.docker.com/r/ersiliaos/eos18ie

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos18ie.zip

Yes

Local

2024

scaffold-morphing

Ready

Scaffold morphing

Pretrained

Sampling

Compound

Single

Compound

String

List

Model generates new molecules from input molecule by replacing core structures of input molecule.

Compound generation

https://github.com/ersilia-os/eos8bhe

https://arxiv.org/pdf/2310.10773.pdf

https://github.com/datamol-io/safe/tree/main

Inyrkz

12/1/2024

https://github.com/Inyrkz

https://hub.docker.com/r/ersiliaos/eos8bhe

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8bhe.zip

Local

2024

ngonorrhoeae-inhibition

Ready

Growth Inhibitors of Neisseria gonorrhoeae

The authors curated a dataset of 282 compounds from ChEMBL, of which 160 (56.7%) were labeled as active N. gonorrhoeae inhibitor compounds. They used this dataset to build a naïve Bayesian model and used it to screen a commercial library. With this method, they identified and validated two hits. We have used the dataset to build a model using LazyQSAR with Ersilia Compound Embeddings as molecular descriptors. LazyQSAR is an AutoML Ersilia-developed library.

Retrained

Annotation

Compound

Single

Score

Float

Single

Probability of activity for the inhibition of the pathogen N. gonorrhoeae

Antimicrobial activity

ChEMBL

N.gonorrhoeae

https://github.com/ersilia-os/eos5cl7

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274436/

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

Richiio

3/1/2024

https://github.com/Richiio

https://hub.docker.com/r/ersiliaos/eos5cl7

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5cl7.zip

Yes

Local

2024

kgpgt-embedding

In progress

Knowledge-guided pre-trained graph transformer

Neural fingerprints (embeddings) based on a knowledge-guided graph transformer. This model reprsents a novel self-supervised learning framework for the representation learning of molecular graphs, consisting of a novel graph transformer architecture, LiGhT, and a knowledge-guided pre-training strategy.

Pretrained

Representation

Compound

Single

Value

Float

List

Knowledge-driven embedding

Descriptor

https://github.com/ersilia-os/eos8aa5

https://www.nature.com/articles/s41467-023-43214-1

https://github.com/lihan97/KPGT

Apache-2.0

miquelduranfrigola

17/12/2024

https://github.com/miquelduranfrigola

Local

2024

mole-embeddings

In progress

MolE molecular embeddings

Representation

Compound

Single

https://github.com/ersilia-os/eos3dq3

https://www.nature.com/articles/s41467-024-53751-y

https://github.com/recursionpharma/mole_public

miquelduranfrigola

18/11/2024

https://github.com/miquelduranfrigola

Local

2024

datamol-basic-descriptors

In progress

Basic molecular descriptors from Datamol

Basic molecular descriptors calculated with the Datamol package, including molecular weight, lipophilicity (cLogP), hydrogen bond donnors, hydrogen bond acceptors, etc. These descriptors are generally useful to annotate small molecule libraries. They are not recommended for QSAR modeling since they are probably too simple for most scenarios.

Pretrained

Representation

Compound

Single

Value

Float

List

Basic molecular descriptors. Some descriptors are floats and some are counts.

Descriptor

https://github.com/ersilia-os/eos4djh

https://github.com/datamol-io/datamol

https://docs.datamol.io/0.7.4/api/datamol.descriptors.html

Apache-2.0

miquelduranfrigola

9/11/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos4djh

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4djh.zip

Local

2024

drug-metabolites

In progress

Drug metabolites prediction

https://github.com/ersilia-os/eos35g4

https://pubs.rsc.org/en/content/articlelanding/2020/sc/d0sc02639e

https://github.com/KavrakiLab/MetaTrans

miquelduranfrigola

24/10/2024

https://github.com/miquelduranfrigola

Local

2024

mtb-permeability

In progress

Mtb cell wall permeability

This model predicts the probability of a compound of passing the Mycobacterium tuberculosis cell wall membrane. The classifier (permeable vs not permeable) model was trained on a dataset of 5368 molecules. It is a simple classifier (SVC) using Mordred descriptors.

Pretrained

Annotation

Compound

Single

Score

Float

Single

Probability score of a compound passing the Mtb cell wall membrane

M.tuberculosis

Permeability

https://github.com/ersilia-os/eos3ujl

https://link.springer.com/article/10.1007/s11030-024-10952-3

https://github.com/PGlab-NIPER/MTB_Permeability

GPL-3.0-or-later

miquelduranfrigola

16/10/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos3ujl

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3ujl.zip

Local

2024

cheese-sampler

In progress

CHEESE similarity search with multiple similarity measures and against various databases

CHEESE is a chemical embeddings search engine based on approximate nearest neighbors. It supports multiple similarity measures and can search against various databases, including ENAMINE REAL, ZINC, and others. Among the similarity measures, CHEESE supports the classical Morgan fingerprints as well as 3D shape and electrostatics similarities. The search engine is available online. This model from the Ersilia Model Hub is intended to be used a sampler for the CHEESE search engine, where the user

Online

Sampling

Compound

Single

Compound

String

List

A list of up to 100 similar compounds to the input compound.

Similarity

https://github.com/ersilia-os/eos9uqy

https://chemrxiv.org/engage/chemrxiv/article-details/67250915f9980725cfcd1f6f

https://cheese.deepmedchem.com/

GPL-3.0-or-later

miquelduranfrigola

19/8/2024

https://github.com/miquelduranfrigola

Local

2024

one-molecule-mollib

In progress

One-molecule MolLib

MolLib is a low-resource generative model trained on ChEMBL data. It is able to generate drug-like and natural-product-like compounds. In this implementation, given an intial molecule, we first sample similar compounds and then we train the generator.

Pretrained

Sampling

Compound

Single

Compound

String

List

Compounds generated by mollib around the chemical space of the input compound

Similarity

https://github.com/ersilia-os/eos4f8y

https://www.nature.com/articles/s42256-020-0160-y

https://github.com/ETHmodlab/virtual_libraries

GPL-3.0-only

miquelduranfrigola

23/7/2024

https://github.com/miquelduranfrigola

Local

2024

unit-testing-compounds

In progress

Unit Testing Compounds Ersilia Pack

https://github.com/ersilia-os/eos30d7

https://ersilia.io

https://github.com/ersilia-os/ersilia

DhanshreeA

15/7/2024

https://github.com/DhanshreeA

Local

2024

reinvent4-linkinvent

In progress

REINVENT 4 LinkInvent

https://github.com/ersilia-os/eos69mr

https://chemrxiv.org/engage/chemrxiv/article-details/65463cafc573f893f1cae33a

https://github.com/MolecularAI/REINVENT4

ankitskvmdam

19/5/2024

https://github.com/ankitskvmdam

Local

2024

squid

In progress

SQUID 3D shape generation

Equivariant shape-conditioned generation of 3D molecules for ligand-based drug design. SQUID can generate chemically diverse molecules for arbitrary molecular shapes. Shape is defined by the input molecule.

Pretrained

Sampling

Compound

Single

Compound

String

List

Molecules matching the 3D shape of the input compound are suggested

Compound generation

https://github.com/ersilia-os/eos8vud

https://arxiv.org/abs/2210.04893

https://github.com/keiradams/SQUID

MIT

miquelduranfrigola

1/5/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos8vud

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8vud.zip

Local

2024

delfta-qm

In progress

DelFTa quantum mechanical properties prediction

https://github.com/ersilia-os/eos9aqt

https://pubs.rsc.org/en/content/articlehtml/2022/cp/d2cp00834c

https://github.com/josejimenezluna/delfta

miquelduranfrigola

24/4/2024

https://github.com/miquelduranfrigola

Local

2024

opt-admet

To do

ADMET Properties Optimization

https://github.com/ersilia-os/eos2fg2

https://www.nature.com/articles/s41596-023-00942-4#code-availability

https://github.com/antwiser/OptADMET

Zainab-ik

13/2/2024

https://github.com/Zainab-ik

Local

2024

unit-test-compounds

Test

Unit test model for compounds

Given a SMILES string, the model counts the number of characters and other string metrics. This model is just for unit testing, it is not intended to be used in a real-world scenario. Ersilia codebase will heavily rely on this model repository to test various functionalities of the CLI such as fetching from GitHub, DockerHub and this repository will function as the model fixture for Ersilia's integration tests.

In-house

Regression

Compound

Single

Value

Integer

List

Simple count of characters in a SMILES string

Fingerprint

https://github.com/ersilia-os/eos5iy5

https://ersilia.io

https://github.com/ersilia-os/ersilia

GPL-3.0

miquelduranfrigola

3/7/2024

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos5iy5

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5iy5.zip

Local

2024

covid-19-drug-repurposing

Archived

DRKG_COVID19

https://github.com/ersilia-os/eos3nl8

https://arxiv.org/abs/2007.10261v1

https://github.com/gnn4dr/DRKG

Inyrkz

5/12/2023

https://github.com/Inyrkz

2023

biogpt-embeddings

Archived

BioGPT embeddings

BioGPT is a pre-trained transformer for biomedical text. This domain-specific model has been trained on large-scale biomedical literature. In this implementation, we use BioGPT to generate numerical embeddings for bioassay and other biomedical texts.

Pretrained

Representation

Text

Single

Descriptor

Float

List

Biomedical text embedding

Embedding

Biomedical text

https://github.com/ersilia-os/eos1xje

https://academic.oup.com/bib/article/23/6/bbac409/6713511?guestAccessKey=a66d9b5d-4f83-4017-bb52-405815c907b9&login=false

https://github.com/microsoft/biogpt

MIT

miquelduranfrigola

30/8/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos1xje

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1xje.zip

Local

2023

hdac3-inhibition

Ready

Identifying HDAC3 inhibitors

The model predicts the inhibitory potential of small molecules against Histone deacetylase 3 (HDAC3), a relevant human target for cancer, inflammation, neurodegenerative diseases and diabetes. The authors have used a dataset of 1098 compounds from ChEMBL and validated the model using the benchmark MUBD-HDAC3.

Pretrained

Annotation

Compound

Single

Score

Float

Single

Probability that the molecule is a HDAC3 inhibitor

Cancer

ChEMBL

https://github.com/ersilia-os/eos1n4b

https://onlinelibrary.wiley.com/doi/10.1002/minf.202000105

https://github.com/jwxia2014/HDAC3i-Finder

GPL-3.0

Richiio

14/12/2023

https://github.com/Richiio

https://hub.docker.com/r/ersiliaos/eos1n4b

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1n4b.zip

Local

2023

mrlogp

Ready

MRlogP: neural network-based logP prediction for druglike small molecules

The authors use a two-step approach to build a model that accurately predicts the lipophilicity (LogP) of small molecules. First, they train the model on a large amount of low accuracy predicted LogP values and then they fine-tune the network using a small, accurate dataset of 244 druglike compounds. The model achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP.

Pretrained

Annotation

Compound

Single

Value

Float

Single

Predicted LogP of small molecules

Lipophilicity

LogP

https://github.com/ersilia-os/eos9ym3

https://www.mdpi.com/2227-9717/9/11/2029/htm

https://github.com/JustinYKC/MRlogP

MIT

leilayesufu

12/12/2023

https://github.com/leilayesufu

https://hub.docker.com/r/ersiliaos/eos9ym3

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9ym3.zip

Local

2023

dmpnn-herg

Ready

Prediction of hERG channel blockers with directed message passing neural networks

This model leverages the ChemProp network (D-MPNN) to build a predictor of hERG-mediated cardiotoxicity. The model has been trained using a published dataset which contains 7889 molecules with several cut-offs for hERG blocking activity. The authors select a 10 uM cut-off. This implementation of the model does not use any specific featurizer, though the authors suggest the moe206 descriptors (closed-source) improve performance even further.

Pretrained

Annotation

Compound

Single

Score

Float

Single

Probability of blocking hERG (cut-off: 10uM)

Cardiotoxicity

hERG

Toxicity

Descriptor

https://github.com/ersilia-os/eos30f3

https://pubs.rsc.org/en/content/articlehtml/2022/ra/d1ra07956e

https://github.com/AI-amateur/DMPNN-hERG

None

leilayesufu

4/12/2023

https://github.com/leilayesufu

https://hub.docker.com/r/ersiliaos/eos30f3

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos30f3.zip

Local

2023

chemprop-burkholderia

Ready

Burkholderia cenocepacia inhibition

Prediction of antimicrobial potential using a dataset of 29537 compounds screened against the antibiotic resistant pathogen Burkholderia cenocepacia. The model uses the Chemprop Direct Message Passing Neural Network (D-MPNN) abd has an AUC score of 0.823 for the test set. It has been used to virtually screen the FDA approved drugs as well as a collection of natural product list (>200k compounds) with hit rates of 26% and 12% respectively.

Pretrained

Annotation

Compound

Single

Score

Float

Single

Probability that a compound inhibits the drug resistant bacteria Burkholderia cenocepacia. Scores range from 0 to 1. With 1 indicating the highest probability for growth inhibitory activity.

Antimicrobial activity

https://github.com/ersilia-os/eos5xng

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9624395/

https://github.com/cardonalab/Prediction-of-ATB-Activity

GPL-3.0

Richioo

3/12/2023

https://github.com/Richioo

https://hub.docker.com/r/ersiliaos/eos5xng

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5xng.zip

Yes

Local

2023

pgmg-pharmacophore

Ready

Pharmacophore-guided molecular generation

Based on a molecule's pharmacophore, this model generates new molecules de-novo to match the pharmacophore. Internally, pharmacophore hypotheses are generated for a given ligand. A graph neural network encodes spatially distributed chemical features and a transformer decoder generates molecules.

Pretrained

Sampling

Compound

Single

Compound

String

List

Model generates new molecules from input molecule by first creating pharmacophore hypotheses and then constraining generation.

Chemical graph model

Compound generation

https://github.com/ersilia-os/eos69e6

https://www.nature.com/articles/s41467-023-41454-9

https://github.com/CSUBioGroup/PGMG

MIT

miquelduranfrigola

1/12/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos69e6

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos69e6.zip

Local

2023

morgan-binary-fps

Ready

Morgan fingerprints in binary form (radius 3, 2048 dimensions)

The Morgan Fingerprints are one of the most widely used molecular representations. They are circular representations (from an atom,search the atoms around with a radius n) and can have thousands of features. This implementation uses the RDKit package and is done with radius 3 and 2048 dimensions, providing a binary vector as output. For Morgan counts, see model eos5axz.

Pretrained

Representation

Compound

Single

Value

Integer

List

Binary vector representing the SMILES

Descriptor

Fingerprint

https://github.com/ersilia-os/eos4wt0

https://pubmed.ncbi.nlm.nih.gov/20426451/

https://www.rdkit.org/docs

BSD-3.0

GemmaTuron

1/12/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos4wt0

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4wt0.zip

Local

2023

pmapper-3d

Ready

3D pharmacophore descriptor

The pharmacophore mapper (pmapper) identifies common 3D pharmacophores of active compounds against a specific target and uniquely encodes them with hashes suitable for fast identification of identical pharmacophores. The obtained signatures are amenable for downstream ML tasks.

Pretrained

Representation

Compound

Single

Value

Integer

List

Vector representation of pharmacophores

Descriptor

Fingerprint

https://github.com/ersilia-os/eos4x30

https://www.mdpi.com/1422-0067/20/23/5834

https://github.com/DrrDom/pmapper

BSD-3.0

GemmaTuron

28/11/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos4x30

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4x30.zip

Local

2023

molfeat-usrcat

Ready

USR descriptors with pharmacophoric constraints

USRCAT is a real-time ultrafast molecular shape recognition with pharmacophoric constraints. It integrates atom type to the traditional Ultrafast Shape Recognition (USR) descriptor to improve the performance of shape-based virtual screening, being able to discriminate between compounds with similar shape but distinct pharmacophoric features.

Pretrained

Representation

Compound

Single

Value

Float

List

60 features based on USRCAT

Descriptor

Embedding

https://github.com/ersilia-os/eos1ut3

https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-4-27

https://molfeat.datamol.io/featurizers/usrcat

Apache-2.0

GemmaTuron

28/11/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos1ut3

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1ut3.zip

Local

2023

antitb-seattle

Ready

Antituberculosis activity prediction

Prediction of the activity of small molecules against Mycobacterium tuberculosis. This model has been developed by Ersilia thanks to the data provided by the Seattle Children's (Dr. Tanya Parish research group). In vitro activity against M. tuberculosis was measured i na single point inhibition assay (10000 molecules) and selected compounds (259) were assayed in MIC50 and MIC90 assays. Cut-offs have been determined according to the researcher's guidance.

In-house

Classification

Compound

Single

Compound

Float

List

Probability of inhibition of M.tb in vitro in the MIC50, MIC90 and whole cell assays at cut-offs 10 and 20 uM and 50%, respectively

M.tuberculosis

Antimicrobial activity

MIC90

Tuberculosis

https://github.com/ersilia-os/eos9ivc

https://pubmed.ncbi.nlm.nih.gov/30650074/

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

GemmaTuron

24/11/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos9ivc

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9ivc.zip

Local

2023

molpmofit

Ready

Molecular Prediction Model Fine-Tuning (MolPMoFiT) encodings

Using self-supervised learning, the authors pre-trained a large model using one millon unlabelled molecules from ChEMBL. This model can subsequently be fine-tuned for various QSAR tasks. Here, we provide the encodings for the molecular structures using the pre-trained model, not the fine-tuned QSAR models.

Pretrained

Representation

Compound

Single

Value

Float

List

Embedding vectors of each smiles are obtained, represented in a matrix, where each row is a vector of embedding of each smiles character, with a dimension of 400. The pretrained model is loaded using the fastai library

Descriptor

Embedding

https://github.com/ersilia-os/eos9zw0

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00430-x

https://github.com/XinhaoLi74/MolPMoFiT

GemmaTuron

6/11/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos9zw0

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9zw0.zip

Local

2023

moler-enamine-blocks

Ready

Extending molecular scaffolds with building blocks

MoLeR is a graph-based generative model that combines fragment-based and atom-by-atom generation of new molecules with scaffold-constrained optimization. It does not depend on generation history and therefore MoLeR is able to complete arbitrary scaffolds. The model has been trained on the GuacaMol dataset. Here we sample the 300k building blocks library from Enamine.

Pretrained

Sampling

Compound

Single

Compound

String

List

1000 new molecules are sampled for each input molecule, preserving its scaffold.

Chemical graph model

Compound generation

https://github.com/ersilia-os/eos633t

https://arxiv.org/abs/2103.03864

https://github.com/microsoft/molecule-generation

MIT

miquelduranfrigola

3/11/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos633t

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos633t.zip

Local

2023

small-world-wuxi

Ready

Small World Wuxi search

Small World is an index of chemical space containing more than 230B molecular substructures. Here we use the Small World API to post a query to the SmallWorld server. We sample 100 molecules within a distance of 10 specifically for the Wuxi map, not the entire SmallWorld domain. Please check other small-world models available in our hub.

Online

Sampling

Compound

Single

Compound

String

List

List of 100 nearest neighbors

Similarity

https://github.com/ersilia-os/eos3kcw

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606195/

https://pypi.org/project/smallworld-api/

MIT

miquelduranfrigola

2/11/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos3kcw

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3kcw.zip

Local

2023

small-world-zinc

Ready

Small World Zinc search

Small World is an index of chemical space containing more than 230B molecular substructures. Here we use the Small World API to post a query to the SmallWorld server. We sample 100 molecules within a distance of 10 specifically for the ZINC map, not the entire SmallWorld domain. Please check other small-world models available in our hub.

Online

Sampling

Compound

Single

Compound

String

List

List of 100 nearest neighbors

Similarity

https://github.com/ersilia-os/eos1d7r

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606195/

https://pypi.org/project/smallworld-api/

MIT

miquelduranfrigola

2/11/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos1d7r

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1d7r.zip

Local

2023

small-world-enamine-real

Ready

Small World Enamine REAL search

Small World is an index of chemical space containing more than 230B molecular substructures. Here we use the Small World API to post a query to the SmallWorld server. We sample 100 molecules within a distance of 10 specifically for the Enamine REAL map, not the entire SmallWorld domain. Please check other small-world models available in our hub.

Online

Sampling

Compound

Single

Compound

String

List

List of 100 nearest neighbors

Similarity

https://github.com/ersilia-os/eos9ueu

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3606195/

https://pypi.org/project/smallworld-api/

MIT

miquelduranfrigola

1/11/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos9ueu

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9ueu.zip

Local

2023

mycetos

Ready

Inhibition of Eumycetoma from MycetOS

This model predicts the growth of the fungus M. mycetomatis, causal agent of Mycetoma, in presence of small drugs. It has been developed using the data from MycetOS, an opemn source initiative aiming at finding new patent-free drugs. The model has been trained using the LazyQSAR package (MorganBinaryClassifier) from Ersilia.

In-house

Annotation

Compound

Single

Score

Float

Single

Probability of inhibition of M. mycetomatis (growth assay, cut-off at 20% growth)

Mycetoma

Antifungal activity

https://github.com/ersilia-os/eos4f95

https://www.ijidonline.com/article/S1201-9712(20)31735-5/fulltext

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

GemmaTuron

27/9/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos4f95

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4f95.zip

Local

2023

hdac1-inhibition

Ready

Inhibition of HDAC1

Prediction of the inhibition of the Human Histone Deacetylase 1 to revert HIV latency. The dataset is composed of all available pIC50 values from ChEMBL target 325, and the model has been developed using Ersilia's LazyQsar package (MorganBinaryClassifier)

In-house

Annotation

Compound

Single

Score

Float

List

Probability of inhibition of HDAC1 at cut-offs pIC50 7 (0.1uM) and 8 (10nM)

HIV

Human

HDAC1

https://github.com/ersilia-os/eos2zmb

https://www.ebi.ac.uk/chembl/target_report_card/CHEMBL325/

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

GemmaTuron

27/9/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos2zmb

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2zmb.zip

Local

2023

chembl-sampler

Ready

ChEMBL Molecular Sampler

A simple sampler of the ChEMBL database using their API. It looks for similar molecules to the input molecule and returns a list of 100 molecules by default. This model has been developed by Ersilia. It posts queries to an online server.

Pretrained

Sampling

Compound

Single

Compound

String

List

100 nearest molecules in ChEMBL

Similarity

https://github.com/ersilia-os/eos1noy

https://academic.oup.com/nar/article/40/D1/D1100/2903401

https://github.com/ersilia-os/chem-sampler/blob/main/chemsampler/samplers/chembl/sampler.py

GPL-3.0

GemmaTuron

4/9/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos1noy

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1noy.zip

Local

2023

hepg2-mmv

Ready

HepG2 Toxicity - MMV

This model predicts the toxicity of small molecules in HepG2 cells. It has been developed by Ersilia thanks to data provided by MMV. We have used two cut-offs to define activity (5 and 10 uM respectively) with a dataset of 1335 molecules. 5-fold crossvalidation showed an AUROC of 0.8 and 0.77 respectively

In-house

Classification

Compound

Single

Probability

Float

List

Probability of toxicity in HepG2 cells. Cut-offs: 5 and 10 uM

Toxicity

Human

https://github.com/ersilia-os/eos3le9

https://ersilia.io

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

GemmaTuron

24/8/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos3le9

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3le9.zip

Local

2023

malaria-mmv

Ready

Antimalarial activity (MMV)

Prediction of the in vitro antimalarial potential of small molecules. This model has been developed by Ersilia thanks to experimental data provided by MMV. The model provides the probability of inhibition of the malaria parasite (NF54) measured both as percentage of inhibition (with luminescence and LDH) and IC50. 5-fold crossvalidation of the models shows AUROC>0.75 in all models.

In-house

Classification

Compound

Single

Probability

Float

Single

Probability of inhibiting the malaria parasite (strain NF54) in IC50 (threshold 1uM) and percentage of inhibition (50%, measured by LDH and Lum)

Malaria

P.falciparum

IC50

https://github.com/ersilia-os/eos4rta

https://ersilia.io

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

GemmaTuron

24/8/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos4rta

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4rta.zip

Local

2023

schisto-swisstph

Ready

Anti-schistosomiasis activity

Prediction of the activity of small molecules against the schistosoma parasite. This model has been developed by Ersilia thanks to the data provided by the Swiss TPH. In vitro activity against newly transformed schistosoma (nts) and adult worms was measured (% of inhibition of activity and IC50, respectively)

In-house

Classification

Compound

Single

Probability

Float

List

The probabilities of the molecule being active against schistosoma in NTS stage (in a % of inhibition assay at 70 and 90% inhibition 10uM) and adult stage (in IC50 assay at cut-offs 5 and 10uM

Neglected tropical disease

Schistosomiasis

IC50

https://github.com/ersilia-os/eos2l0q

https://pubmed.ncbi.nlm.nih.gov/30398059

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

GemmaTuron

24/8/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos2l0q

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2l0q.zip

Local

2023

chemprop-abaumannii

Ready

Inhibition of Acinetobacter baumannii growth

This model is a Chemprop neural network trained with a growth inhibition dataset. Authors screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. They discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii.

Pretrained

Annotation

Compound

Single

Score

Float

Single

Probability of growth inhibition of the bacteria A. Baumannii (threshold > 80%)

A.baumannii

Antimicrobial activity

https://github.com/ersilia-os/eos3804

https://www.nature.com/articles/s41589-023-01349-8

https://github.com/GaryLiu152/chemprop_abaucin

None

https://eos3894-gz5nz.ondigitalocean.app/

miquelduranfrigola

23/8/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos3804

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3804.zip

https://ersilia-app-cubsw.ondigitalocean.app/?model_id=eos3804

Yes

Online

2023

pubchem-sampler

Ready

PubChem Molecular Sampler

A simple sampler of the PubChem database using their API. It looks for similar molecules to the input molecule and returns a list of 100 molecules by default. This model has been developed by Ersilia and posts queries to an online server.

Pretrained

Similarity

Compound

Single

Compound

String

List

100 nearest molecules in PubChem

Similarity

https://github.com/ersilia-os/eos2hzy

https://academic.oup.com/nar/article/51/D1/D1373/6777787

https://github.com/ersilia-os/chem-sampler/blob/main/chemsampler/samplers/pubchem/sampler.py

GPL-3.0

GemmaTuron

10/8/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos2hzy

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2hzy.zip

Local

2023

stoned-sampler

Ready

Stoned Sampler

The STONED sampler uses small modifications to molecules represented as SELFIES to perform a search of the chemical space and generate new molecules. The use of string modifications in the SELFIES molecular representation bypasses the need for large amounts of data while maintaining a performance comparable to deep generative models.

Pretrained

Generative

Compound

Single

Compound

String

List

Up to 1000 derivatives of the input molecule

Compound generation

https://github.com/ersilia-os/eos8fma

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153210/

https://github.com/aspuru-guzik-group/stoned-selfies

Apache-2.0

GemmaTuron

8/8/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos8fma

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8fma.zip

Local

2023

smiles-pe

Ready

SmilesPE: tokenizer algorithm for SMILES, DeepSMILES, and SELFIES

The Smiles Pair Encoding method generates smiles substring tokens based on high-frequency token pairs from large chemical datasets. This method is well-suited for both QSAR activities as well as generative models. The model provided here has been pretrained using ChEMBL.

Pretrained

Generative

Compound

Single

Compound

String

Flexible List

A data-driven tokenization method for SMILES-based deep learning models in cheminformatics, demonstrating high performance in molecular generation and QSAR prediction tasks compared to atom-level tokenization

Chemical language model

Chemical notation

ChEMBL

https://github.com/ersilia-os/eos1mxi

https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c01127

https://github.com/XinhaoLi74/SmilesPE

Apache-2.0

Richiio

2/8/2023

https://github.com/Richiio

https://hub.docker.com/r/ersiliaos/eos1mxi

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1mxi.zip

Local

2023

osm-series4

Ready

Antimalarial activity from OSM

This model predicts the antimalarial potential of small molecules in vitro. We have collected the data available from the Open Source Malaria Series 4 molecules and used two cut-offs to define activity, 1 uM and 2.5 uM. The training has been done with the LazyQSAR package (Morgan Binary Classifier) and shows an AUROC >0.8 in a 5-fold cross-validation on 20% of the data held out as test. These models have been used to generate new series 4 candidates by Ersilia.

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of killing P.falciparum in vitro (IC50 < 1uM and 2.5uM, respectively)

Malaria

P.falciparum

IC50

https://github.com/ersilia-os/eos7yti

https://pubs.acs.org/doi/10.1021/acscentsci.6b00086

https://github.com/ersilia-os/lazy-qsar

GPL-3.0

GemmaTuron

2/8/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos7yti

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7yti.zip

Local

2023

fasmifra

Ready

FasmiFra molecule generator

FasmiFra is a molecular generator based on (deep)SMILES fragments. The authors use Deep SMILES to ensure the generated molecules are syntactically valid, and by working on string operations they are able to obtain high performance (>340,000 molecule/s). Here, we use 100k compounds from ChEMBL to sample fragments. Only assembled molecules containing one of the fragments of the input molecule are retained.

Pretrained

Generative

Compound

Single

Compound

String

List

1000 generated molecules per each input

Compound generation

https://github.com/ersilia-os/eos4qda

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00566-4

https://github.com/UnixJunkie/FASMIFRA

GPL-3.0

miquelduranfrigola

1/8/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos4qda

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4qda.zip

Local

2023

malaria-mam

Ready

Antimalarial activity for sexual stage and asexual blood stage (ABS)

Prediction of the antimalarial potential of small molecules using data from various chemical libraries that were screened against the asexual and sexual (gametocyte) stages of the parasite. Several compounds' molecular fingerprints were used to train machine learning models to recognize stage-specific active and inactive compounds.

Pretrained

Annotation

Compound

Single

Score

Float

List

Probability of inhibition of the malaria parasite growth

Malaria

P.falciparum

https://github.com/ersilia-os/eos80ch

https://pubs.acs.org/doi/10.1021/acsomega.3c05664

https://github.com/M2PL

GPL-3.0

GemmaTuron

10/7/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos80ch

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos80ch.zip

Yes

Local

2023

ncats-cyp3a4

Ready

CYP3A4 metabolism

Analysis of metabolic stability, determining the inhibition of CYP3A4 activity and whether the compounds are a substrate for the CYP3A$ enzyme. The data to build these models has been publicly available at PubChem (AID1645840, AID1645841, AID1645842) by ADME@NCATS.

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of inhibiting the enzyme and probability of being a ubstrate of the enzyme. Activity in both indicates the compound is a ligand of the enzyme.

CYP450

ADME

Metabolism

https://github.com/ersilia-os/eos3ev6

https://dmd.aspetjournals.org/content/49/9/822

https://github.com/ncats/ncats-adme

None

ZakiaYahya

6/7/2023

https://github.com/ZakiaYahya

https://hub.docker.com/r/ersiliaos/eos3ev6

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3ev6.zip

Yes

Local

2023

ncats-cyp2d6

Ready

CYP2D6 metabolism

Analysis of metabolic stability, determining the inhibition of CYP2D6 activity and whether the compounds are a substrate for the CYP2D6 enzyme. The data to build these models has been publicly available at PubChem (AID1645840, AID1645841, AID1645842) by ADME@NCATS

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of inhibiting the enzyme and probability of being a ubstrate of the enzyme. Activity in both indicates the compound is a ligand of the enzyme.

CYP450

ADME

Metabolism

https://github.com/ersilia-os/eos7nno

https://dmd.aspetjournals.org/content/49/9/822

https://github.com/ncats/ncats-adme

None

ZakiaYahya

6/7/2023

https://github.com/ZakiaYahya

https://hub.docker.com/r/ersiliaos/eos7nno

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7nno.zip

Yes

Local

2023

ncats-cyp2c9

Ready

CYP2C9 metabolism

Analysis of metabolic stability, determining the inhibition of CYP2C9 activity and whether the compounds are a substrate for the CYP2C9 enzyme. The data to build these models has been publicly available at PubChem (AID1645840, AID1645841, AID1645842) by ADME@NCATS

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of inhibiting the enzyme and probability of being a ubstrate of the enzyme. Activity in both indicates the compound is a ligand of the enzyme.

CYP450

ADME

Metabolism

https://github.com/ersilia-os/eos5jz9

https://dmd.aspetjournals.org/content/49/9/822

https://github.com/ncats/ncats-adme

None

ZakiaYahya

5/7/2023

https://github.com/ZakiaYahya

https://hub.docker.com/r/ersiliaos/eos5jz9

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5jz9.zip

Yes

Local

2023

bidd-molmap-fingerprint

Ready

Molecular fingerprint maps based on broadly learned knowledge-based representations

Molecular representation of small molecules via ingerprint-based molecular maps (images). Typically, the goal is to use these images as inputs for an image-based deep learning model such as a convolutional neural network. The authors have demonstrated high performance of MolMap out-of-the-box with a broad range of tasks from MoleculeNet.

Pretrained

Representation

Compound

Single

Image

Descriptor

Float

List

Image representation of a molecule. Each pixel represents a molecular feature (37 rows, 36 columns, flattened with reshape)

Fingerprint

https://github.com/ersilia-os/eos59rr

https://www.nature.com/articles/s42256-021-00301-6

https://github.com/shenwanxiang/bidd-molmap

GPL-3.0

samuelmaina

3/7/2023

https://github.com/samuelmaina

https://hub.docker.com/r/ersiliaos/eos59rr

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos59rr.zip

Local

2023

h3d-virtual-screening-cascade-light

Ready

H3D virtual screening cascade light

This panel of models provides predictions for the H3D virtual screening cascade. It leverages the Ersilia Compound Embedding and FLAML. The H3D virtual screening cascade contains models for Mycobacterium tuberculosis and Plasmodium falciparum IC50 predictions, as well as ADME, cytotoxicity and solubility assays

In-house

Classification

Compound

Single

Probability

Float

List

The raw scores are the ones emerging from the FLAML model. The ones with a sufix _perc represent the percentile in the scale 0-1 over a ChEMBL dataset of 200k compounds.

Malaria

P.falciparum

Tuberculosis

M.tuberculosis

ADME

Cytotoxicity

Solubility

https://github.com/ersilia-os/eos7kpb

https://www.nature.com/articles/s41467-023-41512-2

https://github.com/ersilia-os/h3d-screening-cascade-models

GPL-3.0

miquelduranfrigola

9/5/2023

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos7kpb

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7kpb.zip

Yes

Local

2023

ersilia-compound-embedding

Ready

Ersilia Compound Embeddings

Bioactivity-aware chemical embeddings for small molecules. Using transfer learning, we have created a fast network that produces embeddings of 1024 features condensing physicochemical as well as bioactivity information The training of the network has been done using the FS-Mol and ChEMBL datasets, and Grover, Mordred and ECFP descriptors

In-house

Representation

Compound

Single

Descriptor

Float

List

Embedding of 1024 features representing a compound

Descriptor

Embedding

https://github.com/ersilia-os/eos2gw4

https://www.nature.com/articles/s41467-023-41512-2

https://github.com/ersilia-os/compound-embedding

GPL-3.0

GemmaTuron

13/4/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos2gw4

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2gw4.zip

Local

2023

molfeat-chemgpt

Ready

ChemGPT-4.7

ChemGPT (4.7M params) is a language-based transformer model for generative molecular modeling, which was pretrained on the PubChem10M dataset. Pre-trained ChemGPT models are also robust, self-supervised representation learners that generalize to previously unseen regions of chemical space and enable embedding-based nearest-neighbor search.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

128 features based on a chemical language model

Descriptor

Chemical language model

Chemical graph model

Embedding

https://github.com/ersilia-os/eos3cf4

https://chemrxiv.org/engage/chemrxiv/article-details/627bddd544bdd532395fb4b5

https://molfeat.datamol.io/featurizers/ChemGPT-4.7M

Apache-2.0

GemmaTuron

11/4/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos3cf4

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3cf4.zip

Local

2023

molfeat-estate

Ready

Estate Molecular Descriptors

Electrotopological state (Estate) indices are numerical values computed for each atom in a molecule, and which encode information about both the topological environment of that atom and the electronic interactions due to all other atoms in the molecule

Pretrained

Representation

Compound

Single

Descriptor

Float

List

79 Electrotopological features

Fingerprint

Descriptor

https://github.com/ersilia-os/eos3zur

https://link.springer.com/article/10.1023/A:1015952613760

https://molfeat.datamol.io/featurizers/estate

Apache-2.0

GemmaTuron

11/4/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos3zur

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3zur.zip

Local

2023

ncats-pampa74

Ready

Parallel Artificial Membrane Permeability Assay (PAMPA) 7

Parallel Artificial Membrane Permeability is an in vitro surrogate to determine the permeability of drugs across cellular membranes. PAMPA at pH 7.4 was experimentally determined in a dataset of 5,473 unique compounds by the NIH-NCATS. 50% of the dataset was used to train a classifier (SVM) to predict the permeability of new compounds, and validated on the remaining 50% of the data, rendering an AUC = 0.88. The Peff was converted to logarithmic, log Peff value lower than 2.0 were considered to h

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of a compound being poorly permeable (logPeff < 1)

ADME

Permeability

LogP

https://github.com/ersilia-os/eos9tyg

https://slas-discovery.org/article/S2472-5552(22)06765-X/fulltext

https://github.com/ncats/ncats-adme

None

pauline-banye

7/4/2023

https://github.com/pauline-banye

https://hub.docker.com/r/ersiliaos/eos9tyg

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9tyg.zip

Yes

Local

2023

ncats-cyp450

Ready

CYP450 metabolism

Analysis of metabolic stability, determining the inhibition of CYP450 activity and whether the compounds are a substrate for the CYP450 enzymes. The data to build these models is publicly available at PubChem, AID1645840, AID1645841, AID1645842. The tested cyps include CYP2C9, CYP2D6 and CYP3A4.

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of inhibiting the enzyme and probability of being a ubstrate of the enzyme. Activity in both indicates the compound is a ligand of the enzyme.

CYP450

ADME

Metabolism

https://github.com/ersilia-os/eos44zp

https://dmd.aspetjournals.org/content/49/9/822

https://github.com/ncats/ncats-adme

None

GemmaTuron

6/4/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos44zp

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos44zp.zip

Yes

Local

2023

qcrb-tb

Ready

QcrB Inhibition (M. tuberculosis)

The cytochrome bcc complex (QcrB) is a subunit of the mycobacterial cyt-bcc-aa3 oxidoreductase in the electron transport chain (ETC), and it has been suggested as a good M.tb target due to the bacteria's dependence on oxidative phosphorylation for its growth. The authors use a dataset of 352 molecules, of which 277 are classified as active (QIM < 1 uM), 58 as moderately active ( 1 > QIM < 20 uM) and 78 as inactive (QIM > 20). Qim refers to quantification of intracellular mycobacteria.

Pretrained

Classification

Compound

Single

Other value

Integer

Single

Class 1: active(QIM < 1uM), Class 2:moerately active (1 < QIM < 20uM), Class 3:inactive (QIM > 20uM)

M.tuberculosis

Antimicrobial activity

https://github.com/ersilia-os/eos24jm

https://pubs.acs.org/doi/full/10.1021/acsomega.2c01613

https://github.com/CoutinhoLab/Q-TB/

GemmaTuron

6/4/2023

https://github.com/GemmaTuron

https://hub.docker.com/r/ersiliaos/eos24jm

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos24jm.zip

Yes

Local

2023

rxn-fingerprint

Ready

RXNFP - chemical reaction fingerprints

RXNFP uses a pre-trained BERT Language Model to transform a reaction represented as smiles into a fingerprint amenable for downstream applications. The authors show how the RXN-fps can be used to identify nearest neighbors on reaction datasets, or map the reaction space without knowing the reaction centers.

Pretrained

Representation

Compound

Single

Descriptor

Float

Matrix

Fingerprint of the reaction.

Fingerprint

Embedding

Chemical synthesis

https://github.com/ersilia-os/eos6aun

https://www.nature.com/articles/s42256-020-00284-w

https://github.com/rxn4chemistry/rxnfp/tree/master/

MIT

samuelmaina

28/3/2023

https://github.com/samuelmaina

https://hub.docker.com/r/ersiliaos/eos6aun

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6aun.zip

Local

2023

ncats-hlm

Ready

Human Liver Microsomal Stability

The Human Liver Microsomal assay takes into account the liver-mediated drug metabolism to assess the stability of a compound in the human body. The NIH-NCATS group took a proprietary dataset of 4300 compounds with its associated HLM (in vitro half-life; unstable ≤ 30 min, stable >30 min) and used it to train a classifier.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of a compound being unstable in a HLM assay (half-life ≤ 30min)

Metabolism

ADME

Human

Microsomal stability

Half-life

https://github.com/ersilia-os/eos31ve

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00426-7

https://github.com/ncats/ncats-adme/tree/master

None

pauline-banye

27/3/2023

https://github.com/pauline-banye

https://hub.docker.com/r/ersiliaos/eos31ve

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos31ve.zip

Yes

Local

2023

s2dv-hepg2-toxicity

Ready

S2DV HepG2 toxicity

The model uses Word2Vec, a natural language processing technique to represent SMILES strings. The model was trained on over <2000 small molecules with associated experimental HepG2 cytotoxicity data (IC50) to classify compounds as HepG2 toxic (IC50 <= 30 uM) or non-toxic. Data was gathered from the public repository ChEMBL.

Pretrained

Classification

Compound

Single

Experimental value

Float

Single

Probability of HepG2 Toxicity (IC50 < 30 uM)

ChEMBL

IC50

Toxicity

https://github.com/ersilia-os/eos2fy6

https://pubmed.ncbi.nlm.nih.gov/35062019/

https://github.com/NTU-MedAI/S2DV

Apache-2.0

emmakodes

27/3/2023

https://github.com/emmakodes

https://hub.docker.com/r/ersiliaos/eos2fy6

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2fy6.zip

Local

2023

hob-pre

Ready

Human oral bioavailability prediction

HobPre predicts the oral bioavailability of small molecules in humans. It has been trained using public data on ~1200 molecules (Falcón-Cano et al, 2020, complemented with other literature and ChEMBL compounds). The molecules were labeled according to two cut-offs: HOB > 20% and HOB > 50%, due to ongoing discussions as to which would be a more appropriate cut-off.

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of a compound having high oral bioavailability (HOB >20% and HOB >50%)

ADME

Solubility

Human

https://github.com/ersilia-os/eos2lqb

https://doi.org/10.1186/s13321-021-00580-6

https://github.com/whymin/HOB

None

HellenNamulinda

27/3/2023

https://github.com/HellenNamulinda

https://hub.docker.com/r/ersiliaos/eos2lqb

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2lqb.zip

Yes

Local

2023

redial-2020

Ready

SARS-CoV-2 antiviral prediction: REDIAL-2020

Predictor of several endpoints related to Sars-CoV-2. It provides predictions for Live Virus Infectivity, Viral Entry, Viral Replication, In Vitro Infectivity and Human Cell Toxicity using a combination of three models. Consensus results are obtained by averaging the prediction for the three different models for each activity and toxicity models. The models have been built using NCATS COVID19 data. Further details on result interpretations can be found here: https://drugcentral.org/Redial

Pretrained

Classification

Compound

Single

Probability

Float

Single

The model returns the probability of 1 (active) in each assay. Good drugs are active in CPE, 3CL and are inactive in cytotox, hCYTOX and ACE2 and/or are active in at least one of the following: AlphaLISA, CoV-PPE, MERS-PPE, while inactive in the counter screen, respectively: TruHit, CoV-PPE_cs, MERS-PPE_cs.

Sars-CoV-2

COVID19

Antiviral activity

https://github.com/ersilia-os/eos8fth

https://www.nature.com/articles/s42256-021-00335-w#Sec9

https://github.com/sirimullalab/redial-2020/tree/v1.0

MIT

Pradnya2203

27/3/2023

https://github.com/Pradnya2203

https://hub.docker.com/r/ersiliaos/eos8fth

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8fth.zip

Yes

Local

2023

s2dv-hbv

Ready

Inhibition of Hepatits B virus

The model uses Word2Vec, a natural language processing technique to represent SMILES strings. The model was trained on over <4000 small molecules with associated experimental HBV inhibition data (IC50) to classify compounds as HBV inhibitors (IC50 <= 1 uM) or non-inhibitors. Data was gathered from the public repository ChEMBL.

Pretrained

Classification

Compound

Single

Experimental value

Float

Single

Probability of inhibition of HBV (IC50 < 1uM)

Antiviral activity

IC50

HBV

ChEMBL

https://github.com/ersilia-os/eos8lok

https://pubmed.ncbi.nlm.nih.gov/35062019/

https://github.com/NTU-MedAI/S2DV

Apache-2.0

emmakodes

24/3/2023

https://github.com/emmakodes

https://hub.docker.com/r/ersiliaos/eos8lok

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8lok.zip

Yes

Local

2023

ncats-hlcs

Ready

Human Liver Cytosolic Stability

The human liver cytosol stability model is used for predicting the stability of a drug in the cytosol of human liver cells, which is beneficial for identifying potential drug candidates early during the drug discovery process. If a drug compound is quickly absorbed, it may not reach the intended target in the body or become toxic. On the other hand, if a drug compound is too stable, it could accumulate and cause detrimental effects. The authors use an NCATS dataset of 1450 compounds screened in

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of a compound being unstable (half-life ≤ 30min) due to liver cells metabolism

ADME

Metabolism

Human

Half-life

https://github.com/ersilia-os/eos9yy1

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00426-7

https://github.com/ncats/ncats-adme

None

pauline-banye

1/3/2023

https://github.com/pauline-banye

https://hub.docker.com/r/ersiliaos/eos9yy1

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9yy1.zip

Yes

Local

2023

idl-ppbopt

Ready

Human Plasma Protein Binding (PPB) of Compounds

IDL-PPB aims to obtain the plasma protein binding (PPB) values of a compound. Based on an interpretable deep learning model and using the algorithm fingerprinting (AFP) this model predicts the binding affinity of the plasma protein with the compound.

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

This model receives smiles as input and returns as output the fraction PPB, which measures the affinity of the binding of the plasma protein. In the analysis of results by the author, they indicate high affinity (fraction of ppb >80%), medium affinity (40% <= fraction of ppb <=80%) and as low levels of affinity (fraction of ppb < 40%). Note: Inorganics and salts are out of the applicability domain of the model, So for these compounds the output is Null.

Fraction bound

ADME

https://github.com/ersilia-os/eos22io

https://pubs.acs.org/doi/10.1021/acs.jcim.2c00297

https://github.com/Louchaofeng/IDL-PPBopt

GPL-3.0

carcablop

3/2/2023

https://github.com/carcablop

https://hub.docker.com/r/ersiliaos/eos22io

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos22io.zip

Local

2023

ncats-solubility

Ready

Aqueous Kinetic Solubility

Kinetic aqueous solubility (μg/mL) was experimentally determined using the same SOP in over 200 NCATS drug discovery projects. A final dataset of 11780 non-redundant molecules and their associated solubility was used to train a SVM classifier. Approximately half of the dataset has poor solubility (< 10 μg/mL), and two-thirds of these low soluble molecules report values of < 1 μg/mL. A subset of the data used is available at PubChem (AID 1645848).

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of a compound having poor solublibity (< 10 µg/ml)

ADME

Solubility

https://github.com/ersilia-os/eos74bo

https://slas-discovery.org/article/S2472-5552(22)06765-X/fulltext

https://github.com/ncats/ncats-adme

None

pauline-banye

31/1/2023

https://github.com/pauline-banye

https://hub.docker.com/r/ersiliaos/eos74bo

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos74bo.zip

Yes

Local

2023

ncats-pampa5

Ready

Parallel Artificial Membrane Permeability Assay 5

Parallel Artificial Membrane Permeability is an in vitro surrogate to determine the permeability of drugs across cellular membranes. PAMPA at pH 5 was experimentally determined in a dataset of 5,473 unique compounds by the NIH-NCATS. 50% of the dataset was used to train a classifier (SVM) to predict the permeability of new compounds, and validated on the remaining 50% of the data, rendering an AUC = 0.88. The Peff was converted to logarithmic, log Peff value lower than 2.0 were considered to hav

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of a compound being poorly permeable (logPeff < 1)

ADME

Permeability

LogP

https://github.com/ersilia-os/eos81ew

https://www.sciencedirect.com/science/article/pii/S0968089621005964

https://github.com/ncats/ncats-adme

None

pauline-banye

29/1/2023

https://github.com/pauline-banye

https://hub.docker.com/r/ersiliaos/eos81ew

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos81ew.zip

Yes

Local

2023

image-mol-gpcr

Ready

imagemol-gpcr

ImageMol is a Representation Learning Framework that utilizes molecule images for encoding molecular inputs as machine readable vectors for downstream tasks such as bio-activity prediction, drug metabolism analysis, or drug toxicity prediction. The approach utilizes transfer learning, that is, pre-training the model on massive unlabeled datasets to help it in generalizing feature extraction and then fine tuning on specific tasks. This model is fine tuned on 10 GPCR assays with the largest number

Pretrained

Regression

Compound

Single

Score

Float

Single

Binding activity prediction (as a regression task) for the following GPCR assays: 5HT1A, 5HT2A, AA1R, AA2AR, AA3R, CNR2, DRD2, DRD3, HRH3, OPRM

Target identification

GPCR

https://github.com/ersilia-os/eos93h2

https://www.nature.com/articles/s42256-022-00557-6

https://github.com/HongxinXiang/ImageMol

MIT

DhanshreeA

25/1/2023

https://github.com/DhanshreeA

https://hub.docker.com/r/ersiliaos/eos93h2

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos93h2.zip

Local

2023

datamol-smiles2canonical

Ready

Converter of SMILES in Canonical, Selfie, Inchi, Inchi Key form

Using the Datamol package, the model receives a SMILE as input, then goes through a process of sanitizing and standardization of the molecule to generate four outputs: Canonical SMILES, SELFIES, InChI and InChIKey

Pretrained

Representation

Compound

Single

Compound

String

Matrix

Compound represented in its canonical SMILES, SELFIES, InChI and InChIKey forms

Chemical notation

https://github.com/ersilia-os/eos7qga

https://doc.datamol.io/stable/tutorials/Preprocessing.html

https://github.com/datamol-org/datamol

Apache-2.0

carcablop

25/1/2023

https://github.com/carcablop

https://hub.docker.com/r/ersiliaos/eos7qga

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7qga.zip

Local

2023

image-mol-embeddings

Ready

Molecular representation learning

Representation Learning Framework that utilizes molecule images for encoding molecular inputs as machine readable vectors for downstream tasks such as bio-activity prediction, drug metabolism analysis, or drug toxicity prediction. The approach utilizes transfer learning, that is, pre-training the model on massive unlabeled datasets to help it in generalizing feature extraction and then fine tuning on specific tasks.

Pretrained

Representation

Compound

Single

Descriptor

Float

Matrix

ImageMol embeddings of shape [1512] reshaped as a Numpy 1D array before serializing. These embeddings can be used as the input features of a fully connected classification or regression layer in a neural network.

Embedding

https://github.com/ersilia-os/eos4avb

https://www.nature.com/articles/s42256-022-00557-6

https://github.com/HongxinXiang/ImageMol

MIT

DhanshreeA

25/1/2023

https://github.com/DhanshreeA

https://hub.docker.com/r/ersiliaos/eos4avb

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4avb.zip

Local

2023

sars-cov-2-antiviral-screen

Ready

SARS-CoV-2 Anti viral screening

Pretrained

Classification

Compound

Single

Boolean

Integer

List

The output is comprised of binary classification across thirteen assays that are as follows: 3C-like enzymatic activity (3CL), ACE2 enzymatic activity (ACE2), Human Embryonic Kidney 293 Cell line toxicity (HEK293), Human fibroblast toxicity (Human), MERS Pseudotyped particle entry (MERS_PPE), MERS Pseudotyped particle entry counterscreen (MERS_PPE_cs), SarsCov Pseudotyped particle entry (Cov_PPE), SarsCov Pseudotyped particle entry counterscreen (Cov_PPE_cs), SarsCov2 cytopathic effect (COV2_CPE), SarsCov2 cytopathic effect counterscreen (COV2_Cytotox), Spike ACE2 Protein-protein interaction (AlphaLISA), Spike ACE2 Protein-protein interaction counterscreen (TruHit), Transmembrane protease serine 2 enzymatic activity (TMPRSS2)

Sars-CoV-2

Antiviral activity

COVID19

https://github.com/ersilia-os/eos4cxk

https://www.nature.com/articles/s42256-022-00557-6

https://github.com/HongxinXiang/ImageMol

MIT

DhanshreeA

25/1/2023

https://github.com/DhanshreeA

https://hub.docker.com/r/ersiliaos/eos4cxk

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4cxk.zip

Yes

Local

2023

image-mol-bace

Ready

ImageMol human beta-secretase-1 (BACE-1) inhibition

This model has been developed using ImageMol, a deep learning model pretrained on 10 million unlabelled small molecules and fine-tuned in a second step to predict the binding of inhibitors to the human beta secretase 1 (BACE-1) protein. The BACE-1 dataset from MoleculeNet contains 1522 compounds with their associated pIC50. A compound with pIC50 => 7 is considered a BACE-1 inhibitor.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of BACE-1 inhibition (>0.5: Inhibitor). Compounds with pIC50 => 7 are considered BACE-1 inhibitors

BACE

Chemical graph model

MoleculeNet

https://github.com/ersilia-os/eos8c0o

https://www.nature.com/articles/s42256-022-00557-6

https://github.com/ChengF-Lab/ImageMol

MIT

DhanshreeA

17/1/2023

https://github.com/DhanshreeA

https://hub.docker.com/r/ersiliaos/eos8c0o

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8c0o.zip

Local

2023

image-mol-hiv

Ready

ImageMol HIV growth inhibition

TThis model has been developed using ImageMol, a deep learning model pretrained on 10 million unlabelled small molecules and fine-tuned in a second step to predict the inhibition of the human immunodeficiency virus (HIV). The HIV dataset is from MoleculeNet and contains 43850 small molecules and their in vitro activity against HIV (CA - Confirmed active, CM - Confirmed moderately active, CI - Confirmed inactive). The classification was based on EC50 values and expert knowledge.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of HIV inhibition. Active compounds are considered those classified as CA/CM.

HIV

Antiviral activity

MoleculeNet

https://github.com/ersilia-os/eos6hy3

https://www.nature.com/articles/s42256-022-00557-6

https://github.com/ChengF-Lab/ImageMol

MIT

DhanshreeA

17/1/2023

https://github.com/DhanshreeA

https://hub.docker.com/r/ersiliaos/eos6hy3

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6hy3.zip

Yes

Local

2023

ncats-rlm

Ready

Rat liver microsomal stability

Hepatic metabolic stability is key to ensure the drug attains the desired concentration in the body. The Rat Liver Microsomal (RLM) stability is a good approximation of a compound’s stability in the human body, and NCATS has collected a proprietary dataset of 20216 compounds with its associated RLM (in vitro half-life; unstable ≤30 min, stable >30 min) and used it to train a classifier based on an ensemble of several ML approaches (random forest, deep neural networks, graph convolutional neural

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of a compound being unstable in RLM assay (half-life ≤ 30min)

Microsomal stability

Rat

ADME

Metabolism

Half-life

https://github.com/ersilia-os/eos5505

https://slas-discovery.org/article/S2472-5552(22)06765-X/fulltext

https://github.com/ncats/ncats-adme

None

pauline-banye

12/1/2023

https://github.com/pauline-banye

https://hub.docker.com/r/ersiliaos/eos5505

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5505.zip

Yes

Local

2023

smiles2iupac

Ready

STOUT: SMILES to IUPAC name translator

Small molecules are represented by a variety of machine-readable strings (SMILES, InChi, SMARTS, among others). On the contrary, IUPAC (International Union of Pure and Applied Chemistry) names are devised for human readers. The authors trained a language translator model treating the SMILES and IUPAC as two different languages. 81 million SMILES were downloaded from PubChem and converted to SELFIES for model training. The corresponding IUPAC names for the 81 million SMILES were obtained with Che

Pretrained

Representation

Compound

Single

Text

String

Single

IUPAC name of a specific SMILES

Chemical notation

Chemical language model

https://github.com/ersilia-os/eos4se9

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00512-4

https://github.com/Kohulan/Smiles-TO-iUpac-Translator

MIT

carcablop

9/1/2023

https://github.com/carcablop

https://hub.docker.com/r/ersiliaos/eos4se9

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4se9.zip

Local

2023

drugtax

Ready

DrugTax: Drug taxonomy

DrugTax takes SMILES inputs and classifies the molecule according to their taxonomy, organic or inorganic kingdom and their subclasses, using a 0/1 binary classification for each one. It generates a vector of 163 features including the taxonomy classification and other key information such as number of carbons, nitrogens… These vectors can be used for subsequent molecular representation in chemoinformatic pipelines.

Pretrained

Representation

Compound

Single

Descriptor

Integer

List

A vector of 163 points, each one corresponding to a particular taxonomic or structural molecular feature

Fingerprint

Descriptor

https://github.com/ersilia-os/eos24ci

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-022-00649-w

https://github.com/MoreiraLAB/DrugTax

GPL-3.0

Femme-js

3/1/2023

https://github.com/Femme-js

https://hub.docker.com/r/ersiliaos/eos24ci

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos24ci.zip

Local

2023

embeddings-extraction

To do

Text Embeddings Extraction using Pretrained Lamguage Models

Syntactic relationship and intrinsic information carried out in textual input data can be represented in the form of text embeddings. These embeddings can be utilised for the downstream tasks like classification, regression etc. BioMed-RoBERTa-base is a trandformer-based language model adapted from RoBERTa-base, pretrained on 2.68 million biomedical domain specific scientific papers (7.55B tokens and 47GB of data). The multi-layer structure of transformer captures different levels of representat

Pretrained

Representation

Text

List

Descriptor

Float

List

A list consisting of 768 float points values which is representation of textual input in numerical vector form.

Chemical language model

Embedding

https://github.com/ersilia-os/eos1086

https://aclanthology.org/2020.acl-main.740/

https://huggingface.co/allenai/biomed_roberta_base

Apache-2.0

Femme-js

25/1/2023

https://github.com/Femme-js

Local

2023

iupac2smiles

To do

STOUT: SMILES to IUPAC name translator

Pretrained

Representation

Text

Single

Compound

String

Single

SMILES of the molecule corresponding to the IUPAC name input

Chemical notation

Chemical language model

https://github.com/ersilia-os/eos5ecc

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00512-4

https://github.com/Kohulan/Smiles-TO-iUpac-Translator

MIT

carcablop

13/1/2023

https://github.com/carcablop

Local

2023

meta-trans

Ready

MetaTrans: human drug metabolites

Small molecules are metabolized by the liver in what is known as phase I and phase II reactions. Those can lead to reduced drug efficacy and generation of toxic metabolites, causing serious side effects. This model predicts the human metabolites of small molecules using a molecular transformer pr-trained on general chemical reactions and fine tuned to human metabolism. It provides up to 10 metabolites for each input molecule.

Pretrained

Generative

Compound

Single

Compound

String

List

A maximum of 10 human metabolites generated from the input molecule

Metabolism

https://github.com/ersilia-os/eos935d

https://pubs.rsc.org/en/content/articlelanding/2020/sc/d0sc02639e#fn1

https://github.com/KavrakiLab/MetaTrans

BSD-3.0

carcablop

20/12/2022

https://github.com/carcablop

https://hub.docker.com/r/ersiliaos/eos935d

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos935d.zip

Local

2022

crem-structure-generation

Ready

CReM fragment based structure generation

CReM (chemically reasonable mutations) is a fragment-based generative model that takes as input a small molecule, breaks it down into fragments and iteratively replaces them with other fragments from a database. It has three implementations (MUTATE: arbitrarily replaces one fragment with another one); GROW (arbitrarily replaces an hydrogen with another fragment) and LINK (replaces hydrogen atoms in two molecules to link them with a fragment). Here, we use a MUTATE and GROWTH approach, which prov

Pretrained

Generative

Compound

Single

Compound

String

List

Up to 100 newly generated molecules

Compound generation

https://github.com/ersilia-os/eos4q1a

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00431-w

https://github.com/DrrDom/crem

BSD-3.0

DhanshreeA

20/12/2022

https://github.com/DhanshreeA

https://hub.docker.com/r/ersiliaos/eos4q1a

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4q1a.zip

Local

2022

moler-enamine-fragments

Ready

Extending molecular scaffolds with fragments

Pretrained

Generative

Compound

Single

Compound

String

List

1000 new molecules are sampled for each input molecule, preserving its scaffold.

Chemical graph model

Compound generation

https://github.com/ersilia-os/eos9taz

https://arxiv.org/abs/2103.03864

https://github.com/microsoft/molecule-generation

MIT

anamika-yadav99

16/11/2022

https://github.com/anamika-yadav99

https://hub.docker.com/r/ersiliaos/eos9taz

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9taz.zip

Local

2022

molt5-smiles-to-caption

Ready

MolT5-Translation between Molecules and Natural Language

MolT5 (Molecular T5) is a self-supervised learning framework pretrained on unlabeled natural language text and molecule strings with two end goals: molecular captioning (given a molecule, generate its description) and text-based de novo molecular generation (given a description, propose a molecule that matches it). This implementation is focused on molecular captioning.

Pretrained

Representation

Compound

Single

Text

String

Single

Description of a molecule

Chemical language model

Chemical notation

https://github.com/ersilia-os/eos2rd8

https://arxiv.org/abs/2204.11817

https://github.com/blender-nlp/MolT5

None

Amna-28

14/11/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos2rd8

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2rd8.zip

Local

2022

bayesian-drug-likeness

Ready

Drug-likeness prediction with Bayesian neural networks

To define drug-likeness, a set of 2136 approved drugs from DrugBank was taken as drug-like, and three negative datasets were selected from ZINC15 (19M), the Network of Organic Chemistry (6M) and ligands from the Protein Data Bank (13k), respectively. The drug dataset was combined with an equal subsampling of the negative dataset for each experiment, using five different molecular representations (Mold2, RDKit, MCS, EXFP4, Mol2Vec). We have re-trained it following the author’s specifications.

Retrained

Classification

Compound

Single

Probability

Float

Single

Drug-likeness probability

Drug-likeness

https://github.com/ersilia-os/eos9sa2

https://www.nature.com/articles/s42256-020-0209-y

https://github.com/Nanotekton/drugability/tree/v0.1

Non-commercial

Amna-28

9/11/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos9sa2

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9sa2.zip

Local

2022

molbloom

Ready

MolBloom: molecule purchasability in ZINC20

This model uses a Bloom filter to query the ZINC20 database to identify if a molecule is purchasable. A bloom filter is a space-efficient probabilistic data structure to identify whether an element is in a given set. Due to the nature of bloom filters, false negatives are not possible (i.e if the model returns False, the molecule is not purchasable). As stated by the author, if the model returns True the molecule is purchasable with an error rate of 0.0003 (according to the ZINC20 catalog).

Pretrained

Classification

Compound

Single

Boolean

String

Single

It returns a boolean (True/False) suggesting whether the molecule is commercially available or not.

ZINC

Compound generation

https://github.com/ersilia-os/eos8a5g

https://github.com/whitead/molbloom/blob/main/CITATION.cff

https://github.com/whitead/molbloom

MIT

Amna-28

2/11/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos8a5g

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8a5g.zip

Local

2022

mesh-therapeutic-use

Ready

MeSH therapeutic use based on chemical structure

Drug function, defined as Medical Subject Headings (MeSH) “therapeutic use” is predicted based on the chemical structure. 6955 non-redundant molecules, pertaining to one of the twelve therapeutic use classes selected, were downloaded from PubChem and used to train a binary classifier. The model provides the probability that a molecule has one of the following therapeutic uses: antineoplastic, cardiovascular, central nervous system (CNS), anti-infective, gastrointestinal, anti-inflammatory, derma

In-house

Classification

Compound

Single

Probability

Float

List

Probability that the molecule belongs to each therapeutic use specified.

Therapeutic indication

https://github.com/ersilia-os/eos238c

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6819987/

https://github.com/jgmeyerucsd/drug-class

GPL-3.0

Amna-28

17/10/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos238c

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos238c.zip

Local

2022

admetlab-2

Ready

ADMETlab-2

ADMETLab2 is the improved version of ADMETLab, a suite of models for systematic evaluation of ADMET properties. ADMETLab2 provides predictions on 17 physicochemical properties, 13 medicinal chemistry properties, 23 ADME properties, 27 toxicity endpoints and 8 toxicophore rules. The code and training data are not released, using this model posts predictions to the ADMETLab2 online server. The Ersilia Model Hub also offers ADMETLab (v1) as a downloadable package for IP-sensitive queries.

Online

Regression

Compound

Single

Experimental value

Probability

Float

List

Predicted relevant ADMET properties, Tox21 outcomes, physicochemical properties and drug-likeness. Outputs are of mixed type, including classification (labels) and continuous values.

Toxicity

ADME

Lipophilicity

Solubility

Permeability

https://github.com/ersilia-os/eos2v11

https://academic.oup.com/nar/article/49/W1/W5/6249611?login=false

https://admetmesh.scbdd.com/

Proprietary

miquelduranfrigola

16/9/2022

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2v11

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2v11.zip

Local

2022

metabokiller

Ready

Carcinogenic potential of metabolites and small molecules

Carcinogenicity is a result of several potential effects on cells. This model predicts the carcinogenic potential of a small molecule based on their potential to induce cellular proliferation, genomic instability, oxidative stress, anti-apoptotic responses and epigenetic alterations. Metabokiller uses the Chemical Checker signaturizer to featurize the molecules, and the Lime package to provide interpretable results. Using Metabokiller, the authors screened a panel of human metabolites and exper

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability that the molecule has each of the specified carcinogenic properties

Toxicity

Cancer

Metabolism

https://github.com/ersilia-os/eos1579

https://doi.org/10.1038/s41589-022-01110-7

https://github.com/the-ahuja-lab/Metabokiller

Non-commercial

brosular

30/8/2022

https://github.com/brosular

https://hub.docker.com/r/ersiliaos/eos1579

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1579.zip

Local

2022

bidd-molmap-desc

Ready

Molecular maps based on broadly learned knowledge-based representations

Molecular representation of small molecules via descriptor-based molecular maps (images). The fingerprint-based molecular maps are available at eos59rr. These images can be used as inputs for an image-based deep learning model such as a convolutional neural network. The authors have demonstrated high performance of MolMap out-of-the-box with a broad range of tasks from MoleculeNet.

Pretrained

Generative

Compound

Single

Image

Descriptor

Float

Matrix

Image representation of a molecule. Each pixel represents a molecular feature

Descriptor

https://github.com/ersilia-os/eos6m4j

https://www.nature.com/articles/s42256-021-00301-6

https://github.com/shenwanxiang/bidd-molmap

GPL-3.0

miquelduranfrigola

25/8/2022

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos6m4j

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6m4j.zip

Local

2022

maip-malaria

Ready

MAIP: antimalarial activity prediction

Prediction of the antimalarial potential of small molecules. This model is an ensemble of smaller QSAR models trained on proprietary data from various sources, up to a total of >7M compounds. The training sets belong to Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library. The code and training data are not released, using this model posts predictions to the MAIP online server. The Ersilia Model Hub also offers MAIP-surrogate as a downloadable package for IP-sensitiv

Online

Classification

Compound

Single

Score

Float

Single

Higher score indicates higher antimalarial potential

P.falciparum

Malaria

Antimicrobial activity

https://github.com/ersilia-os/eos4zfy

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00487-2

https://www.ebi.ac.uk/chembl/maip/

None

Amna-28

18/8/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos4zfy

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4zfy.zip

Yes

Local

2022

chembl-similarity

Ready

Similarity search in ChEMBL

Given a molecule, this model looks for its 100 nearest neighbors in the ChEMBL database, according to ECFP4 Tanimoto similarity. Due to size constraints, the model redirects queries to the ChEMBL server, so when using this model predictions are posted online.

Online

Similarity

Compound

Single

Compound

String

List

List of 100 nearest neighbors

ChEMBL

Similarity

https://github.com/ersilia-os/eos2a9n

https://www.frontiersin.org/articles/10.3389/fchem.2020.00046/full

http://130.92.106.217:8080/chemblMuti.v1/

None

Amna-28

18/8/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos2a9n

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2a9n.zip

Local

2022

medchem17-similarity

Ready

Similarity search in ChEMBL, DrugBank and UNPD

Given a molecule, this model for its 100 nearest neighbors, according to ECFP4 Tanimoto similarity, in the medicinal chemistry database ChEMBL17_DrugBank17_UNPD17. This combined database contains all the compounds from the three collections (DrugBank, ChEMBL22 and Universal natural product directory (UNPD)) with up to 17 heavy atoms. It features a total of 128k compounds. The whole ChEMBL17_DrugBank17_UNPD17 database is not downloaded with the model, by using it you post queries to an online ser

Online

Similarity

Compound

Single

Compound

String

List

List of 100 nearest neighbors

Similarity

ChEMBL

DrugBank

https://github.com/ersilia-os/eos9c7k

https://onlinelibrary.wiley.com/doi/abs/10.1002/minf.201900031

https://gdb-medchem-simsearch.gdb.tools/

None

Amna-28

18/8/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos9c7k

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9c7k.zip

Local

2022

gdbmedchem-similarity

Ready

GDBMedChem similarity search

The model looks for 100 nearest neighbors of a given molecule, according to ECFP4 Tanimoto similarity, in the GDBMedChem database. GDBMedChem is a 10M molecule-sampling from GDB17, a database containing all the enumerated molecules of up to 17 atoms heavy atoms (166.4B molecules). GDBMedChem compounds have reduced complexity and better synthetic accessibility than GDB17 but retain high sp3 carbon fraction and natural product likeness, providing a database of diverse molecules for drug design. Th

Online

Similarity

Compound

Single

Compound

String

List

List of 100 nearest neighbors

Similarity

ChEMBL

https://github.com/ersilia-os/eos7jlv

https://onlinelibrary.wiley.com/doi/abs/10.1002/minf.201900031

https://gdb-medchem-simsearch.gdb.tools/

None

Amna-28

18/8/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos7jlv

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7jlv.zip

Local

2022

gdbchembl-similarity

Ready

GDBChEMBL similarity search

The model looks for 100 nearest neighbors of a given molecule, according to ECFP4 Tanimoto similarity, in the GDBChEMBL database. GDBChEMBL is a 10M molecule-sampling from GDB17, a database containing all the enumerated molecules of up to 17 atoms heavy atoms (166.4B molecules). GDBChEMBL compounds were selected using a ChEMBL-likeness score, with the objective of having a collection with higher synthetic accessibility and high bioactivity while maintaining continuous coverage of the GDB17 chemi

Online

Similarity

Compound

Single

Compound

String

List

List of 100 nearest neighbors

Similarity

ChEMBL

https://github.com/ersilia-os/eos4b8j

https://www.frontiersin.org/articles/10.3389/fchem.2020.00046/full

https://gdb-chembl-simsearch.gdb.tools/

None

Amna-28

15/8/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos4b8j

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4b8j.zip

Local

2022

chemical-vae

Ready

Variational autoencoder for small molecule generation

This variational autoencoder (VAE) for chemistry uses an encoder-decoder-predictor framework to predict new small molecules. The input SMILES molecule is converted into a continuous vector, and the decoder converts this molecular representation back to a discrete SMILES. These continuous molecular representations allow for simple operations to generate new chemical matter. The decoder is constrained to produce valid molecules. In addition, a predictor estimates the chemical properties of the mol

Pretrained

Generative

Compound

Single

Compound

String

List

Compounds generated based on the input molecule

Compound generation

https://github.com/ersilia-os/eos3ae7

https://pubs.acs.org/doi/10.1021/acscentsci.7b00572

https://github.com/aspuru-guzik-group/chemical_vae

Apache-2.0

brosular

13/8/2022

https://github.com/brosular

https://hub.docker.com/r/ersiliaos/eos3ae7

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3ae7.zip

Local

2022

chemnet-distance

Ready

FCD: Fréchet ChemNet Distance to evaluate generative models

The Fréchet ChemNet distance is a metric to evaluate generative models. It unifies, in a single score, whether the generated molecules are valid according to chemical and biological properties as well as their diversity from the training set. The score measures the Fréchet Inception Distance between molecules represented by ChemNet, a deep neural network trained to predict biological and chemical properties of small molecules.

Pretrained

Similarity

Compound

Pair of Lists

Distance

Float

Single

Frechet ChemNet Distance (FCD). Higher FCD indicates higher difference to the training set

Similarity

Bioactivity profile

Compound generation

https://github.com/ersilia-os/eos9be7

https://pubs.acs.org/doi/10.1021/acs.jcim.8b00234

https://github.com/bioinf-jku/FCD

LGPL-3.0

brosular

12/8/2022

https://github.com/brosular

https://hub.docker.com/r/ersiliaos/eos9be7

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9be7.zip

Local

2022

bayesherg

Ready

BayeshERG: hERG channel blockade

BayeshERG is a predictor of small molecule-induced blockade of the hERG ion channel. To increase its predictive power, the authors pretrained a bayesian graph neural network with 300,000 molecules as a transfer learning exercise. The pretraining set was obtained from Du et al, 2015, and the fine tuning dataset is a collection of 14,322 molecules from public databases (8488 positives and 5834 negatives). The model was validated on external datasets and experimentally, from 12 selected compounds (

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of hERG channel blockade. The cut-off used in the training set to define hERG blockade was IC50 <= 10 μM

hERG

Toxicity

Cardiotoxicity

https://github.com/ersilia-os/eos4tcc

https://academic.oup.com/bib/article-abstract/23/4/bbac211/6609519

https://github.com/GIST-CSBL/BayeshERG

GPL-3.0

azycn

10/8/2022

https://github.com/azycn

https://hub.docker.com/r/ersiliaos/eos4tcc

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4tcc.zip

Local

2022

rexgen

Ready

Organic reaction outcome prediction

Utilizes a Weisfeiler-Lehman network (attentive mechanism) to predict the products of an organic reaction given the reactants. The model identifies the reaction centers (set of atoms/bonds that change from reactant to product) and obtains the products directly from a graph-based neural network.

Pretrained

Generative

Compound

List

Compound

String

Flexible List

Products of an organic reaction

Chemical synthesis

https://github.com/ersilia-os/eos5qfo

https://arxiv.org/pdf/1709.04555v3.pdf

https://github.com/connorcoley/rexgen_direct

GPL-3.0

svolk19-stanford

8/8/2022

https://github.com/svolk19-stanford

https://hub.docker.com/r/ersiliaos/eos5qfo

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5qfo.zip

Local

2022

deepsmiles

Ready

DeepSMILES, an alternate SMILES representation for deep learning

DeepSMILES converts a SMILES string to a more accurate syntax for molecule representation, taking into account both the branches (closed parenthesis in the SMILES strings) and rings (using a single symbol at ring closure that also indicates ring size). This syntax is particularly suitable in generative models, when the output is a SMILES string. With DeepSMILES, scientists can train a network using this new syntax, generate new molecules represented as DeepSMILES and then decode them back to nor

Pretrained

Representation

Compound

Single

Compound

String

Single

String representing a DeepSMILES

Chemical language model

Chemical notation

https://github.com/ersilia-os/eos2mrz

https://chemrxiv.org/engage/api-gateway/chemrxiv/assets/orp/resource/item/60c73ed6567dfe7e5fec388d/original/deep-smiles-an-adaptation-of-smiles-for-use-in-machine-learning-of-chemical-structures.pdf

https://github.com/baoilleach/deepsmiles

MIT

brosular

28/7/2022

https://github.com/brosular

https://hub.docker.com/r/ersiliaos/eos2mrz

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2mrz.zip

Local

2022

admetlab

Ready

ADMETlab models for evaluation of drug candidates

A series of models for the systematic ADMET evaluation of drug candidate molecules. Models include blood-brain barrier penetration; inhibition and substrate affinity for CYP1A2, CYP2C9, CYP2C19, CYP2D6, CYP3A4, and pgp; F 20% and F 30% bioavailability; human intestinal absorption; Ames mutagenicity; skin sensitization; plasma protein binding; volume distribution; LD50 of acute toxicity; human hepatotoxicity; hERG blocking; clearance; half-life; Papp (caco-2 permeability); LogD distribution coeff

Pretrained

Classification

Compound

Single

Experimental value

Float

List

Regression models provide a numerical result (LogS (log mol/L), LogP (distribution coefficient), Papp (Caco-2 permeability in cm/s), PPB (%)). Classifications provide the probability of activity according to ADMETlab thresholds.

ADME

Toxicity

Lipophilicity

Solubility

Permeability

https://github.com/ersilia-os/eos2re5

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0283-x

https://github.com/ifyoungnet/ADMETlab

GPL-3.0

svolk19-stanford

28/7/2022

https://github.com/svolk19-stanford

https://hub.docker.com/r/ersiliaos/eos2re5

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2re5.zip

Local

2022

deepherg

Ready

Classification of hERG blockers and nonblockers

This model used a multitask deep neural network (DNN) to predict the probability that a molecule is a hERG blocker. It was trained using 7889 compounds with experimental data available (IC50). The checkpoints of the pretrained model were not available, therefore we re-trained the model using the same method but without mol2vec featuriztion. Molecule featurization was instead done with Morgan fingerprints. Six models were tested, with several thresholds for negative decoys (10, 20, 40, 60, 80 and

Retrained

Classification

Compound

Single

Probability

Float

Single

Probability of hERG blockade. Actives are defined as IC50<10, inactives are defined as IC50>80

Toxicity

hERG

Cardiotoxicity

https://github.com/ersilia-os/eos30gr

https://pubs.acs.org/doi/full/10.1021/acs.jcim.8b00769

https://github.com/ChengF-Lab/deephERG

None

azycn

22/7/2022

https://github.com/azycn

https://hub.docker.com/r/ersiliaos/eos30gr

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos30gr.zip

Local

2022

aizynthfinder

Ready

Retrosynthesis planning

A tool for planning retrosynthesis of a target molecule based on template reactions and a stock of precursors. The algorithm breaks down the input molecule into purchasable blocks until it has been completely solved.

Pretrained

Generative

Compound

Single

Score

String

Float

Flexible List

The fraction of solved precursors and the number of reactions required for synthesis. Close to 1.0 for a solved compound, less than 0.8 for unsolved.

Synthetic accessibility

Chemical synthesis

https://github.com/ersilia-os/eos526j

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00472-1

https://github.com/MolecularAI/aizynthfinder

MIT

svolk19-stanford

19/7/2022

https://github.com/svolk19-stanford

https://hub.docker.com/r/ersiliaos/eos526j

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos526j.zip

Local

2022

selfies

Ready

SELF-referencIng Embedded Strings

String representation of small molecules that is more robust than SMILES, since, by design, all SELFIES strings are valid molecules. It is particularly helpful when applied in generative models, as all the SELFIES proposed are valid molecules. The authors also found that on generative models, SELFIES produces more diverse molecules than compared to SMILES.

Pretrained

Representation

Compound

Single

Compound

String

Single

String representation of a molecule (SELFIE)

Chemical notation

Chemical language model

Compound generation

https://github.com/ersilia-os/eos6pbf

https://arxiv.org/pdf/1905.13741

https://github.com/aspuru-guzik-group/selfies

Apache-2.0

brosular

14/7/2022

https://github.com/brosular

https://hub.docker.com/r/ersiliaos/eos6pbf

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6pbf.zip

Local

2022

pkasolver

Ready

Microstate pKa values

This model employs transfer learning with graph neural networks in order to predict micro-state pKa values of small molecules. The model enumerates the molecule's protonation states and predicts its pKa values. It was trained in two phases, first, using a large ChEMBL dataset and then fine-tuning the model for a small training set of molecules with available pKa values. The model in this repository is the pkasolver-light, which does not require an Epik license and is limited to monoprotic molecu

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

Acidity of a molecule (lower pKa indicates stronger acid)

pKa

ADME

https://github.com/ersilia-os/eos2b6f

https://www.biorxiv.org/content/10.1101/2022.01.20.476787v1

https://github.com/mayrf/pkasolver

MIT

svolk19-stanford

13/7/2022

https://github.com/svolk19-stanford

https://hub.docker.com/r/ersiliaos/eos2b6f

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2b6f.zip

Local

2022

grover-qm8

Ready

Electronic spectra and excited state energy

Prediction of the electronic spectra and excited state energy of small molecules. The training set is the QM8 from Molecule Net, where the electronic properties have been calculated by multiple quantum mechanic methods. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER)

Pretrained

Regression

Compound

Single

Other value

Float

List

Predicted electronic spectra and excited state energy

MoleculeNet

Chemical graph model

Quantum properties

https://github.com/ersilia-os/eos3xip

https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos3xip

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3xip.zip

Yes

Local

2022

grover-qm7

Ready

Atomization energy of small molecules

The model predicts the atomization energy of a molecule. It has been trained using the QM7 dataset from MoleculeNet, a subset of GDB13 containing all molecules up to 23 atoms (7 heavy atoms + C, S, O, N). This dataset contains the computed atomization energy of 7165 molecules. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER)

Pretrained

Regression

Compound

Single

Other value

Float

Single

Atomization energy of the molecue

MoleculeNet

Chemical graph model

Quantum properties

https://github.com/ersilia-os/eos6o0z

https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos6o0z

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6o0z.zip

Yes

Local

2022

grover-lipo

Ready

Octanol/water distribution coefficient

Prediction of octanol/water distribution coefficient (logD at pH 7.4) trained using the Lipophilicity Molecule Net dataset. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER)

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

Predicted logD at pH 7.4

MoleculeNet

Lipophilicity

ADME

LogD

Chemical graph model

https://github.com/ersilia-os/eos85a3

https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos85a3

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos85a3.zip

Yes

Local

2022

grover-esol

Ready

Water solubility

Prediction of water solubility data (log solubility in mols per litre) for common organic small molecules. trained using the Molecule Net ESOL dataset.

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

Log Solubility (Mols/Litre)

Solubility

MoleculeNet

ADME

LogS

Chemical graph model

https://github.com/ersilia-os/eos8451

https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos8451

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8451.zip

Yes

Local

2022

grover-freesolv

Ready

Hydration free energy of small molecules in water

Model based on experimental and calculated hydration free energy of small molecules in water, the FreeSolv dataset from MoleculeNet. Hydration free energies are relevant to understand the binding interaction between a molecule (in solution) into its binding site. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER).

Pretrained

Regression

Compound

Single

Other value

Float

Single

Calculated Hydration Free energy in kcal/mol

MoleculeNet

Chemical graph model

Quantum properties

https://github.com/ersilia-os/eos157v

https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos157v

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos157v.zip

Yes

Local

2022

grover-toxcast

Ready

ToxCast toxicity panel

Prediction across the ToxCast toxicity panel, containing hundreds of toxicity outcomes, as part of the MoleculeNet benchmark. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER)

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of toxicity against 617 biological targets

Toxicity

ToxCast

Chemical graph model

https://github.com/ersilia-os/eos481p

https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos481p

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos481p.zip

Yes

Local

2022

grover-bace

Ready

BACE-1 inhibition

Prediction of Beta-secretase 1 (BACE-1) inhibition. BACE-1 is expressed mainly in neurons and has been involved in the development of Alzheimer's disease. This model has been trained on the BACE dataset from MoleculeNet using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER).

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability that the molecule is a BACE-1 inhibitor (using a 0.1 uM cut-off)

Alzheimer

BACE

MoleculeNet

Chemical graph model

https://github.com/ersilia-os/eos2mhp

https://arxiv.org/abs/2007.02835

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos2mhp

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2mhp.zip

Yes

Local

2022

grover-clintox

Ready

Toxicity at clinical trial stage

Using the Molecule Net dataset ClinTox, the authors trained a classification model to predict the likelihood of failure in clinical trials due to toxicity. The dataset has been built using FDA approved drugs (non-toxic) and a set of drugs that have failed at advanced clinical trial stages. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER).

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability that a molecule is approved by the FDA and probability that a molecule shows toxicity in clinical trials

Toxicity

MoleculeNet

Chemical graph model

Side effects

https://github.com/ersilia-os/eos6fza

https://arxiv.org/abs/2007.02835

https://github.com/tencent-ailab/grover

MIT

Amna-28

13/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos6fza

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6fza.zip

Yes

Local

2022

grover-tox21

Ready

Predicts activity of compounds accross the Tox21 panel

Predicts activity of compounds in the Tox21 toxicity panel, comprising of 12 toxicity pathways, as part of the MoleculeNet benchmark datasets. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER)

Pretrained

Classification

Compound

Single

Probability

Float

List

Toxicity measurements against 12 biological targets

Tox21

Toxicity

Chemical graph model

https://github.com/ersilia-os/eos5smc

https://papers.nips.cc/paper/2020/file/94aef38441efa3380a3bed3faf1f9d5d-Paper.pdf

https://github.com/tencent-ailab/grover

MIT

Amna-28

12/7/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos5smc

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5smc.zip

Yes

Local

2022

sa-score

Ready

Synthetic accessibility score

Estimation of synthetic accessibility score (SAScore) of drug-like molecules based on molecular complexity and fragment contributions. The fragment contributions are based on a 1M sample from PubChem and the molecular complexity is based on the presence/absence of non-standard structural features. It has been validated comparing the SAScore and the estimates of medicinal chemist experts for 40 molecules (r2 = 0.89). The SAScore has been contributed to the RDKit Package.

Pretrained

Regression

Compound

Single

Score

Float

Single

Low scores indicate higher synthetic accessibility

Synthetic accessibility

Chemical synthesis

https://github.com/ersilia-os/eos9ei3

https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-1-8

https://github.com/rdkit/rdkit/tree/master/Contrib/SA_Score

BSD-3.0

https://eos9ei3-tkreo.ondigitalocean.app/

miquelduranfrigola

10/7/2022

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos9ei3

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9ei3.zip

https://ersilia-app-cubsw.ondigitalocean.app/?model_id=eos9ei3

Online

2022

chemtb

Ready

Mycobacterium tuberculosis inhibitor prediction

Identification of active molecules against Mycobacterium tuberculosis using an ensemble of data from ChEMBL25 (Target IDs 360, 2111188 and 2366634). The final model is a stacking model integrating four algorithms, including support vector machine, random forest, extreme gradient boosting and deep neural networks.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of M.tb inhibition (measured as IC50 at cut-off 5 uM)

M.tuberculosis

IC50

Tuberculosis

Antimicrobial activity

https://github.com/ersilia-os/eos46ev

https://academic.oup.com/bib/article-abstract/22/5/bbab068/6209685

http://cadd.zju.edu.cn/chemtb/

None

Amna-28

28/6/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos46ev

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos46ev.zip

Yes

Local

2022

ssl-gcn-tox21

Ready

Toxicity prediction across the Tox21 panel with semi-supervised learning

Toxicity prediction across the Tox21 panel from MoleculeNet, comprising 12 toxicity pathways. The model uses the Mean Teacher Semi-Supervised Learning (MT-SSL) approach to overcome the low number of data points experimentally annotated for toxicity tasks. For the MT-SSL, Tox21 (831 compounds and 12 different endpoints) was used as labeled data and a selection of 50K compounds from other MoleculeNet datasets was used as unlabeled data.

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of toxicity across 12 tasks defined in Tox21

Tox21

Toxicity

MoleculeNet

https://github.com/ersilia-os/eos69p9

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00570-8

https://github.com/chen709847237/SSL-GCN

None

Amna-28

16/6/2022

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos69p9

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos69p9.zip

Local

2022

coprinet-molecule-price

Ready

Small molecule price prediction

CoPriNet has been trained on 2D graph representations of small molecules with their associated price in the Mcule catalog. The predicted price provides a better overview of the compound availability than standard synthetic accessibility scores or retrosynthesis tools. The Mcule catalog is proprietary but the trained model as well as the test dataset (100K) are publicly available.

Pretrained

Regression

Compound

Single

Other value

Float

Single

Price value prediction

Price

Compound generation

Chemical synthesis

https://github.com/ersilia-os/eos7a45

https://pubs.rsc.org/en/content/articlelanding/2023/dd/d2dd00071g

https://github.com/oxpig/CoPriNet

MIT

anamika-yadav99

28/3/2022

https://github.com/anamika-yadav99

https://hub.docker.com/r/ersiliaos/eos7a45

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7a45.zip

Local

2022

compound-test-5

Test

Test model 5

Dummy

Compound

List

Dummy

Dummy model

Dummy

GPL-3.0

miquelduranfrigola

19/8/2022

https://github.com/miquelduranfrigola

Local

2022

compound-test-4

Test

Test model 4

Dummy

Compound

Single

Dummy

Dummy model

Dummy

GPL-3.0

miquelduranfrigola

21/7/2022

https://github.com/miquelduranfrigola

Local

2022

eos-template-test

Test

Test for the eos-template

This is a vanilla test for the eos-template

Dummy

Compound

Single

Dummy

Dummy model

Dummy

https://github.com/ersilia-os/eost00

GPL-3.0

miquelduranfrigola

12/7/2022

https://github.com/miquelduranfrigola

Local

2022

deepfl-logp

Ready

Membrane permeability of fluorescent probes

A deep neural network was trained to predict the LogP value of small molecules and fluorescent probes using an experimentally annotated dataset of >13k molecules (OPERA). This dataset was complemented with fluorescent probes to improve the model accuracy in this space. Probes predicted impermeant to cell membranes consistently showed experimental LogP <1.

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

LogP values of > 1 indicate membrane permeability

Permeability

ADME

LogP

https://github.com/ersilia-os/eos65rt

https://www.nature.com/articles/s41598-021-86460-3.epdf?sharing_token=zmYZd6qpwnDwc8tCOYGGf9RgN0jAjWel9jnR3ZoTv0OXuXXr_ZS6VuKQMyMJiA3PeIcqAJZTcpcNZJHblyChkQ2eTpzGXq23YsIcFlG8ayuEptKCJ1DeyIRGrh9O2d5JvvGGB9qG8cXgAuy_k-e1ncAMkAzpTegmR0XUbnftjv0%3D

https://github.com/k-soliman/DeepFl-LogP

GPL-3.0

miquelduranfrigola

10/11/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos65rt

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos65rt.zip

Local

2021

passive-permeability

Ready

Passive permeability based on simulations

Using Coarse Grained (CG) models, where several atoms are aggregated into a single bead, the authors obtain a set of 500,000 compounds with their simulated permeability across a single-component DOPC lipid bilayer. With this approach, the authors are able to cover a large and representative portion of the chemical space. We have used the data generated in this publication to train a simple regression model to predict compound permeability.

In-house

Regression

Compound

Single

Experimental value

Float

Single

Permeability coefficient (P). Cut-off: 6

Permeability

ADME

Papp

https://github.com/ersilia-os/eos2hbd

https://pubs.acs.org/doi/full/10.1021/acscentsci.8b00718?ref=recommended

None

miquelduranfrigola

10/11/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2hbd

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2hbd.zip

Yes

Local

2021

pampa-permeability

Ready

PAMPA effective permeability

The authors provide a dataset of 200 small molecules and their experimentally measured permeability in a PAMPA assay. Using this data, we have trained a model that predicts the logarithm of the effective permeability coefficient.

In-house

Regression

Compound

Single

Experimental value

Float

Single

logPe

Permeability

ADME

LogP

https://github.com/ersilia-os/eos97yu

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6651837/

None

miquelduranfrigola

10/11/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos97yu

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos97yu.zip

Yes

Local

2021

natural-product-fingerprint

Ready

Natural product fingerprint

The model uses a combination of two multilayer perceptron networks (baseline and auxiliar) and an autoencoder-like network to extract natural-product specific fingerprints that outperform traditional methods for molecular representation. The training sets correspond to the coconut database (NP) and the Zinc database (synthetic).

Pretrained

Representation

Compound

Single

Descriptor

String

List

Descriptor of a molecule

Natural product

Fingerprint

Descriptor

https://github.com/ersilia-os/eos6tg8

https://www.sciencedirect.com/science/article/pii/S2001037021003226?via%3Dihub#f0010

https://github.com/kochgroup/neural_npfp

None

miquelduranfrigola

3/11/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos6tg8

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6tg8.zip

Local

2021

maip-malaria-surrogate

Ready

MAIP distillation: antimalarial potential prediction

Prediction of the antimalarial potential of small molecules. This model was originally trained on proprietary data from various sources, up to a total of >7M compounds. The training sets belong to Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library. In this implementation, we have used a teacher-student approach to train a surrogate model based on ChEMBL data (2M molecules) to provide a lite downloadable version of the original MAIP

Retrained

Classification

Compound

Single

Score

Float

Single

Higher score indicates Higher antimalarial potential

P.falciparum

Malaria

Antimicrobial activity

https://github.com/ersilia-os/eos2gth

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00487-2

https://www.ebi.ac.uk/chembl/maip/

None

miquelduranfrigola

2/11/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2gth

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2gth.zip

Local

2021

syba-synthetic-accessibility

Ready

Bayesian prediction of synthetic accessibility

SYBA uses a fragment-based approach to classify whether a molecule is easy or hard to synthesize, and it can also be used to analyze the contribution of individual fragments to the total synthetic accessibility. The easy-to-synthesize dataset is an extract of the ZINC purchasable compounds, and the hard-to-synthesize dataset is generated using a Nonpher approach (introducing small molecular perturbations to transform molecules into more complex compounds). The fragments are calculated with ECFP8

Pretrained

Regression

Compound

Single

Score

Float

Single

Higher score indicates higher confidence that the molecule is synthetically available

Synthetic accessibility

Chemical synthesis

https://github.com/ersilia-os/eos7pw8

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00439-2

https://github.com/lich-uct/syba

GPL-3.0

miquelduranfrigola

25/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos7pw8

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7pw8.zip

Local

2021

natural-product-score

Ready

Natural product score

A simple score to distinguish between natural products (-like) and synthetic compounds. The score was calculated using an analysis of the structural features that distinguish natural products (NP) from synthetic molecules. NP structures were obtained from the CRC Dictionary of Natural products and synthetic molecules belong to an in-house collection. This method has been contributed to the RDKit package, Ersilia is simply implementing the RDKit NP_Score.

Pretrained

Regression

Compound

Single

Score

Float

List

Higher score indicates higher natural product likeness

Natural product

Drug-likeness

https://github.com/ersilia-os/eos8ioa

http://pubs.acs.org/doi/abs/10.1021/ci700286x

https://github.com/rdkit/rdkit/tree/master/Contrib/NP_Score

BSD-3.0

miquelduranfrigola

19/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos8ioa

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8ioa.zip

Local

2021

natural-product-likeness

Ready

Natural product likeness score

The model is a derivation of the natural product fingerprint (eos6tg8). In addition to generating specific natural product fingerprints, the activation value of the neuron that predicts if a molecule is a natural product or not can be used as a NP-likeness score. The method outperforms the NP_Score implemented in RDKit.

Pretrained

Regression

Compound

Single

Score

Float

Single

Higher score indicates higher natural product likeness

Natural product

Drug-likeness

https://github.com/ersilia-os/eos9yui

https://www.sciencedirect.com/science/article/pii/S2001037021003226?

https://github.com/kochgroup/neural_npfp

None

https://eos9yui-7xpw3.ondigitalocean.app/

miquelduranfrigola

19/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos9yui

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9yui.zip

https://ersilia-app-cubsw.ondigitalocean.app/?model_id=eos9yui

Online

2021

retrosynthetic-accessibility

Ready

Retrosynthetic accessibility score

Retrosynthetic accessibility score based on the computer aided synthesis planning tool AiZynthfinder. The authors have selected a ChEMBL subset of 200.000 molecules, and checked whether AiZinthFinder could identify a synthetic route or not. This data has been trained to create a classifier that computes 4500 times faster than the underlying AiZynthFinder. Molecules outside the applicability domain, such as the GBD database, need to be fine tuned to their use case.

Pretrained

Regression

Compound

Single

Score

Float

Single

Higher score indicates easier retrosynthetic accessibility

Synthetic accessibility

Chemical synthesis

https://github.com/ersilia-os/eos2r5a

https://pubs.rsc.org/en/content/articlelanding/2021/sc/d0sc05401a

https://github.com/reymond-group/RAscore

MIT

miquelduranfrigola

19/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2r5a

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2r5a.zip

Local

2021

soltrannet-aqueous-solubility

Ready

Aqueous solubility prediction

Fast aqueous solubility prediction based on the Molecule Attention Transformer (MAT). The authors used AqSolDB to fine-tune the MAT network to solubility prediction, achieving competitive scores in the Second Challenge to Predict Aqueous Solubility (SC2).

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

Predicted LogS (log of the solubility)

Solubility

ADME

LogS

https://github.com/ersilia-os/eos6oli

https://pubs.acs.org/doi/10.1021/acs.jcim.1c00331

https://github.com/gnina/SolTranNet

Apache-2.0

miquelduranfrigola

19/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos6oli

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6oli.zip

Yes

Local

2021

molgrad-ppb

Ready

Coloring molecules for plasma protein binding prediction

By combining a Message-Passing Graph Neural Network (MPGNN) and a Forward fully connected Neural Network (FNN) with an integrated gradients explainable artificial intelligence (XAI) method, the authors developed MolGrad and tested it on a number of ADME predictive tasks. MolGrad incorporates explainable features to facilitate interpretation of the predictions. In this model, they train MolGrad with data from a Plasma-protein binding assay (PPB) to predict the fraction bound in plasma of small mo

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

Fraction (%) bound in plasma

ADME

Fraction bound

Chemical graph model

https://github.com/ersilia-os/eos6ao8

https://pubs.acs.org/doi/10.1021/acs.jcim.0c01344

https://github.com/josejimenezluna/molgrad/

AGPL-3.0

miquelduranfrigola

19/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos6ao8

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos6ao8.zip

Yes

Local

2021

molgrad-herg

Ready

Coloring molecules for hERG blockade

By combining a Message-Passing Graph Neural Network (MPGNN) and a Forward fully connected Neural Network (FNN) with an integrated gradients explainable artificial intelligence (XAI) method, the authors developed MolGrad and tested it on a number of ADME predictive tasks. MolGrad incorporates explainable features to facilitate interpretation of the predictions.In this model, they train MolGrad with a dataset of hERG channel blockers/non-blockers to predict the cardiotoxicity of small molecules (I

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

pIC50 of hERG inhibition

hERG

Toxicity

Cardiotoxicity

Chemical graph model

https://github.com/ersilia-os/eos43at

https://pubs.acs.org/doi/10.1021/acs.jcim.0c01344

https://github.com/josejimenezluna/molgrad/

AGPL-3.0

https://eos43at-zqx9x.ondigitalocean.app/

miquelduranfrigola

19/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos43at

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos43at.zip

https://ersilia-app-cubsw.ondigitalocean.app/?model_id=eos43at

Yes

Online

2021

molgrad-caco2

Ready

Coloring molecules for Caco-2 cell permeability

By combining a Message-Passing Graph Neural Network (MPGNN) and a Forward fully connected Neural Network (FNN) with an integrated gradients explainable artificial intelligence (XAI) method, the authors developed MolGrad and tested it on a number of ADME predictive tasks. MolGrad incorporates explainable features to facilitate interpretation of the predictions. This model has been trained using experimental data on the permeability of molecules across Caco2 cell membranes (Papp, cm s-1)

Pretrained

Regression

Compound

Single

Experimental value

Float

Single

Log 10 of the Passive permeability in cm s-1

Permeability

ADME

Papp

Chemical graph model

https://github.com/ersilia-os/eos1af5

https://pubs.acs.org/doi/10.1021/acs.jcim.0c01344

https://github.com/josejimenezluna/molgrad/

AGPL-3.0

miquelduranfrigola

19/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos1af5

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1af5.zip

Yes

Local

2021

cardiotoxnet-herg

Ready

Ligand-based prediction of hERG blockade

A robust predictor for hERG channel blockade based on an ensemble of five deep learning models. The authors have collected a dataset from public sources, such as BindingDB and ChEMBL on hERG blockers and non-blockers. The cut-off for hERG blockade was set at IC50 < 10 uM for the classifier.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability that the compound inhibits hERG (IC50 < 10 uM)

hERG

Toxicity

Cardiotoxicity

https://github.com/ersilia-os/eos2ta5

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00541-z

https://github.com/Abdulk084/CardioTox

None

miquelduranfrigola

18/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2ta5

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2ta5.zip

Local

2021

molgrad-cyp3a4

Ready

Coloring molecules for interaction with CYP3A4

By combining a Message-Passing Graph Neural Network (MPGNN) and a Forward fully connected Neural Network (FNN) with an integrated gradients explainable artificial intelligence (XAI) method, the authors developed MolGrad and tested it on a number of ADME predictive tasks. MolGrad incorporates explainable features to facilitate interpretation of the predictions. This model has been trained using a ChEMBL dataset of CYP450 3A4 inhibitors (0) and non-inhibitors (1).

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability that the molecule is metabolized by Cyp3A4 (cut-off: 10 uM)

CYP450

ADME

Chemical graph model

https://github.com/ersilia-os/eos96ia

https://pubs.acs.org/doi/10.1021/acs.jcim.0c01344

https://github.com/josejimenezluna/molgrad/

GPL-3.0

miquelduranfrigola

18/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos96ia

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos96ia.zip

Yes

Local

2021

mycpermcheck

Ready

Membrane permeability in Mycobacterium tuberculosis

MycPermCheck predicts potential to permeate the Mycobacterium tuberculosis cell membrane based on physicochemical properties.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of permeability across the M.tb cell wall

Permeability

M.tuberculosis

ADME

Tuberculosis

https://github.com/ersilia-os/eos8d8a

https://academic.oup.com/bioinformatics/article/29/1/62/272745

https://www.mycpermcheck.aksotriffer.pharmazie.uni-wuerzburg.de/index.html

MIT

miquelduranfrigola

14/10/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos8d8a

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8d8a.zip

Yes

Local

2021

padel

Ready

PADEL small molecule descriptors

PaDEL is a commonly used molecular descriptor. It calculates 1875 molecular descriptors (1444 1D and 2D descriptors, 431 3D descriptors) and 12 types of fingerprints for small molecule representation. Originally developed in Java, here we provide PaDDELPy, its python implementation.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Vector representation of a molecule

Descriptor

https://github.com/ersilia-os/eos7asg

https://onlinelibrary.wiley.com/doi/10.1002/jcc.21707

https://github.com/ecrl/padelpy

MIT

miquelduranfrigola

27/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos7asg

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7asg.zip

Local

2021

smiles-transformer

Ready

SMILES transformer descriptor

Molecular embedding based on natural language processing. It converts SMILES into fingerprints using an unsupervised model pre-trained on a very large SMILES dataset from ChEMBL. The transformer is particularly well-suited for low-data drug discovery.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Vector representation of small molecules

Chemical language model

Descriptor

Embedding

https://github.com/ersilia-os/eos2lm8

https://arxiv.org/abs/1911.04738

https://github.com/DSPsleeporg/smiles-transformer

MIT

miquelduranfrigola

22/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2lm8

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2lm8.zip

Local

2021

mordred

Ready

Mordred chemical descriptors

A set of ca 1,800 chemical descriptors, including both RDKit and original modules. It is comparable to the well known PaDEL-Descriptors (see eos7asg), but has shorter calculation times and can process larger molecules.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Vector representation of a molecule

Descriptor

https://github.com/ersilia-os/eos78ao

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0258-y

https://github.com/mordred-descriptor/mordred

BSD-3.0

miquelduranfrigola

17/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos78ao

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos78ao.zip

Local

2021

rdkit-fingerprint

Ready

Path-based fingerprint

Path-based fingerprints calculated with the RDKit package Chem.RDKFingerprint. It is inspired in the Daylight fingerprint. As explained in the RDKit Book, the fingerprinting algorithm identifies all subgraphs in the molecule within a particular range of sizes, hashes each subgraph to generate a raw bit ID, mods that raw bit ID to fit in the assigned fingerprint size, and then sets the corresponding bit.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Vector representation of small molecules

Fingerprint

Descriptor

https://github.com/ersilia-os/eos7jio

https://www.rdkit.org/docs/RDKit_Book.html

https://github.com/rdkit/rdkit

BSD-3.0

miquelduranfrigola

17/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos7jio

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7jio.zip

Local

2021

molbert

Ready

MolBERT chemical language transformer

Molecular representation using the BERT language Transformer. The model has been pre-trained on the GuacaMol dataset (~1.6M molecules from ChEMBL), and can be fine-tuned to the desired QSAR tasks. It has been benchmarked in MoleculeNet.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Embedding representation of a molecule

Chemical language model

Embedding

Descriptor

https://github.com/ersilia-os/eos2thm

https://arxiv.org/abs/2011.13230

https://github.com/BenevolentAI/MolBERT

MIT

miquelduranfrigola

17/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos2thm

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos2thm.zip

Local

2021

rdkit-descriptors

Ready

Physicochemical descriptors available from RDKIT

A set of 200 physicochemical descriptors available from the RDKIT, including molecular weight, solubility and druggability parameters. We have used the DescriptaStorus selection of RDKit descriptors for simplicity.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Vector representation of small molecules

Descriptor

https://github.com/ersilia-os/eos8a4x

https://www.rdkit.org/docs/RDKit_Book.html

https://github.com/bp-kelley/descriptastorus

Proprietary

miquelduranfrigola

17/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos8a4x

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8a4x.zip

Local

2021

avalon

Ready

Avalon fingerprint

Avalon is a path-based substructure key fingerprint (1024 bits), developed for substructure screen-out when searching. It is part of the Avalon Chemoinformatics Toolkit and has also been implemented as an external RDKit tool.

Pretrained

Representation

Compound

Single

Descriptor

Integer

List

Bitvector representation of a molecule

Fingerprint

https://github.com/ersilia-os/eos8h6g

https://pubs.acs.org/doi/full/10.1021/ci050413p

https://github.com/rdkit/rdkit/tree/master/External/AvalonTools

BSD-3.0

miquelduranfrigola

14/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos8h6g

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos8h6g.zip

Local

2021

molecular-weight

Ready

Molecular weight

The model is simply an implementation of the function Descriptors.MolWt of the chemoinformatics package RDKIT. It takes as input a small molecule (SMILES) and calculates its molecular weight in g/mol.

Pretrained

Regression

Compound

Single

Other value

Float

Single

Calculated molecular weight (g/mol)

Molecular weight

https://github.com/ersilia-os/eos3b5e

https://www.rdkit.org/docs/RDKit_Book.html

https://github.com/rdkit/rdkit

BSD-3.0

miquelduranfrigola

13/9/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos3b5e

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3b5e.zip

CPU

Local

2021

morgan-counts

Ready

Morgan counts fingerprints

The Morgan Fingerprints, or extended connectivity fingerprints (ECFP4) are one of the most widely used molecular representations. They are circular representations (from an atom, search the atoms around with a radius n) and can have thousands of features. This implementation uses the RDKit package and is done with radius 3 and 2048 dimensions.

Pretrained

Representation

Compound

Single

Descriptor

Integer

List

Vector representation of a molecule

Fingerprint

Descriptor

https://github.com/ersilia-os/eos5axz

https://www.rdkit.org/docs/RDKit_Book.html

https://github.com/rdkit/rdkit

BSD-3.0

miquelduranfrigola

30/8/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos5axz

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos5axz.zip

Local

2021

whales-descriptor

Ready

Holistic molecular descriptors for scaffold hopping

Weighted Holistic Atom Localization and Entity Shape (WHALES) is a descriptors based on 3D structure to facilitate natural product featurization. It is aimed at scaffold hopping exercises from natural products to synthetic compounds

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Vector representation of a molecule

Natural product

Descriptor

https://github.com/ersilia-os/eos3ae6

https://www.nature.com/articles/s42004-018-0043-x

https://github.com/ETHmodlab/scaffold_hopping_whales

MIT

miquelduranfrigola

15/7/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos3ae6

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos3ae6.zip

Local

2021

grover-embedding

Ready

Large-scale graph transformer

GROVER is a self-supervised Graph Neural Network for molecular representation pretrained with 10 million unlabelled molecules from ChEMBL and ZINC15. The model provided has been pre-trained on 10 million molecules (GROVERlarge). GROVER has then been fine-tuned to predict several activities from the MoleculeNet benchmark, consistently outperforming other state-of-the-art methods for serveral benchmark datasets.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Embedding representation of a molecule

Chemical graph model

Embedding

Descriptor

https://github.com/ersilia-os/eos7w6n

https://papers.nips.cc/paper/2020/file/94aef38441efa3380a3bed3faf1f9d5d-Paper.pdf

https://github.com/tencent-ailab/grover

MIT

miquelduranfrigola

2/7/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos7w6n

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7w6n.zip

Yes

Local

2021

cc-signaturizer

Ready

Chemical Checker signaturizer

A set of 25 Chemical Checker bioactivity signatures (including 2D & 3D fingerprints, scaffold, binding, crystals, side effects, cell bioassays, etc) to capture properties of compounds beyond their structures. Each signature has a length of 128 dimensions. In total, there are 3200 dimensions. The signaturizer is periodically updated. We use the 2020-02 version of the signaturizer.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

2D projection of bioactivity signatures

Descriptor

Bioactivity profile

Embedding

https://github.com/ersilia-os/eos4u6p

https://www.nature.com/articles/s41467-021-24150-4

http://gitlabsbnb.irbbarcelona.org/packages/signaturizer

MIT

miquelduranfrigola

1/7/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos4u6p

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4u6p.zip

Local

2021

cdd-descriptor

Ready

Continuous and data-driven descriptors

Low dimension continuous descriptor based on a neural machine translation model. This model has been trained by inputting a IUPAC molecular representation to obtain its SMILES. The intermediate continuous vector representation encoded by when reading the IUPAC name is a representation of the molecule, containing all the information to generate the output sequence (SMILES). This model has been pretrained on a large dataset combining ChEMBL and ZINC.

Pretrained

Representation

Compound

Single

Descriptor

Float

List

Embedding representation of a molecule

Descriptor

Chemical language model

https://github.com/ersilia-os/eos7a04

https://pubs.rsc.org/en/content/articlelanding/2019/sc/c8sc04175j

https://github.com/jrwnter/cddd

MIT

miquelduranfrigola

1/7/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos7a04

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos7a04.zip

Local

2021

grover-sider

Ready

Adverse Drug Reactions

The model predicts the putative adverse drug reactions (ADR) of a molecule, using the SIDER database (MoleculeNet) that contains pairs of marketed drugs and their described ADRs. This model has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER).

Pretrained

Classification

Compound

Single

Probability

Float

List

Predicted ADRs classified in 27 groups

Toxicity

MoleculeNet

Side effects

https://github.com/ersilia-os/eos77w8

https://arxiv.org/abs/2007.02835

https://github.com/tencent-ailab/grover

MIT

Amna-28

4/6/2021

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos77w8

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos77w8.zip

Yes

Local

2021

grover-bbbp

Ready

Blood-brain barrier penetration

This model predicts the Blood-Brain Barrier (BBB) penetration potential of small molecules using as training data the curated MoleculeNet benchmark containing 2000 experimental data points. It has been trained using the GROVER transformer (see eos7w6n or grover-embedding for a detail of the molecular featurization step with GROVER).

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability that a molecule crosses the blood brain barrier

Permeability

MoleculeNet

Chemical graph model

Alzheimer

https://github.com/ersilia-os/eos1amr

https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

https://github.com/tencent-ailab/grover

MIT

Amna-28

4/6/2021

https://github.com/Amna-28

https://hub.docker.com/r/ersiliaos/eos1amr

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1amr.zip

Yes

Local

2021

chembl-multitask-descriptor

Ready

Multi-target prediction based on ChEMBL data

This is a ligand-based target prediction model developed by the ChEMBL team. They trained the model using pairs of small molecules and their protein targets, and produced a multitask predictor. The thresholds of activity where determined by protein families (kinases: <= 30nM, GPCRs: <= 100nM, Nuclear Receptors: <= 100nM, Ion Channels: <= 10μM, Non-IDG Family Targets: <= 1μM). Here we provide the model trained on ChEMBL_28, which showed an accuracy of 85%.

Pretrained

Classification

Compound

Single

Probability

Float

List

Probability of having the protein (identified by ChEMBL ID), as target

Bioactivity profile

Target identification

ChEMBL

https://github.com/ersilia-os/eos1vms

http://chembl.blogspot.com/2019/05/multi-task-neural-network-on-chembl.html

https://github.com/chembl/chembl_multitask_model/

None

miquelduranfrigola

4/6/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos1vms

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos1vms.zip

Local

2021

etoxpred

Ready

Toxicity and synthetic accessibility prediction

The eToxPred tool has been developed to predict, on one hand, the synthetic accessibility (SA) score, or how easy it is to make the molecule in the laboratory, and, on the other hand, the toxicity (Tox) score, or the probability of the molecule of being toxic to humans. The authors trained and cross-validated both predictors on a large number of datasets, and demonstrated the method usefulness in building virtual custom libraries.

Pretrained

Regression

Compound

Single

Score

Float

Single

Higher scores indicate easier synthetic accessibility and higher toxicity, respectively

Toxicity

Synthetic accessibility

https://github.com/ersilia-os/eos92sw

https://bmcpharmacoltoxicol.biomedcentral.com/articles/10.1186/s40360-018-0282-6

https://github.com/pulimeng/eToxPred

GPL-3.0

miquelduranfrigola

4/6/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos92sw

AMD64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos92sw.zip

Local

2021

chemprop-sars-cov-inhibition

Ready

SARS-CoV inhibition

This model was developed to support the early efforts in the identification of novel drugs against SARS-CoV2. It predicts the probability that a small molecule inhibits SARS-3CLpro-mediated peptide cleavage. It was developed using a high-throughput screening against the 3CL protease of SARS-CoV1, as no data was yet available for the new virus (SARS-CoV2) causing the COVID-19 pandemic. It uses the ChemProp model.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability of 3CL protease inhibition (%) The classifier was trained using a threshold of 12% of inhibition

COVID19

Antiviral activity

Sars-CoV-2

Chemical graph model

https://github.com/ersilia-os/eos9f6t

https://www.sciencedirect.com/science/article/pii/S0092867420301021

http://chemprop.csail.mit.edu/checkpoints

MIT

miquelduranfrigola

3/6/2021

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos9f6t

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos9f6t.zip

Yes

Local

2021

compound-test-3

Test

Test model 3

Dummy

Compound

Single

Dummy

Dummy model

Dummy

GPL-3.0

miquelduranfrigola

3/9/2021

https://github.com/miquelduranfrigola

Local

2021

compound-test-2

Test

Test model 2

Dummy

Compound

Single

Dummy

Dummy model

Dummy

GPL-3.0

miquelduranfrigola

3/9/2021

https://github.com/miquelduranfrigola

Local

2021

compound-test-1

Test

Test model 1

Dummy

Compound

Single

Dummy

Dummy model

Dummy

GPL-3.0

miquelduranfrigola

3/9/2021

https://github.com/miquelduranfrigola

Local

2021

chemprop-antibiotic

Ready

Broad spectrum antibiotic activity

Based on a simple E.coli growth inhibition assay, the authors trained a model capable of identifying antibiotic potential in compounds structurally divergent from conventional antibiotic drugs. One of the predicted active molecules, Halicin (SU3327), was experimentally validated in vitro and in vivo. Halicin is a drug under development as a treatment for diabetes.

Pretrained

Classification

Compound

Single

Probability

Float

Single

Probability that a compound inhibits E.coli growth. The inhibition threshold was set at 80% growth inhibition in the training set.

E.coli

IC50

Antimicrobial activity

Chemical graph model

https://github.com/ersilia-os/eos4e40

https://pubmed.ncbi.nlm.nih.gov/32084340/

http://chemprop.csail.mit.edu/checkpoints

MIT

https://eos4e40-rovva.ondigitalocean.app/

miquelduranfrigola

6/6/2018

https://github.com/miquelduranfrigola

https://hub.docker.com/r/ersiliaos/eos4e40

AMD64

ARM64

https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4e40.zip

https://ersilia-app-cubsw.ondigitalocean.app/?model_id=eos4e40

Yes

Local

2018

Alert

Lorem ipsum

Okay