Explore DrugBank’s
Machine Learning Solutions

DrugBank provides machine-learning ready, structured and curated datasets that allow for the exploration of different algorithms, approaches, and features. Train and evaluate your machine-learning models using our detailed, labelled datasets, or build predictive models for drug targets, side effects, toxicity, and drug-drug interactions. Our customers have seen success in building ML models for drug development and discovery.

Ml 1@2x

Use DrugBank Data to

Train your machine
learning models

with DrugBank’s foundation of authenticated machine-readable data

Enhance your
data pool

with structured information covering everything from chemical structures to approved indications

Build predictive
models

with a huge range of data features to select from

Explore our key datasets for Machine Learning

Access chemical structures and protein sequences for pre-clinical drugs, as well as every drug approved by the FDA and Health Canada with our chemical structures dataset. DrugBank includes structures of formulations and salts, and structures of drug metabolites to ensure customers are easily able to integrate the data into their models to make meaningful progress in drug discovery. The dataset is available in multiple formats including SDF, SMILES, and InChi and protein sequences are available in FASTA and include UniProt and Genbank identifiers.

Customers use our machine-readable pharmacology dataset for building similarity-based predictors, training predictive models, and developing intelligent drug development solutions. Our pharmacology dataset includes detailed descriptions of the mechanism of action, metabolism, absorption, distribution, elimination and pharmacokinetic and pharmacodynamic parameters such as half-life, clearance, and LD50.

The pharmacogenomics dataset includes data on SNP mediated adverse drug reactions and SNP mediated pharmacological effects, including a description of the effect, affected drugs, references, SNP IDs, and allele name, gene identifier and affected genotype and coverage of predicted markers for some pre-clinical drugs. In addition, the Structured Indication Dataset provides detailed information on genetic variants that are part of the approved indication. This is useful for customers looking to create efficiencies within their drug repurposing ML models.

Access chemical structures and protein sequences for pre-clinical drugs, as well as every drug approved by the FDA and Health Canada with our chemical structures dataset. DrugBank includes structures of formulations and salts, and structures of drug metabolites to ensure customers are easily able to integrate the data into their models to make meaningful progress in drug discovery. The dataset is available in multiple formats including SDF, SMILES, and InChi and protein sequences are available in FASTA and include UniProt and Genbank identifiers.

Our customers use the adverse effects dataset to build predictive models in their drug discovery solutions. This dataset includes more than 110,000 adverse effects linked to drugs, clinical trial data, drug labels, and post-market reporting, and also include incidence rates when available. Each listing includes the names and synonyms of the condition, and associated ICD10, MedDRA and SNOMED-CT identifiers to facilitate data integrations.

DrugBank offers an indication dataset that covers more than 10,000 drug indications approved by Health Canada and the FDA, as well as common off-label indications. They include a text description, type of indication, references to drug labels, clinical guidelines and scientific literature. Each condition is associated with ICD10, MedDRA and SNOMED-CT identifiers to facilitate data integrations, making it easy for our customers to build intelligent models.

DrugBank makes data integrations easy by providing extensive synonyms, external identifiers, formulations, salt forms and chemical structures. Our customers use these to facilitate data integration and cross-mapping with other datasets. External mappings include MedDRA, ICD-10, SNOMED-CT, Uniprot, PDB, UNII, CAS, InChI, InChIKey, NDC, NDA, EMA and ATC codes.

The drug metabolism dataset provides structured descriptions of every step in the metabolism of a drug including the enzymes involved and the chemical structures of every metabolite.

Healx

Healx integrates DrugBank into their internal databases, empowering them to use a wide range of data to train their drug repurposing algorithms. By using DrugBank datasets, Healx is able to lower the time and cost of their R&D and get repurposed drugs to market sooner.

Read the full case study

We’ve been very happy with the DrugBank data and service. The data is well structured and DrugBank is always very responsive to requests.
Richard Smith
Senior Software Developer, Healx
Contact background

Learn more about our
Machine Learning Solutions

Our products and services can be tailored to your company’s needs. Contact us today to talk about which solution is right for you.