A guided workspace for predictive modelling, transfer learning, and multi-objective optimization in drug discovery.

No programming background or in-house infrastructure required.

What MergenKit delivers

Three principles, applied to every prediction.

MergenKit replaces fragmented manual routines with a unified pipeline. Each output is paired with the analytical context required for rigorous lead discovery, then preserved in a documentation structure aligned with regulated assessment frameworks. The platform exists because computational drug discovery should not require a dedicated software engineering team to run reproducibly.

Explainability

Every prediction is paired with an analysis that links model output to the underlying molecular features, presented through interactive visualisations alongside a built-in descriptor dictionary with mathematical formulation and chemical context for each term. A prediction without interpretation is a number without meaning.

Reproducibility

A unified pipeline replaces fragmented manual routines, minimising human-dependent variability and preventing procedural inconsistencies across iterative research. Configuration, preprocessing, and scaling parameters travel with every exported model, so the same study can be replicated externally or extended on a new compound batch.

Scientific rigour

Outputs are auditable and grounded in the physical and chemical context required for lead discovery. Applicability domain analysis accompanies every prediction so researchers can judge whether a molecule lies within the chemical space the model was trained on, and when a result needs confirmatory work.

Why this matters

Predictive chemistry, without the tooling tax.

Drug discovery teams today move data between separate tools at every stage: one environment for descriptor calculation, another for modelling, a third for interpretation, a fourth for report writing. Each handoff is an opportunity for procedural drift, for undocumented configuration choices, for the small inconsistencies that quietly undermine reproducibility.

For pharmaceutical R&D

Teams that already model structure-activity relationships in spreadsheets and notebooks gain a workflow that is rigorous by default. The platform's reporting layer produces records compatible with documentation frameworks used in regulated assessments, so the scientific record is ready for the discussions that follow it.

For biotech startups

Small teams do not always have the headcount for a dedicated cheminformatics platform engineer. MergenKit provides the workflow without the infrastructure burden, so a scientific founder can move from molecular structures to a defensible prediction in days rather than months of tool integration.

For academic groups

Doctoral and postdoctoral researchers gain a system where rigorous, reproducible modelling is the default rather than a discipline imposed on top of disconnected scripts. The reporting structure aligns with the documentation expectations of peer-reviewed cheminformatics work.

Built for

Computational scientists Modelling, interpretation, and reporting in one environment, with full control over descriptor selection, validation strategy, and export of every artefact produced during training.
Pharmaceutical R&D teams A standardised workflow without setting up infrastructure, training engineers, or stitching together five different tools for the stages of a single predictive chemistry study.
Academic drug discovery groups Rigorous, reproducible outputs with no programming background required, suitable for thesis-grade and publication-grade work and the documentation reviewers expect.

Documentation

Aligned with regulated frameworks.

Scientific reports follow established structures used in regulated assessments.

QMRF QPRF ICH M7 REACH

MergenKit generates model documentation (QMRF) and prediction documentation (QPRF) compatible with frameworks like ICH M7 and REACH. The QSAR Model Reporting Format structures the description of a model itself, including the training data, the validation strategy, and the applicability domain. The QSAR Prediction Reporting Format documents an individual prediction made using a documented model, including whether the input molecule lies inside the applicability domain. Both records are generated from the same configuration that drove the modelling run, so the documentation matches the science.

The platform does not act as a regulatory authority. Regulatory decisions remain the responsibility of the user organisation. The scientific reporting layer is built to support that decision-making, not to replace it. Read more about the reporting structures and the scientific principles applied in every run.

From input to report

One guided workflow, end to end.

Researchers start from molecular structures in SMILES format paired with target variables, or a previously prepared feature matrix. Inputs are validated, standardised, and deduplicated before scaffold-aware partitioning preserves methodological rigour. Modelling, interpretation, and reporting proceed through a single configuration interface, with applicability domain analysis attached to every prediction.

The three analytical modules share that same workflow infrastructure. Predictive modelling handles studies with sufficient labelled data; transfer learning addresses sparse-data cases by adapting a base model trained on open chemistry data to the target task; multi-objective optimization evaluates competing discovery goals such as activity, toxicity, and solubility concurrently, surfacing Pareto-optimal candidates for prioritisation. Explore the three analytical modules in detail, or read about the scientific principles that shape the platform's design.

No programming background or in-house infrastructure is required to operate the system. The workflow runs through configuration menus rather than code, so medicinal chemists and pharmacology team leads can drive a study end to end. Computational scientists keep full access to inspect, export, and extend every artefact the workflow produces, including the configuration that drove each run.

The platform handles the workflow plumbing so teams focus on hypothesis generation and candidate evaluation, which is where scientific judgement matters most. Configuration choices are recorded and exported with every study, so a result can be reproduced months later or audited by a colleague who was not present when the run was started.

Built around

A working methodology, not a feature list.

MergenKit's design encodes a set of practical commitments that come from research experience rather than product marketing. They show up in every part of the workflow.

Transparent interpretation

Feature attribution, the relevant descriptor dictionary entries, and the applicability domain check appear in the same interface as the prediction itself. Researchers do not move between three tools to interpret one result. Explanation is the default operating mode, not an optional analysis run after the fact.

Honest scope

Applicability domain analysis attaches to every prediction so a researcher knows when the model is operating within its training distribution and when it is not. Out-of-domain predictions are flagged rather than hidden, protecting the scientific record from quietly overconfident inferences.

Documented by design

The reporting layer is part of the workflow, not a separate tool added at the end. QMRF model documentation and QPRF prediction documentation are generated from the same configuration that drove the modelling run, so the record matches the run.

Each of these commitments has a long history in the QSAR and QSPR literature, and they are the working principles behind the platform's design. They are also the working assumptions of any rigorous scientific publication in computational chemistry, which is why MergenKit treats them as defaults rather than features. Read the longer treatment on the Science page, or learn about the founder's research background in computational chemistry, machine learning, and molecular modelling.

See MergenKit on your data.

Request a demo to discuss how the platform can support your discovery workflow. The conversation begins with your use case, not with a license sale.

Request a demo