Drug Formulation Development

Pharmaceutical formulation data include formulation compositions and manufacturing process. One of the main difficulties in formulation prediction is the small dataset with imbalanced input space resulting in overfitting and poor generalizations, because of the limited and unstandardized experimental data. Artificial intelligent methods can find out the intricate correlation between pharmaceutical formulations and in vitro/vivo characteristics.

AI-driven formulation platform supported by CD ComputaBio, is to enable targeted, smart novel drug candidates. Through the integration of machine learning, deep learning, quantum simulation, and high-throughput experimentation, our experts enable formulation scientists to rapidly, comprehensively and intelligently develop clinically differentiable products. The cross-disciplinary integration of pharmaceutics and artificial intelligence may shift the paradigm of pharmaceutical researches from experience-dependent studies to data-driven methodologies. These intelligent ways of working fundamentally transform drug development, and drug product lifecycle management, and ultimately bring more quality drug products to patients. AI methods, like artificial neural networks (ANNs) and deep learning strategies, can greatly speed the development, optimize formulations, save the cost, and keep products consistency.


Figure 1 Drug Formulation Development Platform

nomain-drag-pic1Iterative Learning Cycle of AI-driven Drug Formulation

  • nomain-title-log-pic2 Pharmaceutical datasets (formulation and experimental data extracted from Web of Science, training / validation / test datasets), and datasets selectionGeneration of massive volumes of highly accurate semantically consistent observational facts in the biomedical literature and other sources. Develop pre-curated vocabularies to enable lexical matching and to deal with the synonym variations across the data sources.
  • nomain-title-log-pic2 Molecular descriptors (molecular weight, XLogP3, hydrogen bond donor count, hydrogen bond acceptor count, rotatable bond count, topological polar surface area, heavy atom count, complexity and logS)
  • nomain-title-log-pic2 Process parameters (weight, thickness, tensile strength, elongation, folding endurance, actual drug content, granulation process, diameter, hardness, etc.)
  • nomain-title-log-pic2 Data splitting algorithm (random data, manual data)
  • nomain-title-log-pic2 Full-connected deep feed-forward network
  • nomain-title-log-pic2 Model systems (regression models construction)
  • nomain-title-log-pic2 Evaluation criteria and specific criteria for evaluating the model performance

Table 1 Recent progress of machine learning in formulation design (Yilong Yang, et al. 2018)

Based on numerous data, intelligent tools have been developed implementing bioinformatics and machine learning methods for drug research and discovery. CD ComputaBio also offers biomarker discovery and targeted proteomics services to researchers that want to benefit from our technology.


  • Deep Neural Networks (DNNs, with five and four hidden layers)
  • Artificial Neural Networks (ANNs)
  • Expert Systems (ESs)
  • Deep Convolutional Networks
  • Maximum Dissimilarity algorithm with the small group filter and representative initial set selection (MD-FIS)
  • Naive Bayes
  • Support Vector Machines (SVM)
  • Random forests (RF, ensemble learning method)
  • Multitask Deep Learning
  • One-shot Learning
  • Multiple Linear Regression (MLR, simple and easy to model)
  • Partial Least Squared Regression (PLSR)
  • k-Nearest Neighbors (k-NN)
  • Multivariate adaptive regression spline
  • Classification and regression tree
  • Hybrid systems of fuzzy logic and evolutionary computations
  • Cubist
  • Cross Validation (multi-fold)

Figure 1 The workflow of MD-FIS algorithm (Yilong Yang, et al. 2018)


  • Predict the water solubility of drugs, epoxidation reactivity of molecules, in vivo and in vitro characteristics, drug-induced liver injury, toxicity.
  • Design of pharmaceutical experiment (reinforcement learning improve process DoE)
  • Control the product quality in the whole product cycle
  • High-dimensional optimization based on different proportions of pharmaceutical excipients
  • Class II drugs R & D, extend the patent life of existing drugs (developing and patenting new formulations)

CD ComputaBio provides AI-powered solutions for drug formulation development according to clients' detailed requirements. Our platform improves the efficiency and success rate of drug discovery and development with the help of advanced technologies.

Online Inquiry

CD ComputaBio

Copyright © 2024 CD ComputaBio Inc. All Rights Reserved.