BAGIM is an active community of Boston area scientists bringing together people from diverse fields of modeling and informatics to impact life and health sciences. BAGIM strives to create a forum for great scientific discussions covering a wide range of topics including data management, visualization, computational chemistry, drug discovery, protein structure, molecular modeling, structure-based drug design, data mining, software tools, and the sharing of goals and experiences. Our community is made up of participants from academia, government, and industry whose goal is to engage in the discussion of science involving a synthesis of theory and technology. Discussions sponsored by BAGIM are targeted to the needs and interests of informatics scientists, computational chemists, medicinal chemists, and statisticians. BAGIM also provides opportunities for networking within these disciplines as well as an arena for the dissemination of information of specific interest to the membership.

Saturday, March 27, 2021

Cambridge Crystallographic Data Centre presents...Greg Warren, Doree Sitkoff and Vera Prytkova

A BAGIM/SAGIM joint production.

Thursday, April 15, 2021

4:30 pm PT (UTC-7)

7:30 pm ET (UTC-4)

Sifting through poses: Applying crystal-based conformational health assessments in ligand docking

Doree Sitkoff, Principal Scientist in Computational Chemistry, Discovery Chemistry, Janssen Research and Development

Molecular docking is a primary computational chemistry tool for hypothesizing how organic ligands bind to macromolecular receptor sites. For a variety of reasons, however, docking software packages can sometimes suggest binding models in which the ligand is conformationally strained. A rapid and reliable way to identify such poses would provide added context when assessing docking results. Here, we calculate a ligand geometry fitness score which coarsely estimates the torsional health of a docked pose, as a supplement to the calculated docking score.  The geometry fitness score is based on the Mogul knowledge base of molecular conformations derived from the Cambridge Structural Database (CSD). Applications of the added geometrical score in examining preferred binding poses, and in virtual screening are discussed.

Sifting through the PDB: Would you prefer a diamond or coprolite for that engagement ring?

Gregory Warren, Director of Computational Chemistry, DeepCure

Historically computational chemists have paid little attention to the metrics of X-ray and neutron diffraction crystal structures outside resolution. While resolution is a useful (easily obtained) metric it is not sufficient.  This presentation will discuss metrics that are more applicable for assessing the quality and/or reliability of structures prior to selection for use. Data showing how structure choice affects docking and ligand strain estimates will be presented.

Structural databases in drug discovery: extracting useful information from the CSD and the PDB

Vera Prytkova, Research and Applications Scientist, CCDC

Knowledge of molecular conformation and interactions derived from small-molecule and protein structures can have significant impact in drug discovery. Structural databases can be mined to identify patterns of interaction or potential scaffold hops to design novel motifs and retrieve a diversity of ligand topologies. Statistically significant information about molecular conformations and intermolecular interactions can help a researcher evaluate the probability of observing a particular conformation of a newly designed drug in the binding site.  At last, conformational analysis of a potential drug candidate allows to perform the stability analysis for solid form development. In this presentation specific aspects and examples of insights derived from structural databases useful for drug discovery efforts will be presented.

Tuesday, January 12, 2021

Steven Kearnes: Pursuing a Prospective Perspective

 Join us on Tuesday, January 26, 2021 - 12:00 PM to 1:30 PM EST

We spend a lot of time building models and comparing them to other models. The field generally agrees that forward-looking predictions are the best validation, but even prospective validation can be misleading if it does not consider the actual deployment context: How does the model affect compound selection? What controls are in place to ensure that downstream uses of model predictions are consistent? As a concrete example, I'll review our recent work with DNA-encoded libraries. I'll conclude with a summary of the Open Reaction Database, a large-scale data gathering effort with some interesting possibilities for prospective applications.

Presentation https://youtu.be/6IDKEIln1JM

Wednesday, November 4, 2020

Andrew Dalke: Cheminformatics before there was Cheminformatics

 Join us on Thursday, October 29, 2020

Cheminformatics Before "Cheminformatics"

The roots of modern cheminformatics, which I regard as the machine-based organization, analysis, and prediction of chemical structures and their properties, especially as applied to drug discovery, stretch back to the 1940s when punched cards were used to find structure-activity relationships and to search for chemical documentation. Computerization started in the 1950s, with large-scale projects taking place in the 1960s and commercial systems by the 1970s. The process required developing new forms of chemical nomenclature, new methods for data entry, and new algorithms like canonicalization and substructure matching. These were primarily funded by the cost savings for finding known chemical information. While it has long been known that similar molecules tend to have similar properties, it wasn't until the 1980s that there was widespread research into different ways to automate similarity and apply it to property prediction. In parallel, Moore's law made it possible to work with corporate-sized data sets in memory, allowing an orders-of-magnitude increase in effective performance and enabling the large-scale clustering and other early machine learning projects of the 1990s, and in turn helping the field transform from chemical documentation and chemical information management to cheminformatics.

My presentation will highlight some of the people, ideas, hardware, software, and economic factors from the 50 years of cheminformatics before the modern label was coined and hopefully show both how far we've come and how little things have changed.

Andrew Dalke started with molecular dynamics in 1992 as a summer programming job at Florida State University before starting graduate school at UIUC. He joined Klaus Schulten's group where he was one of the co-authors of VMD and NAMD. He was the full-time VMD developer for two years then went west to the Bay Area to develop modeling and bioinformatics software for the Molecular Applications Group then to the southwest to Santa Fe to develop cheminformatics software to support machine learning at Bioreason. At the same time he promoted Python and open source software in bioinformatics and cheminformatics, including co-founding the Biopython project and supporting the Open Bioinformatics Foundation. He has been an independent consultant since 1999. His income comes from custom software development for cheminformatics, Python training for cheminformatians, and sales of chemfp, a high-performance fingerprint similarity search package.

Andrew lives in Trollhättan, Sweden

Presentation https://youtu.be/y6dUkCxlrd8

Future Directions in Medicinal Chemistry

 September 29: Noon GMT-5 (Eastern Standard Time)

In collaboration with Novartis, BAGIM is pleased to announce a our first multi-continent panel on current issues in the pursuit of medicinal chemistry.

Medicinal chemistry is a challenging research area crossing multiple scientific boundaries. Finding and optimizing chemical entities toward a drug candidate is a process in which complex and multi-factorial problems are addressed. Not only do potent compounds need to be synthetically feasible, but they also need to fulfill an acceptable balance between efficacy, and safety, while being unique enough to be marketable.

Today’s medicinal chemist needs to be akin to a renaissance man, knowledgeable in different fields, able to work effectively with other scientists on drug discovery campaigns, able to hunt for needles in haystacks and communicate disparate with strong communication skills to allow efficient internal and external collaborations.

Much effort, both technical and financial has been made to support the early drug discovery workflows. High throughput experimental techniques, lab automation, and computational prediction tools hold big promises and are still actively developed in drug research programs. The continued development of which still suggests there is still much room for improvement.

In this panel discussion, we discuss:

(1) the main challenges medicinal chemists still face in their research as of today,

(2) among those challenges, which ones would deserve more attention to improve the efficiency of drug discovery research programs.

(3) Overall, what would a happy medicinal chemist look like within the next 10 years?

Keynote Speaker: Derek Lowe (20 minutes)

Panelists (20 minutes each)

* Anthony Donofrio (Merck)

* Peter Pschmidtke (Discngine/Novartis Collaboration)

* Conner Colley (MIT Chemical Engineering)

Followed by a moderated Q/A session.

Tuesday, July 14, 2020

BAGIM (Virtual) Event: Clayton Springer

Join us on July 28th at 6:00 pm for a seminar by Clayton Springer entitled “Self Driving Chemical Space Exploration.” Clayton Springer is in the Computer-Aided Drug Design group at Novartis.

He graduated with a B.S. in Chemistry and a minor in Math from Penn State, where he was introduced to applying computers to chemistry in Kennie Merz’s group. From there, he earned his PhD with Martin Head-Gordon at UC Berkeley in the Chemistry department. He did his postdocs at Sandia National Labs with Diana Roe and at UCSF with Fred Cohen.

At Novartis, he provides medicinal chemistry-based support of projects and technology development. His disclosed project work includes: Selective Estrogen Degraders (SERD); BCL-2; hERG inhibition; Matrix Metalloproteinase; Protein Kinase D; and Aldosterone Synthase.

On the technology side, he has been involved in Molecule Matched Pairs (MMP) and machine learning on ADMET -- specifically hERG, Human Serum Albumin (HSA), and Volume of Distribution. He was part of the team that developed Novartis’ internal med-chem productivity tool called Focus.

His recent activities in library design led him to the topic for today’s talk on “Self driving chemical space exploration.”

You must register in advance for this Zoom meeting via Meetup. All registered members will receive the link in advance.

Friday, May 1, 2020

DE Shaw Simulation Data and Github link- COVID19 Panel

Dear BAGIM Community,
Last night, we held our first-ever virtual event. Panelist, Michael LeVine, from DE Shaw Research, has forwarded links to access their simulation data, Github, and information to access Anton2.

Our simulation data and software can be found here:
Our Github, which includes our software msys, viparr, and our force fields, is here:
Information for getting access to the Anton2 machine at the Pittsburgh Supercomputing Center can be found here:
If there are any questions, please email us at sarscov2@DEShawResearch.com!
Thanks again everyone!

Thursday, April 30, 2020


We had a very successful virtual event tonight discussing the computational impact on the COVID-19 efforts. After this event, we received this letter from Rommie Amaro of UCSD (a panelist at tonight's event and former mentor of mine).

Dear BAGIM Community,

I want to bring your attention to a new resource to help coordinate the CADD community on opportunities to contribute to COVID-19 research on potential new therapies and COVID-19 biology:

The COVID-19 Molecular Structure and Therapeutics Hub [http://covid.molssi.org] is an open data repository where biomolecular scientists can gather and disseminate critical information on the community’s efforts in targeting COVID-19. The main goals of the Hub are to provide a central resource for researchers in CADD, drug discovery, molecular modeling, and molecular simulation to not only share their efforts, but benefit from the intermediate or end products of others contributing to various aspects of COVID-19 research and drug discovery.

The COVID-19 Molecular Hub organizes

Therapeutic opportunities (such as key target proteins and binding sites)
Structural data (highlighting structures particularly suited to particular therapeutic opportunities)
Structural models (suitable for CADD or biomolecular modeling purposes, often correcting significant modeling issues in the original structural data)
Simulation data and ensembles associated with these structures and models
Molecules and bioassay data of note (such as known inhibitors, inhibitors of related proteins, and molecules with available bioassay data)
We are calling on the CADD and biomolecular modeling communities currently working on COVID-19 (or interested in doing so) to help by contributing new information about structures, models, simulations, and molecules, or joining in on data curation and review teams in areas of their expertise. New contributions can easily be made via pull requests to the GitHub repository by following the instructions on the Contributions page.
BAGIM will have so much to offer!  Please join us in this effort.
Best wishes,
Rommie Amaro