About BAGIM

BAGIM is an active community of Boston area scientists bringing together people from diverse fields of modeling and informatics to impact life and health sciences. BAGIM strives to create a forum for great scientific discussions covering a wide range of topics including data management, visualization, computational chemistry, drug discovery, protein structure, molecular modeling, structure-based drug design, data mining, software tools, and the sharing of goals and experiences. Our community is made up of participants from academia, government, and industry whose goal is to engage in the discussion of science involving a synthesis of theory and technology. Discussions sponsored by BAGIM are targeted to the needs and interests of informatics scientists, computational chemists, medicinal chemists, and statisticians. BAGIM also provides opportunities for networking within these disciplines as well as an arena for the dissemination of information of specific interest to the membership.

Wednesday, November 4, 2020

Andrew Dalke: Cheminformatics before there was Cheminformatics

 Join us on Thursday, October 29, 2020

Cheminformatics Before "Cheminformatics"

The roots of modern cheminformatics, which I regard as the machine-based organization, analysis, and prediction of chemical structures and their properties, especially as applied to drug discovery, stretch back to the 1940s when punched cards were used to find structure-activity relationships and to search for chemical documentation. Computerization started in the 1950s, with large-scale projects taking place in the 1960s and commercial systems by the 1970s. The process required developing new forms of chemical nomenclature, new methods for data entry, and new algorithms like canonicalization and substructure matching. These were primarily funded by the cost savings for finding known chemical information. While it has long been known that similar molecules tend to have similar properties, it wasn't until the 1980s that there was widespread research into different ways to automate similarity and apply it to property prediction. In parallel, Moore's law made it possible to work with corporate-sized data sets in memory, allowing an orders-of-magnitude increase in effective performance and enabling the large-scale clustering and other early machine learning projects of the 1990s, and in turn helping the field transform from chemical documentation and chemical information management to cheminformatics.

My presentation will highlight some of the people, ideas, hardware, software, and economic factors from the 50 years of cheminformatics before the modern label was coined and hopefully show both how far we've come and how little things have changed.

Andrew Dalke started with molecular dynamics in 1992 as a summer programming job at Florida State University before starting graduate school at UIUC. He joined Klaus Schulten's group where he was one of the co-authors of VMD and NAMD. He was the full-time VMD developer for two years then went west to the Bay Area to develop modeling and bioinformatics software for the Molecular Applications Group then to the southwest to Santa Fe to develop cheminformatics software to support machine learning at Bioreason. At the same time he promoted Python and open source software in bioinformatics and cheminformatics, including co-founding the Biopython project and supporting the Open Bioinformatics Foundation. He has been an independent consultant since 1999. His income comes from custom software development for cheminformatics, Python training for cheminformatians, and sales of chemfp, a high-performance fingerprint similarity search package.

Andrew lives in Trollhättan, Sweden

No comments:

Post a Comment