Join us on Thursday, October 29, 2020
Cheminformatics Before "Cheminformatics"
The roots of modern cheminformatics, which I regard as the machine-based organization, analysis, and prediction of chemical structures and their properties, especially as applied to drug discovery, stretch back to the 1940s when punched cards were used to find structure-activity relationships and to search for chemical documentation. Computerization started in the 1950s, with large-scale projects taking place in the 1960s and commercial systems by the 1970s. The process required developing new forms of chemical nomenclature, new methods for data entry, and new algorithms like canonicalization and substructure matching. These were primarily funded by the cost savings for finding known chemical information. While it has long been known that similar molecules tend to have similar properties, it wasn't until the 1980s that there was widespread research into different ways to automate similarity and apply it to property prediction. In parallel, Moore's law made it possible to work with corporate-sized data sets in memory, allowing an orders-of-magnitude increase in effective performance and enabling the large-scale clustering and other early machine learning projects of the 1990s, and in turn helping the field transform from chemical documentation and chemical information management to cheminformatics.
My presentation will highlight some of the people, ideas, hardware, software, and economic factors from the 50 years of cheminformatics before the modern label was coined and hopefully show both how far we've come and how little things have changed.
Andrew Dalke started with molecular dynamics in 1992 as a summer programming job at Florida State University before starting graduate school at UIUC. He joined Klaus Schulten's group where he was one of the co-authors of VMD and NAMD. He was the full-time VMD developer for two years then went west to the Bay Area to develop modeling and bioinformatics software for the Molecular Applications Group then to the southwest to Santa Fe to develop cheminformatics software to support machine learning at Bioreason. At the same time he promoted Python and open source software in bioinformatics and cheminformatics, including co-founding the Biopython project and supporting the Open Bioinformatics Foundation. He has been an independent consultant since 1999. His income comes from custom software development for cheminformatics, Python training for cheminformatians, and sales of chemfp, a high-performance fingerprint similarity search package.
Andrew lives in Trollhättan, Sweden