This project has been funded in whole or in
part with Federal funds from the National Cancer Institute, National
Institutes of Health, under Contract No. N01-CO-12400. The content of
this publication does not necessarily reflect the views or policies of
the Department of Health and Human Services, nor does mention of trade
names, commercial products or organizations imply endorsement by the
U.S. Government.
Contains part
of
an electronic structure collection donated by Tudor
Oprea.This
set originates from a compilation of
~4.5 million compounds commercially available in August 2002.These were collected from
CDs offered by 10
vendors.The
structures were processed
into a standardized format using OpenEye’s FILTER software
(http://www.eyesopen.com/products/applications/filter.html).Compliance with
Lipinski’s Rule-of-5 was
enforced (no violations allowed), and several "undesirable" chemical
substructures were removed.A
low-value
for drug-like scores (scores > 0.2) was implemented in order to
further
remove chemicals that were very different from the then-accepted
medicinal
chemistry space.Approximately
~2.5
million compounds passed these filters, and these were subsequently
subjected
to diversity selection using D-optimal design and a 2D-based descriptor
system
(mostly topological indices, atom counts, and LogP-type descriptors),
in order
to realize the final collection of ~800K compound structures.
A description of the technical platform and data pipeline of ChemBank
can be found here.
The ChemBank team
Stuart Schreiber - Principal Investigator, ICG
Mike Foley - Director, Platform
Dave DeCaprio - Director, Informatics
Greg George - Group Leader, Software Development
Kathy Seiler - Head of Biological Data Management
Nurgees Banu Sulthan - Senior Software Engineer
Steve Brudz - Senior Software Engineer
Bob Brady - Senior Software Engineer
Dan Durkin - Software Engineer III
Josh Nichols - Software Engineer I
Paul Clemons - Institute Fellow & Head of Computational Science