ChemBank

registration

Registered users of ChemBank have full access to all features of the ChemBank website. Users logged in as "Guest" (available only on the public website) are not allowed to export data. Registered users will have access to additional features in future releases.

User Guide

Users of ChemBank are encouraged to explore the new ChemBank User Guide. Whereas this ChemBank help page consists of basic information about individual ChemBank features, the User Guide provides several interactive scenarios designed to help users combine ChemBank features to answer research questions.

ChemBank Entities

ChemBank stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the Broad Chemical Biology screening center in collaborations involving biomedical researchers worldwide. The following figure shows the main ChemBank entities and the navigation paths among them. Generally, when ChemBank displays an entity, it provides links to any associated entities of interest. For example, the 'Molecule Display' page includes links to the associated sample, plate, assay, and project.

Links (either in the left-hand navigation bar or on the ChemBank 'Welcome' page) provide several ways of searching for molecules, assays, and proteins. To find a project, click 'view projects'. To find a plate or a well on a plate, under 'Find Small Molecules', click 'by user list'. At present, this is the only mechanism provided by ChemBank for locating samples directly; it is an indirect mechanism, in that searching using plate and/or well returns a list of molecules, not a list of samples, per se.

Molecule Search

ChemBank contains structures for over 500,000 unique molecular entities. Each structure is stored with associated chemical information, including name(s), source information, molecular descriptors, assay results, and activity-related terms from the scientific literature.

Often, researchers come to ChemBank looking for information about a particular molecule. ChemBank provides a number of search options to help you find that molecule. Most commonly, you would search by one of the following:

molecule search builder

The molecule search builder allows you to build complex queries of ChemBank to find molecular structures associated with chemical names, source information, calculated molecular descriptors, measured assay-result values, and (for a subset of molecules) biological activities curated from the scientific literature. Each of these fields can be used to specify limits that restrict the set of molecules returned by a query. Each information type is specified through a different interface, designed to meet the specific needs of that information type. More than one search criterion can be specified by using the 'Select a criterion to add' drop-down box after the first criterion is defined. The current search function joins query conditions with Boolean AND operations, returning the intersection of the specified compound subsets. Future releases will support additional Boolean operators for search building.

search by assay

Small molecules within ChemBank can be searched according to results from all of the assays performed at our facility. There are numerous ways to search assay results. Under 'Find Small Molecules', click 'by assay' (either in the left-hand navigation bar or on the ChemBank 'Welcome' page).

The 'Search by assay' page shows the list of projects and assays arranged hierarchically by project. To display the assays associated with a project, click the plus sign [+] next to the project name. To close this list of assays, click the minus sign [-] next to the project name. Choose projects and assays of interest by clicking the checkbox next to the project or assay name. Alternatively, choose all projects and assays by clicking 'Check all'.

To filter the projects/assays appearing in the project and assay list, select appropriate filters for your search using the series of drop-down boxes and textboxes in the 'Filter' panel to the right of the project and assay list. As you change the filter settings, the project and assay list are dynamically repopulated with matching projects and assays. Note that to update the list after changing the project description filter, you must press the TAB key.

Once you have chosen a set of projects and/or assays, use the drop-down menu in the sentence below the project and assay list to select the molecules from these screens that you want returned. The options are: 'all screened molecules', 'molecules that were standard hits' (the default), or 'molecules satisfying the condition'. If you select the last option, a series of drop-down lists allow you to select a condition (the raw value or any calculated value) and a cutoff or range of interest.

Search by assay also allows you to limit results to a particular replicate. Most screens in ChemBank are done in replicate and this option allows you to choose among the replicates. Generally you should leave this set to 'any replicate'.

search using descriptors

ChemBank allows you to restrict search results to compounds falling within a specified range of one or more molecular descriptors. Molecular descriptors are values calculated for each small molecule in ChemBank based on the electronic representation of chemical structure. Currently, 35 different descriptors are available for filtering on an individual basis. Each descriptor falls into one of three main types:

  • Counts provide the number of occurrences of a specified element or structural feature in a molecule.
  • Indices are calculated metrics designed to describe molecular complexity.
  • Properties are estimates of physical characteristics of a molecule; while properties are calculated based on accepted models, they do not represent actual measured values.

In future releases, ChemBank will provide the capability to find molecules that are described by sets of descriptors that characterize molecular shape or electrotopological state.

Under 'Find Small Molecules', click 'using descriptors'. On the 'Search using descriptors' page, the properties available to compound search are listed in the 'Descriptors' list box. Selecting a descriptor displays a brief description of the descriptor and how it is calculated, as well as the current minimum ('Min'), maximum ('Max'), and mean values for that descriptor across all structures in ChemBank. After selecting a descriptor for use as a search criterion, choose a range operator from the set of operators displayed in the drop-down box and enter a limit value. Currently supported range operators include:

  • = – exact equality, including all relevant decimal places; use of this operator is best restricted to descriptors that carry integer values
  • < – less than
  • > – greater than
  • <= – less than or equal to
  • >= – greater than or equal to
  • between – in a range of values bounded by inclusive lower and upper limits; this option provides two value-entry boxes, with the lower limit specified in the left box and the upper limit specified in the right box

Optional filtering criteria are provided in the 'Filter' portion of the dialog box to help navigate the list of available descriptors:

  • The 'Type' drop-down box can be used to filter the set of displayed descriptors to one of the three types described above.
  • The 'Subtype' drop-down box allows further subdivision of the descriptor list. It will become increasingly useful as the number of descriptors available to the search capability is increased in future ChemBank releases.
  • The 'Source' drop-down box filters the displayed descriptor set by the calculation package used to generate the values in the database. Two such packages are currently used:
    • Pipeline Pilot (SciTegic, 9665 Chesapeake Drive, San Diego, CA 92123; (858) 279-8800)
    • QsarIS (Elsevier MDL, 2440 Camino Ramon, San Ramon, CA, 94583; (925) 543-5400)
    Additional descriptor calculation packages will be supported in future releases.
  • The 'Name' field can be used to filter the descriptors by name. Enter a word (string) and press the TAB key. The displayed descriptor set lists only descriptors with names that include that word.

Presently, it is not possible to select two different descriptors simultaneously. To apply cumulative search criteria, you must specify the first descriptor criterion, add it to a search, and then specify the second criterion by revisiting the 'Search using descriptors' interface.

search by molecule name

ChemBank allows you to search by the common name of a molecule (e.g., FK506 or fluorouracil), most often associated with bioactive small molecules. Many bioactive small molecules have several common names (synonyms). Under 'Find Small Molecules', click 'by molecule name' to search all synonyms that are stored in ChemBank. In the absence of wildcards, the search is for an exact match only; use wildcards to search for substrings. Thus, a search for "arginine" does not produce a result, since the common name is "L-arginine", but a search for "*arginine" finds L-arginine (and several other arginine derivatives).

search by function

ChemBank compounds can be searched according to their biological function. Biological terms currently displayed within ChemBank are contained within three separate categories, called "controlled vocabularies". The vocabularies used to search by function are:

  • Therapeutic Indication – a disease treated using the compound (e.g., Alzheimer disease); these are MeSH (Medical Subject Heading) terms.
  • Therapeutic Use – a classification of biological action for the compound (e.g., anti-neoplastic); this is an internally derived vocabulary.
  • Biological Process – a biological process within cells or organisms affected by the compound (e.g., DNA replication); these terms are derived from the Gene Ontology project's Biological Process vocabulary.

To search for compounds that are associated with a term, under 'Find Small Molecules', click 'by function'. On the 'Search by function' page, first use the 'Ontology' drop-down menu to select the vocabulary you wish to search. Then, enter a specific term into the 'Term' textbox and click the 'search now' button. Your search returns a list of compounds that are associated with the term you selected. Click a ChemBankID in the resulting list to display the 'Molecule Display' page for more details.

Terms are case sensitive. You can enter a specific term by typing it into the 'Term' textbox. Alternatively, you can click the looking-glass icon to the right of the 'Term' textbox and browse through the vocabularies for terms. Click a plus [+] sign to expand that section of the vocabulary so you can choose a search term.

search by similarity

ChemBank allows the retrieval of compounds structurally similar to a target structure that you specify. The interface for definition of the target structure is the JME Molecular Editor, developed by Peter Ertl. Target structures are specified by drawing directly in the Editor or by entering a SMILES string in the textbox below the editor. The Editor is intended to be used without any special documentation or training, but there are some tricks that can help you to work with it more efficiently.

Similarity searches use Daylight fingerprints, which are binary representations of the chemical features in a molecule. A mathematical distance is computed between the fingerprint of the target molecule and the fingerprint of every other molecule in the database. Three methods (i.e., three different mathematical distance metrics) are provided for calculating these distances. All three methods produce values bounded by 0 and 1, and these values are used to rank molecules for their similarity to the target. As part of the search criteria, you specify a bound (threshold) for minimum similarity or distance to the target molecule. The three distance metrics compute similarity as follows:

  • Tanimoto similarity – proportion of total features present that are shared between target and query; the total number of features present differs for each pairwise comparison (threshold is specified as a lower bound).
  • Euclidian distance – proportion of total features defined that are different between target and query; the total number of features defined is the same for each pairwise comparison, but most features are populated with values of zero (threshold is specified as an uppper bound).
  • Tversky similarity – proportion of weighted total features present that are shared between target and query; the total number of features present differ for each pairwise comparison, and features are weighted to provide greater contributions for features present in the target or query (threshold is specified as a lower bound) – the Tversky similarity search is an advanced topic; refer to Daylight documentation for a full description.

search by chemist

ChemBank allows you to retrieve compounds submitted to the database from a specified source. Sources can be individual chemists, commercial vendors, or other sources of compounds or molecular structures. You can enter a name by typing it into the 'Name' box, optionally using an asterisk (*) wildcard to represent 0 or more characters. Alternatively, you can click the looking-glass icon to the right of the 'Name' box and browse through the names to select one.

For synthetic chemists, the last name of a chemist is listed, followed by the first initial separated by a hyphen, followed by the chemist's institution separated by a slash (e.g., Mazitschek-R/Harvard). For commercial vendors, some sources represent named subsets of compounds, and these names are separated from the vendor name using a slash (e.g., Chembridge/DiverSetE). In addition to chemists and commercial vendors, three additional values for Source are listed: 'Bioactives Collection', 'NationalInstitutesOfHealth', and 'NaturalProduct'. Each of these values for Source is a "catch-all" that groups compounds whose actual sources are widely variable, unavailable, or both. In future ChemBank releases, information on members of these groups will be elaborated to provide more information about the actual source of the compound (e.g., the species of origin of a natural product).

search by substructure

ChemBank allows the retrieval of compounds containing a target substructure that you specify. The interface for definition of the target substructure is the JME Molecular Editor, developed by Peter Ertl. Target substructures are specified by drawing directly in the Editor or by entering a SMILES string or SMARTS query in the textbox below the editor. The Editor is intended to be used without any special documentation or training, but there are some tricks that can help you to work with it more efficiently.

Atom or bond substitutions can be specified with the Editor's QRY button, or using appropriate syntax in SMARTS queries. Chiral centers can be specified using wedge bonds in the Editor, or using appropriate syntax in SMILES strings or SMARTS queries. When no stereochemistry is specified for a chiral center, both enantiomers are found in the search. Geometric isomers must be specified using the stereo tool in the Editor (click the wedge-bond button, then click the isomeric double bond to "lock" the configuration of attached atoms) or using appropriate syntax in SMILES and SMARTS strings. When E or Z stereochemistry is not explicitly assigned, both configurations are found in the search.

You may also paste in SMILES strings from ChemDraw. To accomplish this, in ChemDraw, select the structure and then, on the 'Edit' menu, select the 'Copy As' sub-menu and select 'SMILES'. In ChemBank, you can then paste this SMILES text in the textbox below the JME editor on the 'Search by substructure' page.

After defining the target substructure, click the 'search now' or 'add to search' button.

search by user list

To search for a list of compounds, under 'Find small molecules', click 'by user list'. The 'Search by user list' page allows you to search for lists of compounds. You can enter a list of molecule names, SMILES strings, plate numbers, plate and well numbers, or ChemBankIDs. After entering the identifiers, click the 'search now' or 'add to search' button.

molecule search results

After executing a compound search query, ChemBank displays the results as a list that includes compound structure, ChemBankID, and values for any data fields used to restrict the search. Ten compounds are returned per page, with the total number of compounds found, and the total number of pages to display, indicated at the top of each results page. For substructure searches, the list is sorted by the length of the SMILES string for each structure in ascending order. For similarity searches, the list is sorted by the similarity score in ascending order for the Euclidian distance metric and in descending order for the Tanimoto and Tversky metrics. Otherwise, the list is initially sorted by ChemBank identification number.

At the top of the page is an English description of the criteria used to generate the search result is displayed. If the search has returned too many or too few compounds, you can recall and modify the original search criteria by clicking '[modify]'.

Three action links appear below the description of your search:

  • '[view multi-assay results heatmap]' displays a heatmap of these molecules and selected assays in which they scored as "hits". Clicking this link displays the 'Feature selection' page, which you use to select the assays to display in the heatmap.
  • '[export as text]' exports the data returned from the search, including the fields displayed on the results page, in a tab-delimited text format (with structures encoded as SMILES strings).
  • '[export as SDF]' exports the data returned from each search, including the fields displayed on the results page, in SDF format (with structures encoded as two-dimensional coordinates plus a connection table).

Search criteria are lost once the definition of a new search is initiated by clicking any of the 'Find Small Molecules' or 'Find Assays' options (either in the left-hand navigation bar or on the ChemBank 'Welcome' page).

Cross-sectional Analysis

Version 2.1.2 of ChemBank adds a new tool for cross-sectional analysis. Using this tool, you can explore the performance of sets of molecules across all screens contained within ChemBank. The first step in using this tool is to create a set of molecules of interest. Molecule sets are defined by any executed molecule search created by using the normal molecule-search mechanisms within ChemBank. For instance, if you are interested in exploring the performance of a particular structural motif, create a molecule set by executing a search by substructure for that motif. Another common use would be to pick molecules that scored as "hits" in an assay or project of interest by executing a search by assay. You can also create molecule sets from the assay histogram and assay scatterplot visualizations by selecting regions of interest.

Typically, you begin cross-sectional analysis by executing a molecule search. On the molecule search results page, you click '[view multi-assay result heatmap]' to display the 'Feature selection' page, which allows you to pick the assays whose performance you want explore for this molecule set. Generally, you click 'Check all' to pick all assays. The heatmap visualization automatically removes assays that were not exposed to any of the molecules in your molecule set.

select heatmap visualization features

The 'Feature selection' page allows you to choose the features (i.e., assays) to view in the multi-assay heatmap visualization. These features become the columns in the heatmap. The rows of the heatmap are all of the screened wells containing molecules from the molecule search used to reach this page. Saving the link for this page allows you to process a different selection of assays as features without rerunning the molecule search. Selecting projects and assays to use as features works identically to the way these selections are made in the 'Search by assay' page. After selecting the features of interest, click the 'Generate visualization' button to proceed to the heatmap visualization page.

cross-sectional analysis heatmap visualization

A heatmap java applet (provided by the GenePattern project) is used to visualize molecule features. The heatmap contains a feature column for each assay selected in the select visualization features page. These columns are sorted by the assay name. The heatmap contains a row for every sample well from your molecule search that was screened in any of the experiments chosen as features. For example, if a molecule appears in multiple wells for dosing purposes, each well appears as a separate row. For each sample well, the heatmap displays the molecule name (or well position if name is unknown), source and SMILES string.

Double-click a column feature name to display the 'View Assay' page for the assay represented by that column. Double-click a row's molecule name to display the 'Molecule Display' page for the molecule represented by that row. You can re-sort the rows in the heatmap by feature (assay), molecule name, or description columns by clicking on the button below an assay column name, the 'compound' column heading, or the 'description' column heading, respectively. Hover the mouse pointer over a particular well to display the detailed information for that well.

Note: The values (and colors) displayed in the visualization heatmap are CompositeZ scores for the well and are capped at +/- 8.53 (the cutoff for standard "hit" determination). The actual value for the well can be found on the 'Molecule Display' page for that well.

Note: The information used to generate the heatmap is cached on the ChemBank server. If you are the first user to include a particular well in the heatmap visualization, you may see some delay in the applet. As rows are received from the server, the applet begins to draw them. If information has not yet been received for a particular row, you see only the row number under 'Compound Name' for the row. Until all rows have been received, you cannot use any of the interactive (re-sorting, etc.) features of the applet. In rare cases you may see a dialog box indicating that the cache is being refreshed. If so, wait 5 minutes or so before refreshing the heatmap visualization.

Assay Search

ChemBank contains data from many high-throughput screening (HTS) and small-molecule microarray (SMM) projects. Currently these datasets represent primary small-molecule screens, but in the future will also include assay development, secondary (follow-up) assays for interesting "hits" from primary screens, and screens involving other perturbagens (e.g., RNAi screens, among others). These datasets can be searched by attributes within a class of assays (small-molecule, RNAi, protein) or by the biologist who performed the assay.

high-throughput screen search

To find all high-throughput screen (HTS) assays performed at the Broad Chemical Biology screening center, under 'Find Assays', click 'HTS' (either in the left-hand navigation bar or on the ChemBank 'Welcome' page).

small-molecule microarray search

To find all small-molecule microarray (SMM) assays performed at the Broad Chemical Biology screening center, under 'Find Assays', click 'SMM' (either in the left-hand navigation bar or on the ChemBank 'Welcome' page).

assay search by screener

Assays can be selected according to the screener who performed the assay. Under 'Find Assays', click 'by screener'. On the 'Find assays by biologist' page, select a name from the drop-down menu and click the 'Search' button.

advanced assay search

To find particular small-molecule assays performed at the Broad Chemical Biology (BCB) screening center, under 'Find Assays', click 'advanced assay search' (either in the left-hand navigation bar or on the ChemBank 'Welcome' page). On the 'Find small-molecule assays' page, choose your search criteria:

  • Project – grouping of assays, usually by a single screener, under a single biological motivation.
  • Screener – name of the primary person who conducted the screen.
  • Organization – institution to which the primary screener belongs.
  • AssayType – type of the assay, high-throughput screen, small-molecule microarray, etc.
  • Species – biological species tested in a particular assay.
  • Project Description Contains – words (strings) that appear in the project description.

When you have entered all search criteria, click the 'Search' button.

assay search results

ChemBank displays the results of an assay search as a list of assays and assay details in a tabular format. This table includes information on:

  • Project – grouping of assays, usually by a single screener, under a single biological motivation.
  • ProjectDescription – the biological description (i.e., motivation) of the project.
  • Assay Name – name of the assay; click to view more detail about the assay.
  • Assay Description – the description (i.e., technical details) of the assay.
  • Species – biological species tested in a particular assay.
  • Screener – name of the primary person who conducted the screen.
  • Organization – institution to which the primary screener belongs.

To export these data as a tab-delimited text file (that can be opened with a tool such as Microsoft Excel), click '[export as text]' (at the top of the search results page).

Protein Search

ChemBank serves as a repository for protein information with links out to external databases such as EntrezGene and the RCSB Protein Data Bank.

find proteins by name or id

Under 'Find Proteins', click 'by name or id' to query for proteins across several different fields at the same time. ChemBank assembles the query as follows:

  • Official Symbol is like your search string (case-insensitive)
    OR
  • Name is like your search string (case-insensitive)
    OR
  • at least one Synonym is like your search string (case-insensitive)
    OR
  • Description is like your search string (case-insensitive)
    OR
  • Entrez GeneID equals your search string
    OR
  • Protein DataBank ID equals your search string

protein advanced search

Under 'Find Proteins', click 'advanced search' to find proteins using precise criteria. On the 'Protein Advanced Search' page, each field into which you enter information is included in the search. Choosing multiple fields narrows your result set (the criteria are joined by Boolean AND operators). If you leave a field blank, the search result includes proteins with any value for that field.

protein search results

After executing a protein search, ChemBank displays the search results in a tabular format that includes the following information:

  • Official Symbol – a locally controlled vocabulary term that represents this protein.
  • Name – the name of the protein; click to view more detail about the protein.
  • Species – the species of organism that produces the protein.
  • Entrez Gene Id – the identifier in the Entrez Gene database of the gene that represents this protein.
  • Status – the curation status of this protein within ChemBank; only visible to curators.

The number of proteins found by the search is displayed at the top of the page. Below that number is a text description of your search. If the search has returned too many or too few proteins, you can recall and modify the original search criteria by clicking '[modify]'. Click a protein name to display the 'Protein' page for the protein.

Views

ChemBank views of various datasets allow inspection of high-throughput screening (HTS) data in substantial detail. At present, ChemBank provides histogram and scatter-plot views of HTS data. Future releases of ChemBank will allow increasingly sophisticated data visualizations.

view molecule page

On the results page for a molecule search, each ChemBankID links to a 'Molecule Display' page, which provides a detailed view of that individual molecule. This view includes a larger depiction of the structure, an IUPAC name for the molecule, the accumulated set of common names (synonyms) associated with the unique molecule, and the canonical SMILES and InChI representations of the structure. Canonicalized SMILES strings (Simplified Molecular Input Line Entry Specification) are generated using Daylight Molecular Toolkit software (Daylight Chemical Information Systems, Aliso Viejo, CA; (949) 831-9990). IUPAC names and InChI strings (International Chemical Identifier) are generated using ACD/Labs NameBatch software (Advanced Chemistry Development, Inc., Toronto, ON; (416) 368-3435).

The 'Molecule Display' page also provides a subset of chemical properties. Click '[view all descriptors]' to display a larger set of descriptors as a list sorted by descriptor Type and Subtype. Presently, all descriptors in ChemBank are generated using either Pipeline Pilot (SciTegic, San Diego, CA; (858) 279-8800) or QsarIS (Elsevier MDL, San Ramon, CA; (925) 543-5400).

Biochemical interactions, therapeutic indications, and therapeutic uses, all curated from the scientific literature, are listed when available.

Screening experiment instances within a project/assay are listed if a compound has associated assay data registered in ChemBank. Also listed on the 'Molecule Display' page is a list of screening "instances" to which the small molecule has been exposed, providing links to project and assay descriptions, and alternatively to graphical depictions of assay results for any Assay plate involving the small molecule. The list of assay plates is followed by a list of compound source plates containing the molecule.

autofluorescence methods

Measurement protocol. Compounds are incubated at a nominal concentration of 25µM in phosphate buffered saline (PBS) for 5 hours. Fluorescence excitation, emission, and absorbance measurements are performed consecutively on a Varioskan (Thermo Scientific; Waltham, MA) spectral-scanning monochromator to characterize each compound. Absorbance measurements (quantity 53) are made every 10nm. Fluorescence measurements (quantity 205) are performed with either the excitation or emission monochromator held fixed at increments of 50nm, and the complementary monochromator scanned at 10-nm resolution. The bandpass of the excitation monochromator is 5nm, and the bandpass of the emission monochromator is 12 nm. Measurements are performed at a separation of at least 20nm to eliminate stray light.

Data analysis. Varioskan outputs are submitted to the standard normalization and scoring procedure for ChemBank which uses a trimmed distribution of mock-treatment wells (dimethylsulfoxide in PBS) at each measurement coordinate to calculate background-subtracted values (BSubValue) and corresponding ChemBank Z-Scores.

Autofluorescence. Compounds having one or more ChemBank Z-Score measurements of at least 8.53 and having a continuous region of the spectrum for which the BSubValue is at least 1 Relative Fluorescence Unit (RFU) on a Varioskan are indicated to be autofluorescent.

Contour plot. The locations of actual measurements are plotted as white crosshatches. The location of the measurements shown as the excitation spectrum are plotted as blue crosshatches. The location of the measurements shown as the emission spectrum are plotted as red crosshatches. Heights of the contour lines as well as the color scale are displayed on a logarithmic axis. Values below 0.3 RFU are below the level of noise from which the human eye can distinguish a peak, and are plotted as dark blue. Values above 0.3 RFU are plotted on a scale of log10(BSubValue + 1) such that at a signal of 1 RFU (approximately 3 times the noise level) the autofluorescence is interpreted as real and is plotted as light pink. Autofluorescence above 1 RFU moves progressively through the hues of a HSV color space to saturate at 150 RFU (light blue) which is equivalent to 0.4µM fluorescein at pH 7.4. Intensity values on the contour surface are interpolated by a modified least-squares fit between measured points using GridFit by John D'Errico utilizing MATLAB (The Mathworks; Natick, MA) to represent the shape of the spectra without being unduly affected by individual points of large magnitude. High variability in the instrument at long wavelengths requires a filter to consider only values in the range of 300nm to 650nm excitation and 370nm to 700nm emission.

Bar plot. A binary representation of the wavelengths for which any BSubValue is at least 1 RFU. The colors are blue for excitation, red for emission, and purple for both excitation and emission.

Fluorescence spectra. The shape of the excitation spectra and emission spectra based on fluorescence measurements are shown scaled to the range of 0 to 1. Choice of the fixed wavelength depends on the location of the largest fluorescence peak.

Absorbance spectrum. The background-subtracted absorbance is shown as a percentage of the maximum absorbance for each compound for the purpose of displaying the shape of the spectrum. The maximum background-subtracted value of absorbance for each compound is included in the upper right corner of the plot. Background subtraction minimizes the contribution of the polystyrene 384-well plate to the absorbance spectrum.

assay histogram

From the 'View Assay' page, click '[view histogram]' to display the 'Assay Histogram' page. This histogram view displays the distributions of both mock-treatment and compound-treatment wells from a selected assay. The vertical axis represents the relative frequency of occurrence (e.g., number of occurrences) and cannot be changed. The horizontal axis represents the value type selected from the 'Data type' drop-down list, which can be set to background-subtracted, dimensionless Z-score, signal-to-noise ratio, reproducibility value, or composite Z-score. These values (as well as standard "hit") are defined in the screening data section.

The 'Sample type' drop-down box includes options to change the distributions displayed in the histogram. For example, the mock-treatment distribution can be displayed alone, without the compound-treatment distribution. A range of values can be specified in the box to the left of the histogram, either by entering values or by dragging over a desired range of the histogram itself. A list of molecules in this range can be viewed by clicking '[view molecules in range as list]', and the histogram can be regenerated with new limits by clicking '[resample histogram using this range]'. Click '[reset range]' to reset the horizontal axis.

assay scatter plot

From the 'View Assay' page, click '[view scatterplot]' to display the 'Assay Scatter Plot' page. This scatter-plot view represents the values of both mock-treatment and compound-treatment wells from a selected assay. The horizontal axis represents values for the first replicate ("A"), and the vertical axis represents values for the second replicate ("B"). Values represented in the scatter plot can be changed using the 'Data type' drop-down box, which can be set to background-subtracted or dimensionless Z-score. These values are defined in the screening data section. A range of values can be specified in the box to the left of the scatter plot, either by entering values or by dragging over a desired range of the scatter plot itself. A list of molecules in this range can be viewed by clicking '[view molecules in range as list]', and the scatter plot can be regenerated with new limits by clicking '[resample histogram using this range]'. Click '[reset range]' to reset the horizontal and vertical axis.

The 'Sample type' drop-down box includes options to change the distributions displayed in the scatter plot. For example, the mock-treatment distribution can be displayed alone, without the compound-treatment distribution.

screening data

ChemBank stores the raw value associated with each screened well along with a number of calculated values. The calculated values and their definitions are:

  • Background-subtracted values – background-subtracted raw values, using a per-plate measure of central tendency as the background.
  • Dimensionless Z-score values – background-subtracted values scaled by the variance of values across all plates in a single replicate of an assay.
  • Signal-to-noise ratios – (unsigned) ratios of the length of the "replicate vector" [ZScoreA, ZScoreB] to the radius of a (random) noise envelope corresponding to P < 0.005 (Z-score values > 1.28); a dimensionless value, with <1 indicating "within" the noise and >1 indicating "beyond" the noise.
  • Reproducibility values – correlation cosine between the "replicate vector" [ZScoreA, ZScoreB] and the vector [1, 1] representing perfect reproducibility.
  • Composite Z-score values – vector projection of the "replicate vector" [ZScoreA, ZScoreB] onto a line (parametrized by the vector [1, 1]) representing perfect reproducibility; this representation is the default for the histogram view.
  • Standard hit – categorical outcome associated with the Boolean expression |CompositeZ| > 8.53 AND |Reproducibility| > 0.99 for most assay types in ChemBank.

Further information about the standard analysis model can be found in the ChemBank paper in Nucleic Acids Research (Seiler et al., 2007).


view protein

On the results page for a protein search, each protein name links to a 'Protein' page, which provides a more detailed view of an individual protein. This view includes the following information:

  • Official Symbol – a locally controlled vocabulary term that represents this protein.
  • Name – the name of the protein.
  • Species – the species of organism that produces the protein.
  • Entrez Gene Id – the identifier in the Entrez Gene database of the gene that represents this protein; click to open a new browser window containing the corresponding Entrez Gene entry.
  • NCBI RefSeq – the identifier in the National Center for Biotechnology Information (NCBI) Reference Sequence database (http://www.ncbi.nlm.nih.gov/RefSeq/); this database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins.
  • GenBank Accession # – the identifier assigned to the entire sequence record for this protein in the GenBank database.
  • PDB Id – the identifier assigned to this protein in the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB); click to open a new browser window containing the corresponding PDB entry.
  • Description – description of any special characteristics of this protein.
  • Synonyms – alternate names by which this protein might be called.
  • Status – the curation status of this protein within ChemBank; only visible to curators.
  • Small-Molecule Microarray Experiment Positives – list of small-molecule microarray experiments that indicate binding events between this protein and the given small molecules.
  • Referencing Assays – a list of all experiments that involve this protein in some manner.
Data Management

view projects

A project in ChemBank is a collection of assays with a related goal. Click 'view projects' to display a list of each ChemBank project with a brief description, the name of the primary screener, the number of standard "hits" found for the project, and the number of discrete molecules screened in the project. (The last two fields are available only in the ChemBank data-sharing environment.) The displayed table can be sorted by any of these columns by clicking on the column label. Click a project name to display the 'View project' page for that project.

view project

Clicking a project name (for example, from the 'View Projects' page) displays the 'View Project' page, which includes the following details:

  • Name
  • Description
  • Motivation
  • Biological Process
  • Cellular Component
  • Miscellaneous Terms
  • Molecular Function
  • Phenotype
  • Therapeutic Indication
  • Therapeutic Use
  • Screener
  • Collaborators
  • Assays

From this page, you can download all screening data for the project.

view assay

Clicking an assay name (for example, from a 'View Project' page) displays the 'View Assay' page containing details of the assay. From this page, you can link either to additional details of the project (click the project name) or to the dataset for the assay displayed as a histogram (click '[view histogram]') or as a scatter plot (click '[view scatterplot]').

From this page, you can also download all screening data for the assay.

view plate

The ChemBank plate view represents 384 wells of an assay microtiter plate with each well colored by a binned value. Values are evenly distributed into five bins, with the bin interval calculated by subtracting the minimum value from the maximum value for the plate and dividing by five. The value represented in the plate view can be changed with the 'Data type' drop-down box, and may be set to background-subtracted, dimensionless Z-score, signal-to-noise ratio, reproducibility value, or composite Z-score. These values (as well as standard "hit") are defined in the screening data section.

As an alternative to the plate view, click 'View well values' to list all plate information in a tabular format. The checkbox 'Mark those wells that scored as standard hits' places a black dot in each well that passes the standard "hit" criterion. The checkbox 'Set color ranges using only wells that scored as standard hits' recalculates bins (using the method described above) using values passing the standard "hit" criterion. Wells outlined in red are mock-treatment (DMSO) control wells. Click a well to display the 'Molecule Display' page corresponding to that well, if the molecule record exists.


Assay Search

If you are logged in to ChemBank, you can download all of the screening data for a particular project or a particular assay by clicking the '[download data]' link on the 'View Project' page or the 'View Assay' page, respectively.

The download file is a tab-delimited text file containing the following columns of data:

  • Project
  • AssayName
  • Plate
  • Well
  • WellType
  • RawValue *
  • BSubValue *
  • ZScore *
  • CompositeZ
  • Reproducibility
  • ChemBankID
  • Virtual_Id
  • SMILES

* The RawValue, BSubValue, and ZScore columns repeat according to the number of replicates, with one set of columns per replicate. The total number of columns in the file is determined by the assay in the file with the greatest number of replicates. The meaning of these columns is defined in the screening data section. Further information about the standard analysis model can be found in the ChemBank paper in Nucleic Acids Research (Seiler et al., 2007).

For projects, the download file contains a header row and one row of data per assay, plate, and well tested in the project. For assays, the download file contains a header row and one row of data per plate and well tested in the assay.

Compound-treatment, mock-treatment, positive-control, and negative-control wells are identified in the download file via the WellType column. The ChemBankID, Virtual_ID, and SMILES values for mock-treatment and other control wells are empty. It is also possible for compound-treatment wells to have empty ChemBankID and SMILES values, which indicates either that the structure of the compound in the well is not known or that the compound is biological in nature and does not have a readily expressible molecular formula.

Protein Curation

ChemBank allows curators and administrators to create and maintain protein data.

Existing proteins can be edited by navigating to the 'Protein' page and clicking the '[edit]' link. This action displays the 'Edit Protein' page, which allows curators to change any of the data that ChemBank stores for the protein.

Any changes made are treated as Pending until an administrator approves them. Pending changes can only be seen from within the 'Edit Protein' page. This feature allows changes to be made over a period of time without having the general public see curation work in progress.

The general workflow is to edit the protein, make changes, and save them. When done making changes, mark the protein as ready for review. Administrators periodically check for proteins that need to be reviewed and either approve the changes or return them to curators for further revisions. Once the protein is marked approved, all of the changes are committed and become visible to the public.

The following diagram summarizes the curation process:

To create new proteins, use the 'Create Protein > new' option in the toolbar. New proteins are not visible to the public until they have been approved by an administrator.

Administrators can also delete proteins. Deletion of a protein that is linked to other entities in ChemBank (e.g. if it was used in an assay, a biochemical interaction, etc.), results in an error message. To delete one of these proteins, you must first remove all such dependent links.

ChemBank Tools

Generally available to ChemBank administrators only.

structure cleanup

Available to ChemBank administrators only.