ChemBank User Guide

Paul A. Clemons, Mary Pat Happ, Heidi Kuehn, Kathleen Petri Seiler
(last updated 09 January 2008)

 

ChemBank, which was created by the National Cancer Institute’s Initiative for Chemical Genetics, stores information on hundreds of thousands of small molecules and hundreds of biomedically relevant assays that have been performed at the Broad Chemical Biology screening center in collaborations involving biomedical researchers worldwide. The ChemBank web-based user interface makes it easy to retrieve this information and the ChemBank online help provides descriptions of the web pages and the data displayed.

This ChemBank User Guide provides a brief introduction to ChemBank followed by a series of scenarios that show how researchers might use ChemBank to research specific questions. Each scenario is an independent hands-on tutorial. The collection of scenarios introduces most of the features in ChemBank (version 2.1.3).

Note: ChemBank is regularly updated with new information. To consistently provide current information, this User Guide includes live links to ChemBank in place of screenshots. For example, this link displays the ChemBank home page: http://chembank.broad.harvard.edu/welcome.htm.

 

About ChemBank

 

Scenario 1: Make a hypothesis about the potential biological activity of a molecule

15 minutes

Scenario 2: Select screened compounds on which to pursue additional screening or follow-up chemistry

15 minutes

Scenario 3: Determine which small molecules may perturb biological pathways and processes

15 minutes

Scenario 4: Dissect small-molecule structure using assay profiles

20 minutes

Scenario 5: Identify structurally related small molecules with known biological functions

15 minutes

Scenario 6: Download data for further calculation in external applications

25 minutes

 

 

About ChemBank

Molecules and compounds

In ChemBank, an eight-digit ChemBankID is assigned to each unique molecule. For registration to ChemBank, standardized representations of new chemical structures are checked against the existing small-molecule collection in ChemBank and assigned an existing ChemBankID if they match an existing molecule or a new ChemBankID if they provide a unique new structure. Individual instances of molecules are called compounds and may be salt, hydrate or other forms for the unique molecule in its physical form. Compounds are distinguished by unique Plate/Well assignments. There can be several compound samples for any given ChemBankID (molecule).

Projects and assays

Assays to measure the biological impact of compounds are organized into screening projects. Each project is assigned a four-digit project identifier. A project comprises a group of assays that all assess the same general area of biology, but may differ from each other in details of the execution protocol, the date performed, the reagents employed, or the nature of the measurement (baseline and post-incubation measurements are considered two different assays, and the calculated difference between them is a third). An assay ID is the four-digit project identifier dot-separated from a four-digit assay index. Within each project, assays are uniquely and sequentially numbered beginning with assay 0001.

Hits and standard hits

An assay screens a collection of compounds. For each screened well, ChemBank stores the associated raw value and a number of calculated values. The calculated CompositeZ score value is the primary measure of whether a compound scored as active in an assay. In ChemBank, the term hit refers to a non-zero response based on a researcher’s subjective criteria. The term standard hit refers to a defined cutoff for the CompositeZ score and reproducibility based on the following objective criteria: |CompositeZ| > 8.53 AND |Reproducibility| > 0.99, where CompositeZ and Reproducibility are both calculated values stored by ChemBank. For more information about how ChemBank calculates these values, see Seiler et al. (2007).

‘Molecule Display’ page and heatmaps

In ChemBank, the ‘Molecule Display’ page is the primary source of information about a molecule. It includes the name, SMILES string, descriptors, molecular structure, activity-related terms (if annotated yet) from the scientific literature, and all sample sources for the molecule. It also lists every screening test instance of the molecule including the screening project, assay, plate, well, and resulting CompositeZ score. Click here to display the ‘Molecule Display’ page for the molecule with ChemBankID 1347770: http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=1347770.

Heatmaps are used to visualize the screening results (CompositeZ scores) for multiple compounds across multiple assays. In a multi-assay result heatmap, each row represents a compound (identified by plate/well and SMILES string) and each column represents an assay (identified by assay ID). The intersection of a row and a column represents the CompositeZ score for that compound in that assay. Dark blue represents the lowest CompositeZ scores and dark red represents the highest CompositeZ scores. Click here to view a ChemBank heatmap (it takes a minute or two to gather data and display the heatmap): http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5046996&featureSelectId=5046999.

Following are a few tips for using ChemBank heatmaps:

·         Click ‘View > Color Scheme Legend’ to display a legend for the colors. The colors in a heatmap always range from dark blue to dark red; however, the CompositeZ scores represented by those colors vary from one heatmap to another. Dark blue represents the lowest CompositeZ score in the heatmap and dark red the highest CompositeZ score in the heatmap.

·         Hover your cursor over a cell to display the associated compound name, assay ID and CompositeZ score.

·         Double-click an assay ID (column header) to display its details. Double-click a compound name (right side of the heatmap) to display its ‘Molecule Display’ page.

·         Heatmaps truncate CompositeZ scores at +/- 8.53 (the ChemBank cutoff for a standard hit). To see the precise CompositeZ scores, display the ‘Molecule Display’ page for a compound.

·         For a more readable display, sort the heatmap in one of three ways: click the box below an assay ID (column header) to sort compounds based on their CompositeZ scores in that assay; click the ‘Compound’ column heading (right side of the heatmap) to sort compounds by name and plate number; or click the ‘SMILES’ column heading (right side of the heatmap) to sort compounds by SMILES string. Sorting by SMILES string is a crude way of grouping compounds by structure similarity.

Troubleshooting

You can log into ChemBank as a guest or as a registered user. Guests can use the scenarios; however, you must be a registered user to download any data from ChemBank. If you are logged in as a guest and attempt to download data, ChemBank displays an error message and offers you the opportunity to register.

The links in this User Guide display pages in ChemBank. If you are logged into ChemBank as a registered user, ChemBank displays the requested page for you. Otherwise, ChemBank logs you in as a guest and then displays the requested page.

ChemBank supports most commonly used browsers. To test your browser for compatibility with ChemBank, click the ‘Browser Requirements’ link at the bottom of the ChemBank home page: http://chembank.broad.harvard.edu/welcome.htm. You may need to allow pop-up windows from the ChemBank site, depending on your browser.

It may take ChemBank a few minutes to complete a molecule search or draw a heatmap. Any single step in a scenario should complete in less than five minutes.

To report problems, click the ‘Report Problem’ link at the bottom of any ChemBank page. If you have questions not answered by this User Guide or the online help, contact the ChemBank team: chembank@broad.harvard.edu.

Scenario 1: Make a hypothesis about the potential biological activity of a molecule

Research Objective: A researcher has isolated or synthesized a compound (“Compound X”) and is looking for clues about the potential biological activity of the molecule. Because the molecule’s structure is known, the researcher has the SMILES string, structure or ChemBankID for the molecule. Using ChemBank, explore the biological activity of structurally similar molecules scoring as standard hits in assays. Explore the assays in which these molecules appear to generate hypotheses about potential biological activity of Compound X.

ChemBank Overview:

Tutorial: For this example, the SMILES string for Compound X interest is as follows: CCn1cc(C(=O)O)c(=O)c2c(N)c(F)c(N3CCNCC3)c(F)c12.

Find the molecule of interest.

1.      Under ‘Find Small Molecules’, click ‘by user list’.

ChemBank displays the ‘Search by user list’ page: http://chembank.broad.harvard.edu/chemistry/search/input/userList.htm.

2.      Enter the SMILES string shown above. Instead, if you know the ChemBankID, it can be entered here. Click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038336.

Tip: If a SMILES string search does not return a ChemBank molecule, under ‘Find Small Molecules’, click ‘by similarity’ and enter the SMILES string on the ‘Search by similarity’ page. Identify the molecule of interest among those returned by the similarity search.

Find molecules similar to the molecule of interest. For this example, focus on the molecule with ChemBankID 1347770.

3.      On the ‘Compound Search’ page containing the search results, click ChemBankID 1347770 to display the ‘Molecule Display’ page: http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=1347770.

4.      Click ‘[find similar molecules]’ (below the structure depiction) to search for related compound structures.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5033037.

Modify the search to find molecules that are similar to the molecule of interest AND that are scored as standard hits in any assay.

5.      Click ‘[modify]’ to modify the query.

ChemBank displays the ‘Molecule search builder’ page: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5033037.

6.      From the drop-down list labeled ‘Select a criterion to add’, select Assay and then click the ‘add’ button.

ChemBank displays the ‘Search by assay’ page: http://chembank.broad.harvard.edu/chemistry/search/input/assayResults.htm?id=5033037.

7.      Select all projects and assays by clicking ‘Check all’. By default, ChemBank finds molecules that scored as standard hits in the selected assays.

8.      Click the ‘add to search’ button to review the search criteria or the ‘search now’ button to begin the search.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5033040.

Use a heatmap to visualize the screening results for these molecules.

9.      Click ‘[view multi-assay result heatmap]’.

ChemBank displays the ‘Feature selection’ page, which prompts you to select the assays to display in the heatmap: http://chembank.broad.harvard.edu/chemistry/featureSelection.htm?molSearchId=5033040.

10.  Select all projects and assays by clicking ‘Check all’ and then click the ‘generate visualization’ button.

ChemBank displays a heatmap that shows the molecules that were on the search result page and the assays in which they were screened: http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5033040&featureSelectId=5033041.

A molecule can be used in multiple assays and/or multiple wells in an assay; therefore, the heatmap may contain more compounds than there are molecules in the search results.

11.  For a more readable heatmap, sort the compounds by name and plate number by clicking the ‘Compound’ column heading (on the right side of the heatmap).

Display the details of each compound, including activity-related terms (if any) from the scientific literature. For this example, focus on ciprofloxacin and norfloxacin.

12.  Double-click a compound name to display the ‘Molecule Display’ page. Notice that both ciprofloxacin and norfloxacin include the term ‘anti‑bacterial’ under the ‘Therapeutic Uses’ heading.

Tip: Your web browser may need to allow popup windows for the ChemBank site to see this result properly.

Having found compounds of interest, use the heatmap to explore assays in which those compounds scored as standard hits.

13.  Scroll the heatmap display scanning for the dark blue and dark red cells, which indicate the lowest and highest CompositeZ scores for the compounds of interest, ciprofloxacin and norfloxacin. Tip: Hover over a cell to display the associated compound name, assay name, and CompositeZ score.

14.  When you find an assay in which the compound scored as a standard hit, double-click the assay name (at the top of the column) to display information about the assay and its associated project. Examining the descriptions of these assays/projects may provide insight into the molecule of interest.

Ciprofloxacin and norfloxacin scored as standard hits in several assays. For example, both compounds scored as standard hits in project 907 (ciprofloxacin in assay 907.0134 and 907.0135 and norfloxacin in assay 907.0115) and ciprofloxacin scored as a standard hit in assay 1000.0052. Project 907, SolutionAutofluorescence, tested autofluorescence and provides little insight into the biological activities of the compounds.

Having found compounds with annotations of interest, search ChemBank to find all compounds that have those annotations.

15.  Under ‘Find Small Molecules’, click ‘by function’. On the ‘Search by function’ page: for Ontology, select ‘Therapeutic Use’; for Term, enter ‘anti-bacterial’; and select the ‘Include child term matches’ check box (the default). Click the ‘search now’ button to start the search.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5033546.

Tip: Ontological terms are case-sensitive. If you are unsure of a term, use the browse (magnifying glass) button to select it rather than simply typing it in.

16.  Modify the search to find only those molecules that scored as standard hits (see steps 5 through 8).

ChemBank displays a subset of the original search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5047922.

You have now found all molecules with an activity annotation of ‘anti-bacterial’ from the literature which scored as standard hits in an assay.

Tip: To generate a list of all projects and assays, under ‘Find Assays’, click ‘advanced assay search’. On the ‘Find small-molecule assays’ page, click the ‘Search’ button. ChemBank displays a list of all assays and their associated projects: http://chembank.broad.harvard.edu/assays/search/execute.htm. To save the list to a file, click ‘[export as text]’.

Further exploration: Further explore the assays to hypothesize about Compound X’s affect on biological systems. For example, explore what other molecules scored as hits in the assays. Do those hits provide insight into the potential biological activity of Compound X? The biological annotations can also be further investigated; for example, what other molecules also have these biological annotations? Do they have other documented biological activity that might be applicable to Compound X?

Scenario 2: Select screened compounds on which to pursue additional screening or follow-up chemistry

Research Objective: A researcher has observed that hundreds of compounds have been screened in a ChemBank project and would now like to select a small number of those compounds for their own additional screening or follow-up chemistry.

ChemBank Overview:

Tutorial: For this example, the project of interest is DihydroorotateDehydrogenase.

Find compounds that scored as hits in the project of interest.

1.      Click ‘view projects’ and then click DihydroorotateDehydrogenase.

ChemBank displays the ‘View Project’ page: http://chembank.broad.harvard.edu/assays/view-project.htm?id=1000454.

Notice that the project contains pairs of assays taken at 0 and 30 minute time points. The assays named “Calc…” show the change between the two time points, which are the values of interest. In looking at this project, we focus on the “Calc…” assays.

2.      Click ‘[find hits]’ to find the compounds that scored as standard hits in the assays of this project.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046984.

Use a heatmap to visualize the molecules and the assays in which they were screened. Focus on the values of interest in this project by including only the Calc assays of the DihydroorotateDehydrogenase project in the heatmap.

3.      Click ‘[view multi-assay result heatmap]’.

ChemBank displays the ‘Feature selection’ page, which prompts you to select the assays to display in the heatmap: http://chembank.broad.harvard.edu/chemistry/featureSelection.htm?molSearchId=5046984.

4.      In the ‘Select Projects and Assays’ list box, click the plus (+) icon next to DihydroorotateDehydrogenase to display the assays in that project.

5.      Select the calculated assays by clicking the check box next to each assay. For this example, select assays 1021.0018, 1021.0019 and 1020.0020.

6.      Click the ‘generate visualization’ button to display the heatmap.

ChemBank displays a heatmap that shows the molecules that were on the search page and the selected calculated assays in which they were screened: http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5046984&featureSelectId=5046985.

7.      Scroll the heatmap scanning for the dark red and dark blue cells that indicate the lowest and highest CompositeZ scores for these compounds. The dark blue cells indicate that the compounds scored as hits in assay 1021.0019 have the lowest CompositeZ scores.

Use an assay histogram to select the compounds with the lowest CompositeZ scores in assay 1021.0019.

8.      Double-click assay 1021.0019 to display its details: http://chembank.broad.harvard.edu/assays/view-assay.htm?id=1004057.

9.      Click ‘[view histogram]’ to display a histogram of the CompositeZ scores for the assay: http://chembank.broad.harvard.edu/assays/assay-histogram.htm?id=1004057&method=CompositeZ&min=NaN&max=NaN&mock=false.

10.  Select the compounds with a CompositeZ score less than -20 by clicking on the histogram image at -20 and dragging the cursor to the left side of the histogram (it does not matter where the cursor is on the vertical axis). ChemBank draws a box around the selected portion of the histogram.

Tip: You can also manually input values into the boxes to the left of the histogram image rather than drawing a box within the histogram.

11.  Click ‘[view molecules in range as list]’ to display the selected compounds on the search result page: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046988.

Use an assay scatterplot to determine the replicate reproducibility of a selected small‑molecule test.

12.  Click a representative molecule from the list to display information about the molecule. For this example, click ChemBankID 3052589: http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=3052589.

13.  The Screening Test Instances section of the ‘Molecule Display’ page shows compound activity across all projects. Click the ‘Composite Z-score’ heading once to sort the list by ascending CompositeZ scores. The assay of interest, 1021.0019, with its low CompositeZ score moves to the top of the list.

14.  Click assay 1021.0019, DihydroorotateDehydrogenase: Calc(E1-E2), to display its details: http://chembank.broad.harvard.edu/assays/view-assay.htm?cbid=3052589&id=1004057.

15.  Click ‘[view scatterplot]’ to determine replicate reproducibility for the compound of interest, ChemBankID 3052589. The plot shows the CompositeZ scores of both mock-treatment and compound-treatment wells from the selected assay. The score for the compound of interest is highlighted in cyan. In this example, most scores (including the highlighted score) lie on the diagonal indicating similar results for both replicates: http://chembank.broad.harvard.edu/assays/assay-scatter-plot.htm?id=1004057&cbid=3052589&method=ZScoreValue&mock=false&replicate1=A&replicate2=B&minrep1=NaN&maxrep1=NaN&minrep2=NaN&maxrep2=NaN.

16.  Optionally, check the replicate reproducibility of other molecules in the list.

Use a heatmap to determine whether the compounds active in this project are selectively active.

17.  Click the Back button of the browser until you return to the list of the molecules with lowest CompositeZ scores in assay 1021.0019 of the DihydroorotateDehydrogenase project.

18.  Click ‘[view multi-assay result heatmap]’. ChemBank displays the ‘Feature selection’ page, which prompts you to select the assays to display in the heatmap. Select all projects and assays by clicking ‘Check all’ and then click the ‘generate visualization’ button:. http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5046988&featureSelectId=5046989.

19.  Scroll the heatmap scanning for the dark red and dark blue cells that indicate the lowest and highest CompositeZ scores for these compounds. The heatmap shows relatively restricted activity in projects other than project 1021, DihydroorotateDehydrogenase.

 

Further exploration: Decide how you might choose compounds for follow-up studies. For example, would you choose selectively active compounds to follow up? Would you choose molecules that are structurally similar or different? Once you choose the compounds in which you are interested, you can record their ChemBankID numbers separately in a file for future reference.

Scenario 3: Determine which small molecules may perturb biological pathways and processes

Research Objective: A researcher studying a biological pathway or process has found a related high-throughput screening (HTS) project in ChemBank (the biological object of the project is a cell line or an organism). ChemBank also contains screening projects for purified proteins – small-molecule microarray (SMM) projects and HTS projects for homogenous proteins. The researcher seeks to identify assays related to the biological process in which they are interested, or SMM projects containing proteins known to be involved in that biological process. Searching for compounds that are active in both types of assays, the researcher hopes to identify small molecules that may affect the biological pathway or process being studied.

ChemBank Overview:

Tutorial: For this example, the project of interest is PSACAntagonistScreen.

1.      Click ‘view projects’ and then click PSACAntagonistScreen.

ChemBank displays the ‘View Project’ page: http://chembank.broad.harvard.edu/assays/view-project.htm?id=1000468.

2.      Click ‘[find hits]’ to find the compounds that scored as standard hits in the assays of this project.

The ChemBank search returns thousands of molecules: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046990.

Modify the search to find compounds that scored as standard hits in PSACAntagonistScreen AND in a small-molecule microarray (SMM) project.

3.      Click ‘[modify]’ to modify the query.

ChemBank displays the ‘Molecule search builder’ page: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5046990.

4.      From the drop-down list labeled ‘Select a criterion to add’, select Assay and then click the ‘add’ button.

ChemBank displays the ‘Search by assay’ page.

5.      From the ‘Assay Type’ drop-down list, select ‘small-molecule microarray’. ChemBank filters the list of projects and assays to display only small-molecule microarray projects. For this example, select the small-molecule microarray project SMMDIV06Annotation. Click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046992.

Each molecule scored as a standard hit in the project of interest and as a standard hit in the small-molecule microarray project SMMDIV06Annotation. For this example, focus on the molecule with ChemBankID 2144641.

6.      Click ChemBankID 2144641 to display the molecule details.

ChemBank displays the ‘Molecule Display’ page: http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=2144641.

7.      Click the ‘Composite Z-score’ heading to sort the screening test instances by descending CompositeZ score. Locate a small-molecule microarray assay in which the compound scored as a hit. For this example, select the SMMDIV06Annotation assay with the highest CompositeZ score, 1066.0011.

8.      Click the assay name, ManualSNR(1066.0011), to view assay details, including the name of the protein that was screened.

ChemBank displays the ‘View Assay’ page: http://chembank.broad.harvard.edu/assays/view-assay.htm?cbid=2144641&id=1003695.

9.      Click the protein name for more information about the protein.

ChemBank displays the ‘Protein’ page: http://chembank.broad.harvard.edu/protein/view-protein.htm?id=56.

10.  Click the Back button of the browser to return to the ‘Molecule Display’ page. Optionally, examine the other SMMDIV06Annotation assays and their proteins.

Tip: To list the small-molecule microarray assays together, click the ‘Assay Type’ heading to sort the screening test instances by assay type.

11.  Click the Back button of the browser to return to the search results page: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046992.

On the search results page, each molecule scored as a standard hit in the project of interest and in at least one screening assay for a purified protein. For a molecule of interest, use the ‘Molecule Display’ page to find other HTS projects in which it scored as a hit. For this example, focus on the molecule with ChemBankID 2144641.

12.  Click ChemBankID 2144641 to display molecule details.

ChemBank displays the ‘Molecule Display page: http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=2144641.

13.  Click the ‘Composite Z-score’ heading to sort the screening test instances by descending CompositeZ score. In addition to scoring as a standard hit in the PSACAntagonistScreen and SMMDIV06Annotation projects, this compound also scored as a hit in the FacioscapulohumeralMD, HemeDetoxification, and PKCoreAssaySet projects.

Use a heatmap to examine the compound response patterns across these projects.

14.  Click the Back button of the browser to return to the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046992.

15.  Click ‘[view multi-assay result heatmap]’.

ChemBank displays the ‘Feature selection’ page, which prompts you to select the assays to display in the heatmap: http://chembank.broad.harvard.edu/chemistry/featureSelection.htm?molSearchId=5046992.

16.  In the ‘Select Projects and Assays’ list box, select the projects of interest: FacioscapulohumeralMD, HemeDetoxification, PKCoreAssaySet, PSACAntagonistScreen and SMMDIV06Annotation. Click the ‘generate visualization’ button.

ChemBank displays a heatmap that shows the molecules on the search result page and the assays (from the selected projects) in which they were screened: http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5046992&featureSelectId=5046994.

17.  Examine the assays in project 1066, SMMDIV06Annotation. The response pattern for the Proco-J/BostonU compound (well 2111L03) is distinctly different than that of the Neumann-C/Harvard compounds (wells 2098E03, 2097G06, 2098G18 and 2098E08). While all 5 compounds show reactivity in assays 1066.0001 and 1066.0002, only the Proco-J/BostonU compound shows consistently negative CompositeZ scores across the other assays in that project. To examine a compound, including its structure, double-click the compound name.

18.  The response pattern for the Neumann-C/Harvard compounds appear to have potentially interesting similarities across the assays of project 1051 (HemeDetoxification) and potentially interesting differences across the assays in project 1035 (PSACAntagonistScreen). For more information about an assay, double-click the assay number.

Further exploration: Decide how to determine which small molecules might affect the biological process in the HTS assay. Do any of those molecules also have an effect on a purified protein (found in an SMM assay) that is known to participate in the same biological process?

 

Scenario 4: Dissect small-molecule structure using assay profiles

Research Objective: A researcher has isolated or synthesized a small molecule and is now interested in examining response patterns for structurally related compounds. Because this is a known molecule, the researcher knows the structure, SMILES string, or ChemBankID of the molecule.

ChemBank Overview:

Tutorial: For this example, the known molecular structure is:

 

Find molecules structurally related to the molecule of interest.

1.      Under ‘Find Small Molecules’, click ‘by substructure’. On the ‘search by substructure’ page, use the JME Molecular Editor to draw the molecular structure shown above. Click the ‘search now’ button to find molecules that share that structure.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046995.

Modify the search to find molecules that contain this substructure AND that scored as standard hits in any assay.

2.      Click ‘[modify]’ to modify the query: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5046995.

3.      From the drop-down list labeled ‘Select a criterion to add’, select Assay and then click the ‘add’ button.

ChemBank displays the ‘Search by assay’ page: http://chembank.broad.harvard.edu/chemistry/search/input/assayResults.htm?id=5046995.

4.      Select all projects and assays by clicking ‘Check all’. By default, ChemBank finds molecules that scored as standard hits in the selected assays.

5.      Click the ‘add to search’ button to review the search criteria or the ‘search now’ button to begin the search.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5046996.

Use a heatmap to visualize the molecules on the search result page and the assays in which they were screened.

6.      Click ‘[view multi-assay result heatmap]’.

ChemBank displays the ‘Feature selection’ page, which prompts you to select the assays to display in the heatmap: http://chembank.broad.harvard.edu/chemistry/featureSelection.htm?molSearchId=5046996.

7.      Select all projects and assays by clicking ‘Check all’ and then click the ‘generate visualization’ button.

ChemBank displays a heatmap that shows molecules on the search result page and the assays in which they were screened: http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5046996&featureSelectId=5046997.

8.      Scroll the heatmap scanning for the dark blue and dark red cells that indicate the lowest and highest CompositeZ scores for these compounds.

For this example, notice that the compound of interest is active in project 1012 and its compound response patterns vary across the assays in that project.

Examine project 1012.

9.      From the heatmap, double-click an assay to display its information. For example, double-click assay 1012.0064.

ChemBank displays the ‘View Assay’ page: http://chembank.broad.harvard.edu/assays/view-assay.htm?id=1002522.

10.  From the ‘View Assay’ page, click the project name to display information about the project.

ChemBank displays the ‘View Project’ page for the NOXSuperoxideGeneration project: http://chembank.broad.harvard.edu/assays/view-project.htm?id=1000446.

Simplify the heatmap by displaying only the NOXSuperoxideGeneration project.

11.  Return to the heatmap. Use the Back button of the browser to redisplay the ‘Feature selection’ page, which prompts you to select the assays to display in the heatmap: http://chembank.broad.harvard.edu/chemistry/featureSelection.htm?molSearchId=5046996.

12.  Select the NOXSuperoxideGeneration project and click the ‘generate visualization’ button.

ChemBank displays a heatmap that shows the same set of compounds and the assays (in the NOXSuperoxideGeneration project) in which they were screened: http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5046996&featureSelectId=5046999.

Group compounds by structure in the heatmap and examine their response patterns.

13.  Click the ‘SMILES’ heading to sort the compounds by SMILES string.

ChemBank sorts the compounds by ordering the SMILES strings alphabetically. This is a crude way of grouping compounds by structure similarity.

14.  Notice that compounds 2110K11 and 2110K03 have similar response patterns: high CompositeZ scores in assays 1012.0064 and 1012.0065 and low scores in all other assays. Compound 2111M11 has a distinctly different response pattern: high CompositeZ scores in all assays.

15.  For a visualization of the response pattern, select all assays (shift-click the assay numbers) and compounds 2110K11, 2110K03 and 2111M11 (control-click the compound names)and then select ‘View > Profile’ from the heatmap menu bar. When you are finished viewing the profile, close the profile window.

16.  Double-click the compound name to display more detailed information, including its molecular structure and activity across assays in all projects. For example, double-click 2110K11.

ChemBank displays the ‘Molecule Display’ page: http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=3011059.

17.  Display the ‘Molecule Display’ page for the three compounds of interest: 2110K11, 2110K03 and 2111M11. As one would expect, the first two have a similar structure, which is distinct from that of the third.

Find molecules that have a response pattern similar to compounds 2110K11 and 2110K03. Generate a Structure-Data File (SDF) for those molecules.

18.  Return to the heatmap and examine the response pattern for 2110K11 and 2110K03. These compounds have CompositeZ scores greater than 5.4 in assays 0064 and 0065 and less than -1.5 in assays 0068 and 0069.

Tip: Hover over a cell of the heatmap to see the CompositeZ scores.

19.  Define the first search criterion: molecules that have CompositeZ scores > 5.4 in assay 1012.0064.

Under ‘Find Small Molecules’, click ‘by assay’. On the ‘Search by assay’ page, select assay 0064 from the NOXSuperoxideGeneration project. At the bottom of the page, use the drop-down list to select ‘molecules satisfying the condition’ and then specify CompositeZ > 5.4 to complete the search criterion. Click the ‘add to search’ button.

ChemBank displays the ‘Molecule search builder’ page: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5047001.

Tip: Projects can have many assays. On the ‘Search by assay’ page, use the Find function of the browser to find the correct assay.

20.  Define the second search criterion: molecules that have CompositeZ scores > 5.4 in assay 1012.0065.

From the drop-down list labeled ‘Select a criterion to add’, select Assay and then click the ‘add’ button. On the ‘Search by assay’ page, select assay 0065 from the NOXSuperoxideGeneration project. At the bottom of the page, use the drop-down list to select ‘molecules satisfying the condition’ and then specify CompositeZ > 5.4 to complete the search criterion. Click the ‘add to search’ button: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5047002.

21.  Define the third search criterion: molecules that have CompositeZ scores < -1.5 in assay 1012.0068: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5047003.

22.  Define the fourth search criterion: molecules that have CompositeZ scores < -1.5 in assay 1012.0069: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5047004.

23.  Click the ‘search’ button.

ChemBank displays the two molecules from the heatmap. No other molecules screened in this project have this response pattern: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5047004.

24.  Click ‘[export as SDF]’ to generate an SDF for these molecules.

Find molecules that have a response pattern similar to compound 2111M11. Generate a Structure-Data File (SDF) for those molecules.

25.  Return to the heatmap (displayed in step 12) and examine the response pattern for 2111M11. This compound has CompositeZ scores greater than 4.7 in assays 0064, 0065, 0066, 0067, 0068 and 0069 of the NOXSuperoxideGeneration project.

26.  Define a search to find all molecules that have this response pattern: http://chembank.broad.harvard.edu/chemistry/search.htm?id=5047012.

27.  Click the ‘search’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5047012.

28.  Click ‘[export as SDF]’ to generate an SDF for these molecules.

Further exploration: Use the downloaded SDF to visualize and compare structural features of molecules with similar response patterns using other programs that can read SDF format.

Scenario 5: Identify structurally related small molecules with known biological functions

Research Objective: A researcher has identified compounds with a known biological function and would like to identify structurally related compounds involved in the same biological function.

ChemBank Overview:

Tutorial: For this example, the known biological function of the compounds of interest is their use in the treatment of asthma (Therapeutic Indication=Asthma).

Find the molecules of interest and browse their chemical structures.

1.      Under ‘Find Small Molecules’, click ‘by function’. On the ‘Search by function’ page: for Ontology, select ‘Therapeutic Indication’; for Term, enter ‘Asthma’; and select the ‘Include child term matches’ check box (the default). Click the ‘search now’ button to start the search.

Tip: Ontological terms are case-sensitive. If you are unsure of a term, use the Browse (magnifying glass) button to select it rather than simply typing it in.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5034459.

2.      Examine the structures of the molecules by browsing the search results. Notice that many of the molecules include a particular four fused-ring substructure. For this example, select molecule 1000123 as having a chemical structure of interest.

Find molecules with similar chemical structures.

3.      Click ChemBankID 1000123 to display the ‘Molecule Display’ page: http://chembank.broad.harvard.edu/chemistry/viewMolecule.htm?cbid=1000123.

4.      Copy the SMILES string for the molecule by clicking it. ChemBank displays the SMILES string in a text box from which it can be copied without embedded line breaks. Copy the SMILES string from the text box and close the pop-up window.

5.      Under ‘Find Small Molecules’, click ‘by similarity’. On the ‘Search by similarity’ page, paste the SMILES string into the SMILES field. Leave the similarity metric set to Tanimoto and the similarity threshold set to .8. Click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5034793.

Expand the search by reducing the similarity threshold from 0.8 to 0.6.

6.      Click ‘[modify]’ to modify the query.

7.      Click ‘[edit criterion]’ to modify the structure criterion. Notice that the structure has been drawn for you in the JME Molecular Editor.

8.      On the ‘Search by similarity’ page, set the similarity threshold to 0.6 and click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038367.

9.      Click ‘[export as text]’ to save the results to a text file. Rename the file similarity_1000123.txt.

Tip: A similarity search returns molecules that have the smallest number of “unshared” features when compared to the query structure. A substructure search returns molecules that share the query structure, but may have complex superstructures (extensive “unshared” features). For a similarity search, simplifying the query structure is likely to reduce the number of molecules found. For a substructure search, simplifying the query structure is likely to increase the number of molecules found.

Modify the query structure.

10.  Click ‘[modify]’ to modify the query and ‘[edit criterion]’ to modify the structure criterion.

11.  Use the delete (DEL) function of the JME Molecular Editor to simplify the chemical structure, as shown in the following figure.

Chemical structure of 1000123

Edited structure

 

Note: Due to a known software issue, when you modify the query, the chemical structure may be drawn incorrectly. To avoid this issue, set the similarity threshold to 0.6 in step 5 and skip steps 6 through 8. Or, click the following link and begin at step 10: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5050874.

12.  Click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038368.

13.  Click ‘[export as text]’ to save the results to a text file. Rename the file similarity_edited.txt.

Use a substructure search to find molecules that share the edited query structure.

14.  Copy the SMILES string for the edited structure as it appears in the query statement at the top of the search results page.

15.  Under ‘Find Small Molecules’, click ‘by substructure’. On the ‘Search by substructure’ page, paste the SMILES string into the SMILES/SMARTS field and click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5034521.

16.   Click ‘[export as text]’ to save the results to a text file. Rename the file substructure_edited.txt.

Compare the three result files. Sorting by ChemBankID reveals the overlap between the search results. Sorting by molecular weight reveals the higher average molecular weights of the overall larger compound structures in the substructure search result.

Determine which search was most effective in finding compounds with the known biological function of interest (Therapeutic Indication=Asthma).

17.  Copy all ChemBankIDs from the first result file.

18.  Under ‘Find Small Molecules’, click ‘by user list’ and paste the copied IDs into the ‘ChemBankIDs’ list box.

19.  Click the ‘add to search’ button. ChemBank displays the ‘Molecule search builder’ page.

20.  From the drop-down list labeled ‘Select a criterion to add’, select Function and click the ‘add’ button.

21.  On the ‘Search by function’ page: for Ontology, select ‘Therapeutic Indication’; for Term, enter ‘Asthma’; and select the ‘Include child term matches’ check box (the default). Click the ‘search now’ button to start the search.

22.  Repeat the process for the other result files.

The similarity search for the chemical structure of 1000123 finds 6 compounds: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5034612.

The similarity search for the edited chemical structure finds 0 compounds: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5034614.

The substructure search for the edited chemical structure finds 6 compounds; the same 6 compounds as the first similarity search: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5034616.

Shortcut: After displaying details for molecule 1000123, this scenario finds molecules with similar chemical structures by copying the SMILES string from the ‘Molecule Display’ page to the ‘Search by similarity’ page (steps 4 and 5). Alternatively, from the ‘Molecule Display’ page, you can find molecules with similar chemical structures by clicking ‘[find similar molecules]’.

Further exploration: ChemBank biological annotation is not complete at this time. Not all molecules with known biological effects have been annotated. Annotation of ChemBank molecules is an ongoing project, so check back often for updated biological annotations and enhanced searching capabilities.

 

Scenario 6: Download data for further calculation in external applications

Research Objective: A researcher would like to download information from ChemBank for use in other applications.

ChemBank Overview:

ChemBank provides several options for downloading data, as summarized in the following table. In general, from a ChemBank display page, click ‘[download data]’ or ‘[export as text]’ to write the associated data to a text file.

Note: You must be logged into ChemBank as a registered user to download data.

Object

Instructions

Downloaded Data

Projects & Assays

Click ‘advanced assay search’

Click ‘Search’

Click ‘[export as text]’

project ID, project description, assay description, project name, project motivation, assay name, assay type, species, screener, and organization for all ChemBank projects

Project

Click ‘view projects’

Click a project

Click ‘[download data]’

project name, assay name, plate, well, raw data values, background-subtracted values, CompositeZ scores, reproducibility calculations, ChemBankIDs and SMILES strings for all compounds screened in all assays in this project

Assay

Click ‘advanced assay search’

Click ‘Search’

Click an assay

Click ‘[download data]’

project name, assay name, plate, well, raw data values, background-subtracted values, CompositeZ scores, reproducibility calculations, ChemBankIDs and SMILES strings for all compounds screened in this assay

Search results

Find small molecules

Click ‘[export as text]’

ChemBankIDs and SMILES strings for all compounds on the search results page, as well as any additional information (such as Chemist, Molecule Name, or Descriptors) displayed on the search results page

Search results

Find small molecules

Click ‘[export as SDF]’

Structure-Data File (SDF) for the compounds on the search results page

Heatmap

Display a heatmap
Click ‘[download data]’

 

(To display a heatmap: find small molecules, click ‘[view multi-assay result heatmap]’, select assays, click ‘generate visualization’.)

matrix of assay names and compounds (with SMILES strings), with CompositeZ scores for each compound in each assay (CompositeZ scores are truncated at +/- 8.53)

 

Tutorial: For this example, download all possible information for the AspulvinoneUpregulation project. First, view and download project screening data. Second, view and download the molecules screened in the project with all available information about those molecules. Third, view and download functional annotations for the molecules screened in the project. Finally, display a heatmap showing the assays in the project and the compounds screened in those assays. Download the heatmap data.

View the project and download the associated screening data.

1.      Click ‘view projects’ and then click AspulvinoneUpregulation.

ChemBank displays the ‘View Project’ page: http://chembank.broad.harvard.edu/assays/view-project.htm?id=1000455.

2.      Click ‘[download data]’.

ChemBank writes the data to a tab-delimited text file and then prompts you to open or save the file.

3.      Save the file to disk. Depending on the web browser, you may be prompted for a directory location and file name.

Use Excel, a text editor, or program of your choice to view the data.

Find all molecules screened in this project

4.      Under ‘Find Small Molecules’, click ‘by assay’. On the ‘Search by assay’ page, select the AspulvinoneUpregulation project. At the bottom of the page, use the drop-down list to direct ChemBank to find all screened molecules (by default, ChemBank finds molecules that scored as standard hits). Click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038913.

Download chemist, molecule name, and descriptor values for the compounds screened in this project: modify the query to display that information and then download the search results.

5.      Add the criterion Chemist = “*”.

Click ‘[modify]’ to modify the query. From the drop-down list labeled ‘Select a criterion to add’, select Chemist and then click the ‘add’ button. On the ‘Search by chemist’ page, input “*” in the text box to return the source of synthesis for every compound. Click the ‘search now’ button.

Tip: The asterisk (*) substitutes as a wildcard character for any zero or more characters. In this example, entering “*” finds every instance of the search object.

ChemBank returns significantly more results – all compounds synthesized for all molecules screened in this project. The search results include the source of synthesis for each compound: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038914.

6.      Add the criterion Molecule Name = “*”.

Click ‘[modify]’ to modify the query. From the drop-down list labeled ‘Select a criterion to add’, select Molecule Name and then click the ‘add’ button. On the ‘Search by molecule name’ page, input “*” in the text box to return the molecule name for every compound. Click the ‘search now’ button.

The ChemBank search returns the same compounds and the search results now include the molecule name for each compound: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038915.

7.      Add the descriptor criterion Aqueous Solubility >= -59.44.

Click ‘[modify]’ to modify the query. From the drop-down list labeled ‘Select a criterion to add’, select Descriptor and then click the ‘add’ button. On the ‘Search using descriptors’ page, select the first descriptor, Aqueous Solubility. To return that descriptor value for every compound, select greater than or equal to (>=) and input the minimum value displayed (-59.44) into the text box. Click the ‘search now’ button.

The ChemBank search returns fewer compounds than before and the search results now include the Aqueous Solubility descriptor value for each compound. Fewer compounds are returned because a subset of ChemBank compounds (for example, 701 and 1484) are missing descriptor values. These compounds are removed from the search results when descriptors are added to the query: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038917.

8.      Add additional descriptor criteria as desired.

9.      From the search result page, click ‘[export as text]’.

ChemBank writes the data to a tab-delimited text file and then prompts you to open or save the file.

10.  Save the file to disk. Depending on the web browser, you may be prompted for a directory location and file name.

Use Excel, a text editor, or program of your choice to view the data.

Download functional annotations for the compounds.

11.  Remove all previously added search criteria except the first: that the molecule was screened in any assay of project AspulvinoneUpregulation..

Click ‘[modify]’ to modify the query. Click ‘[remove criterion]’ next to the criterion that you want to remove.

12.  Modify the query to add the functional annotation. From the drop-down list labeled ‘Select a criterion to add’, select Function and then click the ‘add’ button.

13.  On the ‘Search by function’ page: for Ontology, select ‘Therapeutic Indication’. For Term, use the Browse (magnifying glass) button to select the MeSH root term and select the ‘Include child term matches’ check box (the default). Click the ‘search now’ button.

ChemBank displays the compounds screened in this project that have annotations for Therapeutic Indication: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038944.

14.  From the search result page, click ‘[export as text]’ and save the results to a file.

15.  Modify the query and edit the Therapeutic Indication criterion.

16.  On the ‘Search by function’ page: for Ontology, select ‘Therapeutic Use’. For Term, use the Browse button to select the root term (‘use classification ontology’) and select the ‘Include child term matches’ check box (the default). Click the ‘search now’ button.

ChemBank displays the compounds screened in this project that have annotations for Therapeutic Use: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038946.

17.  From the search result page, click ‘[export as text]’ and save the results to a file.

18.  Modify the query and edit the Therapeutic Use criterion.

19.  On the ‘Search by function’ page: for Ontology, select ‘Biological Process’. For Term, use the Browse button (magnifying glass) to select the root term (‘biological process’) and select the ‘Include child term matches’ check box (the default). Click the ‘search now’ button.

The ChemBank search returns the compounds screened in this project that have annotations for Biological Process: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5038949.

20.  From the search result page, click ‘[export as text]’ and save the results to a file.

View a heatmap of the assays in the project and the compounds screened in those assays. Download the heatmap data.

21.  Under ‘Find Small Molecules’, click ‘by assay’. On the ‘Search by assay’ page, select the AspulvinoneUpregulation project. By default, ChemBank finds molecules that scored as standard hits in the assays of the selected project. Click the ‘search now’ button.

ChemBank displays the search results: http://chembank.broad.harvard.edu/chemistry/search/execute.htm?id=5039734.

22.  Click ‘[view multi-assay result heatmap]’. ChemBank displays the ‘Feature selection’ page, which prompts you to select the assays to display in the heatmap. Select the AspulvinoneUpregulation project and click the ‘generate visualization’ button.

ChemBank displays a heatmap of the compounds from the search result page and the assays (in the AspulvinoneUpregulation project) in which they were screened: http://chembank.broad.harvard.edu/chemistry/featureSelection/visualize.htm?molSearchId=5039734&featureSelectId=5039735.

23.  Click ‘[download data]’ and save the heatmap data to a tab-delimited text file.

Use Excel, a text editor, or another program of your choice to view the data. CompositeZ scores in the heatmap and downloaded file are truncated at +/- 8.53. For actual scores, download the screening data from the ‘View Project’ or ‘View Assay’ page.

Further exploration: Use other software applications to manipulate data downloaded from ChemBank. Compare output to the current data manipulation available within ChemBank. Have suggestions for ChemBank data analysis tools? Email them to the ChemBank team using the following link: http://chembank.broad.harvard.edu/reportSuggestion.htm?type=suggestion&problemPage=%2Fwelcome.htm.


Summary

These scenarios have introduced the user to some of the basic functions of ChemBank. ChemBank is constantly evolving and being updated. In the future, ChemBank will have additional data analysis tools, enhanced biological annotations, and more screening data from new screens and possibly other screening facilities as well. Bookmark ChemBank in your browser and check back often for new features.

Further information on how to use ChemBank can be obtained by going to the Help section at http://chembank.broad.harvard.edu/details.htm?tag=Help. ChemBank encourages user suggestions, which can be mailed to the ChemBank team using the following link: http://chembank.broad.harvard.edu/reportSuggestion.htm?type=suggestion&problemPage=%2Fwelcome.htm.