The G1E Database, G1EDb, contains the experimental results from a microarray-based analysis of differenting G1E erythroid cells. Experimental results are also available on this site in other formats, and have been deposited with the NIH in the GEO database. G1EDb is useful, however, reviewing the expression pattern of either individual genes, or groups of genes sorted by functional, numerical or other criteria. The database was designed to make it easy to get at the underlying information in an intuitive manner.
All of the functions of the G1EDb are accessed from one screen. The screen shows the transcriptional profile for one probeset of the 12,488 present on the MG-U74Av2 microarray. This information is conveyed at a glance through the use of graphic elements, but detailed information is also present. The user can page through the database probeset by probeset or search for a specific probeset using defined criteria. Subsets of data can be created using search parameters, or by manually picking probesets one at a time. Finally, data can be exported from the database in a number of formats. A number of hyperlinks connect each page to relevant web resources (assuming the user'scomputer has a live internet connection). A picture of the screen is shown below and is divided into several thematic regions, which are described in additional detail in the following text. For a larger picture of the screen shot, click on the picture.
Running the Database
The database is distributed in Microsoft Access format, which is required to use the database (MS Access can not be distributed with the database). The database does not run remotely over the web, so it is necessary to download a copy from our website. Once the database file is present on your computer, double click it's icon. The database should automatically show the first probeset on the array, similar to the picture above. If the form is clipped by the edges of the screen, either drag the edges of the window to enlarge it, or hit the maximize button in the upper right hand corner of the window. If the form does not fit because of screen resolution limitations, increase your monitor's resolution using the control panel or right click on the desktop. Some institutions limit your ability to modify desktop settings; if so, either ask your IS department to change your resolution or use the scrollbars to navigate around the database.
The most annoying problem about bioinformatics is that the underlying information is in a constant state of flux. The database is designed around the one invariant element of this experiment: the probeset ID number provided by Affymetrix. Since each probeset is based on a genbank entry, those genbank accessions were used to extract information for the database. PERL scripts were written to map genbank sequences to the U96Mm version of Unigene. The database is modular in design, and at some point when these mappings become sufficiently antiquated, this process can be repeated, substituting newer definitions. There are two titles for each gene because one set was extracted directly from Genbank, the other from Unigene. Unigene cluster assignment, locus link, and tissue specificity were extracted from Unigene entries. Some sequences were not associated with genbank accession numbers or did not map to unigene, resulting in NO_GENE and NO_UG entries in the database. Next to various fields, there are buttons. These are hotlinks to relevant NCBI and Affymetrix resources. Clicking on LocusLink will bring you to the locuslink page assigned to the gene you are viewing. Since clicking on Gene will retrieve all entries with that Gene Symbol, it could take a while to download all the entries -- usually this is not the button you want. The Probeset button will attempt to access probeset data from Affymetrix. Presently, retrieval of information from Affymetrix requires an account from Affymetrix, and login information must be entered at their site. Additionally, there are two other buttons: NetAffx and Blast. NetAffx brings you to the NetAffx site at Affymetrix, while Blast launches a Blastn of the corresponding Genbank sequence. This search is performed at the NCBI and returned in a window. The expected behavior of these buttons is that they will open new windows in a web-browser. Hitting the browser's back button should return you to the database and close the window. Since the probeset ID and genbank accession number should not change, you cannot write in those fields. The other fields may be changed by the user, but be careful. Once you leave the record, those changes become part of the database. It might make sense to make a copy of the database when you download it, just in case something like that happens.
The data synopsis provides a numerical look at the transcript and gives an indication of how well the three experimental replicates agreed. All of these values are derived from analysis with Affymetrix MAS 5.0 software. The synopsis is divided into three tiers.
The initial annotations were derived automatically from a number of databases including Unigene, the GO database and the Mouse Genome Database. Additionally, many transcripts were manually reviewed and categorized. The field at the bottom contains comments pasted in from literature or database searches. An "importance" rating is also present; this allows the user to subjectively designate a probeset as "a winner", "interesting", "maybe" or "forget it". The "Murine" checkbox is present because not every probeset on the array is murine -- for instance, there several control probesets on the array such as Cre. Finally, beneath the annotion is the record number. There are a total of 12,488 probesets on the array, each with a unique record number. If the dataset is filtered in some way, this area will state the number of records which passed the filter and can be viewed. If you are not seeing all of the transcripts, a filter may have restricted the dataset. To revert to an unfiltered view, hit the reset all button in the upper-right hand panel.
Two graphs are generated for each probeset, a plot of absolute signal (in red) and a relativized plot (in green). The absolute signal plot is just another way of viewing the absolute signal numbers shown on the left side of the form. To generate the relativized plot, the highest value is set to 100% and the lowest is defined as 0%; intermediate values are scaled. The intention of this plot was to allow comparison of profiles with different absolute signals, but the same overall shape. Different probe affinities/hybridization can result in variable signal levels. Using one scale for all transcripts would make it hard to see low signals. This plot is misleading when a signal does not change much, because it magnifies the scale. If these graphs are selected and copied to the clipboard in Access, they can be pasted into other applications.
To narrow down the number of genes displayed (or for output), you may filter the data using several types of parameters, each on its own tabbed page. Parameters from multiple pages may be used, with the understanding that there is an implicit Boolean "AND" in effect -- that is, only data which satisfies all of the filter requirements will be displayed. Thus, if you check "erythroid" and "liver", some erythroid genes will not be displayed because they are not also hepatic genes. To implement the filter, press the "Set Limits" button. To remove all filtering, press "Reset All".
Within the dataset (which may have been restricted through filtering discussed above, records can be searched according to a number of parameters. The drop box determines which field will be searched, for example, gene symbol (the internationally agreed upon abbreviation of the gene's name). The search term may be entered in the lower box. If you provide a prefix, the search will return all hits that start with the prefix. For example, "hb" will return all hemoglobin genes (hba-a1, hbb-b1, hbb-b2, etc.). Wildcards may also be used, where * indicates any character or string, and ? indicates any single character. More advanced searches using regular expressions are not implemented, nor can all fields be searched simultaneously. The Mark Record checkbox will mark the record which is currently displayed. Search and Mark can be used together to generate a custom dataset manually. After marking the relevant records, filter the dataset to include only the marked records by using the qualitative fitlering dialogue above.
The output options will function on the current dataset, so it it has been filtered, only a subset of the entire database will be written out to a file or printed. That's good, because outputing large files or printing large reports takes a while, and the resulting files or print jobs can be very large if the data are not somehow restricted. There are two options for outputing a file: either writing it in Microsoft Excel format or as HTML. Often, it is easier to manipulate large chunks of information in a spreadsheet. Additionally, two types of reports (shown below) can be generated from the database: a detailed report and a summary. The detailed report provides essentially the same information as the database screen for each probeset, and probesets are printed one to a page. The summary view prints some identifying information, the absolute signal values for each time point, and reproduces the relative expression plot for each transcript. Transcripts printed as a summary are roughly organized by transcription profile shape, but no formal clustering methology should be assumed. Multiple transcripts are printed on each page of the summary report.
|G1E Microarray Home|