How to find promoter region on integrated genome viewer?
Answers
To do this workshop, you'll need
A computer with 5 Gb or more computer memory (RAM)
A computer with Java 1.5 or higher installed
Ability to download and install software from the Internet
Internet connection (the faster the better)
About two hours to do the tutorial from start to finish
You'll also need a basic understanding of molecular biology concepts. Terms like "DNA", "alternative splicing," "transcription", and "gene" should be familiar to you.
You should also be familiar with sequencing technologies and sequencing-based assays, including ChIP-Seq and RNA-Seq.
Download and install Integrated Genome Browser
There are two ways to get IGB.
The first (easiest) way is to use Java Web Start, a system for downloading and running Java-based software programs from the internet. However, your computer may have been configured to block Java Web Start, in which case we'll use a different method (igb.zip) to download and run IGB.
Check memory
To start, check how much memory is available on your computer. You'll need to know this to decide whether to run big, medium, or small memory IGB.
On Mac, choose Apple > About This Mac.
On Windows 7, follow the directions in this link: http://windows.microsoft.com/en-us/windows7/find-out-how-much-ram-your-computer-has
On Windows Vista, follow the directions in this link: http://windows.microsoft.com/en-us/windows-vista/find-out-how-much-ram-your-computer-has
On Debian/Linux use the command $ grep MemTotal /proc/meminfo
Or use Google to find out how to check the memory on your computer
Make a note of the amount of memory your computer has and proceed.
Try Java Web Start to launch IGB
To launch IGB by Java Web Start, go to http://www.bioviz.org and click Download. You will see three IGB icons:
IGB Java Web Start icons
These icons link to a special type of file called a Java Network Launch Protocol (JNLP) file. A JNLP file contains computer-readable instructions for how to download and start Java-based software programs like IGB. However, because downloading and running programs from the Internet is often not a good idea, many systems administrators block computers from doing this. Which means: Java Web Start might not work for your computer system. But it is a good idea to use it because if your computer is configured to handle JWS applications, you'll automatically get IGB software updates and it's convenient and easy (when it works.)
If you are running Linux you can use icedtea to get similar functionality of Java Web Start.sudo apt-get install icedtea-netxIf you install IGB using Java Web start, then every time you launch the IGB program, Java Web Start will check to see if there is a new version of IGB available. If yes, the new IGB version will download and install it automatically. However, if you ever need to get an older version of IGB, they are available at the BioViz.org site.
IGB rules of thumb
If you have 8 Gb or more of computer memory, choose the "Large memory" option.
If you have 4 Gb of computer memory, choose "Medium memory" IGB.
If you have less than 4 Gb of memory, choose "Small memory" IGB.
When you click the icon, the corresponding JNLP file will download. If all goes well, your browser will launch the Java Web Start plug-in, which will then download IGB from BioViz.org and start it. Java Web Start may also show a window inviting you to create a shortcut icon for IGB on your desktop. If you click "yes", then Java Web Start will put a new IGB icon on your desktop which you can use to start IGB later on.
If this process fails, it's possible your computer might not have enough memory to run IGB. In that case, try a lower memory version.
If that doesn't help, the problem may be that your computer is not configured to allow Java Web Start to run. If this is the case, try downloading igb.zip.
If you can't use Java Web Start, download igb.zip
If Java Web Start didn't work, you can instead download igb.zip, unpack it, and launch IGB by double-clicking an IGB start script.
On the download page, click the link labeled igb.zip and download it. (Or you can use this link: igb.zip)
Save the file and then double-click it to unpack it (on Mac) or use a tool like WinZip on Windows.
When you unpack igb.zip, you'll see a new folder called igb. Open the folder and double-click one of the IGB script designed for your computer platform. There are three start scripts for the three different memory options: small, medium, and big.
The IGB start scripts for Mac end with .command
The IGB start scripts for Windows end with .bat
The IGB start scripts for UNIX end with .sh
When you double-click a start script, a new window will open called a "shell" or "terminal." This window will remain open until you close IGB. If you close the window, IGB will shut down. So you should leave it open until you are ready to quit IGB.
Promoter sequences are usually the sequence immediately upstream the transcription start site (TSS) or first exon. If we know the TSS of a gene, we will know with confidence where the promoter is even without experimental characterization. For many organisms, such as as human, mouse, the genome is well annotated and TSS well defined. Thus promoter sequence retrieval is an easy task. There are three major genome browsers: NCBI, Ensembl and UCSC. For our purpose, Ensembl provides the most convenient interface. Here is an example:
1. go to ensembl website: http://www.ensembl.org/index.html
2. choose an organism such as human http://www.ensembl.o...iens/Info/Index
3. Search your gene such as BRCA2 http://www.ensembl.o...ns;idx=;q=brca2
4. Click the right hit on the search result page and it will bring you to the gene summary page. For example the link to BRCA2 gene is http://www.ensembl.o...ns;idx=;q=brca2
5. On the left, under "Gene Summary", click "Sequence", the sequence of the gene including 5' flanking, exons, introns and flanking region will be displayed.
6. The exons are high lighted in pink background and red text, the sequence in front of the first exon is the promoter sequence.
7. By default, 600 bp 5'-flanking sequence (promoter) is displayed. If you want to get more, click "Configure this page" in the lower left column, a popup window opens allowing to input the size of 5' Flanking sequence (upstream). You can put for example "1000" and then save the configuration.
8. Sometimes there are discrepancies between Ensembl and UCSC annotation regarding TSS. To make sure the first exon given by ensembl is right, copy the promoter sequence
9. Go to UCSC BLAT search at http://genome.ucsc.e...t?command=start and choose the right genome (eg, human), paste the sequence there. On the result page, click browse of the first hit, this will bring you to the genome browser Page. the query sequence is now aligned with UCSC genome sequence. Zoom out a bit, you will be able to determine whether the promoter sequence matches UCSC annotation. If it matches, the sequence is very likely the right one. Here is the BRCA2 promoter sequence aligned to BRAC2 gene.
10. In UCSC genome broswer, you can turn on CpG island feature, if there is CpG island in the promoter sequence, the sequence is highly likely a true promoter. In the above example (BRCA2), a CpG island is displayed in the proximal promoter.
11. Beware some genes have alternative promoters. To find those sequences, it requires extensive bioinformatics and experimental analysis.