Just as supercomputers have played a key role in unravelling the genome code, they're also set to play a pivotal role in its analysis. To decipher what parts of the DNA code contain meaningful information, scientists are using gene-finding software and applying neural networking techniques in order to identify promising patterns within the three billion-long string of DNA codes. Pattern-recognition software is also being used to help to mine genetic databases. By finding close matches, which are already annotated, the researchers can get some good clues about a new gene's function.
But they need more than this. They need to know what's called a gene's 'expression'. This is a measure of what quantities of what protein are created at what time. When the gene needs to make a protein it triggers the creation of multiple copies of a molecule called mRNA, derived from and complementary to a segment of DNA. Scientists have devised a micro-array, commonly called a DNA chip, which can monitor the activity levels of thousands of different mRNAs. This gives researchers a powerful tool, which can be used to observe the pattern of behaviour of thousands of genes simultaneously - scientists are already able to discriminate between cancer cells, which are indistinguishable under the microscope. To achieve real breakthroughs in understanding, it's necessary to compare human DNA not only between different people, but also with that of other species. As it happens, humans share an amazing amount of meaningful DNA code with mice, but the 'junk' code is quite different. Celera is already a third of the way towards mapping the DNA of a mouse. By comparing the two maps, it hopes to locate important information, such as the gene triggers, which are hidden in the 'junk'.