The genome of a terrestrial metazoan extremophile

Project ID
FG-379
Project Categories
Life Science
Completed
Abstract
We have massive quantity of genomic (~80x SOLID ABI) data originated from muscle tissue of an earthworm that lives in an impressive environmental within a geothermal field. Let me briefly tell you that these geothermal biotopes are reducing environments with particular unique features, such as elevated soil, water, and atmospheric elemental composition, together with constant diffuse degassing and high temperatures. “The secondary manifestations of volcanism in the geothermal field include low temperature fumaroles (maximum temperature around 100 °C), hot springs, CO2 cold springs and several diffuse degassing areas. Also the volcanic gases present in Furnas geothermal field (Azores Islands, Portugal) typically comprise water vapour, carbon dioxide (CO2), hydrogen sulfide (H2S), sulfur dioxide (SO2), hydrogen chloride (HCl), with lesser amounts of hydrogen fluoride (HF), and, the radioactive gas radon (Rn). Therefore, the ephemeral nature of the geothermal field is expected to favour the colonization by species with admirable colonization abilities” The assembly of this genome will provide an unusual opportunity to understand the dynamics thriving population structure and genetic diversity as well the integrated modifications ranging from genetic and biochemical, to cellular and physiological levels of organisation under such conspicuous environmental conditions. Now that I presented my “motivational scenario” let me tell what type of data we have. Presently our data is originated from several paired end and mate paired libraries: 100x coverage of short reads (SOLID 5500 with ECC module) 1x Single fragment (75 bp) (with ECC) 2x Paired end (~160-200 bp) 2x Mate pair (500bp-1kb) (with ECC) 1x Mate pair (3-5kb) (with ECC) In theory the denovo task that I am undertaken is far from easy but I believe that with your computing power we could go a bit further. I must say that our aim is to get as much as possible genetic units from this genome, although large contigs would be a great achievement but for that we need several core processor and as much RAM as we can get. These are very intensive computing jobs and I am having difficulties in finding the proper resources.
Use of FutureSystems
Just for research
Scale of Use
I would like to run several genome assemblies that may escalate to a bit more than 500Gb of RAM and 1.5TB of storing space. I will be managing resources wisely.