SciLifeLab
Browse
.GZ
CT01_draft_longest.fa.gz (3.45 MB)
.GZ
CT01_draft_longest.gff.gz (1.82 MB)
.GZ
CT01_draft_verbose.gff.gz (5.11 MB)
TEXT
MANIFEST.txt (0.2 kB)
TEXT
README.txt (2.81 kB)
1/0
5 files

The Chironomus tentans draft genome annotation

dataset
posted on 2023-08-03, 16:04 authored by Alexey Kutsenko, Thomas SvenssonThomas Svensson, Björn NystedtBjörn Nystedt, Joakim Lundeberg, Petra Björk, Erik Sonnhammer, Stefania Giacomello, Neus Visa, Lars Wieslander

If you use this data, please cite:
Kutsenko, A., Svensson, T., Nystedt, B. et al. The Chironomus tentans genome sequence and the organization of the Balbiani ring genes. BMC Genomics 15, 819 (2014). https://doi.org/10.1186/1471-2164-15-819

 

The dipteran Chironomus tentans (C. tentans) and its Balbiani ring (BR) genes serve as a model system for eukaryotic gene expression studies. Kutsenko, A. et al. (2014), reports the first draft genome of C. tentans, characterizing its gene expression machinery and the genomic architecture of its BR genes.
 

In brief, genomic DNA was extracted and sequenced, resulting in an assembly size of 213 Mb, which was likely an overestimate due to allelic variants. The estimated genome size is around 200 Mb, with low GC content (31%) and repeat fraction (15%) compared to other dipterans. Phylogenetic analysis places it as a sister clade to mosquitoes, diverging 150-250 million years ago. The assembled genome was relatively fragmented (scaffold NG50=65 Kbp), but was still found to be reasonably complete regarding gene content, with 97% of 248 highly conserved core eukaryotic genes being represented.
 

For transcriptome sequencing and genome annotation, poly (A)+ RNA was extracted from various tissues and developmental stages. This data was used as evidence for ab initio predictions of gene models and alternative splice variants, resulting in a draft annotation of 15,120 predicted genes. 

The C. tentans draft genome assembly can be downloaded here or from NCBI:

GenBank accession number: CBTT000000000.1

https://www.ncbi.nlm.nih.gov/assembly/GCA_000786525.1/


The draft genome annotation and the corresponding longest predicted proteins for each gene locus is provided here for download. Note that these preliminary annotations are provided as is, and incomplete, missing, or incorrect gene models are to be expected to some extent. 


Acknowledgements

We  acknowledge the Science for Life Laboratory and the National Genomics  Infrastructure (NGI) for sequencing service. Computations were mainly  performed on resources provided by SNIC through Uppsala  Multidisciplinary Center for Advanced Computational Science (UPPMAX).  Microscopy was performed at IFSU, Stockholm University. Ann-Charlotte  Sonnhammer at BILS is acknowledged for assistance concerning the initial  bioinformatics analysis. We thank Magnus Bjursell for initial support  in the project. This work was financed by grants from The Knut and Alice  Wallenberg Foundation through The Center for Metagenomic Sequence  analysis (CMS), The Granholm’s Foundation, The Carl Trygger’s Foundation  and The Swedish Research Council (VR).

Funding

The Knut and Alice Wallenberg Foundation through The Center for Metagenomic Sequence analysis (CMS)

The Granholm’s Foundation

The Carl Trygger’s Foundation

The Swedish Research Council (VR)

History

Publisher

Stockholm university

SciLifeLab acknowledgement

  • National Genomics Infrastructure unit
  • Support Infrastructure and Training unit

Usage metrics

    Science for Life Laboratory

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC