SciLifeLab
Browse

Swedish Reference Genome Portal: assets

Version 2 2025-01-22, 08:12
Version 1 2024-11-04, 10:27
online resource
posted on 2025-01-22, 08:12 authored by Daniel BrinkDaniel Brink, Rory Crean, Angela P. Fuentes-PardoAngela P. Fuentes-Pardo, Quentin Ågren

The Swedish Reference Genome Portal (genomes.scilifelab.se) is a service facilitating access and discovery of genome data of non-model eukaryotic species studied in Sweden.

This record contains assets generated during the development of the Genome Portal that are relevant for re-use of the data configurations. As of this version of the record, it contains refNameAlias files used for the genome browser pages in the Genome Portal, and a mirrored copy of an annotation track.

The Swedish Reference Genome Portal uses the open source genome browser JBrowse 2 (https://jbrowse.org) to visualise genomic data. During the process of adding a new genome assembly to the portal, a so-called 'refNameAlias' file is generated by the staff. These files facilitate the display of annotation tracks associated to the assembly in JBrowse 2, as described in the refNameAlias format description section below.

File descriptions

For detailed descriptions of the files, please see the README.txt file.

refNameAlias format description

For a track to be rendered in JBrowse 2, the underlying data must point to the FASTA header of the sequence (contig, scaffold, chromosome) of the genome assembly it is describing. However, it is not unusual that FASTA headers undergo name changes throughout its life cycle, even if the nucleotide sequence contained within remains the same. For instance, a genome assembly uploaded to ENA will be mirrored to NCBI, but the final hosted files will have slightly reformatted headings based on the conventions of the two repositories. A data track formatted for use with an assembly downloaded from ENA might therefore not work with the version of the assembly hosted on NCBI.

A refNameAlias is a tab-delimited file that can be loaded in JBrowse 2 to mitigate this issue and used to avoid reformatting the data track file. The first column of the refNameAlias contains the FASTA header of the genome assembly that will be loaded in JBrowse 2, and each subsequent column contains synonymous FASTA headers. There can be any number of columns, but refNameAlias files created for the Swedish Reference Genome Portal typically contain the header formatting from ENA, NCBI, and, if applicable, any internal header names used by the submitting research group.

The refNameAlias are published in this record for use in Swedish Reference Genome Portal, and for facilitating data reuse. The refNameAlias files can be used to create local JBrowse 2 instances of the specific combinations of genome assembly and data track versions used for a specific species in the Swedish Reference Genome Portal. This can for instance be done using the JBrowse 2 desktop client or an own deployment of JBrowse 2 web (https://jbrowse.org/jb2/download/).

Example of the first lines of CAMGYJ01.fna.alias:

#ENA_header NCBI_header original_header
ENA|CAMGYJ010000001|CAMGYJ010000001.1 CAMGYJ010000001.1 CHL
ENA|CAMGYJ010000002|CAMGYJ010000002.1 CAMGYJ010000002.1 LG1
ENA|CAMGYJ010000003|CAMGYJ010000003.1 CAMGYJ010000003.1 LG10
ENA|CAMGYJ010000004|CAMGYJ010000004.1 CAMGYJ010000004.1 LG2


Funding

SciLifeLab and Wallenberg Research Program for Data-Driven Life Science (DDLS)

Swedish Foundation for Strategic Research (SSF)

History

Publisher

Uppsala University and The Swedish Museum of Natural History

SciLifeLab acknowledgement

  • SciLifeLab Data Centre
  • Bioinformatics platform (NBIS)

Usage metrics

    Data Science Node in Evolution and Biodiversity

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC