Swedish Reference Genome Portal: assets
The Swedish Reference Genome Portal (genomes.scilifelab.se) is a service facilitating access and discovery of genome data of non-model eukaryotic species studied in Sweden.
This record contains assets generated during the development of the Genome Portal that are relevant for re-use of the data configurations. As of this version of the record, it contains refNameAlias files used for the genome browser pages in the Genome Portal, and a mirrored copy of an annotation track.
The Swedish Reference Genome Portal uses the open source genome browser JBrowse 2 (https://jbrowse.org) to visualise genomic data. During the process of adding a new genome assembly to the portal, a so-called 'refNameAlias' file is generated by the staff. These files facilitate the display of annotation tracks associated to the assembly in JBrowse 2, as described in the refNameAlias format description section below.
File descriptions
For detailed descriptions of the files, please see the README.txt file.
refNameAlias format description
For a track to be rendered in JBrowse 2, the underlying data must point to the FASTA header of the sequence (contig, scaffold, chromosome) of the genome assembly it is describing. However, it is not unusual that FASTA headers undergo name changes throughout its life cycle, even if the nucleotide sequence contained within remains the same. For instance, a genome assembly uploaded to ENA will be mirrored to NCBI, but the final hosted files will have slightly reformatted headings based on the conventions of the two repositories. A data track formatted for use with an assembly downloaded from ENA might therefore not work with the version of the assembly hosted on NCBI.
A refNameAlias is a tab-delimited file that can be loaded in JBrowse 2 to mitigate this issue and used to avoid reformatting the data track file. The first column of the refNameAlias contains the FASTA header of the genome assembly that will be loaded in JBrowse 2, and each subsequent column contains synonymous FASTA headers. There can be any number of columns, but refNameAlias files created for the Swedish Reference Genome Portal typically contain the header formatting from ENA, NCBI, and, if applicable, any internal header names used by the submitting research group.
The refNameAlias are published in this record for use in Swedish Reference Genome Portal, and for facilitating data reuse. The refNameAlias files can be used to create local JBrowse 2 instances of the specific combinations of genome assembly and data track versions used for a specific species in the Swedish Reference Genome Portal. This can for instance be done using the JBrowse 2 desktop client or an own deployment of JBrowse 2 web (https://jbrowse.org/jb2/download/).
Example of the first lines of CAMGYJ01.fna.alias:
#ENA_header NCBI_header original_header
ENA|CAMGYJ010000001|CAMGYJ010000001.1 CAMGYJ010000001.1 CHL
ENA|CAMGYJ010000002|CAMGYJ010000002.1 CAMGYJ010000002.1 LG1
ENA|CAMGYJ010000003|CAMGYJ010000003.1 CAMGYJ010000003.1 LG10
ENA|CAMGYJ010000004|CAMGYJ010000004.1 CAMGYJ010000004.1 LG2
Funding
SciLifeLab and Wallenberg Research Program for Data-Driven Life Science (DDLS)
Swedish Foundation for Strategic Research (SSF)
History
Publisher
Uppsala University and The Swedish Museum of Natural HistoryContact email
datacentre@scilifelab.seSciLifeLab acknowledgement
- SciLifeLab Data Centre
- Bioinformatics platform (NBIS)