### General information Author: National Bioinformatic Infrastructure Sweden, https://nbis.se Contact e-mail: john.sundh@nbis.se DOI: 10.17044/scilifelab.22246417 License: CC BY 4.0 This readme file was last updated: 2023-03-10 Please cite as: NBIS (2023). MRSA case study example data. https://doi.org/10.17044/scilifelab.22246417 ### Dataset description This dataset contains fastq files for three Illumina HiSeq runs of an RNA-seq analysis (see Osmundson J et al., PLoS One, 2013;8(10):e76572) This data is used as a case-study for the Tools in Reproducible Research course. We have previously used `fastq-dump` from the `sra-tools` package to download a subsampled set of sequences from the Sequence Read Archive (SRA). However,recently sra-tools has become very unreliable due to some certificate/security issue when downloading from the National Center for Biotechnology Information (NCBI). We have therefore created this dataset to use as an alternative starting point for the course case-study. All three files were generated on the Rackham compute cluster by installing sra-tools (v.3.0.3) from the bioconda channel: ``` mamba create -n sra-tools -c bioconda sra-tools conda activate sra-tools ``` then running: ``` fastq-dump SRR935090 -X 100000 --gzip -Z > SRR935090.fastq.gz fastq-dump SRR935091 -X 100000 --gzip -Z > SRR935091.fastq.gz fastq-dump SRR935092 -X 100000 --gzip -Z > SRR935092.fastq.gz ``` Thus, each file contains a subset of 100,000 reads for each sample downloaded from the original data found in the SRA archive. The original data contains between 76.3 - 176.6 million reads. The idea is to let the students download these subsampled files directly or as part of bioinformatic workflows taught during the course.