This dataset includes two human genome references assembled by the Genome Reference Consortium: Hg19 and Hg38. Additionally, this dataset includes a specially curated version of the Hg19 reference dataset, known as the decoy genome. This dataset contains all the resource files needed to run GATK Best Practices workflows on sequencing data. AWS S3 has made this collection of reference data available free of charge so that anyone can use the AWS cloud platform to perform large-scale genomics analysis without worrying about the cost to download or host this data for themselves.
The dataset is organized by a directory structure where each set of reference files are organized into a sub-directory directly under the bucket
Each reference directory includes: * a
README with details on how these files are generated *
*.fasta, and other common auxiliary resource files *
vcf files for known
snps from various genomic projects, such as
dbsnp, etc. * an interval list containing contiguous regions that is used to analyze whole genome sequencing data in chunks
For example, if you're looking to find the
fasta files associated to the hg19 set of resources, an initial listing of the hg19 reference directory
The first directory under the reference type is a version to protect the files from mutating and any changes/updates to an existing refrence is explicitly managed. Listing the contents under the version directory and filtering by the term
s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.64.amb s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.64.ann s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.64.bwt s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.64.pac s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.64.sa s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.alt s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.amb s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.ann s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.bwt s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.pac s3://broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.sa
This dataset is maintained and updated by the Broad Institute. For any help, please contact email@example.com.
This data is acquired from the NCBI GenBank site. The GenBank database is designed to provide and encourage access within the scientific community to the most up-to-date and comprehensive DNA sequence information. Therefore, NCBI places no restrictions on the use or distribution of the GenBank data. However, some submitters may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted. NCBI is not in a position to assess the validity of such claims, and therefore cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information contained in GenBank.