What are the different formats for archival of genome assembly data?
Answers
Answer:
The ENCODE consortium uses several file formats to store, display, and disseminate data:
FASTQ
BAM
bigWig
bigBed
FASTQ[1] is a text-based format for storing nucleotide sequences (reads) and their quality scores. The Sequence Alignment/Mapping (SAM)[2] format is a text-based format for storing read alignments against reference sequences and it is interconvertible with the binary BAM format. The bigWig format is an indexed binary format for rapid display of continuous and dense data in the UCSC Genome Browser. And the bigBed format is also an indexed binary format for rapid display of annotation items such as a linked collection of exons or the binding peaks of a transcription factor.
These file formats were originally designed to be generic and flexible. As the ENCODE consortium is a collaborative effort, the consortium has made several specifications on the file formats to facilitate data archival, presentation, and distribution, as well as integrative analysis on the data. The consortium considers FASTQ as the basic file format for archival purpose and thus the FASTQ format's specifications aim to preserve the raw sequence data. In comparison, the other file formats are geared towards data visualization and dissemination, thus their specifications aim to facilitate user-friendliness
The consortium considers FASTQ as the basic file format for archival ... In comparison, the other file formats are geared towards data .... Analyses of ENCODE data produce annotation files, e.g., genomic.....
✔️✔️✔️