MAGE-TAB Version	1.1
Investigation Title	TARGET: Acute Myeloid Leukemia (AML) WXS
Experimental Design	disease state design
Experimental Design Term Source REF	EFO
Experimental Factor Name
Experimental Factor Type
Experimental Factor Term Source REF
Person Last Name	NCI Office of Cancer Genomics (OCG)	NCI Center for Biomedical Informatics and Information Technology (CBIIT)	Meshinchi	Arceci	Ma	Zhang
Person First Name			Soheil	Robert	Xiaotu	Jinghui
Person Mid Initials				J		
Person Email	ocg@mail.nih.gov	ncicbiit@mail.nih.gov	smeshinc@fredhutch.org	rarceci@phoenixchildrens.com	xiaotu.ma@stjude.org	jinghui.zhang@stjude.org
Person Phone	+1 301 451 8027	+1 888 478 4423	+1 206 667 4077	+1 602 827 2508	+1 901 595 3774	+1 901 595 6829
Person Fax	+1 301 480 4368		+1 206 667 4310	+1 602 271 0264	+1 901 595 7100	+1 901 595 7100
Person Address	31 Center Dr, Rm 10A07, Bethesda, MD 20892	9609 Medical Center Dr, Rockville, MD 20850	1100 Fairview Avenue North, D5-380, POB 19024, Seattle, WA  98109	445 N. 5th St, Tgen Building Room 322, Phoenix, AZ 85004	262 Danny Thomas Place, Memphis, TN 38105	262 Danny Thomas Place, Memphis, TN 38105
Person Affiliation	National Cancer Institute	National Cancer Institute	Fred Hutchinson Cancer Research Center	Phoenix Children's Hospital	St Jude Children's Research Hospital	St Jude Children's Research Hospital
Person Roles	funder;investigator	data coder;curator	investigator	investigator	investigator;data analyst;submitter	investigator;data analyst
Person Roles Term Source REF	EFO;EFO	EFO;EFO	EFO	EFO	EFO;EFO;EFO	EFO;EFO
Quality Control Type
Quality Control Term Source REF
Replicate Type
Replicate Term Source REF
Normalization Type
Normalization Term Source REF
Date of Experiment
Public Release Date
PubMed ID
Publication DOI
Publication Author List
Publication Title
Publication Status
Publication Status Term Source REF
Experiment Description	"There are 200 fully characterized patient cases with AML (all tumor/normal pairs, 100 with relapse sample as well) that will make up the TARGET AML dataset, each with gene expression, tumor and paired normal copy number analyses, methylation and comprehensive next-generation sequencing to include whole genome sequencing, mRNA-seq and miRNA-seq. A subset of these cases will also have whole exome sequencing data available as well. There are additionally a large number of cases with partial molecular characterization making this a large and informative genomic dataset. All cases can be sorted according to data type via the Case Matrix on the TARGET Data Matrix. Please visit the TARGET website listed above for additional information on this and other TARGET genomics projects. Please see the TARGET Publication Guidelines at the OCG websitefor updated details on sharing of any TARGET substudy data."
Protocol Name	fredhutch.org:Protocol:DNA-Extraction-Qiagen-AllPrep:01	bcm.edu:Protocol:WXS-LibraryPrep-Illumina:01	bcm.edu:Protocol:WXS-ExomeCapture-NimbleGen-SeqCapEZHumanExomeV2:01	bcm.edu:Protocol:WXS-Sequence-Illumina-HiSeq2000:01	bcm.edu:Protocol:WXS-BaseCall-Illumina:01	bcm.edu:Protocol:WXS-ReadAlign-BWA-GATK-ITDAssembler:01	bcm.edu:Protocol:WXS-CnvSegment-LOHcate:01	stjude.org:Protocol:WXS-VariantCall-Bambino-DToxoG:01
Protocol Type	nucleic acid extraction protocol	nucleic acid library construction protocol	nucleic acid library construction protocol	nucleic acid sequencing protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol
Protocol Term Source REF	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO
Protocol Description	"Genomic DNA was prepared from using the Qiagen AllPrep DNA/RNA Mini Kit (Qiagen, Valencia, CA, USA). Please see https://ocg.cancer.gov/programs/target/target-methods for full extraction protocol details."	"Library construction: Specimen processing, DNA extraction, standard QC and Illumina paired-end pre-capture libraries were prepared according to the manufacturer's protocol (Illumina Inc, San Diego, CA) with the following modifications: 0.5 - 1ug genomic DNA in 100 ul volume was sheared into fragments of approximately 300 base pairs in a Covaris E210 system (Covaris, Inc. Woburn, MA). The setting was 10% duty cycle, intensity of 4, 200 cycles per burst for 120 seconds. Fragment size was checked using a 2.2% Flash Gel DNA Cassette (Lonza, Walkersville, MD, Cat. No.57023). End-repair of fragmented DNA was performed in 90 ul total reaction volume containing sheared DNA, 9 ul 10X buffer, 5 ul END Repair Enzyme Mix and H2O (NEBNext End-Repair Module, New England BioLabs, Ipswich, MA, Cat. No. E6050L), incubated at 20 degC for 30 minutes. A-tailing was performed in a total reaction volume of 60 ul containing end-repaired DNA, 6 ul 10X buffer, 3 ul Klenow fragment (NEBNext dA-Tailing Module; Cat. No. E6053L) and H2O followed by incubation at 37 degC for 30 minutes. Illumina multiplex adapter ligation (NEBNext Quick Ligation Module Cat. No. E6056L) was performed in a total reaction volume of 90 ul containing 18 ul 5X buffer, 5 ul ligase, 0.5 ul 100 uM adaptor and H2O at room temperature for 30 minutes. After ligation, PCR with Illumina PE 1.0 and modified barcode primers (manuscript in preparation) was performed in 170 ul reactions containing 85 ul of 2x Phusion High-Fidelity PCR master mix, adaptor ligated DNA, 1.75 ul of 50 uM primers and H2O. PCR was performed using a 5 minute initial denaturation at 95C, 6-10 cycles of 15 seconds at 95 degC, 15 seconds at 60 degC and 30 seconds at 72 degC followed by a final extension for 5 minute at 72 degC. Agencourt XP Beads (Beckman Coulter Genomics, Inc., Danvers, MA, Cat. No. A63882) were used to purify DNA after each enzymatic reaction. After purification, PCR product quantification and size distribution was determined using the Caliper GX 1K/12K/High Sensitivity Assay Labchip (Hopkinton, MA, Cat. No. 760517)."	"Exome capture: Illumina pre-capture libraries (1 ug DNA input) were hybridized in solution to SeqCap EZ Human Exome 2.0 (Nimblegen, Madison, WI) probes targeting approximately 44Mbs of sequence from approximately 30K genes according to the manufacturer's protocol with the following modifications: hybridization enhancing oligos IHE1, IHE2 and IHE3 replaced oligos HE1.1 and HE2.1 and post-capture LM-PCR was performed using 14 cycles. Capture libraries were quantified using Caliper GX 1K/12K/High Sensitivity Assay Labchip (Hopkinton, MA, Cat. No. 760517). The efficiency of the capture was evaluated by performing a qPCR-based quality check on the built-in controls (qPCR SYBR Green assays, Applied Biosystems, Grand Island, NY). Four standardized oligo sets, RUNX2, PRKG1, SMG1, and NLK, were employed as internal quality controls. The enrichment of the capture libraries was estimated to range from 7- to 9-fold over background."	"Library templates were prepared for sequencing using Illumina's cBot cluster generation system with TruSeq PE Cluster Generation Kits (Part no. PE-401-3001). Briefly, these libraries were denatured with sodium hydroxide and diluted to 6-9 pM in hybridization buffer in order to achieve a load density of ~800K clusters/mm2. Each library pool was loaded in a single lane of a HiSeq flow cell, and each lane was spiked with 2% phiX control library for run quality control. The sample libraries then underwent bridge amplification to form clonal clusters, followed by hybridization with the sequencing primer. Sequencing runs were performed in paired-end mode using the Illumina HiSeq 2000 platform. Using the TruSeq SBS Kits (Part no. FC-401-3001), sequencing-by-synthesis reactions were extended for 101 cycles from each end, with an additional 7 cycles for the index read. Sequencing runs generated approximately 300-400 million successful reads on each lane of a flow cell, with approximately 9-10 Gb produced per sample. With these sequencing yields, samples achieved an average of 95% of the targeted exome bases covered to a depth of 20X or greater."	"Real Time Analysis (RTA) software was used to process the image analysis and nucleotide base calling. On average, about 80-100 million successful reads, consisting of 2X 100 bp, were generated on each lane of a flow cell."	"Mapping Reads: Illumina HiSeq bcl files were processed using BCLConvertor v1.7.1. All reads from the prepared libraries that passed the Illumina Chastity filter were formatted into fastq files. The fastq files were aligned to human reference genome build37 (NCBI) using BWA (bwa-0.5.9-R16) with default parameters with the following exceptions: seed sequence: 40 bpseed mismatch: 2, total mismatches allowed: 3. BAM files generated from alignment were preprocessed using GATK (v1.3-8-gb0e6afe) to recalibrate and locally realign reads. ITD Detection: We used ITD Assembler, a combined de novo assembly/algorithmic approach that takes the entire set of unmapped and significantly soft-clipped reads, and employs a De Bruijn graph assembly algorithm to select read sets that form cycles, indicative of repetitive sequence structures, in order to find reads that span duplications. Read sets that formed De Bruijn graph cycles are independently assembled using the Overlap Layout Consensus (OLC) methodology of the Phrap algorithm thereby alleviating the collapse of repeat sequences from De Bruijn graph assembly approaches. Resulting OLC assembled contigs are locally aligned to the reference sequence and the mapped location data from aligned soft-clipped reads and aligned-unaligned read pairs from that contig, are utilized to annotate the position of detected internal tandem duplications (ITDs). FLT3/ITDs were also verified by fragment length analysis utilizing Life Technologies GeneMapper software (Life Technologies, Grand Island, NY). Variant allele fractions are reported as the size of the ITD peak divided by the sum of the wild type and ITD peaks."	"CNA analysis was performed using LOHcate, a method that identifies CNA events in whole exome tumor sequence data via detecting enrichment in variant or reference allele quantities per site across polymorphic exonic sites. The per-site quantities are plotted two-dimensionally between matched normal and tumor. Significant sites are then clustered in this Euclidean 2D space using an optimized version of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, after which clusters are classified to denote the appropriate CNA events: somatic gain, loss of heterozygosity (LOH), or copy neutral LOH (cnLOH). The sites within these classified CNA clusters are mapped back to the exome, which is subsequently segmented into CNA regions. Regions and sub-regions of low to high recurrence are identified by comparing CNA regions across samples. Segments with 200 or more markers were visualized by Partek software (Partek Inc., St. Louis, MO) and utilized in the analysis. Dewal, N et al. LOHcate: Robust detection and analysis of aneupoidy in whole exome sequence data from cancer genomes. In preparation"	"Of the 1,131 paired tumor-normal WES data, all except for 23 osterosarcoma pairs exhibit the expected binomial distribution of variant allele fraction for known germline SNPs. The 23 outlier samples were not used for discovery of driver genes due to their abnormal profile of variant allele distribution. They were included only for determining mutation prevalence of the driver genes identified in this study. Somatic SNVs and indels were detected by the Bambino program followed by postprocessing and manual curation as previously described. As 8-oxo-G artifact has been reported previously, we implemented an 8-oxo-G filter following the principles described in the D-ToxoG (http://archive.broadinstitute.org/cancer/cga/dtoxog) algorithm. The filter was tested on the "Exome_Native" neuroblastoma samples known to have 8-oxo-G artifacts and then applied to the variants identified in all WES data set."
Protocol Parameters					Software Versions	Software Versions		
Protocol Hardware				Illumina HiSeq 2000				
Protocol Software					Casava	bfast;picard tools;raw bam read validator		
Protocol Contact
SDRF File	TARGET_AML_WXS_20170609.sdrf.txt
Term Source Name	NCBITaxon	NCIt	MO	EFO	OBI
Term Source File	http://www.ncbi.nlm.nih.gov/taxonomy	http://ncit.nci.nih.gov/	http://mged.sourceforge.net/ontologies/MGEDontology.php	http://www.ebi.ac.uk/efo	http://purl.obolibrary.org/obo/obi
Term Source Version
Comment[SRA_STUDY]	SRP012000
Comment[BioProject]	PRJNA89525
Comment[dbGaP Study]	phs000465
