MAGE-TAB Version	1.1
Investigation Title	TARGET: Kidney - Wilms Tumor (WT) WXS
Experimental Design	disease state design
Experimental Design Term Source REF	EFO
Experimental Factor Name
Experimental Factor Type
Experimental Factor Term Source REF
Person Last Name	NCI Office of Cancer Genomics (OCG)	NCI Center for Biomedical Informatics and Information Technology (CBIIT)	Gadd	Perlman	Ma	Zhang
Person First Name			Samantha	Elizabeth	Xiaotu	Jinghui
Person Mid Initials			L	J		
Person Email	ocg@mail.nih.gov	ncicbiit@mail.nih.gov	sgadd@luriechildrens.org	eperlman@luriechildrens.org	xiaotu.ma@stjude.org	jinghui.zhang@stjude.org
Person Phone	+1 301 451 8027	+1 888 478 4423	+1 773 755 6392	+1 312 227 3967	+1 901 595 3774	+1 901 595 6829
Person Fax	+1 301 480 4368				+1 901 595 7100	+1 901 595 7100
Person Address	31 Center Dr, Rm 10A07, Bethesda, MD 20892	9609 Medical Center Dr, Rockville, MD 20850	2430 N Halsted St, Room C366 Chicago, IL 60614	225 E Chicago Ave, Chicago, IL 60611	262 Danny Thomas Place, Memphis, TN 38105	262 Danny Thomas Place, Memphis, TN 38105
Person Affiliation	National Cancer Institute	National Cancer Institute	Lurie Children's Hospital of Chicago Research Center	Ann & Robert H. Lurie Children's Hospital of Chicago	St Jude Children's Research Hospital	St Jude Children's Research Hospital
Person Roles	funder;investigator	data coder;curator	investigator;data analyst	investigator	investigator;data analyst;submitter	investigator;data analyst
Person Roles Term Source REF	EFO;EFO	EFO;EFO	EFO;EFO	EFO	EFO;EFO;EFO	EFO;EFO
Quality Control Type
Quality Control Term Source REF
Replicate Type
Replicate Term Source REF
Normalization Type
Normalization Term Source REF
Date of Experiment
Public Release Date
PubMed ID
Publication DOI
Publication Author List
Publication Title
Publication Status
Publication Status Term Source REF
Experiment Description	"There are 130 fully characterized patient cases with high risk Wilms tumor (all tumor/normal pairs; 8 with additional samples for analysis = 3 with tumor adjacent normal, 5 with relapse sample) that will make up the TARGET WT dataset. Each case will have gene expression, tumor and paired normal copy number analyses, methylation and whole genome sequencing; a subset of WT cases with mRNA-seq, miRNA-seq, and whole exome sequencing data available as well. All cases can be sorted according to data type via the Case Matrix on the TARGET Data Matrix. Please visit the TARGET website listed above for additional information on this and other TARGET genomics projects. Please see the TARGET Publication Guidelines at the OCG websitefor updated details on sharing of any TARGET substudy data."
Protocol Name	nationwidechildrens.org:Protocol:DNA-Extraction-Qiagen-AllPrep:01	bcm.edu:Protocol:WXS-LibraryPrep-Illumina:01	bcm.edu:Protocol:TargetedCapture-LibraryPrep-IonTorrent:01	bcm.edu:Protocol:WXS-ExomeCapture-NimbleGen-SeqCapEZHumanExomeV2:01	bcm.edu:Protocol:WXS-Sequence-Illumina-HiSeq2000:01	bcm.edu:Protocol:TargetedCapture-Sequence-IonTorrent-IonPGM:01	bcm.edu:Protocol:WXS-BaseCall-Illumina:01	bcm.edu:Protocol:TargetedCapture-BaseCall-IonTorrent:01	bcm.edu:Protocol:WXS-ReadAlign-BWA-GATK:01	bcm.edu:Protocol:TargetedCapture-ReadAlign-BLAT-CrossMatch:01	bcm.edu:Protocol:WXS-VariantCall-AtlasPindel:01	bcm.edu:Protocol:TargetedCapture-FilterVerified:01	bcm.edu:Protocol:WXS-FilterVerified:01	bcm.edu:Protocol:TargetedCapture-VariantCall-AtlasPindel:01	nci.nih.gov:CBIIT.Meerzaman.Protocol:WXS-VariantCall:01	nci.nih.gov:CBIIT.Meerzaman.Protocol:WXS-FilterVerified:01	stjude.org:Protocol:WXS-VariantCall-Bambino-DToxoG:01
Protocol Type	nucleic acid extraction protocol	nucleic acid library construction protocol	nucleic acid library construction protocol	nucleic acid library construction protocol	nucleic acid sequencing protocol	nucleic acid sequencing protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol
Protocol Term Source REF	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO
Protocol Description	"Genomic DNA was prepared from using the Qiagen AllPrep DNA/RNA Mini Kit (Qiagen, Valencia, CA, USA). Please see https://ocg.cancer.gov/programs/target/target-methods for full extraction protocol details."	"Library construction: Specimen processing, DNA extraction, standard QC and Illumina paired-end pre-capture libraries were prepared according to the manufacturer's protocol (Illumina Inc, San Diego, CA) with the following modifications: 0.5 - 1ug genomic DNA in 100 ul volume was sheared into fragments of approximately 300 base pairs in a Covaris E210 system (Covaris, Inc. Woburn, MA). The setting was 10% duty cycle, intensity of 4, 200 cycles per burst for 120 seconds. Fragment size was checked using a 2.2% Flash Gel DNA Cassette (Lonza, Walkersville, MD, Cat. No.57023). End-repair of fragmented DNA was performed in 90 ul total reaction volume containing sheared DNA, 9 ul 10X buffer, 5 ul END Repair Enzyme Mix and H2O (NEBNext End-Repair Module, New England BioLabs, Ipswich, MA, Cat. No. E6050L), incubated at 20 degC for 30 minutes. A-tailing was performed in a total reaction volume of 60 ul containing end-repaired DNA, 6 ul 10X buffer, 3 ul Klenow fragment (NEBNext dA-Tailing Module; Cat. No. E6053L) and H2O followed by incubation at 37 degC for 30 minutes. Illumina multiplex adapter ligation (NEBNext Quick Ligation Module Cat. No. E6056L) was performed in a total reaction volume of 90 ul containing 18 ul 5X buffer, 5 ul ligase, 0.5 ul 100 uM adaptor and H2O at room temperature for 30 minutes. After ligation, PCR with Illumina PE 1.0 and modified barcode primers (manuscript in preparation) was performed in 170 ul reactions containing 85 ul of 2x Phusion High-Fidelity PCR master mix, adaptor ligated DNA, 1.75 ul of 50 uM primers and H2O. PCR was performed using a 5 minute initial denaturation at 95C, 6-10 cycles of 15 seconds at 95 degC, 15 seconds at 60 degC and 30 seconds at 72 degC followed by a final extension for 5 minute at 72 degC. Agencourt XP Beads (Beckman Coulter Genomics, Inc., Danvers, MA, Cat. No. A63882) were used to purify DNA after each enzymatic reaction. After purification, PCR product quantification and size distribution was determined using the Caliper GX 1K/12K/High Sensitivity Assay Labchip (Hopkinton, MA, Cat. No. 760517)."	"Verification of putative SNV and Indel mutation calls from whole exome sequencing was performed using alternative instrumentation and chemistries to avoid any systematic errors inherent to the processes described above. We submitted sample-sites for amplification by PCR followed by analysis on the Ion Torrent sequencing platform. Ion Torrent libraries were constructed using Life Technologies protocols and reagents (Cat. No. 4471269). Briefly, 500 ug of amplicon DNA (143 bp average size) was processed through end-repair, followed by ligation with Ion Xpress Barcode Adapters (1-32; Cat. No. 4471250 and 4474009) and purification with Agencourt XP beads. The resultant libraries were quantified using the Agilent Bioanalyzer 2100 DNA Chip 7500 (Cat. No. 5067-1506) and then pooled for sequencing. Library templates were clonally amplified on Ion Sphere Particles (ISPs) through emulsion PCR and then enriched for template-positive ISPs using manufacturer reagents (Cat. No. 4479878). Approximately 35 million template-positive ISPs were deposited onto each Ion 318C chip (Cat. No. 4484355)."	"Exome capture: Illumina pre-capture libraries (1 ug DNA input) were hybridized in solution to SeqCap EZ Human Exome 2.0 (Nimblegen, Madison, WI) probes targeting approximately 44Mbs of sequence from approximately 30K genes according to the manufacturer's protocol with the following modifications: hybridization enhancing oligos IHE1, IHE2 and IHE3 replaced oligos HE1.1 and HE2.1 and post-capture LM-PCR was performed using 14 cycles. Capture libraries were quantified using Caliper GX 1K/12K/High Sensitivity Assay Labchip (Hopkinton, MA, Cat. No. 760517). The efficiency of the capture was evaluated by performing a qPCR-based quality check on the built-in controls (qPCR SYBR Green assays, Applied Biosystems, Grand Island, NY). Four standardized oligo sets, RUNX2, PRKG1, SMG1, and NLK, were employed as internal quality controls. The enrichment of the capture libraries was estimated to range from 7- to 9-fold over background."	"Library templates were prepared for sequencing using Illumina's cBot cluster generation system with TruSeq PE Cluster Generation Kits (Part no. PE-401-3001). Briefly, these libraries were denatured with sodium hydroxide and diluted to 6-9 pM in hybridization buffer in order to achieve a load density of ~800K clusters/mm2. Each library pool was loaded in a single lane of a HiSeq flow cell, and each lane was spiked with 2% phiX control library for run quality control. The sample libraries then underwent bridge amplification to form clonal clusters, followed by hybridization with the sequencing primer. Sequencing runs were performed in paired-end mode using the Illumina HiSeq 2000 platform. Using the TruSeq SBS Kits (Part no. FC-401-3001), sequencing-by-synthesis reactions were extended for 101 cycles from each end, with an additional 7 cycles for the index read. Sequencing runs generated approximately 300-400 million successful reads on each lane of a flow cell, with approximately 9-10 Gb produced per sample. With these sequencing yields, samples achieved an average of 95% of the targeted exome bases covered to a depth of 20X or greater."	"Sequencing was performed on the Ion PGM platform using the corresponding reagents (Cat. No. 4482002) and the 850 flow (400 bp) run format."	"Real Time Analysis (RTA) software was used to process the image analysis and nucleotide base calling. On average, about 80-100 million successful reads, consisting of 2X 100 bp, were generated on each lane of a flow cell."		"Mapping Reads: Illumina HiSeq bcl files were processed using BCLConvertor v1.7.1. All reads from the prepared libraries that passed the Illumina Chastity filter were formatted into fastq files. The fastq files were aligned to human reference genome build37 (NCBI) using BWA (bwa-0.5.9-R16) with default parameters with the following exceptions: seed sequence: 40 bpseed mismatch: 2, total mismatches allowed: 3. BAM files generated from alignment were preprocessed using GATK (v1.3-8-gb0e6afe) [1] to recalibrate and locally realign reads."	"Verification sequencing results were analyzed employing a highly accurate two-step mapping process. First, verification fastq sequence files are aligned to the human genome reference using BLAT; the top-scoring alignment is reported from the cognate amplicon hits if, and only if that top scoring hit is greater than 90% of the next-best hit. Second, the passing BLAT hits are pair-wise aligned to their respective amplicon sequence using Crossmatch. A passing verification status is assigned if at least 50 reads spanned a sample-site (depth of 50x)."	"Mutation Detection: Sequence variants were called from tumor and matched normal BAM files using Atlas [2] an integrative variant analysis suite of tools specializing in the separation of true SNPs and insertions and deletions (indels) from sequencing and mapping errors in whole exome capture sequencing (WXS) data. The suite implements logistic regression models trained on validated WXS data to identify the true variants. ATLAS-SNP-2 (v1.3) [3] and ATLAS-Indel-2 (v0.3.1) along with Pindel (v0.2.4q) [4] were run on the BAM files producing variant data that were further filtered to remove all those observed fewer than 5 times or were present in less than 0.08 of the reads (e.g., variant allele fraction must be greater than 0.08 to undergo validation). At least one variant read of Q30 or better was required, and the variant had to lie in the central portion of the read (15% from the 5' end of the read and 20% from the 3' end). In addition, reads harboring the variant must have been observed in both forward and reverse orientations. Finally, the variant base was not observed in the normal tissue. Indels were discovered by similar processing except indels must have been observed in at least 10 of the reads."						"Of the 1,131 paired tumor-normal WES data, all except for 23 osterosarcoma pairs exhibit the expected binomial distribution of variant allele fraction for known germline SNPs. The 23 outlier samples were not used for discovery of driver genes due to their abnormal profile of variant allele distribution. They were included only for determining mutation prevalence of the driver genes identified in this study. Somatic SNVs and indels were detected by the Bambino program followed by postprocessing and manual curation as previously described. As 8-oxo-G artifact has been reported previously, we implemented an 8-oxo-G filter following the principles described in the D-ToxoG (http://archive.broadinstitute.org/cancer/cga/dtoxog) algorithm. The filter was tested on the "Exome_Native" neuroblastoma samples known to have 8-oxo-G artifacts and then applied to the variants identified in all WES data set."
Protocol Parameters							Software Versions	Software Versions	Software Versions	Software Versions							
Protocol Hardware					Illumina HiSeq 2000	Ion Torrent PGM											
Protocol Software							Casava	Torrent Suite	bfast;picard tools;raw bam read validator;bwa;MateInfoFixer;picard;samtools;GATk	blat;cross_match;crossmatch2SAM;samtools;picard							
Protocol Contact
SDRF File	TARGET_WT_WXS_20170609.sdrf.txt
Term Source Name	NCBITaxon	NCIt	MO	EFO	OBI
Term Source File	http://www.ncbi.nlm.nih.gov/taxonomy	http://ncit.nci.nih.gov/	http://mged.sourceforge.net/ontologies/MGEDontology.php	http://www.ebi.ac.uk/efo	http://purl.obolibrary.org/obo/obi
Term Source Version
Comment[SRA_STUDY]	SRP012006
Comment[BioProject]	PRJNA89521
Comment[dbGaP Study]	phs000471
