MAGE-TAB Version	1.1
Investigation Title	TARGET: Kidney - Wilms Tumor (WT) WGS
Experimental Design	disease state design
Experimental Design Term Source REF	EFO
Experimental Factor Name
Experimental Factor Type
Experimental Factor Term Source REF
Person Last Name	NCI Office of Cancer Genomics (OCG)	NCI Center for Biomedical Informatics and Information Technology (CBIIT)	Gadd	Perlman	Ma	Zhang	Ma	Novik
Person First Name			Samantha	Elizabeth	Xiaotu	Jinghui	Yussanne	Karen
Person Mid Initials			L	J			P	L
Person Email	ocg@mail.nih.gov	ncicbiit@mail.nih.gov	sgadd@luriechildrens.org	eperlman@luriechildrens.org	xiaotu.ma@stjude.org	jinghui.zhang@stjude.org	yma@bcgsc.ca	knovik@bcgsc.ca
Person Phone	+1 301 451 8027	+1 888 478 4423	+1 773 755 6392	+1 312 227 3967	+1 901 595 3774	+1 901 595 6829	+1 604 707 5800 Ext 6082	+1 604 707 8000 Ext 7983
Person Fax	+1 301 480 4368				+1 901 595 7100	+1 901 595 7100	+1 604 876 3561	+1 604 675 8178
Person Address	31 Center Dr, Rm 10A07, Bethesda, MD 20892	9609 Medical Center Dr, Rockville, MD 20850	2430 N Halsted St, Room C366 Chicago, IL 60614	225 E Chicago Ave, Chicago, IL 60611	262 Danny Thomas Place, Memphis, TN 38105	262 Danny Thomas Place, Memphis, TN 38105	Suite 100-570 West 7th Ave, Vancouver, BC Canada V5Z 4S6	675 West 10th Ave Vancouver, BC Canada V5Z 1L3
Person Affiliation	National Cancer Institute	National Cancer Institute	Lurie Children's Hospital of Chicago Research Center	Ann & Robert H. Lurie Children's Hospital of Chicago	St Jude Children's Research Hospital	St Jude Children's Research Hospital	BC Cancer Agency Canada's Michael Smith Genome Sciences Centre	BC Cancer Agency Canada's Michael Smith Genome Sciences Centre
Person Roles	funder;investigator	data coder;curator	investigator;data analyst	investigator	investigator;data analyst;submitter	investigator;data analyst	investigator;data analyst;submitter	investigator
Person Roles Term Source REF	EFO;EFO	EFO;EFO	EFO;EFO	EFO	EFO;EFO;EFO	EFO;EFO	EFO;EFO;EFO	EFO
Quality Control Type
Quality Control Term Source REF
Replicate Type
Replicate Term Source REF
Normalization Type
Normalization Term Source REF
Date of Experiment
Public Release Date
PubMed ID
Publication DOI
Publication Author List
Publication Title
Publication Status
Publication Status Term Source REF
Experiment Description	"There are 130 fully characterized patient cases with high risk Wilms tumor (all tumor/normal pairs; 8 with additional samples for analysis = 3 with tumor adjacent normal, 5 with relapse sample) that will make up the TARGET WT dataset. Each case will have gene expression, tumor and paired normal copy number analyses, methylation and whole genome sequencing; a subset of WT cases with mRNA-seq, miRNA-seq, and whole exome sequencing data available as well. All cases can be sorted according to data type via the Case Matrix on the TARGET Data Matrix. Please visit the TARGET website listed above for additional information on this and other TARGET genomics projects. Please see the TARGET Publication Guidelines at the OCG websitefor updated details on sharing of any TARGET substudy data."
Protocol Name	nationwidechildrens.org:Protocol:DNA-Extraction-Qiagen-AllPrep:01	bcgsc.ca:Protocol:WGS-LibraryPrep-Illumina:01	completegenomics.com:Protocol:WGS-LibraryPrep-CGI:01	bcgsc.ca:Protocol:WGS-Sequence-Illumina-HiSeq2000:01	completegenomics.com:Protocol:WGS-Sequence-CGI-CGI:01	bcgsc.ca:Protocol:WGS-BaseCall-Illumina:01	completegenomics.com:Protocol:WGS-BaseCall-CGI:01	bcgsc.ca:Protocol:WGS-ReadAlign-BWA-Picard:01	completegenomics.com:Protocol:WGS-ReadAlign-CGI:01	bcgsc.ca:Protocol:WGS-VariantCall:01	completegenomics.com:Protocol:WGS-CnvSegment-CGI:01	completegenomics.com:Protocol:WGS-Circos-CGI:01	completegenomics.com:Protocol:WGS-Junction-CGI:01	completegenomics.com:Protocol:WGS-VariantCall-CGI:01	completegenomics.com:Protocol:WGS-Vcf2Maf-CGI:01	completegenomics.com:Protocol:WGS-FilterSomatic-CGI:01	completegenomics.com:Protocol:WGS-HigherLevelSummary-CGI:01	stjude.org:Protocol:WGS-StructVariant-CGI:01	stjude.org:Protocol:WGS-CnvSegment-CONCERTING-CGI:01	stjude.org:Protocol:WGS-VariantCall-CGI:01
Protocol Type	nucleic acid extraction protocol	nucleic acid library construction protocol	nucleic acid library construction protocol	nucleic acid sequencing protocol	nucleic acid sequencing protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol	data transformation protocol
Protocol Term Source REF	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO	EFO
Protocol Description	"Genomic DNA was prepared from using the Qiagen AllPrep DNA/RNA Mini Kit (Qiagen, Valencia, CA, USA). Please see https://ocg.cancer.gov/programs/target/target-methods for full extraction protocol details."	"DNA quality was assessed by spectrophotometry (260/280 and 260/230) and gel electrophoresis before library construction. Depending on the availability of DNA, between 2 and 10ug was used in WGSS library construction. Briefly, DNA was sheared for 10 min using a Sonic Dismembrator 550 with a power setting of "7" in pulses of 30 seconds interspersed with 30 seconds of cooling (Cup Horn, Fisher Scientific, Ottawa, Ontario, Canada), and analyzed on 8% PAGE gels. The 200-300bp DNA fraction was excised and eluted from the gel slice overnight at 4 degrees Celsius in 300 ul of elution buffer (5:1, LoTE buffer (3 mM Tris-HCl, pH 7.5, 0.2 mM EDTA)-7.5 M ammonium acetate), and was purified using a Spin-X Filter Tube (Fisher Scientific), and by ethanol precipitation. WGSS libraries were prepared using a modified paired-end protocol supplied by Illumina Inc. (Illumina, Hayward, USA). This involved DNA end-repair and formation of 3' A overhangs using Klenow fragment (3' to 5' exo minus) and ligation to Illumina PE adapters (with 5' overhangs). Adapter-ligated products were purified on Qiaquick spin columns (Qiagen, Valencia, CA, USA) and PCR-amplified using Phusion DNA polymerase (NEB, Ipswich, MA, USA) and 10 cycles with the PE primer 1.0 and 2.0 (Illumina). PCR products of the desired size range were purified from adapter ligation artifacts using 8% PAGE gels. DNA quality was assessed and quantified using an Agilent DNA 1000 series II assay (Agilent, Santa Clara CA, USA) and Nanodrop 7500 spectrophotometer (Nanodrop, Wilmington, DE, USA) and DNA was subsequently diluted to 10nM. The final concentration was confirmed using a Quant-iT dsDNA HS assay kit and Qubit fluorometer (Invitrogen, Carlsbad, CA, USA)."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."		"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."		"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"Illumina paired-end whole genome sequencing reads were aligned to the hg19 reference using BWA version 0.5.7. This reference contains chromosomes 1-22, X, Y, MT, 20 unlocalized scaffolds and 39 unplaced scaffolds. Multiple lanes of sequences were merged and duplicated reads were marked with Picard Tools."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."		"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"Complete Genomics Inc. standard protocol, please see CGI READMEs for details."	"MAF files containing structural variants identified by CGI were downloaded from the TARGET Data Matrix and filtered to remove germline rearrangements and low confidence somatic calls. Germline variant databases used for filtering included the Database of Genomic Variants (DGV), dbSNP, PCGP, and also recurrent germline rearrangements from the downloaded MAF files. Rearrangements where both breakpoints fall into gap regions in the human genome (hg19) were also excluded. To filter out low confidence rearrangements, a BLAT search was performed on the assembled sequence for each rearrangement, and those that could be fully mapped (>90% similarity to the reference genome) were excluded. We further required each variant to have an assembled contig length of at least 10 bp on each breakpoint. Since copy number alterations were highly coupled with rearrangement events, to avoid over-filtering we also integrated the copy number alterations into the SV analysis. Briefly, breakpoints from CNV analysis were matched to those detected in SVs, using a window size of 5kb. Those rearrangements with possible CNV support were rescued after manual curation. Of the 1,011,810 putative CGI SVs, 3,265 passed these filters. Experimental verification using 14 CGI diagnosis-remission-relapse trio samples from a previous publication6 showed a validation rate of 78% as 79 out of the 101 SVs were experimentally verified by targeted capture sequencing."	"We adapted the CONSERTING algorithm to detect copy number alterations from CGI whole genome sequencing data. Briefly, the germline single nucleotide polymorphisms (SNPs) reported by CGI in the MAF files were extracted, with recurrent paralogous variants (identified from the 625 germline whole genome sequencing data generated by the St Jude Pediatric Cancer Genome Project) removed. The read counts of the SNPs were then used to construct a coverage file by taking the mean of all SNPs within a sliding window of 100bp. The coverage difference between tumor and normal samples were then used as the input for CONSERTING. To detect loss-of-heterozygosity (LOH), we used SNPs that have variant allele fraction (VAF) in normal within an interval of (0.4, 0.6) and have >15X coverage in both tumor and normal samples. For these SNPs, the allelic imbalance (AI), defined as |Tumor_VAF-0.5|, was used as the input for CONSERTING to detect LOH. Regions with concomitant copy number changes (log ratio>0.2 or log ratio<-0.2) and/or LOH (AI>0.1) were subjected to manual review. Finally, regions with length <2Mb that passed manual review were considered to be focal changes and included in the GRIN analysis to determine the significance of the somatic alterations. We compared the MYCN amplification status derived from CONSERTING with that of the original CGI analysis to evaluate the accuracy of the recalled CNVs. A subset of 32 NBL tumors carried a clinically-validated high-amplitude amplification of MYCN, which is a known oncogenic driver in pediatric neuroblastoma. While CGI?s HMM CNA model only reported MYCN amplifications in 15 out of these 32 tumors. CONSERTING successfully identified high-amplitude amplifications in 31 tumors. For the NBL (PASJZC) with a negative finding of MYCN amplification by CONSERTING, a follow-up review of the initial diagnosis data indicated that this discrepancy could be explained by tumor heterogeneity and tumor material sampling bias. Moreover, two additional subclonal MYCN amplification events were predicted in the remaining tumor samples (PARACM, PATHVK). These results demonstrate that CONSERTING achieved higher sensitivity over the original CGI analysis. For osteosarcoma, CNA analysis was limited to the TP53 locus but not the other regions due to the excessive number of rearrangements caused by chromothripsis in this cancer histotype."	"Putative somatic SNVs and indels were extracted from MAF files downloaded from the TARGET Data Matrix and run through a 3-step filter to remove germline, low-confidence and paralog variations. In the first step, the following data sets were used for filtering germline variants: 1) NLHBI Exome Sequencing Project (http://evs.gs.washington.edu/EVS/); 2) dbSNP build 132 (https://www.ncbi.nlm.nih.gov/projects/SNP/); 3) St. Jude/Washington University Pediatric Cancer Genome Project (PCGP), and 4) germline variants present in >= 5 cases in TARGET CGI WGS data. In the second step, a variant will be considered low-confidence unless it meets the following criteria: 1) at least 3 more reads support the mutant allele in the tumor sample than in the normal sample; 2) the mutant read count in tumor is significantly higher than in the matched normal (P<0.01 by Fisher's Exact test); and 3) mutant allele fraction in normal is below 0.05. In the third step, we ran a BLAT search3 using a template sequence consisting of the mutant allele and its 20-bp flanking region to determine whether or not the mutation was uniquely mapped. Because pathogenic germline variants may overlap with oncogenic somatic mutations, we implemented a "rescue" pipeline to avoid over-filtering. All putative somatic variants were first re-annotated using a customized AnnoVar pipeline (Edmonson et al, unpublished) and performed variant classification using Medal_Ceremony. Variants assigned "Gold" by medal ceremony are those matching known mutation hotspots, or truncation mutations in tumor suppressor genes. These were "rescued" and merged with the filtered variants for each gene and the results further curated using our visualization program ProteinPaint (https://pecan.stjude.org/proteinpaint/study/pan-target). The filtering process reduced the original 51 million SNVs and 38 million indels from the CGI MAF files to a set of ~700,000 SNVs and 58,000 indels. Of these, 9,397 SNVs and 1,000 indels are in protein coding regions. We tested the filter on 14 diagnosis-remission-relapse trio samples that were analyzed by both CGI and WES. Of the 661 CGI SNVs passing the filter, 580 (88%) were verified by WES while the indel verification rate is 67% (48/72). Notably, all 53 variants (45 SNVs and 8 indels) on the driver genes identified in this study were cross-validated by WES."
Protocol Parameters						Software Versions														
Protocol Hardware				Illumina HiSeq 2000	Complete Genomics															
Protocol Software						Illumina RTA														
Protocol Contact
SDRF File	TARGET_WT_WGS_20170609.sdrf.txt
Term Source Name	NCBITaxon	NCIt	MO	EFO	OBI
Term Source File	http://www.ncbi.nlm.nih.gov/taxonomy	http://ncit.nci.nih.gov/	http://mged.sourceforge.net/ontologies/MGEDontology.php	http://www.ebi.ac.uk/efo	http://purl.obolibrary.org/obo/obi
Term Source Version
Comment[SRA_STUDY]	SRP012006
Comment[BioProject]	PRJNA89521
Comment[dbGaP Study]	phs000471
