	BROAD TCGA ALGORITHM DESCRIPTION
Last edited: Aug 3, 2012

For the latest description of the algorithms, please see the supplementary
information of the paper at:

 http://www.nature.com/nature/journal/vaop/ncurrent/suppinfo/nature07385.html


1) Invariant Set Median-Polish Values

Protocol Name:    broad.mit.edu:invariantset_medianpolish:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    2
Data File:     *.ismpolish.data.txt

These data are the probe sets' normalized
intensity values.  Firstly, the probes' raw intensity values were
brightness corrected using quantile normalization as described in
Li and Wong et al.'s dChip paper. Then the probe sets were summarized
using a robust median, a median-polishing method described in Bolstad
et al.'s RMA paper.  Both of these steps were executed by a
GenePattern module called SNPFileCreator.


2) Allele-Specific Copy-Numbers

Protocol Name:    broad.mit.edu:byallele_copynumber:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    2
Data File:     *.byallele.copynumber.data.txt

Allele-specific copy numbers were estimated at each of the SNP markers
by subtracting a background term and dividing by a scaling factor. 
The calculation is done in an allele-specific manner. The background 
term for each allele is estimated using the center of the birdseed 
cluster associated with homozygous call of the other allele (for example, 
for allele A we use the A coordinate of the center of the BB cluster). 
The scaling factor is set to the distance between the BB cluster and the AB
cluster for the A allele, or between the AA cluster and the AB cluster
for the B allele.

3) Copy-Numbers

Protocol Name:    broad.mit.edu:raw_copynumber:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    2
Data File:     *.raw.copynumber.data.txt

Raw copy numbers were estimated at each of the SNP and copy-number
(CN) markers by subtracting a background term and dividing by a
scaling factor. The total copy at SNP markers was calculated by summing the 
allele-specific values.
For CN probes we built a model based on an X-dosage experiment
which estimates the background and scaling factor as a function of the
median intensity of the probe across normal samples.
Finally, we divide the total copy number by the average of all normals 
and multiply by 2.  The value is in linear space.

4) Tangent Copy-Numbers

Protocol Name:    broad.mit.edu:tangent_copynumber:Genome_Wide_SNP_6:01
Link:        http://www.ncbi.nlm.nih.gov/geosuppl/?acc=GSE19399&file=GSE19399%5Ftangent%5Fnorm%5Ffiles%2Etar%2Egz
Data Level:    2
Data File:     *.tangent.copynumber.data.txt

The total copynumber is smoothed by first removing outliers, and 
then applying the tangent algorithm.  The value is in linear space, 
and each sample has been centered around 2, as if it were diploid.

Tangent normalization determines normalized copy number values
by calculating the orthogonal distance between each data (in this
case, cancer) sample and the high-dimensional hyperplane defined by a
set of reference samples.  The projection of each sample onto this
hyperplane is equivalent to constructing a 'hypothetical' reference
sample that most closely approximates that data sample; by normalizing
against this 'hypothetical' sample, we normalize away as much of the
variance in signal intensity observed in that sample as can be
explained by linear combination of signal intensity in the reference
set, yielding cleaner copy number values.

5) Segmentation

Protocol Name:    broad.mit.edu:segmented_scna_hg18:Genome_Wide_SNP_6:01,
 broad.mit.edu:segmented_scna_hg19:Genome_Wide_SNP_6:01,
 broad.mit.edu:segmented_scna_minus_germline_cnv_hg18:Genome_Wide_SNP_6:01,
 broad.mit.edu:segmented_scna_minus_germline_cnv_hg19:Genome_Wide_SNP_6:01
Link:        http://www.broad.mit.edu/cancer/software/genepattern/
Data Level:    3
Data File:     *.hg18.seg.txt, *.hg19.seg.txt, *.nocnv_hg18.seg.txt,
               *.nocnv_hg19.seg.txt

The probes are sorted according to genome build.  For SNP probes, the 
position used is the position of the SNP; for copynumber probes, the 
position used is the average of the start and end positions, rounding down.  
The data is then segmented using the CBS (Circular Binary Segmentation) 
algorithm.  The 'minus_germline_cnv' seg files also had a 
fixed set of probes removed prior to segmentation, and are more suitable for 
GISTIC or other analysis for statistical significance.  
The value is base2 log(copynumber/2), centered on 0.

6) Birdseed Genotypes

Protocol Name:    broad.mit.edu:birdseed_genotype:Genome_Wide_SNP_6:01
Link:        https://www.affymetrix.com/support/developer/powertools/index.affx
Data Level:    2
Data File:     *.birdseed.data.txt

Birdseed results are genotype calls produced by the Birdseed algorithm
from the probe sets' intensity values normalized by Invariant Set
Median-Polish algorithm. Initially the normalized values of SNP probe
sets from the normals samples were passed as input to birdseed along
with the 6.0 priors file and special SNPs file.  The clusters,
confidences and calls files were generated.  The Birdseed was run
again this time using the '--clusters' option and using the SNP probe
sets from all samples with the clusters file from the previous normals
run.

