The file REFERENCE.fa contains two types of sequences: BACKGROUND and CROSSJUNCTION. 

BACKGROUND
Background sequences correspond to annotated transcripts. The header format is as follows:

<individual number>_REFERENCE_<gene_id>_<chromosome>_<strand>_nan_BACKGROUND_REFERENCE_<transcript id>_<number of exons>_<exon_1 start>_<exon_1 stop>_..._<exon_n start>_<exon_n stop>


CROSSJUNCTION
Crossjunction sequences correspond to the intron-spanning polypeptides described in the manuscript, i.e., peptides generated by translating the pair of exons connected by the intron with respect to a certain reading-frame.
The header format is as follows:

<individual number>_REFERENCE_<gene id>_<chromosome>_<strand>_nan_CROSSJUNCTION_REFERENCE_<1/number of reading frames spanning the junction>_<upstream exon start>_<upstream exon stop>_<downstream exon start>_<downstream exon stop>

NOTE
Polypeptide sequences consisting of Xs only should be ignored. 
