Publication

Please cite the publication for this reference sequence:

Martínez‐García PJ, Crepeau MW, Puiu D, Gonzalez‐Ibeas D, Whalen J, Stevens KA, Paul R, Butterfield TS, Britton MT, Reagan RL, Chakraborty S. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of nonstructural polyphenols. The Plant Journal. 2016 May 1.

Browse the genome and annotation on JBrowse.

Download

Raw data can be found in NCBI under bioproject PRJNA291087.

These downloads are linked directly to the data hosted at TreeGenes.

Genomic scaffolds

The genome includes 186,636 scaffolds (> 100 bp), with a total length of 713 Mbp.

Genomic scaffolds

Gene models

32,496 gene models were developed from the genomic scaffolds.

Genomic coordinates
CDS nucleotide sequences
CDS protein sequences

These models were further classified into three categories:

High quality, full length genes (16852 sequences):
Genes which are multi-exonic or monoexonic, complete (have start and stop codon), supported by expression data (walnut transcriptome) or protein evidence, and annotated with at least one protein domain

Genomic coordinates
CDS nucleotide sequences
CDS protein sequences

High quality, partial genes (8782 sequences):
Genes which are multi-exonic, partial (they lack start and/or stop codon), supported by expression data (walnut transcriptome) or protein evidence, and annotated with at least one protein domain

Genomic coordinates
CDS nucleotide sequences
CDS protein sequences

Low quality genes (6862 sequences):
Genes which are multi-exonic or monoexonic, complete or partial, not supported by expression data (walnut transcriptome) or protein evidence, or not annotated with at least one protein domain

Genomic coordinates
CDS nucleotide sequences
CDS protein sequences

Gene Models - Functional Annotation

Functional annotation includes InterPro accessions and Gene Ontology terms. 30,843 gene models have annotations

Annotations
Log file and annotation statistics

Transcriptome Assembly

A combined de novo assembly of 19 tissues (Illumina GA IIx 75bp PE sequence) yielded 29,785 unigenes.

CDS sequences
Peptide sequences

Transcriptome Assembly - Functional Annotation

Functional annotation includes BLAST results, InterPro accessions and Gene Ontology terms. 25,373 unigenes have annotations.

Annotations
Log file and annotation statistics

miRNA loci

205 high quality miRNAs were identified.

Nucleotide sequence precursor
Genomic coordinates

64 low quality miRNAs were identified.

Nucleotide sequence precursor
Genomic coordinates

Repeats

Repeat elements were determined via de novo and similarity based techniques. 811 de novo elements were identified using RepeatModeler and 2,009 repeat elements from Repbase were identified via similarity search.

Repeat sequences
Genomic coordinates

Repeat Statistics:
Percentage of genome covered by interspersed repeat elements: 51.33% (50.38% interspersed + 0.95% tandem repeats)

Retrotransposons
Gypsy 8.40%
Copia 6.57%
Caulimovirus 0.52%
Other LTR Retrotransposons 7.09%
L1 7.60%
R1 0.01%
non-LTR SINE 0.29%
non-LTR LINE 2.51%
other non-LTR 0.06%
Penelope 0.01%
other retrotransposons 1.75%
DNA transposons
hAT 2.08%
EnSpm 2.23%
MuDR 0.63%
Helitron 0.68%
Harbinger 0.71%
Other DNA 8.02%
rRNA 0.58%
Unknown 0.6%
Other 0.01%

Complete Genome Annotation

This combined annotation file includes gene models, repeats and miRNAs.

genomic coordinates
UTK Logo
NSF Logo