Overview
Antheraea yamamai, also known as the Japanese oak silk moth, is a wild species of silk moth. Silk produced by A. yamamai, referred to as tensan silk, differs drastically from common silk produced from the domesticated silkworm, Bombyx mori. Silk moths can be categorized into two families- Bombycidae and Saturniidae. Saterniidae has been estimated to contain approximately 1,861 species with 162 genera and is known as the largest family in Lepidoptera. Among the many species in family Saturniidae, only a few species, including A. yamamai, can be utilized for silk production. For whole genome sequencing, we selected one male sample (Ay-7-male1) from a breeding line (Ay-7) of A. yamamai raised at the National Academy of Agricultural Science, Rural Development Administration, Korea. A total of 147Gb of genomic data and 76Gb of transcriptomic data was generated for this study. We present the genome sequence of A. yamamai, the first published genome in family Saturniidae, with gene expression data collected from ten different body organ tissues.
Statistics
Genome
A total of 147G base pairs using Illumina and Pacbio sequencing platforms were generated. Approximately 210-fold coverage based on the 700 Mb estimated genome size of A. yamamai. The assembled genome of A. yamamai was 656 Mb(>2kb) with 3,675 scaffolds.
The N50 length of assembly was 739 Kb with 34.07% GC ratio.
Identified repeat elements covered 37.33% of the total genome and the completeness of the constructed genome assembly was estimated to be 96.7% by BUSCO v2 analysis.
Loci
A total of 76Gb of transcriptomic data was generated for this study.
A total of 21,124 genes were identified using Evidence Modeler based on the gene prediction results obtained from 3 different methods (ab initio, RNA-seq based, known-gene based).
Assembly
Before conducting genome assembly, we conducted k-mer distribution analysis using a 350bp paired-end library in order to estimate the size and characteristics of the A. yamamai genome. The 19-mer distribution of A. yamamai genome using a 350 bp paired-end library.
In the 19-mer distribution analysis, the genome size of A. yamamai was estimated to be 709Mb. Next, we conducted error correction on Illumina paired-end libraries using the error correction module of Allpaths-LG before the initial contig assembly process (ALLPATHS-LG , RRID:SCR_010742). After error correction, initial contig assembly with 350bp and 700bp libraries was conducted using SOAP denovo2 with the parameter option set at K=19; this approach showed the best assembly statistics compared to other assemblers and parameters (SOAPdenovo2 , RRID:SCR_014986).
At each scaffolding step, SOAP Gapcloser[21] with -l 155 and -p 31 parameters was repeatedly used to close the gaps within each scaffold.
After scaffolding was performed using SSPACE-LongRead with Illumina synthetic long read data, the total number of assembled scaffolds was effectively reduced from 398,446 to 24,558. The average scaffold length was also extended from 1.7 Kb to 24.8 Kb. However, there was no impressive improvement in N50 length (approximately 91 Kb to 112 Kb) of assembled scaffolds.
After final scaffolding processing using Pacbio long reads, the number of scaffolds was reduced to 3,675 and N50 length was effectively extended from 112 Kb to 739 Kb.
Gene Prediction
Three different algorithms were used for gene prediction of the A. yamamai genome: ab initio, RNA-seq transcript based, and protein homology-based approaches.
For RNA-seq transcript based prediction, generated transcriptome data from ten organ tissues of A. yamamai were aligned to the assembled genome and gene information was predicted using Cufflinks[44](Cufflinks , RRID:SCR_014597). The longest CDS sequences were identified from Cufflinks results using Transdecoder. For the homology-based approach, all known genes of order Lepidoptera in the NCBI database were aligned using PASA. The final gene set of A. yamamai genome contains 21,124 genes.
The average gene length was 8,331 bp with a 38.76% GC ratio and the number of exons per gene was 4.44. To identify the function of predicted genes, Swiss-Prot, Uniref100, NCBI NR database, and gene information of B. mori and D. melanogaster was employed for sequence similarity search using blastp.
Contacts
Seong-Ryul Kim (email : ksr319@korea.kr)
Seong-Wan Kim (email:tarupa@korea.kr)
Reference Publication
Kim SR, Kwak W, Kim H, Caetano-Anolles K, Kim KY, Kim SB, Choi KH, Kim SW, Hwang JS, Kim M, Kim I, Goo TW5 Park SW. Genome sequence of the Japanese oak silk moth, Antheraea yamamai: the first draft genome in the family Saturniidae. Gigascience. 2018 Jan 1;7(1):1-11. doi: 10.1093/gigascience/gix113.
|