NABIC

Home > Portal > Genome > Introduction info

Introduction info

Open genome list

Overview

The Perilla citriodora genome project was initiated by the National Agricultural Genome Project (NAGP) of Rural Development Administration (RDA). Through the sequence production and assembly using Illumina and Pacbio, mass production of the genomic sequence of diploid perilla (Perilla citriodora) was performed, and the quality of the assembly was confirmed by BAC and Illumina short reads. Subsequently, the sequence of the assembly was reconfirmed and analyzed by Hi-C grouping, and the diploid perilla genome size was estimated by k-mer analysis. For De novo assembly using mass-produced nucleotide sequences, contigs were scaffolded with Illumina mate pair based on Pacbio contigs. The contigs were separated and corrected by Hi-C analysis to determine the pseudomolecules of diploid perilla. The final gene was predicted using the pseudomolecule level, and the assembly level was confirmed by BLAST2GO and BUSCO analysis.

Statistics

Genome

Approximately 677.6 Mb arranged in 1,622 scaffolds

Approximately 676.0 Mb arranged in 2,802 contigs

Scaffold N50 (L50) = 12.3 Mbp

Contig N50 (L50) = 1,672.9 Kbp

78 scaffolds larger than 1Mbp, with 95.6% of the genome (1,622 scaffolds)

10 assembled pseudomolecules = 638.7 Mbp

Protein-coding gene

43,175 protein-coding gene, 43,664 transcripts and 36,015 CDS have been predicted

 

Sequencing, Assembly, and Annotation

NGS whole genome sequencing methodology

Considering the advantages of each NGS equipment, it produces high-quality data through a total of three types of sequencers (HiSeq 2500, MiSeg, HiSeq 2000), and also adjusts the insert length in various ways to produce a total of 36 libraries and totals from them. 1,198.9Gb was produced, and the size of the diploid perilla (P. citriodora) predicted through k-mer analysis was estimated to be approximately 650 Mbp. First, Miseq reads were merged using FLASH, contig assembly was performed with Miseq reads, and scaffolding was performed with Hiseq mate paired reads.

Pacbio analysis was performed to overcome the limitations of Illumina short read decoding using long reads and to improve scaffold quality. The average read quality after sequence trimming value was 0.83, the read length was 11,855bp, the read base was 49,142,181,846bp, the number of reads was 4,145,022 and the N50 value of the reads was 16,794bp. In the sequence analysis of P. citriodora, the subread base was 48,959,646,854bp, the number of subreads was 7,318,606 and the average subread length was 6,689bp.

Hi-C library and raw sequence data

Hi-C library was made by diploid perilla leaflets, and Bioanalyzer was used for library QC. The completed library was about 200-400 bp in size and showed an intermediate value of about 300 bp. The completed Hi-C library performed 80bp paired end sequencing twice using HiSeq2500 and produced about 112.7 million paired end raw data (225.3 million reads). Total 18 Gb raw data were made and used. Raw data QC analyzed per base sequence quality, per sequence quality, per sequence GC content, and sequence length distribution using FastQC program.

Clustering

The first genome of diploid perilla was produced by scaffolding with Illumina mate-pair data based on PacBio assembly, and consists of a total of 1622 scaffolds and 2802 contigs. Based on the above draft assembly results, Hi-C raw data was mapped and contig clustering was attempted using the LACHESIS program.

Of the 2802 contigs, 2098 were clustered to10 Hi-C groups, with 666.6Mb of total length. This corresponds to 98.6% of the contig 2802 total base length of 676,030,824 bp, and 74.9% of the contig number. 704 contigs, which account for about 25%, were judged not to be clusters due to the small assembly length, and the average length of contigs was about 14 kb.

Ordering/orientation

The clustered contigs were processed through order / orientation to produce pseudomolecules for each cluster. Finally, 2,065 contigs were formed through order / orientation to form a scaffold of 10 chromosome units, which included 98.4% of contigs compared to the number of clustered contigs, and ~ 100% of length compared to the total (compared to the total number of input contigs). 73.4%, 98.6% of length). 33 contigs were clustered but order / orientation failed, and 737 contigs were excluded from the Hi-C final assembly if 704 clusters were included. Its total length is 10 Mb, which is less than 1.5% of the total assembly. In order to confirm the sequence of pseudomolecule, it was found that more than 99.8% of 10 BAC full insert sequences were identical and also about 90% of pseudomolecule QC was confirmed by insert size prediction using paired BAC end sequence.

Gene prediction

For protein-coding gene prediction analysis, Seqping was used using major transcript sequence, perilla de novo repeat sequence generated by repeatModeler, plant refseq protein sequence of NCBI, and Gypsy Database (GyDB). As a result, 43,175 genes, 43,664 transcripts, and 36,015 CDS were predicted.

Pseudomolecule construction

10 pseudomolecules of diploid perilla were constructed through Hi-C analysis and genetic map.

 

Contacts

Tae-Ho Kim (Email: thkim@rda.go.kr)

Myoung-Hee Lee (Email: emhee@korea.kr)

Jeong-Hee Lee (Email: jhlee@seeders.co.kr)

Hong-Il Ahn (Email: ahi0101@korea.kr)

Sun-Hwa Bae (Email: bae209@korea.kr)

 

 

Reference Publication(s)

Yun-Joo Kang, Bo-Mi Lee, Moon Nam, Ki-Won Oh, Myoung-Hee Lee, Tae-Ho Kim, Sung-Hwan Jo & Jeong-Hee Lee, Identification of quantitative trait loci associated with flowering time in perilla using genotyping by sequencing, Molecular Biology Reports, 2019; 46:4397-4407

Myoung Hee Lee, Ki Won Oh, Myung Sik Kim, Sung Up Kim, Jung In Kim, Eun Young Oh, Suk Bok Pae, Un Sang Yeo, Tae-Ho Kim, Jeong Hee Lee, Chan Sik Jung, Do Yeon Kwak, and Yong Chul Kim, Detection of QTLs in an Interspecific Cross between Perilla citriodora × P. hirtella Mapping Population, Korean J. Breed. Sci., 2018; 50(1):13-20

Kyeong-Seong Cheon, In-Seon Jeong, Kyung-Hee Kim, Myoung-Hee Lee, Tae-Ho Lee, Jeong-Hee Lee, Ung-Han Yoon, Romika Chandra, Ye-Ji Lee, Tae-Ho Kim, Comparative SNP Analysis of Chloroplast Genomes and 45S nrDNAs Reveals Genetic Diversity of Perilla Species, Plant Breed. Biotech., 2018; 6(2):125-139

Ji-Su Mo, Kyunghee Kim, Myoung Hee Lee, Jeong-Hee Lee, Ung-Han Yoon & Tae-Ho Kim, The complete chloroplast genome sequence of Perilla citriodora (Makino) Nakai Mitochondrial DNA Part A, 2017; 28(1): 131–132

Ji-Eun Kim, Junkyoung Choe, Woo Kyung Lee, Sangmi Kim, Myoung Hee Lee, Tae-Ho Kim, Sung-Hwan Jo, Jeong Hee Lee, De novo gene set assembly of the transcriptome of diploid, oilseed-crop species Perilla citriodora, J Plant Biotechnol, 2016; 43:293–301


Open genome list