Overview
We sequenced a reference strain of H. marmoreus (Haemi 51987–8). We evaluated various assembly strategies, and as a result the Allpaths and PBJelly produced the best assembly. The resulting genome was 42.7 Mbp in length and annotated with 16,627 gene models. A putative gene (Hypma_04324) encoding the antifungal and antiproliferative hypsin protein with 75% sequence identity with the previously known N-terminal sequence was identified. Carbohydrate active enzyme analysis displayed the typical feature of white-rot fungi where auxiliary activity and carbohydrate-binding modules were enriched. The genome annotation revealed four terpene synthase genes responsible for terpenoid biosynthesis. From the gene tree analysis, we identified that terpene synthase genes can be classified into six clades. Four terpene synthase genes of H. marmoreus belonged to four different groups that implies they may be involved in the synthesis of different structures of terpenes. A terpene synthase gene cluster was well-conserved in Agaricomycetes genomes, which contained known biosynthesis and regulatory genes.
Statistics
Genome
Approximately 42.71 Mb arranged in 235 scaffolds
Approximately 42.68 Mb arranged in 278 contigs (~0.06% gap)
Scaffold N50 (L50) = 17 (764.8 kbp)
Contig N50 (L50) = 20 (621.3 kbp)
83 scaffolds larger than 50 Kbp, with 96.68% of the genome in scaffolds larger than 50 Kbp
Assembly
We selected the Allpaths+PBJelly assembly for further analyses. The final assembly had a size of 42,710,661 bp including 235 scaffolds/278 contigs with 287.3× sequence coverage. The GC percentage was 49.64%. We estimated the genome size as 43.0 Mbp using the k-mer frequency calculation of Illumina paired-end reads.
NCBI GenBank Recoreds
Release Date: 25/7/2018 BioProject: PRJNA312409 Accession ID: LUEZ00000000
Is it complete?
Genome completeness was calculated using BUSCO v3.0 at the gene level. Only 5 of 1335 single-copy entries were missing, indicating >99% genome completeness. Its RNA-seq reads were mapped into the genome, where 97.27% of the reads were aligned.
Is it accurate?
The transcriptome assembly is aligned with the genome with >99% identity. Also, 97.35% of predicted genes are complete, implying accurate genome assembly without sequencing or assembly error.
What about polyploidy?
It's haploid genome.
Gene prediction
Using the FunGAP pipeline (Min, 2017), we predicted 16,627 protein-coding genes with an average size of 1586.1 nt. Of these protein-coding genes, 14,179 genes (85.3%) were supported by assembled transcripts, and this included
10,522 (63.3%) highly supported genes (> 90% coverage). The quality of the gene prediction was evaluated by comparing the predictions of three programs inside the FunGAP pipeline: Augustus 3.2.1 (Stanke, 2005), Braker 1.8 (Hoff, 2016), and Maker 2.31.8 (Cantarel, 2008). Approximately half of the predicted genes were functionally annotated; in total, 7786 genes (46.8%) were annotated using Pfam domains, and 7447 genes (44.8%) were annotated using SwissProt. The dominant functions included WD, F-box, protein kinase, cytochrome P450, and major facilitator superfamily domains, similarly as observed in other mushroom genomes (Gupta, 2018, Yuan, 2017). The genome contained 1793 genes encoding secreted proteins. We identified 1262 noncoding RNA elements containing 171 tRNAs, including 9 selenocysteine tRNAs, 191 small nucleolar RNAs (snoRNAs) from 127 different families, and 224 microRNAs from 90 different families.
References:
Min B, Grigoriev IV, Choi IG. FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation. Bioinformatics (Oxford, England). 2017;33(18):2936–7. 24. Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 2005; 33(Web Server):W465–7.
Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics (Oxford, England). 2016;32(5):767–9.
Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, Holt C, Sanchez Alvarado A, Yandell M. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18(1):188–96.
Gupta DK, Ruhl M, Mishra B, Kleofas V, Hofrichter M, Herzog R, Pecyna MJ, Sharma R, Kellner H, Hennicke F, et al. The genome sequence of the commercially cultivated mushroom Agrocybe aegerita reveals a conserved repertoire of fruiting-related genes and a versatile suite of biopolymerdegrading enzymes. BMC Genomics. 2018;19(1):48.
Yuan Y, Wu F, Si J, Zhao YF, Dai YC. Whole genome sequence of Auricularia heimuer (Basidiomycota, Fungi), the third most important cultivated mushroom worldwide. Genomics. 2017. https://doi.org/10.1016/j.ygeno. 2017.12.013
Contacts
jkim5aug@korea.kr; igchoi@korea.ac.kr
Reference Publication(s)
Genomic discovery of the hypsin gene and biosynthetic pathways for terpenoids in Hypsizygus marmoreus. BMC Genomics 19:789 (2018)
|