The Smart Bacterial Completed Map refers to first assembling the genome with high-accuracy second-generation data (Illumina or BGI-seq), during the assembly process, Nanopore long sequences are used to cover the branched points and connect them into a completed map. Finally, the second-generation data is used for error correction. This strategy can quickly obtain high-quality bacterial genome complete maps, and the error rate is reduced to the 0.001% level.
Advantages of Smart Bacteria Completed Map
Faster: complete analysis within one week after sequencing.
More economical: Get more result with less cost.
More accurate: The second-generation data determines the accuracy, and the error rate is as low as 0.001%.
No preference: no GC preference, evenly coverage of the whole genome, high GC/low GC genome preferred.
Application direction
Pathogenic microorganisms: human infection, animal infection, plant infection, etc.
Industrial direction: antibiotic production, yogurt fermentation, energy utilization, etc.
Environmental science: microbial resource utilization, heavy metal treatment, foreign defense control, extreme environmental microorganisms and other
Paper recommend
Nicola DM, Liam PS, et al. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes [J]. Microbial Genomics, 2019.
Michelle T S , Settlage R E , Lahmers K K , et al. Fusobacterium Genomics Using MinION and Illumina Sequencing Enables Genome Completion and Correction[J]. Msphere, 2018, 3(4).
Margaret M C L , Wyres K L , Duchêne Sebastian, et al. Population genomics of hypervirulent Klebsiella pneumoniae clonal group 23 reveals early emergence and rapid global dissemination[J]. Nature Communications, 2018.
common problem
(1) What is Smart Bacteria CompleteMap?
The Smart Bacteria Complete Map is a new technology independently developed by Wuhan Benagen Technology Co., Ltd., which based on the joint assembly of Nanopore and next-generation sequencing. It cleverly uses the advantages of high accuracy of next-generation sequencing data and long read length of Nanopore, to avoid the problem of high error rate of long sequences.
Smart bacteria completed map is firstly assembled with high-accuracy second-generation data to obtain high-quality contigs, and then used Nanopore data (why not PacBio data, because it is not long enough) to connect contigs into completed maps, and finally use second-generation data to correction. Successfully solved the problems of low quality of Nanopore sequence data and difficult assembly of second-generation short sequences, and obtained a high-quality bacterial genome completed map. The accuracy of this strategy is determined by the second-generation data, and at the same time, it can avoid sequence contamination introduced by Nanopore data, and quickly obtain high-quality bacterial genome completed maps, with the error rate reduced to 0.001%.
(2) Can the completed map of Smart bacterial genome be assembled into plasmids?
There are generally some plasmids in bacterial cells. The Smart bacterial genome completion map can assemble some plasmid information. However, due to the length of the library and the limitation of the number of plasmids, it is not guaranteed that all plasmids are completely assembled.
(3) Why did Smart bacteria completed mapchoose the Nanopore+ second-generation sequencing strategy?
Limited by the length of the sequenced fragments, bacterial genome sequences usually require the use of software algorithms to splicing a large number of sequenced fragments, and the existence of repetitive sequences in the bacterial genome will greatly increase the complexity of splicing. The size of bacterial repeats ranges from several hundreds bp to 7 Kb. Use the second-generation data for assembly to obtain high-quality contigs, and then use Nanopore data (why not PacBio data, because it is not long enough) to connect the contigs into a completed map, and then Finally, use the second generation for error correction. Successfully solved the problems of low quality of Nanopore sequence data and difficult assembly of second-generation short sequences, and obtained a high-quality bacterial genome completed map.
(4) How are ribosomal rDNA genes predicted in bacterial genomes?
There are usually two methods for predicting ribosomal rDNA genes in bacterial genomes: one is de novo prediction based on the structural features of rDNA sequences, and the other is homology prediction using closely related rDNA sequences. The former prediction is more accurate, but requires a complete rDNA structure in assembly. In the assembly results of the frame map and part of the fine map, there may be cases where the rDNA region is not assembled completely and is distributed in multiple scaffolds, which may lead to the situation that the rDNA cannot be predicted by the de novo sequencing method. If you want to obtain a more complete prediction result, you can provide the closely related rDNA sequence in advance and use the homology prediction method to improve the prediction effect.
(5) Detecting the variation of new strains can be achieved by comparing genomes or resequencing. How should we choose?
Both comparative genome and resequencing methods can detect and obtain the variation information of new strains, but there are some differences in the application of the two methods. In general resequencing methods, due to the limitation of sequence read length, in order to ensure the accuracy of the alignment, generally only the alignment results with a similarity of more than 95% are retained. For regions with large variation, the detection effect is often unsatisfactory. Due to the rapid evolution rate of bacteria, except for individual strains obtained by mutagenesis, there are regions with large variation in different contents. Generally speaking, the detection effect of variation of comparative genomes will be more comprehensive. Considering that the comparative genome based on bacterial completion map does not increase the cost of analysis then bacterial resequencing, we generally recommend the comparative genome analysis strategy.
(6) What are the requirements for reference genome selection in population evolution analysis?
In order to study the evolutionary sequence and origin relationship between multiple strains, the single-copy core genes among these strains can be selected as genetic markers, and an evolutionary tree can be constructed, and based on this, related researches such as geographical spread, divergence time, and selection pressure can be carried out. . In general, the more distantly related genomes are selected, the fewer the number of single-copy core genes. If the number of single-copy core genes falls below a certain limit, such as hundreds of genes, the accuracy of the phylogenetic tree will suffer. Taking a typical application as an example, about 5 strains of the same genus can be selected, and an additional 2 or so strains of adjacent genus can be selected as outgroups, and a total of 5 to 8 strains can be used to construct an evolutionary tree.
(7) How to interpret the strain variation detected based on collinearity analysis?
Based on the detection of collinearity analysis, different types of variation of the target strain can be obtained. Generally, according to the severity of the impact of the variation on the protein sequence, the genes with greater influence are given priority. Generally, the following sequence can be referred to: gene indel>InDel causing frameshift mutation>non-frameshift InDel≈nonsynonymous SNP>synonymous SNP. Among them, gene indels, especially new gene insertions, are often difficult to detect by resequencing. This is one of the reasons why we prefer to detect variants based on assembly combined with comparative genomic approaches.
Classic Case
Application of Nanopore and Illumina sequencing technologies for hybrid genome assembly and annotation of fully drug-resistant Klebsiella pneumoniae.
Background
The increasing worldwide prevalence of multidrug-resistant Klebsiella pneumoniae is associated with high mortality. various infections. This study reports the whole genome sequence of Klebsiella pneumoniae KP58, a fully drug-resistant Klebsiella pneumoniae strain with strong resistance to colistin and tigecycline in China.
Methods and materials
KP58 was isolated and cultured from the urine of an 89-year-old patient in Hangzhou who received a variety of broad-spectrum antibiotics during hospitalization, including cefoperazone/sulbactam, imipenem, tigecycline, and coliform bacteria. White. Antibacterial susceptibility tests were performed to determine minimum inhibitory concentrations (MICs). Whole genome sequencing was performed using Illumina and Nanopore technologies, and the genome characteristics, drug resistance genes and virulence genes were comprehensively analyzed after assembly; the genome epidemiology and phylogenetic analysis of Klebsiella pneumoniae KP58 and closely related strains were performed. Main results 912Mb data were obtained by Illumina sequencing, and the read quality values were all above Q30. Nanopore MinION sequencing obtained 76K reads, the average read length was 9.8kb (N50 19Kb, the longest read 125Kb), and the sequencing data volume was 746.5Mb; reads with a Q value of 9 were greater than 3K, and the data volume was 368.8Mb. Using mixed assembly, the whole genome sequence of KP58 was obtained, including 1 circular chromosome (5.5Mb) and 5 plasmids (197kb, 135kb, 87kb, 12kb and 5.6kb) (Fig. S2). 5913 protein-coding genes, 85 tRNA genes and 25 rRNA operons were annotated; the genome integrity was 98.62%, and the contamination rate and strain heterogeneity were 0.36% and 0%, respectively, indicating that the assembled genome was of high quality. Consistent with the drug susceptibility data, KP58 contains multiple resistance genes, including
aadA2, rmtB, blaCTX-M-65, blaKPC-2, blaTEM-1B, blaSHV-12, blaSHV-182, qnrS1, fosA, catA2, sul2, tet(A) and dfrA14 , for β-lactose, fluoroquinolones, Phenols, fosfomycin, sulfonamides, tetracyclines, and trimethoprim are resistant. In addition, analysis of quinolone resistance-determining regions (QRDRs) revealed alterations in target genes, including mutations in GyrA (S83I and D87G) and ParC (S80I).
Figure S2 Whole genome circle
legend: (A) is chromosome, (B)-(F) is plasmid.
Expression levels of tigecycline resistance-related efflux pump genes (such as acrA, acrB, ramA, marA, soxS, and acrAB ) were determined using qRT-PCR , indicating that tigecycline resistance of KP58 is related to the AcrAB efflux system of overexpression. The plasmid-free tigecycline resistance gene tet(X) was also found . Although the isolate was negative for the mcr gene, it carried an IS Kpn26 -like element inserted into the mgrB gene, an insertion known to confer colistin resistance.
Three potential virulence factors were also identified, including responses to aerobactin, hypermucoviscosity (hypermucosity), and yersiniabactin (yersinin). KP58 was identified as ST11, a widespread multidrug-resistant clone of K. pneumoniae causing severe infections worldwide. The KL type (polysaccharide capsule and lipopolysaccharide O antigen) of KP58 was predicted by Kaptive as KL64. Replicons belonging to 4 plasmids belonging to incompatible (Inc) Group F [IncFIB, IncR, and IncFII] and ColRNAI were also identified.
Vertical genetic core genome SNPs and core genome MLST analysis were used to assess the phylogenetic relationship between KP58 and 416 ST11 K. pneumoniae strains collected from China. The phylogenetic tree was highly diverse, and the most geographically related strains were grouped into the same taxa that differed by <10 SNPs (Fig. 1). Most of the isolates collected from Sichuan and Hangzhou contained the highest proportion of known antibiotic resistance determinants. The results showed that the most closely related strains of KP58 were those of L39_2, another ST11 strain isolated from fecal samples of Hangzhou people, and these strains had only 10 cgMLST loci or 9 SNPs (Fig. 2). Among the KP58 plasmids, the IncR/IncFII plasmid (134,972 bp)
carrying the carbapenem resistance gene bla KPC-2 has 99% homology with the plasmid pKPC2_020037 of the KPC-2- producing strain wchkp020037 isolated in Chengdu, China and 75% coverage. The backbone region of pKP58-2 was found to contain genes for replication and stabilization of the plasmid, but lack the conjugated genes. The genetic background of bla KPC-2
on plasmid pKP58-2 consists of Tn3 - tnpA , Tn3 - tnpR , ISKpn27 , Tn3- ΔblaTEM - 1 , blaKPC - 2 , ΔISKpn6 ,The composition of kor C, klc A, Δrep B and ΔTn 1721 is almost identical to that of the three plasmids: p3_L39, p2b2 and pKPC2_020037.
Another IncFII type plasmid (87,095bp) contains the resistance genes qnrs1, tet(A), cata2, sul2 and dfra14 , which are known to be resistant to fluoroquinolones, tetracyclines, phenols, sulfonamides and trimethoprim, respectively .
In addition, a group of virulence genes consisting of iucABCD, iutA, rmpA and rmpA2 were all located on the IncFIB/IncHI1B plasmid (197,415 bp), while no resistance genes were detected in the same plasmid. This virulence plasmid was named pKP58-1, and compared with other plasmids in the NCBI GenBank database, the results showed that it had homology with three previously reported virulence plasmids, including pK2044 (99% homology and 93% coverage), pLVPK (99% homology and 89% coverage) and pVir-CR-HvKP4 (99% homology and 78% coverage). In addition, the size of pKP58-1 was smaller than that of the two plasmids pK2044 and pLVPK, but larger than that of pVir-CR-HvKP4 (178,154 bp) (Fig. 3).
Conclusion
In this study, a fully drug-resistant Klebsiella pneumoniae strain KP58 was sequenced using Illumina and Nanopore platforms. Through mixed genome assembly, high-quality whole genome sequences, including one circular chromosome and five circular chromosomes, were obtained. plasmid. This strain has multiple antibiotic resistance and virulence factors. Resistance to colistin was associated with the insertion of a similar ISKpn26 into themgrBgene; resistance to tigecycline was associated with overexpression of the AcrAB efflux pump system. The closest relative to KP58 is another clinical isolate collected from Hangzhou, which differs by only 10 cgMLST loci. These data will provide important guidance for prevention, diagnosis and treatment strategies of Klebsiella pneumoniae infection.
References
Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 2017;13(6):e1005595.
Wick, RR, Judd, LM, Gorrie, CL and Holt, KE Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom. 2017;3(10):e000132.
Ruan Z, Wu J, Chen H, Draz MS, Xu J and He F. Hybrid Genome Assembly and Annotation of a Pandrug-Resistant Klebsiella pneumoniae Strain Using Nanopore and Illumina Sequencing. Infect Drug Resist. 2020;13:199-206.