New Tools in Orthology Analysis: A Brief Review of Promising Perspectives

Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Paraná, Curitiba, Brazil
Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST “all-against-all” methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing time); and proteinOrtho (developed for dealing with large amounts of biological data). We made a comparison among the main features of four tool and tested them using four for prokaryotic genomas. We hope that our review can be useful for researchers and will help them in selecting the most appropriate tool for their work in the field of orthology.

Introduction
Finding the homology relationship between sequences is an essential step for biological research. Within homology, the orthology analyses, that consist in finding out if a pair of homologous genes are orthologs—i.e., resulting from a speciation—or paralogs—i.e., resulting from a gene duplication—is very important in computational biology, genome annotation, and phylogenetic inference (Ullah et al., 2015). Because of this, the highlight of the present research was the development of computational tools that aim at facilitating this field of study.

The process of orthologs detection, besides being closely related to comparative analysis and genomic dynamism, is also an extremely important field of study for helping to improve the functional annotation of various organisms (Kim et al., 2011) and it is still very important to elucidate processes evolving the appearance of species (Wang et al., 2015). An accurate orthology recognition is an essential step for comparative genomic researches (Petersen et al., 2017) and then, in some cases, there is a need for tools that analyze closely related species by pangenomas (Fouts et al., 2012) or for the creation of tools that use different strategies like the post-translational modifications proteins (PTMs) for a better orthology inference (Chaudhuri et al., 2015).

Since the early studies involving the establishment of techniques for inferring orthology, the main difficulty was the lack of a methodology and of a tool to be fully reliable in assemblying orthologs sets of data. It was only in 2007, that the first study about the sensitivity, accuracy, and performance methods in detecting these groups arose (Altenhoff and Dessimoz, 2009) thus consecrating methodological “gold standards” as adaptations of—among others the Markov Cluster Algorithm models of the Basic Alignment Search Tool (BLAST) algorithm (Chen et al., 2007); of Reciprocal Best Hits (RBH) (Zielezinski et al., 2017); of Correlation Coefficient-based Clustering (COCO-CL), COCO-CL (Raja et al., 2006); of Automatic Clustering of Orthologs and In-paralogs (InParanoid) (O'Brien, 2004) leading to the appearance of various tools such as the a Markov Cluster algorithm for grouping proteins into multiple species of ortholog groups (OrthoMCL) (Li et al., 2003). BLAST tool and adaptations were consolidated in subsequent studies (Kristensen et al., 2011) resulting in several publications and in the creation of large biological databases containing Ortholog Clusters such as the Clusters of Orthologs Groups/euKaryotic Orthologous Groups (COG/KOG), Ortholog Data Bank (OrthoDB), and eggNOG (Kuzniar et al., 2008).

However, it is difficult to detect orthologous groups (Tekaia et al., 2016) and there is no effective tool for detecting these groups, because the accumulation of evolutionary dynamic events tends to difficult the recognition of true orthologs among homologs (Novo et al., 2009), but rather a set of tools that meet certain computational demands and interests of its users (Altenhoff and Dessimoz, 2009). Also perceived was the need of many improvements still to be made for a more accurate orthology prediction using these tools building or upgrading the ortholog relationships between genomes requires a lot of computational effort and a lot of time (Tabari and Su, 2017), besides, relating orthology between organisms having distant kinship origins, for instance, still remains a remarkable challenge (Chen et al., 2010).

All of this is gets worse when there is a need to include a large number of sequences to be analyzed in order to infer orthology (Bitard-Feildel et al., 2015). Another problem is the application of a high-level of programming knowledge on the part of researchers to analyze data, which hinders the smoothness of the work flow. Some methodologies and tools, like the consolidated as BLAST all-vs.-all (Schreiber and Sonnhammer, 2013), RBBH (Gupta and Singh, 2015), OrthoMCL (Chen et al., 2007), demand a high computational cost (Linard, 2011) that will add weigh on the capabilities of normal hardware and will end up requiring access to the resources of supercomputers (Lechner et al., 2011). Another factor that directly influences the demand for better tools is the ever-increasing number of genomes that are deposited in large biological sequences in databanks and that can be compared simultaneously (Muir et al., 2016). This requires more efficient software tools (Curtis et al., 2013) because those such as BLAST and InParanoid fail when orthology is involved, but the level of conservation among orthologous is low, and therefore this requires a sophisticated manual intervention and makes it difficult to automate the process (Wagner et al., 2014). Besides all that also comes the need to develop tools to improve the sensitivity in detecting orthologous groups (Emms and Kelly, 2015).

Those are the most important needs and because of them several research groups are putting in great effort to develop new tools to improve and facilitate analysis involving orthology and may also contribute to advances in later studies. Therefore, the latest tools already available should gain prominence in the scientific field. Reviews of recent ortholog tools are gaining prominence, so much that came the first review tool involving homology pan genomes by Vernikos et al. (2015) and Xiao et al. (2015). A great number of programs are available for supra-genome analysis but each of them suffers from one or the other limitations leaving gaps for further improvement (Chaudhari et al., 2016).

In order to compile our review, we focused on the most recent tools that have been developed with high expectatives for the study of orthologs, in order to bring the lastest advances in the development of more effective, fast, and multi-tasking tools for the processing of homologous ortholog data sequences.

Source:- frontiersin
New Tools in Orthology Analysis: A Brief Review of Promising Perspectives Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Paraná, Curitiba, Brazil Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST “all-against-all” methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing time); and proteinOrtho (developed for dealing with large amounts of biological data). We made a comparison among the main features of four tool and tested them using four for prokaryotic genomas. We hope that our review can be useful for researchers and will help them in selecting the most appropriate tool for their work in the field of orthology. Introduction Finding the homology relationship between sequences is an essential step for biological research. Within homology, the orthology analyses, that consist in finding out if a pair of homologous genes are orthologs—i.e., resulting from a speciation—or paralogs—i.e., resulting from a gene duplication—is very important in computational biology, genome annotation, and phylogenetic inference (Ullah et al., 2015). Because of this, the highlight of the present research was the development of computational tools that aim at facilitating this field of study. The process of orthologs detection, besides being closely related to comparative analysis and genomic dynamism, is also an extremely important field of study for helping to improve the functional annotation of various organisms (Kim et al., 2011) and it is still very important to elucidate processes evolving the appearance of species (Wang et al., 2015). An accurate orthology recognition is an essential step for comparative genomic researches (Petersen et al., 2017) and then, in some cases, there is a need for tools that analyze closely related species by pangenomas (Fouts et al., 2012) or for the creation of tools that use different strategies like the post-translational modifications proteins (PTMs) for a better orthology inference (Chaudhuri et al., 2015). Since the early studies involving the establishment of techniques for inferring orthology, the main difficulty was the lack of a methodology and of a tool to be fully reliable in assemblying orthologs sets of data. It was only in 2007, that the first study about the sensitivity, accuracy, and performance methods in detecting these groups arose (Altenhoff and Dessimoz, 2009) thus consecrating methodological “gold standards” as adaptations of—among others the Markov Cluster Algorithm models of the Basic Alignment Search Tool (BLAST) algorithm (Chen et al., 2007); of Reciprocal Best Hits (RBH) (Zielezinski et al., 2017); of Correlation Coefficient-based Clustering (COCO-CL), COCO-CL (Raja et al., 2006); of Automatic Clustering of Orthologs and In-paralogs (InParanoid) (O'Brien, 2004) leading to the appearance of various tools such as the a Markov Cluster algorithm for grouping proteins into multiple species of ortholog groups (OrthoMCL) (Li et al., 2003). BLAST tool and adaptations were consolidated in subsequent studies (Kristensen et al., 2011) resulting in several publications and in the creation of large biological databases containing Ortholog Clusters such as the Clusters of Orthologs Groups/euKaryotic Orthologous Groups (COG/KOG), Ortholog Data Bank (OrthoDB), and eggNOG (Kuzniar et al., 2008). However, it is difficult to detect orthologous groups (Tekaia et al., 2016) and there is no effective tool for detecting these groups, because the accumulation of evolutionary dynamic events tends to difficult the recognition of true orthologs among homologs (Novo et al., 2009), but rather a set of tools that meet certain computational demands and interests of its users (Altenhoff and Dessimoz, 2009). Also perceived was the need of many improvements still to be made for a more accurate orthology prediction using these tools building or upgrading the ortholog relationships between genomes requires a lot of computational effort and a lot of time (Tabari and Su, 2017), besides, relating orthology between organisms having distant kinship origins, for instance, still remains a remarkable challenge (Chen et al., 2010). All of this is gets worse when there is a need to include a large number of sequences to be analyzed in order to infer orthology (Bitard-Feildel et al., 2015). Another problem is the application of a high-level of programming knowledge on the part of researchers to analyze data, which hinders the smoothness of the work flow. Some methodologies and tools, like the consolidated as BLAST all-vs.-all (Schreiber and Sonnhammer, 2013), RBBH (Gupta and Singh, 2015), OrthoMCL (Chen et al., 2007), demand a high computational cost (Linard, 2011) that will add weigh on the capabilities of normal hardware and will end up requiring access to the resources of supercomputers (Lechner et al., 2011). Another factor that directly influences the demand for better tools is the ever-increasing number of genomes that are deposited in large biological sequences in databanks and that can be compared simultaneously (Muir et al., 2016). This requires more efficient software tools (Curtis et al., 2013) because those such as BLAST and InParanoid fail when orthology is involved, but the level of conservation among orthologous is low, and therefore this requires a sophisticated manual intervention and makes it difficult to automate the process (Wagner et al., 2014). Besides all that also comes the need to develop tools to improve the sensitivity in detecting orthologous groups (Emms and Kelly, 2015). Those are the most important needs and because of them several research groups are putting in great effort to develop new tools to improve and facilitate analysis involving orthology and may also contribute to advances in later studies. Therefore, the latest tools already available should gain prominence in the scientific field. Reviews of recent ortholog tools are gaining prominence, so much that came the first review tool involving homology pan genomes by Vernikos et al. (2015) and Xiao et al. (2015). A great number of programs are available for supra-genome analysis but each of them suffers from one or the other limitations leaving gaps for further improvement (Chaudhari et al., 2016). In order to compile our review, we focused on the most recent tools that have been developed with high expectatives for the study of orthologs, in order to bring the lastest advances in the development of more effective, fast, and multi-tasking tools for the processing of homologous ortholog data sequences. Source:- frontiersin
Like
informative
Love
9
0 Comments 0 Shares 470 Views