Institute of Zoology, Chinese Academy of Sciences
Group of Molecular Ecology and Evolution
CVhaplot
(Version 2.01)
CVhaplot
Haplotype inference from population genotypic data is a complex statistical problem, showing considerable internal algorithm variability and among-algorithm discordance (Huang et al., 2009). Recently, Huang et al. (2008) have explored the consensus vote (CV) approach to increase the confidence of statistical haplotyping results. This approach places its emphasis on examining discordance among independent algorithms and identifying uncertain inferrals in the solutions. Alternatively, halpotype inference uncertainty can also be reduced by controlling the internal variability of individual algorithms (e.g. Orzack et al., 2003). CVhaplot has been developed to combine these two complementary approaches and automate the analysis procedure.
CVhaplot is a small package of Perl scripts. It has the following features:
· reformats the data into the input file formats of several popular algorithms for haplotype reconstruction: PHASE (Stephens et al., 2001; Stephens & Donnelly 2003), HAPLOTYPER (Niu et al., 2002), HAPLOREC (Eronen et al., 2004, 2006), ARLEQUIN-EM (Excoffier et al., 2005), GCHAP (Thomas, 2003a,b), GERBIL (Kimmel and Shamir, 2005), and HAPINFERX ( Clark, 1990);
· facilitates the realization of multiple iterations of these algorithms;
·extends the applicability of HAPLOTYPER, GCHAP and GERBIL to deal with triallelic sites with a coding switch technique (i.e. coding a triallelic site as two biallelic ones);;
·provides various indices to evaluate the consistency of individual algorithms;
·identifies a HAPINFERX ensemble solution with high accuracy from multiple independent HAPINFERX? iterations? to overcome its deficiency of high inconsistency;
·generates CV solutions according to consensus rules;
· identifies uncertain haplotypes in the data;
·identifies samples that show any mismatch between the inferred and original genotypes after the CV analysis.
Platform requirements: Windows XP/Vista, Linux and Mac OS X implemented with Perl version 5.10.0 or later versions.
Table 1. The Perl scripts of CVhaplot
Scripts | Functions |
trans.pl | Reformat a PHYLIP input file into the formats of seven popular algorithms |
consistency.pl | Consistency test identify a HAPINFERX ensemble solution with high accuracy |
CV.pl | Consensus vote |
To download the package with example files, click here (WinZip file).
Version history
2.01 (August, 2009)
The new version can now run under Windows XP/Vista, Linux and Mac OS X. A warning report bug was fixed: the previous version does not report a warning message when all HAPINFERX runs are abnormal. A bug of the haplotype output of HAPINFERX has also been fixed: the present version now outputs the inferred haplotypes in the same order as in the original genotypic data.
.
2.0 (May 2009)
CVhaplot 2.0 introduces more functions and greater flexibility. It generates three batch files to help to perform the independent iterations, and automatically analyzes the output files of various algorithms for consistency test and consensus vote analysis. The revised consensus vote rules further improve the performance of the CV approach. The new version can also identify a HAPINFERX ensemble solution with high accuracy from multiple independent iterations whose NDH (numbers of distinct haplotypes) values are among the smallest. This strategy helps to overcome the deficiency of high inconsistency of some algorithms (i.e. HAPINFERX), as demonstrated in Orzack et al. (2003). The script CV.pl incorporates the inspection function of inspect.pl in the previous version. Two programs (HAPINFERX and GCHAP) were included in the current version of CVhaplot package, with sincere thanks to their original authors for giving the permission. Also, the new version HAPLOREC2.3 was used in this updated version of CVhaplot.
1.0 (March 2008)
This was the original implementation of the consensus vote approach as introduced in Huang et al. (2008). It reformats the genotypic data into the input file formats of several popular algorithms for haplotype reconstruction, performs the consistency test for inspecting the internal variability of five algorithms (except for GERBIL and GCHAP), and performs the consensus vote analysis. This version only displays the discordance among algorithms and generates haplotype data for individual algorithms. It does not provide the consensus vote solution. It allows user to use an ensemble of random iterations of HAPINFERX as its input solution to CV.pl. It includes a Perl script (inspect.pl) to identify samples that show any mismatch between the inferred and original genotypes after the CV analysis.