We consider the following three related problems: 1) Given complete data (taxon by character data) remove the minimum number of characters so that the remaining data is pairwise compatible (contains a perfect phylogeny); 2) Given data with missing entries, impute the missing data in order to minimize, in the completed data, the number of pairs of characters that are incompatible; 3) Given data with missing entries, impute the missing data to minimize, in the completed data, the number of characters that must be removed so that the remaining sites are pairwise compatible. We consider in detail the cases of two, three and four states allowed per character and describe compact integer programming formulations that solve these problems (the approach extends in principle to any fixed number of states). We discuss empirical tests on problem instances of size of interest in phylogenetics, and establish that these instances can be solved very efficiently in practice, even though the problems are all NP-hard.
Related Links
* http://wwwcsif.cs.ucdavis.edu/~gusfield/ - powerpoint, papers and software for the 2-state case
view more