Such expressions as that famous one of Linnæus, and which we often meet with in a more or less concealed form, that the characters do not make the genus, but that the genus gives the characters, seem to imply that something more is included in our classification, than mere resemblance. I believe that something more is included; and that propinquity of descent,—the only known cause of the similarity of organic beings,—is the bond, hidden as it is by various degrees of modification, which is partially revealed to us by our classifications (Darwin, 1859, p. 413f).

Monday, 29 October 2007

The Great Phenetic Revival 2 Revisted: A Reply to Felsenstein


Recently on our blog we received a reply to our posting on The Great Phenetic Revival 2: Phenetics from Joseph Felsenstein (University of Washington). We thought it would be a pity to relegate our reply to the comments section and instead include it as separate post.

Felsenstein claims not to have been "trying to give the history of "phylogenetic methods" in his chapter 10. Nevertheless, this seems not to have prevented him from making sweeping (and damning) statements concerning classification - some published before the publication of his book: "The focus of systematics has shifted massively away from classification: it is the phylogenies that are central, and it is nearly irrelevant how they are then used in taxonomy" (Felsenstein 2001: 467), "Systematists get so worked up declaiming the centrality of classification in systematics that I have argued the opposite' (Felsenstein in Franz 2005, p. 495); others see things in much the same light: "Many phylogeneticists now see nomenclature and classification as largely irrelevant to phylogenetics..." (Hillis 2007: 331).

Still, Felsenstein sees himself as commenting only upon "algorithmic methods", when, of course, any method proposed can be made 'algorithmic' and many attempted to do so in constructing early versions of data matrices, way before Sneath or Sokal (see figure above as well as Tillyard 1919, Abel 1910 and Willman 2003).

Cladistics and phenetics might (erroneously) be seen as methods. Felsenstein wished to drop the terminology: "Making this distinction [between phenetics and cladistics] implies that something fundamental is missing from the 'phenetic' methods, that they are ignoring information that the 'cladistic' methods do not. In fact, both methods can be considered to be statistical methods, making their estimates in slightly different ways ... In this book we will give the terms 'cladistic' and 'phenetic' a rest and consider all approaches as methods of statistical inference of the phylogeny" (Felsenstein 2004: 145-146). Our comment on Felsenstein's wish to drop the terms 'cladistic' and 'phenetic', was "to grant equal time to all quantitative (numerical) methodologies", which now leaves us puzzled as to what, exactly, in this passage was "an outrageous misrepresentation of the content of my Chapter 10".

Further, "... numerical phylogenetics is not 'based on simple similarity'. It just isn't. There is no way you can compute either a parsimony tree, or a likelihood tree, from a table of similarities between species". What, then, is it based upon? The matrices that grace our systematic accounts certainly look to us as if they are sets of similarities.

To many (us included), cladistics was about the reform of palaeontology rather than the elaboration, support and promotion of one kind of method or another. That reform began in the 1960s almost entirely independent of the numerical development of data manipulation, of which the latter manifests itself as the ever present pernicious influence of phenetics (regardless of that manifestation as 'parsimony', 'compatibility', 'likelihood', etc.). Felsenstein doesn't mention palaeontology in his history chapter but does later in "Phylogenies and Paleontology" (Felsenstein 2004: 547 et seq.). Here his imprecision seems a little troubling: "...If the fossil record of a group has been searched thoroughly enough, then we should not only be allowed to interpret fossils as ancestors, we should be encouraged to do so" (Felsenstein 2004, p. 547) - searched thoroughly enough; we should not only be allowed to...we should be encouraged to do so. How thorough is enough? And since when has the scientific endeavour required 'permission' to be 'allowed' and 'encouraged' to 'believe' something? It was with such 'beliefs' that the first cladistic revolution was necessary. It is from the ever present phenetics that the second cladistic revolution will (eventually) be born.

References
Abel, O. 1910. Kritische Untersuchungen über die palaogenen Rhinocerotiden Europas. Abhandlungen Kaiserlich-Koenigliche Geologische Reichsanstalt 20: 1-22.
Felsenstein, J., 2001. The troubled growth of statistical phylogenetics. Systematic Biology 50: 465-467.
Felsenstein, J., 2004. A digression on history and philosophy. In: Felsenstein, J. (Ed.), Inferring Phylogenies. Sinauer Associates, Sunderland, MA, pp. 123-146.
Franz, N. 2005. On the lack of good scientific reasons for the growing phylogeny/classification gap. Cladistics 21: 495-500.
Hillis, D. M. 2007. Constraints in naming parts of the Tree of Life. Molecular Phylogenetics and Evolution 42: 331-338.
Tillyard R. J., 1919. The panorpoid complex. Part 3: the wing venation. Proceedings of the Linnean Society of New South Wales 44: 533-717.
Willman, R. 2003. From Haeckel to Hennig: the early development of phylogenetics in German-speaking Europe. Cladistics 19: 449-479.

5 comments:

Joe Felsenstein said...

It seems to me three points need a comment:
1. What constitutes an algorithmic method.
2. How can inference of ancestral-descendant relations be done with paleontological data.
3. What constitutes "cladistic" or "phenetic" methods.


Algorithmic methods:

"any method proposed can be made 'algorithmic' and many attempted to do so in constructing early versions of data matrices, way before Sneath or Sokal"

The coded data tables from papers by Abel (1910) and Tillyard (1919) are interesting, and I'm happy to hear of them. But until someone shows me discussion in which Abel or Tillyard described the methods they used to resolve conflicts among characters, I cannot consider them to have algorithmic methods. It is not just a matter of constructing a coded data matrix. They need to have a method for going from that to a tree that can be coded into a computer program. Given that criterion, Michener and Sokal (1957) remains the first use of an algorithmic method, and thus a good starting point for my chapter.


Paleontological data and ancestors:

"Here his imprecision seems a little troubling: '...If the fossil record of a group has been searched thoroughly enough, then we should not only be allowed to interpret fossils as ancestors, we should be encouraged to do so' (Felsenstein 2004, p. 547) How thorough is enough? And since when has the scientific endeavour required 'permission' to be 'allowed' and 'encouraged' to 'believe' something? It was with such 'beliefs' that the first cladistic revolution was necessary."

Ebach seems troubled that I am not algorithmic enough. Of course my description is vague. It describes what a statistical model would do, one that had rates of speciation, rates of extinction, a model of fossilization and a model of stratigraphic sampling. The point I was making (and still make) is that such a statistical process, given complete enough sampling to have a high chance of seeing the ancestral lineage itself, will on occasion allow us to infer that a particular lineage is that of the ancestor of some of the modern species. There is an anathema in phylogenetic systematics against having a method that ever does this. This anathema is not logical.


"Cladistic" and "phenetic"

"What we've got here is a failure to communicate." (said by the Paul Newman character in the 1967 film "Cool Hand Luke")

I'll take some of the blame, but we are talking past each other largely because we have different definitions of what is "cladistic" and what is "phenetic". To me these terms describe approaches to making a classification. Phenetic classification groups by measures of overall similarity, starting from a table of pairwise similarities between species. It does not use an estimate of the phylogeny at all. Cladistic classification makes only groups that are monophyletic, as judged by a phylogeny or cladogram. In the comments in chapter 10 I urged us not to use the terms "cladistic" or "phenetic" to distinguish between different methods for inferring phylogenies.

Ebach's comment in his first post was
"Why does Felsenstein reduce the theory of cladistics and phenetics to different types of method?"
If by that he had meant that I reduced different methods of classification to "different types of method" for inferring phylogenies, that would have been a misunderstanding of my intent. And I would be right to complain of misrepresentation. But I now see that Ebach has a very different definition of those terms, and in view of that I do have to withdraw my charge of misrepresentation. And replace it by a charge of silliness. For he thinks of a simple data matrix (say an alignment of nucleotide sequences) as "based on simple similarities":

"The matrices that grace our systematic accounts certainly look to us as if they are sets of similarities."

Inferring a tree from them, without judging first whether in site 37 state A or state G is the apomorphic state, is (as far as I can tell) a phenetic method, as far as Ebach is concerned. Even if you use a parsimony program. If I understand this point correctly, most of the trees constructed by parsimony, likelihood, and Bayesian programs would count as "phenetic" according this view. I don't think that most people who consider themselves phylogenetic systematists would agree with Ebach that they are being pheneticists in making their trees.

I think my useage is clearer and more straightforward. Others often disagree and assign the terms "cladistic" to parsimony methods, "phenetic" to distance matrix methods, with likelihood and Bayesian methods called one or the other, depending on who is writing. Ebach wants to go much further. I am reminded of the late Charles Sibley. He inferred trees of birds by distance methods, using DNA hybridization data. But he told me that he was absolutely opposed to ever making a group in the classification system that was not monophyletic. By my definition he was a cladist. Many other people would argue that his use of DNA hybridization data and distance methods made him a pheneticist. But Ebach goes farther, for he wants to designate as pheneticists not only me, not only Sibley, but most users of numerical methods for inferring phylogenies. Even the most adamant fans of parsimony, even the most paid-up dues-paying members of the Will Hennig Society.

Joe Felsenstein said...

(Of course that should have been "Willi", not "Will".)

Crawford Tillinghast said...

I have been lurking on this blog for quite some time and finally decided to post.

To make it short, I basically agree with Ebach & Williams on their conceptions about the misuse of barcoding and the exacerbation on molecular phylogeny as a panacea to every systematic problem. Obviously, DNA phylogenies are flawed by intrinsic issues related do the quality of the data, like poor sequences alignment, poor indel information handling (in special by distance methods) and single-gene bias.

As a morphologist and systematician, I feel confused about one topic which is, inter alia, often pointed out by Felsenstein on this blog: Why do Ebach & Williams consider parsimony computational methods as "phenetics"? Don't these methods rely on synapomorphies to form monophyletic groups? Which method is thus a real "cladistic method" in Ebach & Williams' concept? "Pen and paper" cladistics is, to my humble view, impossible to be done adequately (viz. finding all most parsimonious trees) if you happen to have a matrix with, say, 150 taxons and 235 characters!

David Williams & Malte Ebach said...

Rather than dissect the issue afresh, we humbly suggest that one might find some notion of our ideas on methods and phenetics in our book The Foundations of Systematics and Biogeography and in our paper Drowning by Numbers (Botanical Review 71, 4). But simply put:

Phenetics is about similarity, sometimes represented by a code, such as 1. Thus phenetics sees shared characters as 11.



Cladistics is about relationships, which can also be represented by a code. Thus cladistics sees relationships as 0(11).

David Marjanović said...

The point I was making (and still make) is that such a statistical process, given complete enough sampling to have a high chance of seeing the ancestral lineage itself, will on occasion allow us to infer that a particular lineage is that of the ancestor of some of the modern species. There is an anathema in phylogenetic systematics against having a method that ever does this. This anathema is not logical.

I haven't seen such an anathema; but I haven't seen such a method either, apart from very vague hints at it. What I've seen among my fellow vertebrate paleontologists are two things:

1) History. Before phylogenetic analysis was introduced in the mid-late 1980s, vertebrate paleontologists commonly indulged in "ancestor worship" where any fossil was by default considered an ancestor of its later relatives (and if it was inconveniently too young, it was still considered "a good structural ancestor" and similar wordings). Naturally, this led to all kinds of untenable or plain wrong inferences. For instance, ancestors don't have autapomorphies (in parsimony anyway), so whatever Archaeopteryx had was thought to be ipso facto plesiomorphic for birds. It took till well into the 1990s till people noticed that Archie has a bunch of easily visible autapomorphies. The pendulum may have swung a little too far in the direction of assuming that nothing we find can be an ancestor.

2) The quality of the fossil record. As of today, amphibiaweb.org counts 7,306 known extant species of lissamphibians. In the fossil record from the beginning of the Triassic to the end of the Miocene, I counted 319 known species or possible species* of lissamphibians as of April/May 2013. How probable is it that any of those is an ancestor of any extant species? There is no ecological or phylogenetic reason to think that the real diversity has exploded in the last 5 million years (but still left almost no fossils in the Plio-Pleistocene; if I had included those records that are mostly referred to extant species, the total number would hardly have doubled). Indeed, the tree (in the same paper) is full of ghost lineages. – People working on groups with a better fossil record, like rodents or ammonites or conodonts, used to believe their probabilities of finding an ancestor were much higher, but even there the (still ongoing) introduction of phylogenetic analysis has shown that autapomorphies and ghost lineages are everywhere. It really is improbable – not impossible, but highly improbable – that anything we find is an ancestor.

* I'm talking about poor preservation, not so much about species concepts.

a matrix with, say, 150 taxons and 235 characters

I wouldn't bother publishing an original matrix with less than 3 times as many characters as taxa. Because homoplasy is rampant, you really have to worry about accidental sampling bias below – empirically, anecdotally – about that number.

Cladistics is about relationships, which can also be represented by a code. Thus cladistics sees relationships as 0(11).

That's exactly what parsimony programs do. The rooting of the tree, and thus the characters, is done by the outgroup – it isn't necessary to polarize characters a priori, if that's what you mean. Or are you talking about the widespread indefensible practice of "keeping" all multistate characters unordered?