Such expressions as that famous one of Linnæus, and which we often meet with in a more or less concealed form, that the characters do not make the genus, but that the genus gives the characters, seem to imply that something more is included in our classification, than mere resemblance. I believe that something more is included; and that propinquity of descent,—the only known cause of the similarity of organic beings,—is the bond, hidden as it is by various degrees of modification, which is partially revealed to us by our classifications (Darwin, 1859, p. 413f).

Sunday, 16 March 2008

Defining Phenetics, Intentions and Mimics

Many reading this blog are probably wondering why we seem to call everything phenetics. Phenetics is a term used, incorrectly, to only describe a certain type of methodology, namely clustering based on similarity (i.e., neighbor-joining etc.). In fact phenetics is nothing more than Numerical Taxonomy (Sneath & Sokal, 1973), a topic that we have discussed in a previous blog (Phenetic "Natural" Classifications).

Phenetics attempts to classify organisms based on over-all similarity. An excellent definition of phenetics, which can be found at Wikipedia, goes one step further:
"In biology, phenetics, also known as numerical taxonomy, is an attempt to classify organisms based on overall similarity, usually in morphology or other observable traits, regardless of their phylogeny or evolutionary relation".
Where phenetics becomes problematic is when these classifications are considered to be natural, that is monophyletic. A monophyletic taxon is based on relationship, namely homology. Homology is not a measurement of similarity but an expression of relationship. Phenetically grouped organisms may not necessarily be more closely related to each other than they are to another group. In other words, phenetics cannot distinguish paraphyly from monophyly. An analogous problem exists in biogeography.

Parsimony Anaylsis of Endemicity (PAE) is a method developed in order find similarities between areas (see Rosen 1988). The method simply requires a data matrix of presence and absences of taxic distributions. In contrast, cladistic biogeography demands that taxa used in analysis are monophyletic, however many fossil groups have no relations that coexisted in the same period. This means that some paleontologists are forced to deal with higher taxon biogeography (i.e. at family or ordinal level) or abandon cladistic biogeography altogether. The idea behind PAE is to use any group within a phenetic context. Monophyly is not a requirement of PAE therefore absences can be used to cluster organisms into areas since no notion of homology or relationship is assumed. As with phenetic findings in systematics, some users have made the mistake of assuming that PAE can find phylogenetic signals based on non-evolutionary data, that is, non-homologous information, in the data matrix.

On closer examination we find that many systematists and biogeographers intent on discovering homology, monophyly and endemism are nevertheless using phenetic methods. Perhaps this is due to a lack of readily available methods in the literature. After all, cladistics and cladistic biogeography started off as "pen and paper" methods whereas phenetics was always a numerical method (hence numerical taxonomy). The issue at stake is whether using phenetic methods jeopardizes our intent, namely to search for homologies, monophyly and endemic areas. We argue that it does.

The problems lie in transposing data into a data matrix using neighbor-joining, clustering, parsimony or compatibility as are all phenetic - that is, methods that use overall similarity in order to find classifications. These methods can not distinguish natural (monophyletic) from artificial (non-monophyletic) classifications.

Our favorite programs are rightly pointed out as black-boxes yet we shrug this off and cite Farris (1983) or recite some algorithm. In some extreme cases we justify our intentions by making sure that our data is compatible to our methods (sensu Patterson 1982). But we cannot continue skirting this issue. Similarity is an anathema that our forebears, Goethe, Vic D'Azyr, Saint Hilaire, Owen, the founders of homology had quickly disposed. Similarity is the foundation of phenetics, not cladistics. Our intent to find homology, monophlyly and endemicity (rather than the superficial cousin, similarity) must be held when selecting methods and programs that we use, ne c'est pas?

Assumptions held so dearly by some cladists, such as Patterson's test for homology and similarity as a requisite for monophyly, are all phony. Cladists should not use phenetic methods in order to make sense of classification, instead they should use homology and relationships. The only way (if any) which we are able to use phenetics meaningfully is to treat it as a mimic of the real thing (cladistic pen and paper methods). After all that is what phenetics is about, mimicking reality.

A mimic in cladistics is any phenetic method that attempts to implement a genuine theory or intention. Any phenetic implementation needs to be considered carefully since they were originally not intended for cladistic for biogeographical analysis. Many of the methods and implementations we use today have existed in statistical and mathematical classifications (i.e., data matrix, parsimony, compatibility, clustering, subtrees etc.). Rather than accepting these methods wholeheartedly as being "cladistic", cladists should fool the mimics. This has been successfully done by a program called TAX (Nelson & Ladiges, 1991). TAX fools the program into treating areas of no relationships as questions marks, without treating absences as evidence.

If cladistics is to survive as an evolutionary field intent on finding homologies and monophyly, it needs to re-examine the phenetic methods that it uses. A field that is becoming dependent on phenetic methdology can easily become phenetic.

The image above was made by David Maddison in 1981 when "... Cladistics versus Phenetics debates were still fresh in people's minds". We hope that the same image may re-spark some of that debate. The image may be found on his website.


Farris, J. S. 1983. The logical basis of phylogenetic analysis. pp. 1-47 in Advances in Cladistics, Volume 2, Proceedings of the Second Meeting of the Willi Hennig Society. ed. Norman I. Platnick and V. A. Funk. Columbia University Press, New York.
Nelson, G., & Ladgies, P.Y. 1992. TAS and TAX: MSDOS programs for cladistics, version 3.0. Pub- lished by the authors, New York and Melbourne.
Patterson, C. 1982. Morphology characters and homology. In: K. A. Joysey and A. E. Friday (eds.), Problems of Phylogenetic Reconstruction. Systematics Association Special Volume, 21: 21-74.
Rosen, B.R. (1988) From fossils to Earth history: applied historical biogeography. Analytical biogeography: an integrated approach to the study of animal and plant distributions (ed. by A.A. Myers and P.S. Giller), pp. 437–481. Chapman & Hall,
Sneath, P.H.A. & Sokal, R.R. 1973. Numerical taxonomy — The principles and practice of numerical classification. W. H. Freeman, San Francisco.


Joe Felsenstein said...

Why you call many methods "phenetics" is not clarified by your clarification. The quote from Wikipedia is fine -- it refers to classifying by "overall similarity". In your comments you drop the "overall" and argue that methods that use "similarity" are phenetic. That makes most systematists pheneticists. Few others will agree that this is a good way to define "phenetic". You should find some other term.

David Williams & Malte Ebach said...

We 'argue that methods that use "similarity" are phenetic. That makes most systematists pheneticists.' No, that makes most numerical systematists pheneticists. Consider this sentence from Felsenstein 2004: xix), "Phylogenies ... have been around for over 140 years, but statistical, computational, and algorithmic work on them is barely 40 years old." So, the first 100 years struggled towards defining relationships; the next 40 retarded that progress.

Dalton de Souza Amorim said...

I am not sure if Felsenstein complain is correct... Obviously this is not a trivial discussion. The question is that when we refer to "similarity" in general we mean _any_ kind of similarity, including altogether plesiomorphies, homoplasies, reversions and, too, synapomorphies. What would differentiate phenetics from a phylogenetic perspective is the ability to specifically discern synapomorphy from general "similarity". In other words, to avoid pheneticism, an analysis needs to be qualitatively restrictive when dealing with similarity information. If an analysis renders results that are not able to indicate exclusive common ancestry (by those sharing that similarity), it is using a phenetic algorithm after all. Most algorithms (not only the “old” ones) are not able to discriminate special similarity (synapomorphy) among listed similarities. Under this perspective, phenetics would not be an intentional philosophy of systematics, but an intrinsic property or limitation of given methods. One could argue that this would imply that programs working with parsimony principles that are not able to properly recover (because of heuristic limitations) the true phylogenies also to be phenetic. I would agree with that –although this could bring the discussion to another level. Anyway, a number of recent programs (including, e.g., those of molecular alignment), not only those that addressed “overall similarity”, would perfectly fit in a phenetic profile because of their working criteria or of their limited results. From this perspective the original posting was correct in its concepts of similarity and phenetics.