Systematics and Biogeography: Natural Classification

Such expressions as that famous one of Linnæus, and which we often meet with in a more or less concealed form, that the characters do not make the genus, but that the genus gives the characters, seem to imply that something more is included in our classification, than mere resemblance. I believe that something more is included; and that propinquity of descent,—the only known cause of the similarity of organic beings,—is the bond, hidden as it is by various degrees of modification, which is partially revealed to us by our classifications (Darwin, 1859, p. 413f).

Showing posts with label Natural Classification. Show all posts

Wednesday, 24 August 2011

The Autonomous Algorithm

The S&B Blog will be running a series of posts dealing with the rise of the black box and the fall of the foundations of systematics.


The Timetree of Life: A product of the Autonomous Algorithm?

Presently, the majority of systematic analyses are constructed in the same way - a matrix is assembled and fed into a computer that then produces a branching diagram. Students of systematics are taught how to produce this branching diagram, using the algorithm, without context to the foundations of systematics. The result is a whole new generation of computer users ignorant of the basic fundamentals of systematics, such as theory (i.e., homology, monophyly), history (i.e., why we do what we do) and methodology (i.e., how to find homologs and construct a cladogram by hand). This also results in an increased dependency on algorithms, which in turn creates a new systematic history and theory that revolves around algorithms rather than concepts. The former black box, which implemented basic algorithms to find approximations of cladograms, is now totally autonomous to the theory, method and history that had gone into its creation. We call this the Autonomous Algorithm.

Classification and Non-Trees

In spite its role as a ‘central metaphor’ and two decades of effort to promote ‘tree-thinking’, evolutionary relationships are now being portrayed in ways other than the simple bifurcating tree, recent examples being the ‘ring of life’ (Rivera & Lake 2004), the interlinking, anastomosing networks of major eukaryote groups (Doolittle 1999, 2000, Doolittle & Bapteste 2007, for commentary see Arnold 2007, Lane & Archibald 2008, McInerney et al. 2008, Dagan & Martin 2006), interconnecting networks relating various taxa (Hertel et al. 2006), and so on, the idea being summarised in a recent New Scientist article “Why Darwin was wrong about the tree of life”.

Most of this recent batch of non-trees have resulted from analysis of molecular data, although the general argument – if biological classification is hierarchical, then it prevents the representation of ‘real’ reticulate patterns – was explored in a cladistic context some three decades ago (Bremer & Wanntorp 1979).
Significance (or explanation) for many of these molecular diagrams is offered via the process of Lateral (or Horizontal) Gene Transfer (LGT, HGT), the horizontal transfer of a gene or genetic material from one organism to another, distantly related organism (Dagan & Martin 2006), first outlined some years ago to support the theory of serial endosymbiosis (Margulis 1998) to explain the origin of chloroplasts and mitochrondria (see Journal of Phycology 44 (1) and Lane & Archibald 2008). LGT is a mechanism to explain instances of xenology (“foreign genes”, Gray and Fitch 1983, p. 64), “a form of homology (inferred common ancestry) in which the sequence (gene) homology is incongruent with that of the organisms carrying the gene, and horizontal gene transfer or transfection is the assumed cause” (Patterson 1988, p. 612). Xenology finds its closest morphological equivalent in parallelism, a term which remains hard to define but can be simplified by associating it with incongruent homologies (similarities); xenology finds its biogeographical equivalent in dispersal, a term equally hard to define but simply suggests incongruent distributions (Williams & Embley 1996, pp. 581—582). Parallelism (Arendt & Reznick 2008) and dispersal (Queiroz 2005) are being discussed again, within the fresh gloss provided by molecular data, although interpretations of parallelism never really disappeared (Roth 1984:14; Sluys 1989; Wagner 1989:55, 66; Brooks 1996; DeSalle et al. 1996; Gould 2002), with suggestions being made such as “the significance of this similarity [parallelism] is thus dependent on the existence of a relevant underlying process” (Sanderson and Hufford 1996:328). Even earlier, Simpson wrote:

“In the most restricted sense virtually all evolution involves parallelism. Homologous genes tend to mutate in the same way (p. 9)… Homology is always valid evidence of affinity. Parallelism is less direct and reliable, but it is also valid evidence within somewhat broader limits. It may lead to overestimates of degree of affinity, but it is not likely to induce belief in wholly false affinity (p. 10)” (Simpson 1945, pp. 9—10).Simpson’s words turned out not to be so, for the parallelisms he noted simply mislead determination of exact relationships among mammals (McKenna & Bell 1997): those similarities identified as parallelisms (like xenology and dispersal) are simply incongruent characters.
All the same, it has been argued that reticulate networks allow incongruent ‘homologies’ to be accommodated on the same diagram relative to congruent homologies (Huson & Bryant 2006). The general idea seems similar to that explored by William Sharp Macleay and his circular systems: an attempt to represent what he called analogies and affinities (homologies) in one system (Macleay 1819, Fig. 6).

Yet if even orthologous (homologous) genes do not support ‘tree-thinking’ (Bapteste et al. 2005), incongruence among gene-trees presents problems for the effectiveness of these data, rather than provide alternative explanations for incongruence (LGT = parallelism=dispersal). Simply put: Cladograms deal with character distributions and their implications for taxon relationships (classifications), rather than vehicles for explaining incongruence.

References

Arnold, M. 2007. Evolution through Genetic Exchange, Oxford University Press, Oxford.
Arendt, J. & Resnick, D. 2008. Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends in Ecology & Evolution 23: 26—32.
Bapteste, E., Susko, E., Leigh, J., MacLeod, D., Charlebois, R.L. & Doolittle, W.F. 2005. Do orthologous gene phylogenies really support tree-thinking? BMC Evolutionary Biology, 5:33; doi:10.1186/1471-2148-5-33.
Bremer, K. & Wanntorp, H.-E. 1979. Hierarchy and reticulation in systematics. Systematic Zoology 28: 624—627.
Brooks, D. R. 1996. Explanation of homoplasy at different levels of biological organisation. In M.J. Sanderson and L. Hufford (eds) Homoplasy. The Recurrence of Similarity in Evolution, pp. 3—36. San Diego: Academic Press.
Dagan, T. & Martin, W. 2006. The tree of one percent. Genome Biology 7: 118.1—118.7.
DeSalle, R., Agosti, D., Whiting, M., Perez-Sweeney, B., Renson, J., Baker, R., Bonacum, J. & Bang, R. 1996. Cross-roads, milestones, and landmarks in insect development and evolution: Implications for systematics. Aliso 14:305—21.
Doolittle, W.F. 1999. Phylogenetic classification and the universal tree. Science 284: 2124—2128.
Doolittle, W.F. 2000. Uprooting the tree of life. Scientific American, Feb. 2000: 90—95.
Doolittle, W. F. & Bapteste, E. 2007. Pattern pluralism and the Tree of Life hypothesis. PNAS 104:2043—2049.
Gould, S.J. 2002. The Structure of Evolutionary Theory. Cambridge MA: Harvard Univ. Press.
Gray G.S. & Fitch, W.M. 1983. Evolution of antibiotic resistance genes: the DNA sequence of a kanamycin resistance gene from Staphylococcus aureus. Mol. Biol.Evol. 1: 57–66.
Hertel, J., Lindemeyer, M., Missal, K., Fried, C., Tanzer, A., Flamm, C., Hofacker, I.L., Stadler, P.F. and the Students of Bioinformatics Computer Labs 2004 and 2005. 2006. The expansion of the metazoan microRNA repertoire. BMC Genomics 2006, 7:25.
Huson D.H. & Bryant D. 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology & Evolution 23:254—67.
Lane, C.E. & Archibald, J.M. 2008. The eukaryotic tree of life: Endosymbiosis takes its TOL. Trends in Ecology and Evolution 23: 268—275.
Margulis, Lynn. 1998. Symbiotic Planet: A New Look at Evolution. New York: Basic Books.
McInerney, J.O., Cotton, J.A. & Pisani, D. 2008. The prokaryotic tree of life: Past, present...and future? Trends in Ecology and Evolution 23: 276—281.
McKenna, M.C. & Bell, S.K. 1997. [with contributions from G. G. Simpson et al.]. Classification of mammals above the species level. New York: Columbia University Press.
MacLeay, W.S. 1819—1821. Horae entomologicae: or Essays on the Annulose Animals, &c. Vol. 1, Pt. 1 & 2. S. Bagster, London.
Patterson, C. 1988. Homology in classical and molecular biology. Molecular Biology and Evolution 5: 603—625.
Rivera, M.C. & Lake, J.A. 2004. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature (9th September 2004) 431: 152—155.
Roth, V. 1984. On homology. Biological Journal of the Linnean Society 22:13—29.
Sanderson, M.J. and Hufford, L. (eds) 1996. Homoplasy. The Recurrence of Similarity in Evolution, San Diego: Academic Press.
Simpson, G. G. 1945. The principles of classification and a classification of mammals. Bulletin of the American Museum of Natural History 85:1-350.
Sluys, R. 1989. Rampant parallelism: An appraisal of the use of nonuniversal derived character states in phylogenetic reconstruction. Systematic Zoology 38:350—70.
Wagner, G.P. 1989. The Biological Homology Concept. Annual Review of Ecology and Systematics 20: 51—69; doi:10.1146/annurev.es.20.110189.000411
Williams, DM. & Embley, TM. 1996. Microbial Diversity. Annual Review of Ecology and Systematics 27: 569-595.

Sunday, 16 March 2008

Defining Phenetics, Intentions and Mimics

Many reading this blog are probably wondering why we seem to call everything phenetics. Phenetics is a term used, incorrectly, to only describe a certain type of methodology, namely clustering based on similarity (i.e., neighbor-joining etc.). In fact phenetics is nothing more than Numerical Taxonomy (Sneath & Sokal, 1973), a topic that we have discussed in a previous blog (Phenetic "Natural" Classifications).

Phenetics attempts to classify organisms based on over-all similarity. An excellent definition of phenetics, which can be found at Wikipedia, goes one step further:

"In biology, phenetics, also known as numerical taxonomy, is an attempt to classify organisms based on overall similarity, usually in morphology or other observable traits, regardless of their phylogeny or evolutionary relation".

Where phenetics becomes problematic is when these classifications are considered to be natural, that is monophyletic. A monophyletic taxon is based on relationship, namely homology. Homology is not a measurement of similarity but an expression of relationship. Phenetically grouped organisms may not necessarily be more closely related to each other than they are to another group. In other words, phenetics cannot distinguish paraphyly from monophyly. An analogous problem exists in biogeography.

Parsimony Anaylsis of Endemicity (PAE) is a method developed in order find similarities between areas (see Rosen 1988). The method simply requires a data matrix of presence and absences of taxic distributions. In contrast, cladistic biogeography demands that taxa used in analysis are monophyletic, however many fossil groups have no relations that coexisted in the same period. This means that some paleontologists are forced to deal with higher taxon biogeography (i.e. at family or ordinal level) or abandon cladistic biogeography altogether. The idea behind PAE is to use any group within a phenetic context. Monophyly is not a requirement of PAE therefore absences can be used to cluster organisms into areas since no notion of homology or relationship is assumed. As with phenetic findings in systematics, some users have made the mistake of assuming that PAE can find phylogenetic signals based on non-evolutionary data, that is, non-homologous information, in the data matrix.

On closer examination we find that many systematists and biogeographers intent on discovering homology, monophyly and endemism are nevertheless using phenetic methods. Perhaps this is due to a lack of readily available methods in the literature. After all, cladistics and cladistic biogeography started off as "pen and paper" methods whereas phenetics was always a numerical method (hence numerical taxonomy). The issue at stake is whether using phenetic methods jeopardizes our intent, namely to search for homologies, monophyly and endemic areas. We argue that it does.

The problems lie in transposing data into a data matrix using neighbor-joining, clustering, parsimony or compatibility as are all phenetic - that is, methods that use overall similarity in order to find classifications. These methods can not distinguish natural (monophyletic) from artificial (non-monophyletic) classifications.

Our favorite programs are rightly pointed out as black-boxes yet we shrug this off and cite Farris (1983) or recite some algorithm. In some extreme cases we justify our intentions by making sure that our data is compatible to our methods (sensu Patterson 1982). But we cannot continue skirting this issue. Similarity is an anathema that our forebears, Goethe, Vic D'Azyr, Saint Hilaire, Owen, the founders of homology had quickly disposed. Similarity is the foundation of phenetics, not cladistics. Our intent to find homology, monophlyly and endemicity (rather than the superficial cousin, similarity) must be held when selecting methods and programs that we use, ne c'est pas?

Assumptions held so dearly by some cladists, such as Patterson's test for homology and similarity as a requisite for monophyly, are all phony. Cladists should not use phenetic methods in order to make sense of classification, instead they should use homology and relationships. The only way (if any) which we are able to use phenetics meaningfully is to treat it as a mimic of the real thing (cladistic pen and paper methods). After all that is what phenetics is about, mimicking reality.

A mimic in cladistics is any phenetic method that attempts to implement a genuine theory or intention. Any phenetic implementation needs to be considered carefully since they were originally not intended for cladistic for biogeographical analysis. Many of the methods and implementations we use today have existed in statistical and mathematical classifications (i.e., data matrix, parsimony, compatibility, clustering, subtrees etc.). Rather than accepting these methods wholeheartedly as being "cladistic", cladists should fool the mimics. This has been successfully done by a program called TAX (Nelson & Ladiges, 1991). TAX fools the program into treating areas of no relationships as questions marks, without treating absences as evidence.

If cladistics is to survive as an evolutionary field intent on finding homologies and monophyly, it needs to re-examine the phenetic methods that it uses. A field that is becoming dependent on phenetic methdology can easily become phenetic.

The image above was made by David Maddison in 1981 when "... Cladistics versus Phenetics debates were still fresh in people's minds". We hope that the same image may re-spark some of that debate. The image may be found on his website.

References

Farris, J. S. 1983. The logical basis of phylogenetic analysis. pp. 1-47 in Advances in Cladistics, Volume 2, Proceedings of the Second Meeting of the Willi Hennig Society. ed. Norman I. Platnick and V. A. Funk. Columbia University Press, New York.
Nelson, G., & Ladgies, P.Y. 1992. TAS and TAX: MSDOS programs for cladistics, version 3.0. Pub- lished by the authors, New York and Melbourne.
Patterson, C. 1982. Morphology characters and homology. In: K. A. Joysey and A. E. Friday (eds.), Problems of Phylogenetic Reconstruction. Systematics Association Special Volume, 21: 21-74.
Rosen, B.R. (1988) From fossils to Earth history: applied historical biogeography. Analytical biogeography: an integrated approach to the study of animal and plant distributions (ed. by A.A. Myers and P.S. Giller), pp. 437–481. Chapman & Hall,
London
Sneath, P.H.A. & Sokal, R.R. 1973. Numerical taxonomy — The principles and practice of numerical classification. W. H. Freeman, San Francisco.

Monday, 3 December 2007

Buddah: Look at the moon, not my finger!

Joe Felsenstein has suggested an analytical example, one he felt we might like to examine. The example is simple:

"If we take a sequence alignment, perhaps an easy case such as an alignment of exon sequences of a gene, and then we run (say) a parsimony algorithm, and consider ourselves to be making an estimate of the unrooted evolutionary tree (perhaps later rooting it by outgroup), what do Ebach and Williams say of this?"(Felsenstein in Comments)

Felsenstein kindly offers a few suggestions ("guesses") as to what we might think. These are as follows::

It is not inferring the phylogeny because this process is "phenetic"

It is not making a classification so it is fine but not of interest to us

It should instead be trying to make a classification

It is making a classification but a "phenetic" one so not a good one.

Felsenstein offers a view as to which of the suggestions ("guesses") is correct, opting for number 4: 'It is making a classification but a "phenetic" one so not a good one'.

Of course, we welcome helpful suggestions ("guesses"), as our desire has been (and hopefully will remain) the examination of the process of systematics, a complex field that develops and grows, as does all science. Thus, we crave his indulgence at our dissection of his suggestions in the interest of scientific endeavour.

First, we find it a little troublesome to deal with efforts that are thought ‘good’ or "bad" and do not really know what those words might mean in the context above. To us, phenetics is neither good nor bad. Consider the following. Linnaeus created the Sexual System of classification for plants, a system he acknowledged as artificial. That system still has its uses, when one is faced with a particular plant and needs to know its name, then (usually) that can achieved by working through the Sexual System. It is an Artificial Classification – it is neither bad nor good (Linnaeus knew that). It is inappropriate when wishing to investigate the natural system; it is appropriate when wishing to find a name.

Second, whether one is "inferring the phylogeny" or just exploring the distribution of homologies, any branching diagram that results can be made into a classification. Thus, points 1—4 above are without meaning.

In our (several) posts we noted that Natural Classification is investigated using homologies – and similarities, in and of themselves, are not homologies. Consider a matrix of characters, with either 1's and 0's or A's and T's ("…take a sequence alignment…"). What are they? Similarities. The matrix is, one might say, phenetic. The application of UPGMA, or Neighbor-joining, or parsimony, or…well, whatever, cannot change that fact. And, it would appear, that UPGMA, or Neighbor-joining, or parsimony, and so on, are all forms of weighting, regardless of whether one might believe that the 'model' is an accurate representation of the evolutionary process. Now as we noted, "Phenetics uses a method in order to generate a classification that mimics a natural group. The method for doing so can be useful in order to work out similarities between taxa, but the method is only a mimic." Thus, we might offer the following: much of the last 40 years of exploration of methods has, inadvertently, focused on ways one might modify or adjust a matrix of similarities.

We do not have, nor do we promote, any "favorite approach…". This is not a competition. Systematics (classification, phylogeny) is about homologies and their distribution.

The cladistic revolution of the 1960s was necessary because of palaeontology, its promises, its claims, and what it delivered. Palaeontology is reformed as a consequence, yet its effect on systematics, mostly detrimental, lasted 100 years.

Perhaps it's time for another revolution.

Friday, 30 November 2007

Wag the Dog: Mimics, False Prophets and Phenetics

Near enough is not good enough should be the motto of cladistics. For many however, near enough is not only better, but something worth pursuing. Phenetics is that "something". It is a mimic and some of its proponents are false prophets who prefer a "near enough" result to a real understanding. Systematics and biogeography can not rest on its numerical laurels too long. Already in molecular systematics the numerical method is defining the field. When the mimic starts to dictate what the science should be, we have a severe case of the dog’s tail wagging the dog.

Mimics

Artificial classifications are a key or classification based on a particular organ. This forms a System, one that can predict or mimic a natural classification.

Taxonomists, systematists and biogeographers often use artificial classifications or Classification Systems in order to identify and classify taxa. People around the world use classification systems everyday. This is one that many learn at school:

Fish have scales and no limbs.
Amphibians lay eggs on land and live in water.
Reptiles lay eggs, have scales and live on land.
Birds lay eggs and have feathers.
Mammals have skin and hair, mothers feed their young milk.

Classification systems are helpful in identifying taxa but they only mimic real relationships. In the case above only mammals and birds are natural (monophyletic) groups, but the classification system for birds may also apply to taxa that are categorized as reptiles. In other words, the system above only mimics the natural group (i.e., birds), but it does use the homologies that define that group.

Linnaeus was the first person to define a classification system that attempts to mimic natural groups. The system can still be used today in order to identify plants. What Linnaeus’s, or any classification, does not do is purport to be a natural method.

A method is a key or classification based on all of the organs of a taxon; methods are sub-divided into artificial and natural depending on their purpose.

Classification methods not only mimic, they also may predict. In either case they attempt to generate classifications that are near the mark. Phenetics uses a method in order to generate a classification that mimics a natural group. The method for doing so can be useful in order to work out similarities between taxa, but the method is only a mimic. Phenetics becomes problematic when it starts getting closer to the mark. In some cases a phenetic analysis can replicate a true relationship – a homology – without the need for homologies. Although these methods are praiseworthy, they do not actually find homologies. A mimic only replicates something, it does not actually discover. A phenetic analysis may for instance replicate a monophyletic group perfectly, using an assortment of homologues, but since the method uses similarity (i.e., non-relationships) it cannot, by definition, discover homologies, even though it replicates them perfectly.

An analogy would be to state that anything that lives in water and lays eggs on land is an amphibian. Although this behavioural trait is more likely to be common amongst toads, frogs, salamanders and newts, it is not a homology as it is something not unique to that group. Birds may lay eggs and bear feathers, but so do a number of therapod groups. Similarity is not a relationship, only a measurement of likeness based on one or more hypotheses.

False Prophets

Phenetics becomes problematic when it confuses the mimic for the real thing. Certainly phenetics can create a classification system using a method of similarity, but it does not discover natural groups. Therefore the term Natural System is a contradiction. A system cannot be natural as it is based on a single characteristic or assumption and not relationship. Natural groups, as pointed out in the post Phenetic "Natural" Classifications, are not based on a priori assumption:

"... system of classification is the more natural the more propositions there are that can be made regarding its constituent classes" (Sokal & Sneath 1963: 19).

Sokal and Sneath (1963) have turned the mimic into natural group.

Phenetics as purveyor of natural groups is erroneous and prophetic. Stating that natural groups can be reached through a system of quantification and similarity is appealing to those that rely on statistical programs. Most systematists and biogeographers rely on such programs and have swallowed the “phenetic prophesy” hook, line and sinker. Natural groups, it seems, is just a matter of quantity.

Wag the Dog

The phenetic prophesy states that similarity* is relationship, and can discover natural groups. This is wagging the dog.

Taxonomists, systematists and biogeographers can only discover patterns, homologies that give us insight into relationship. Before we do this we may impose a system of beliefs, hypotheses and theories about our own groups and their relationships. Some times we test these assumptions by discovering homologies and find that we were right. That is the nature of a robust scientific discipline. Once we turn that around and impose our own “natural” law, then we can only formulate more hypotheses in differing ways, never discovering only generating. Molecular systematics is now in a unique position to learn from 300 years of systematic theory that has discovered time and time again that homology is not similarity. Unfortunately many in the field ignore the past systematic literature and read that of the phenetic prophesy.

One day someone bent over a PCR machine may come to realise that they are part of a 300 year cycle of wagging.

*There are two forms of similarity. One is that of simile “That kangaroo looks like a rat”. The other is quantifiable and is born from statistics (i.e., divergence and possibility) “The ape is 22% banana”. We refer to the latter form throughout this post.

References

Sokal R.R. & Sneath P.H.A. 1963. Principles of Numerical Taxonomy. W. H. Freeman, San Francisco.

Thursday, 29 November 2007

Natural and Artificial Classification: A reply to Wilkins

The following post is a reply to John Wilkin’s The philosophy of classification on his blog Evolving Thoughts.

An Uninformed Consensus

John Wilkins in his recent post believe that our view is "radical" because

"… they have presented some views on classification that do, indeed, differ from the received consensus."

We beg to differ.

In late 20th and early 21st century literature there are very few discussions on the nature of classification. Most revolves around explaining existing classifications (i.e. Reptilia) or in the defence of poorly defined taxonomic groups that fail to form groups (i.e., paraphly). It is these debates (i.e., paraphly versus monophyly) that would benefit from the discussions of early 20th and late 19th century morphologists, would did hold a consensus view of natural and artificial classifications. That consensus was this,

We then follow a Natural Method, which cannot be called a system, because it is destitute of any unity of principle. (Candolle & Sprengel, 1821)

It is our belief that the pursuit for explanations to existing classifications that ended this debate and therefore any consensus. Furthermore, it is the addition of homology = similarity that radically altered how we view classifications, leading to the almost Fukuyamaist statement that,

"I would say that the effort put into this controversy is further evidence that systematists do not have their priorities straight. In their day-to-day work they really do not make much use of classifications, but they show a strange obsession with fighting about them for reasons that seem to me to be an historical curiosity" (Felsenstein 2005)

Currently there is no consensus over natural or artificial classifications. The topic is a moot point and very few concern themselves with its relevance to 21st systematics and biogeography. As systematists we are more or less tied to the consensus of the past, namely to the literature of the 19th century and early 20th century. In that sense we are not “radicals", but rather “old fashioned”.

Similarity and Homology

Similarity, as expressed in the usual kinds of data matrices, is 11, or, the molecular version, AA is not a relation. The 11 and the AA are, if anything, homologues, the parts, the 'namesakes' as Owen called them. We see homology as a relation: 0(11), or the molecular version, G(AA). We stated earlier:

"...all molecular systematic studies are phenetic as they ignore relationship, that is, homology". One might expand that and say, "...all numerical systematic studies are phenetic as they ignore relationship, that is, homology."

This would be more accurate.

In response to John’s comment,

"I'm not sure I follow this. According to current usage, molecular systematics does rely on homologies: they have a number of special terms devoted to identifying them: paralogy, xenology and orthology. Of course, they often don't use homology properly. And to identify a homology in molecular biology you need to do some prior work; homology is an inference from sequence similarity (including eyeball alignment). In short, if I understand the argument, molecular systematics derives homology from similarity".

In fact we would suggest that it would be more accurate to say:

"... molecular systematics does rely on HOMOLOGUES: they have a number of RELATIONS DERIVED FROM them: paralogy, xenology and orthology....And to identify a HOMOLOGUE in molecular biology you need to do some prior work; HOMOLOGUES ARE inferenceS from sequence similarity (including eyeball alignment). In short, if I understand the argument, molecular systematics derives HOMOLOGUES from similarity ..."

This certainly is not radical. What we are suggesting is that de Candolle (1813) presented a very clear account of classification, an account still of significance today.

Haeckel and Classification

In our understanding, Ernst Haeckel did more than most to promote the genealogical view of species relationships. It might be fair to say that all our genealogical endeavours stem from Haeckel. Adolf Naef (1917, 1919)was the first to critique that viewpoint His interest was in natural classification. Hennig (1950), quite deliberately, focused on Naef. Thus, it might be fair to say that Hennig's efforts were directed towards rehabilitating Haeckel. Further, one might see Systematics and Biogeography (Nelson & Platnick, 1981) as a further detailed critique of Haeckel - if the most detailed critique available - and a restatement of de Candolle's viewpoints on classification. In this sense cladistics sensu Nelson & Platnick is of greater significance than cladistics sensu computer programs.

We would venture the suggestion that Sober (1988) mistook cladistics sensu Farris (parsimony sensu Farris) as if it was the generally accepted view (in the mid-1980s that might have been possible). In fact Sober deliberately excludes the more general view, as if the argument really was about parsimony versus likelihood, one algorithm versus another,

"Because this work is about phylogenetic inference, not classification, nothing will be said about the current controversy concerning so-called 'pattern' cladism." (Sober, 1988:8, footnote 7).

Thus, in our view, the more general study of classification exclude Sober's work as a relevant commentary on the matter.

References
Candolle, A.P., de, & Sprengel, K. 1978. Elements of the philosophy of plants. Reprint of the 1821 ed.. New York, NY.
Hennig, W. 1950. Grundzüge einer Theorie der phylogenetischen Systematik, Deutsche Zentralverlag, Berlin.
Naef, A. 1917. Die individuelle Entwicklung organischer Formen als Urkunde ihrer Stammesgeschichte: (Kritische Betrachtungen über das sogenannte "biogenetische Grundgesetz"), Verlag von Gustav Fischer, Jena.
Naef, A. 1919. Idealistische Morphologie und Phylogenetik (zur Methodik der systematischen), Verlag von Gustav Fischer, Jena).
Nelson, G. & Platnick, N.I. 1981. Systematics and biogeography. Cladistics and vicariance. Columbia University Press, New York.
Sober, E. 1988. Reconstructing the Past: Parsimony, Evolution, and Inference. MIT Press, Cambridge, Massachusetts.

Artificial and Natural Classifications: A Clarification

It was not by accident that we referred to de Candolle (1813): "Naef's concern was with the discovery of natural, as opposed to artificial classification, a problem examined in detail by A. P. de Candolle (1813)".

This is what de Candolle had to say about artificial classifications:

"Others have as their essential goal to give to persons who know nothing of the names of plants an easy way to discover the names in the books by inspection of the plant itself. These classifications have been given the name of Artificial Methods."

And,

"...there are those persons who want to study plants, either in themselves, or in their real relations among themselves, and to class them so that those plants most closely related in the order of nature are also those most closely related in our books. These classifications have received the name of Natural Methods."

De Candolle considers Systems and Methods.

A system is a key or classification based on a particular organ - leaf, flower, etc.

A method is a key or classification based on all of the organs of a plant; methods are sub-divided into artificial and natural depending on their purpose.

De Candolle again:

"classes that are truly natural, established on the basis of one of the major functions, are necessarily the same as those established on the basis of the other."

That is, congruence.

Bar-coding, based on "a particular organ", interpreted as a piece of DNA, is, in this sense, a system. It might be seen as an artificial classification as its purpose is to find the name of any given plant or animal.

Now, is molecular systematics a system or a method? It too is based upon "a particular organ", so it too might be considered a system. Now if considered a method, we see that there is no notion of congruence at all as no other datasets are given consideration. Molecular systematics as a form of measuring similarity constitutes a system, not a method.

Ancestors and other mechanical explanations are not of any concern in the debate between artificial and natural classifications. One does not decide on homology in advance. It is either there or it is not. Homology, as we understand, is a relation. A similarity such as 11, or AA, is not a relation. Thus, all molecular systematic studies are phenetic as they ignore relationship, that is, homology.

Wednesday, 28 November 2007

Adolf Naef - A Potted Biography

Who was he?
Adolf Naef was a Swiss systematist, malacologist and a proponent of systematic morphology. He was born in Niederhelfenschwil on 1st May 1883 and passed away on May 11th 1949.

What did he do?
Naef studied at the University of Zurich, under the guidance of Arnold Lang (1855—1914), a former Professor of Jena University and close friend of Ernst Haeckel. Naef visited and worked in Anton Dorn’s Zoological Station in Naples, Italy in 1908, studying the squid Loligo vulgaris, the subject of his dissertation (Naef, 1909a, b). Naef returned to the Naples Zoological Station in the mid 1920s to study cephalopods, publishing a two-part monograph in the Station’s Fauna und Flora des Golfes von Neapel und der Angrenzenden Meers-Abschitte (Fauna e Flora del Golfo di Napoli) series (Naef 1921d, 1923c, 1928, later translated into English, Naef, 1972a, 1972b, 2000), which formed the basis for his two short but significant monographs on systematic theory (Naef, 1917, 1919). In 1922 he became Professor at the University of Zagreb, and in 1927 was Professor of Zoology at the University of Cairo.

What’s the big idea?
Naef’s studies were framed within Systematische Morphologie (Systematic morphology) (Naef, 1917, 1919), the details he sketched out as early as 1913:

“Phylogenetic and natural systematics deal with the same factual material, and although each has different basic concepts, both disciplines can be united in a single concept because their objects are so similar. I have therefore proposed the name ‘systematic morphology’ for this concept (Naef, 1913: 344)…It is intended to show that there is an inner relationship between natural systematics and (comparative) morphology” (Naef, 1921-23: 7, from the English translation, Naef, 1972a: 12).

Naef’s concern was with the discovery of natural, as opposed to artificial classification, a problem examined in detail by A. P. de Candolle (1813). Naef expressed it as so:

“For decades, phylogenetics lacked a valid methodological basis and developed on the decayed trunk of a withering tradition rooted in the idealistic morphology and the systematics of pre-Darwinian times. There was talk of systematic ‘tact’ and morphological ‘instinct’, terms which were felt rather than understood and consequently insufficient to form the frame of a science which required sound definitions and clearly formulated principles” (Naef, 1921-23, pp. 6-7, from the English translation, Naef, 1972, p. 12).

And thus was born ‘Systematische Morphologie’, perhaps the beginnings of cladistics, in its most general form (of which more in a further post). Towards the end of his career, Naef published several detailed accounts of ‘Systematische Morphologie’ (Naef, 1931a, b, 1933a), including a succinct summary in the widely read 2nd edition of the Handwörterbuch der Naturwissenschaften (Naef, 1933b).

Naef might be considered a man out of time – as might many morphologists today, relative to the explosion of molecular data. In Naef’s day palaeontology and the post World War II hegemony of the modern synthesis was attracting the young minds. Today it is molecular systematics and DNA barcoding – versions of artificial classifications.

References

Candolle, A.-P. de (1813). Théorie élémentaire de la botanique ou exposition des principes de la classification naturelle et de l'art de décrire et d'étudier les végétaux. Deterville, Paris.
Naef, A. (1909a). Die Organogenese des Cölomsystems und der zentralen Blutgefässe von Loligo. Jenaische Zeitschrift für Naturwissenschaft, 45, N.F. 38:221—266.
Naef, A. (1909b). Die Organogenese des Cölomsystems und der zentralen Blutgefässe von Loligo. Inaugural-Dissertation, Univers. Zurich, 46pp.
Naef, A. (1913). Studien zur generellen Morphologie der Mollusken. 2. Teil. Das Cölomsystem in seinen topographischen Berziehungen. Ergebnisse und Fortschritte der Zoologie 3: 329—462.
Naef, A. (1917). Die individuelle Entwicklung organischer Formen als Urkunde ihrer Stammesgeschichte: (Kritische Betrachtungen über das sogenannte “biogenetische Grundgesetz”). Verlag von Gustav Fischer, Jena.
Naef, A. (1919). Idealistische Morphologie und Phylogenetik (zur Methodik der systematischen). Verlag von Gustav Fischer, Jena.
Naef, A. (1921—23). Die Cephalopoden (Systematik). In: Fauna e Flora del Golfo di Napoli, Monograph 35 (I-1), Pubblicazioni della Stazione Zoologica di Napoli. R. Friedländer and Sohn, Berlin, pp. 1—863.
Naef, A. 1931a. Allgemeine Morphologie. I. Die Gestalt als Begriff und Idee, pp. 77—118 in Bolk, L, Göppert, E., Kallius, E. & Lubosch, W., (editors) Handbuch der vergleichenden Anatomie der Wirbeltiere 1. Berlin: Urban & Schwarzenberg.
Naef, A. 1931b. Phylogenie der Tiere, pp. 1—200 in Baur, E., & Hartmann, M., (editors) Handbuch der Vererbungswissenschaft, Gebrüder Borntraeger, Berlin 13 (3i).
Naef, A. 1933a. Die Vorstufen der Menschwerdung. Eine anschauliche Darstellung der menschlichen Stammesgeschichte und eine kritische Betrachtung ihrer allgemeinen Voraussetzungen. Jena: Verlag von Gustav Fischer.
Naef, A. 1933b. Cephalopoda, pp. 293—310 in Dittler, R., Joos, G., Korschelt, E. Linck, G., Oltmanns, F. and Schaum, K. (editors) Handwörterbuch der Naturwissenschaften, 2nd edition, volume 2. Jena: Verlag von Gustav Fischer.
Naef, A. 1933c. Morphologie der Tierre (Allegmeines und Grundsätzliches), pp. 3—17 in Dittler, R., Joos, G., Korschelt, E. Linck, G., Oltmanns, F. and Schaum, K. (editors) Handwörterbuch der Naturwissenschaften, 2nd edition, volume 7. Jena: Verlag von Gustav Fischer.
Naef, A. 1972a. Cephalopoda. Fauna and Flora of the Bay of Naples (Fauna und Flora des Golfes von Neapel und der Angrenzenden Meers-Abschitte), Monograph 35, Part I, [Vol. I], Fascicle I. Smithsonian Institute Libraries, Washington.
Naef, A. 1972b. Cephalopoda (systematics). Fauna and Flora of the Bay of Naples (Fauna e Flora del Golfo di Napoli), Monograph 35, Part I, [Vol. I], Fascicle II. Washington, Smithsonian Institute Libraries.
Naef, A. 2000. Cephalopoda. Embryology. Fauna and Flora of the Bay of Naples [Fauna und Flora des Golfes von Naepel]. Monograph 35. Part I, Vol. II [Final part of the Monograph No. 35], pp. 3-461. Washington, Smithsonian.

Monday, 26 November 2007

The Curse of Complexity

The world is biologically complex. Scientists have always known this and it is not a new discovery. Rather than accepting complexity as an everyday wonder, scientists are surprised that the world is indeed complex and some are annoyed with those who describe complexity in simple statements or methods. Here are a couple of examples:

"Historical biogeography has recently experienced a significant advancement in three integrated areas. The first is the adoption of an ontology of complexity, replacing the traditional ontology of simplicity, or a priori parsimony; simple and elegant models of the biosphere are not sufficient for explaining the geographical context of the origin of species and their post-speciation movements, producing evolutionary radiations and complex multi-species biotas" (Brooks, 2005: 79).

"The problem can be reduced to deciding when a collection of trees—a 'forest'—is a better explanation for evolutionary relationships among a set of sequences than is a single tree" (Ane and Sanderson 2005: 146).

We see no problem with simplifying a complex world in order to communicate in the form of classifications. We know for instance that a cat is a highly complex creature. So complex in fact, that the term cat or Felis silvestris and the classification of the Felidae are satisfactory in communicating that we are in fact referring to a tabby and everything associated with its complexity. These terms and classification are not however sufficient in explaining the highly complex nature of cat behaviour, sexual reproduction or neural activity. Classification is not about explaining complexity - this is job of General Biology.

Classification, an integral part of comparative biology, attempts to convey what information we have (i.e., about cats) without having to divulge and detail all its complexity (i.e., sexual behaviour). The aim of classification is to summarize (not reduce*) a relationship based on known homologues without recourse to inference. That means, comparative biology is about "simplicity" not causality or interconnectivity (sensu reductionism). We can for instance classify all mammals based on their hair and vertebrates based on the presence of forearms. The more complexity we introduce, the less unique traits there are to compare (i.e., eye colour). Since comparative biology is about comparing and classifying, explicit unobserved explanatory mechanisms have little to do classifications. They are statements about a type of complexity reserved for general biology (i.e., physiology, behaviour, sexual reproduction etc.). Although such explanations are unique events (or a series of events) based on careful considerations of general biological laws and processes, they can however be represented by a single classification.

Let us say for instance that the trilobite Eoharpes guichenensis evolved from E. cristatus which then evolved into E. primus. This can be represented as an anagenetic event and drawn accordingly. Another person may object to this explanation and suggest that E. guichenensis evolved into E. cristatus and E. primus through cladogenesis. Another may see that both explanations have avoided the explanation that E. guichenensis evolved in E. primus and E. primus into E. cristatus.

Regardless of how these species of Eoharpes have evolved, the phylogenetic trees and be summarized or simplified as relationships in the cladogram: E. guichenensis (E. cristatus, E. primus). What is more, is that the nodes on the cladogram are not events, ancestors or morphotypes, but simply junctions supported by homologues. Rather than accepting the cladogram as means of communicating three or more different evolutionary scenarios, it is rejected as being too simplistic or as an explicit scenario (i.e. "cladification" of Mayr and Bock, 2002).

As systematists and biogeographers, that is comparative biologists, we study the shadows of the past. We are at best able to find gross relationships between taxa or areas. The ability to extract any pattern at all from the bits and pieces of information at hand is an extraordinary achievement, but for some this is not enough. A complex world it seems must be shown to be complex, as though this something that is not already appreciated. The ability to communicate and understand such complexity is impossible without "simplification", that is, classifications. Simplifying the complexity that surrounds us is not a crime but a way to understand the world and to communicate that information to others. Without classification, complexity becomes a curse, which leaves us dumbfounded in a sea of information.

*It is important to note that reduction is not simplification. Mechanical explanations for instance are reductions. The philosophy of reductionism revolves around causality and not natural classification.

References

Ané, C. & Sanderson, M.J. 2005. Missing the Forest for the Trees: Phylogenetic Compression and Its Implications for Inferring Complex Evolutionary Histories. Systematic Biology 54: 146 – 157.
Brooks D.R. 2005. Historical biogeography in the age of complexity: expansion and integration. Revista Mexicana de Biodiversidad vol. 76: 79- 94
Mayr, E. & Bock, W.J. 2002. Classifications and other ordering systems. Journal of Zoological Systematics and Evolutionary Research, 40, 169-194.

Monday, 5 November 2007

Haeckel, Hennig and History: Evolving Thoughts and Words

John Wilkins, in his eminently readable and ever provocative blog Evolving Thoughts, presents an account of some historical matters relevant to Natural and Artificial Classifications, matters that might illuminate the differences of opinion between Joe Felsenstein and ourselves. To be sure, we differ on certain fundamental matters. But the issue of natural classifications is a subject that might repay closer attention and discussion. John's history is a cast of worthy individuals (Adanson, Linnaeus, Agassiz, Macleay, etc.), many who made worthy contributions to discovering the means with which to discover natural classification. They all, to one degree or another, had some sort of interpretation of that classification. They all, to one degree or another, had some sort of axe to grind. Never mind.

John moves on to note that "...it is with Haeckel and the early German paleontologists of the 20th century that phylogenetic relations become the core of classification, and we all know, of course, that Hennig defined a natural group as a monophyletic group." Haeckel is a something of a departure and one we see of significance. Here's Agassiz on Haeckel:

"It is not that I hold Darwin himself responsible for these troublesome consequences. In the different works of his pen, he never made allusion to the importance that his ideas could have for the point of view of classification. It is his henchmen who took hold of his theories in order to transform zoological taxonomy" (Agassiz, 1869: 375, our translation) (see also http://www.athro.com/general/atrans.html.

Those henchmen included Haeckel. Most of Haeckel's genealogical trees ('phylogenies') were linear schemes of hypothesized relationships, with some taxa 'giving rise' to others, that is, paraphyletic groups not so much created by him (many were, of course) but retained and explained in terms of ancestry and descent, in terms of evolutionary relationships, relative to a particular model of change. It was to Hennig's credit that Haeckel's paraphyletic groups were exposed for what they were: empty conventions. And thus, a circle was closed and certain groups understood as not part of the discovery process of classification - or so it seemed. Haeckel's problem was taking a viewpoint (ancestry and descent) and interpreting classification from that perspective.

Now that's not a whole million miles away from the current viewpoint:
find a model of evolution and interpret the data from that point of view. Still, again, never mind. What comes shining through most of the earlier contributions to the debate is that, one way or another, Adanson, Linnaeus, Agassiz and Macleay, among others, did have a notion of the centrality of classification: homology. So when John suggests that "...It does not seem to me that cladistic classification is in possession of a notion of taxon that grounds its classifications" he omits consideration of homology. But he is not alone. The entire crop of books recently published dealing with the 'mathemetisation' of phylogeny do not deal with that subject at all. Thus, or so it seems to us, the 'mathemetisation' of classification (phylogeny) has lead to a profusion of artificial methods.

We finished a recent (as yet unpublished) paper with the following words, the first few are from Gareth Nelson:

'"What, then, of cladistics in relation to the history of systematics? If cladistics is merely a restatement of the principles of natural classification, why has cladistics been the subject of argument? I suspect that the argument is largely misplaced, and that the misplacement stems, as de Candolle suggests, from the confounding goals of artificial and natural systems" (Nelson, 1979, p. 20). Cladistics is concerned with homology, monophyly, evolutionary patterns, taxa (species), and natural classifications. That is, natural classification is concerned with relationships.'

PS. One of us [DMW] is a Londoner. We are aware (sometimes painfully) of the relationship between Australia and the UK capital city, the latter a onetime plentiful source of persons to inhabit the former. The 'relationship' between Australian's and Londoner's is such that when travelling in the United States a London accent is often mistaken for an Australian accent. We mention these facts, partly because the other half of this pair [MCE] is an Australian. And partly because the word 'Barny' is also said to originate from Cockney Rythming slang, as in Barny Rubble = trouble, and thus (if true) a mingling of Australian slang, London slang and American cartoon characters - words really do have a life of their own!

Monday, 22 October 2007

Phenetic "Natural" Classifications

Why would any one talk about Phenetic "Natural" Classifications? Strangely the concept turned up in a recent review of Johann-Wolfgang Wägele's book Foundations of Phylogenetics by Norman Platnick in The Quarterly Review of Biology (Vol. 81: 56 - 57).

What caught our eye was the following:

"Phenetics is the theory that clustering by raw similarity (i.e., by counting as significant both the presence and absence of characters, the 0s as well as the 1s in data matrices) will retrieve natural groups" (Platnick 2007: 56).

Phenetics and Natural Groups? We had to investigate.

The concept of a Natural Groups or a Natural Classification in phenetics was championed by P.H.A Sneath and R.R Sokal. Their claim followed Gilmour's dictum, namely a "... system of classification is the more natural the more propositions there are that can be made regarding its constituent classes" (Sokal & Sneath 1963: 19).

If we look at Gilmour (1951) wee see that his definition states: "In the general theory of classification, classifications which serve a large number of purposes are called natural, while those serving a more limited number of purposes are
termed artificial" (Gilmour 1951: 401).

It is clear that the meaning of the term "Natural" has been misinterpreted, both by Gilmour and Sokal & Sneath. No one who wished for a Natural Classification would have bought into the idea that Natural = more data whereas Artificial = less data. Moreover, Sneath and Sokal (1973) went as far as to defend their version of natural classification by using A. P. Candolle's distinction that Artificial Classifications, namely Systems (i.e. Linnaeus' system) should rejected in favour of Natural Classifications, namely a Method (Candolle, 1813) - a concept that was also supported by Goethe.

Gilmour's Natural Classification is a System of Classification and not a Natural Classification or Method. The former imposes a way to order nature (i.e. overall similarity) whereas the other discovers the way nature is ordered (homology and monophyly). The mistake is monumental and is one that often gets made (i.e. Phylocode).

References

Candolle A.-P. de 1813. Theorie elementaire de la botanique. Deterville, Paris.
Gilmour J.S.L. 1951. The development of taxonomic theory since 1851. Nature 168:400- 402.
Sneath P.H.A.& Sokal R.R. 1973. Numerical Taxonomy. Freeman, San Francisco.
Sokal R.R. & Sneath P.H.A. 1963. Principles of Numerical Taxonomy. W. H. Freeman, San Francisco.