Such expressions as that famous one of Linnæus, and which we often meet with in a more or less concealed form, that the characters do not make the genus, but that the genus gives the characters, seem to imply that something more is included in our classification, than mere resemblance. I believe that something more is included; and that propinquity of descent,—the only known cause of the similarity of organic beings,—is the bond, hidden as it is by various degrees of modification, which is partially revealed to us by our classifications (Darwin, 1859, p. 413f).

Sunday, 6 November 2011

The Autonomous Algorithm: Malpractice in Theory

In our last post we introduced the topic of the Autonomous Algorithm, a black box that acts as the foundation for a theory and method. In this post we explain what we mean by theory and method and why we believe that no tool can function as a logical foundation. Doing so is a form of malpractice.

For all the non-philosophers reading this post, we define theory as a set of mathematical principles on which an activity is based. The set of principles that underlie the study of geophysics is that radio waves and sound waves for instance have different levels of penetration. When a sound wave is reflected it can tell us the density and depth of an object, like a rock. These principles are based on physics, and not on the actual program that models the depth and density of rock. Doing that would be putting the cart before the horse. If we change the way we model the results of our acoustic test, we do not change the underlying principles of physics.

A method is a procedure to accomplish something. Methods are generally activities that can be done by pen and paper (although sometimes they are easier when automated) in which we determine the steps, for instance, to find out how to tell what is beneath a particular surface. The implementation is the tool that is used to do implement the method. This is usually as a computer algorithm. So, an algorithm is a tool that is based on a method that is underpinned by a theory. Seems simple enough, but this is often misinterpreted.
Take parsimony for example (and by parsimony we mean Wagner parsimony). It is a method that has several implementations. It is not a theory. The theory is phylogenetic systematics. In recent times, however, many have unwittingly chosen to treat the implementation as a tool for testing a method that in turn decides what the theory should be. Not only is this non-empirical and non-scientific, it is also a form of malpractice within systematics.

Algorithms are simply tools. The algorithm/s in parsimony programs, for example, are not capable of recognising reversals, parsimony, evolution, plesiomorphy, homology and synapomorphy. They are tools that manipulate binary digits to do certain things. When character-state 0 in character 1 of Taxon A is forced to be basal (because of a 0 present in the outgroup), its appearance further up the branching diagram is interpreted as a reversal by the operator (us), not by the algorithm. Parsimony programs have no concept of transformation. That concept lives with the operator (us).

So what are these programs doing? In one sense they are mimicking concepts that would remain incomprehensible to any machine (naturally). To use one of Douglas Adams' analogies, an algorithm doesn't know what transformation is as much as a "tea-leaf knows the history of the East India Company".

Transformations, reversals, plesiomorphies are all concepts interpreted by us. What is a reversal to one user is homoplasy to another. The current crop of algorithms act upon and manipulate 'similarities' not homologies. They are phenetic methods. While this is not a bad thing (providing we do not interpret our cladograms to be the real thing), phenetics does have its limitations: it cannot identify homology. This is left to us, the observer and user of a program.

Malpractice in phylogenetics is due to interpreting a quantitative result as qualitative reality.   For example, representation is a relationship interpreted by similarity only. Algorithms represent real objects (e.g., think of two binary characters depicting the presence of wings). Representation however is often confused with replication, namely a copy of a real object. Algorithms cannot replicate evolution, reversal, transformation, homology and so on. Assuming that it does and presenting results in this way is a form of malpractice.

Where this type of thinking has lead us is to the incorrect notion that a homolog (the part of any organism) is actually a homology (that organism's relationships). But more of this in the next post.

2 comments:

Anonymous said...

Part of the problem for this is the absolute absence of character analysis in most studies. Genes require little analysis, albeit more thought should be given to alignment. And many using morphology score features without any comprehension of homology constraints (yes this is your point of practice absence theory). Isn't this an absence of materials contributes to no methods.

Buddhist_Monk_Wannabe said...

I disagree with the commenter above me. Genes should require just as much character analysis as any other character. Molecular characters would benefit from moving from individual nucleotides to sequence lengths as characters. Individual nucleotides are honestly very close to the realm of insignificant, bland uniformity (given how easily they can differ between individual organisms). Furthermore, sequences from individual organisms are taken to be representative of a species overall. Since when does a morphologist look at a single specimen when forming hypotheses of homology?