Sunday, November 26, 2017

Comparing 179 Machine Learning Categorizers on 121 Data Sets


It is often argued that the algorithm used for machine learning is less important than the amount of data used to train the algorithm (e.g., Domingos, 2012; “More data beats a cleverer algorithm”).  In a monumental study, Fernández-Delgado and colleagues tested 179 machine learning categorizers on 121 data sets. They found that a large majority of them, were essentially identical in their accuracy. In fact, 121 of them (that’s a coincidence) were within ±5 percentage points of one another averaging all of the data sets.
The following two graphs show the same data organized either by family (color and order) or by accuracy (order) and family (color).



Families
1. Bagging (BAG): 24 classifiers.
2. Bayesian (BY) approaches: 6 classifiers.
3. Boosting (BST): 20 classifiers.
4. Decision trees (DT): 14 classifiers.
5. Discriminant analysis (DA): 20 classifiers.
6. Generalized Linear Models (GLM): 5 classifiers.
7. Logistic and multinomial regression (LMR): 3 classifiers.
8. Multivariate adaptive regression splines (MARS): 2 classifiers.
9. Nearest neighbor methods (NN): 5 classifiers.
10. Neural networks (NNET): 21 classifiers.
11. Other ensembles (OEN): 11 classifiers.
12. Other Methods (OM): 10 classifiers.
13. Partial least squares and principal component regression (PLSR): 6 classifiers.
14. Random Forests (RF): 8 classifiers.
15. Rule-based methods (RL): 12 classifiers.
16. Stacking (STC): 2 classifiers.
17. Support vector machines (SVM): 10 classifiers.

Each family relies on the same core classifiers but may use different parameters or different transformations of the data.  There is no simple way to assess the variety of the specific classifiers in each group. 

A few observations

The observation that so many of the classifiers performed so well over a variety of different data sets is remarkable.  More than 2/3 of the classifiers that were tested performed within plus or minus 5 percentage points of one another over a large number of different data sets.
The observation that the range of accuracies differed almost as much within a family as between families is also remarkable.  Classifiers in the Bagging family BAG), for example, were among the most and among the least accurate classifiers in the experiment.  Bagging is an ensemble approach, where several different classifiers are combined using a kind of averaging method.  Boosting, Stacking, and OEN (their abbreviation for other ensembles) families also involve ensembles of classifiers.  The high levels of variability among members of these families is a little surprising and may be, at least partially, due to the ways in which the parameters for these models were chosen.
Although Fernández-Delgado and associates tried to choose optimal parameters for each method, there is no guarantee that their methods of selection were optimal for each classifier.  Poor classifiers may have performed poorly either because they were ill suited to one or more of the data sets in the collection or because their parameters were chosen poorly.
Three other families showed relatively high accuracy, and also high consistency.  The best performing family was Random Forest (RF), followed by Support Vector Machines (SVM) families. A Random Forest classifier uses sets of decision trees to perform its classification.  Support Vector Machines learn separators between classes of objects.  These are two relatively old machine learning methods.  Classifiers in the Decision Trees family were also relatively consistent, though slightly less accurate.
Classifiers in the Bayesian family (BY) were also quite consistent, but slightly less accurate.  Bayesian models tend to be the simplest models to compute with relatively few parameters and no iterative training (repeatedly adjusting parameters using multiple passes over the same data).

Conclusion

So, what do we make of this result.  Classification is not particularly sensitive to the family of classifier that is employed.  Practically any family of classifier can be used to achieve high quality results.  Based on these results, the choice of a Random Forest or SVM classifier is likely to be the most reliable choice in that they seem to work well under a variety of data and a variety of configurations.  Many classifiers from other families, if effectively tuned, are also likely to be effective.  There is no guarantee that all classifiers are equally affected by a single tuning method, or that all varieties of classifier are equal, but many of them will yield high quality results.  It appears that how a classifier is used is more important than what kind of classifier it is.
I have left out many of the details of exactly how these different classifiers work.  That information can be gained from the Fernandez-Delgado paper or from Wikipedia.


Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10) 78-87.
Fernández-Delgado, M., Cernadas, E., Barro, S. and Amorim, D. (2014) Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research, 15, 3133-3181


Thursday, January 12, 2017

Intelligence: Natural and Artificial



It does not take a genius to recognize that artificial intelligence is going to be one of the hot topics for 2017.  AI is suddenly everywhere from phones that answer your questions to self-driving cars. Once a technology achieves prominence in the consumer space, it moves into the mainstream of applied fields, even for fields that are slow to adopt technology.  

Predictions 2017.  Source: http://www.psychics.com/blog/a-brief-history-of-the-crystal-ball/


AI has also caught the imagination of people who like to worry about the future.  Are we going to achieve some kind of singularity where we can upload human consciousness to a computer or is Skynet going to determine that people are simply superfluous and work to destroy them?

The prospects for either of these worrisome events seems extraordinarily remote.  I think that these concerns rest largely on a deep misunderstanding of just what intelligence is.  Intelligence is often used to describe something akin to cognitive processing power.  And processing power of a certain kind that represents the cultural achievements of Western Civilization (e.g., school, business success).

Intelligent people/things generally are those that are better able to think.  The implication is that some people are better than others at thinking—they are generally more intelligent.  This is the kind of idea that underlies the concept of IQ (intelligence quotient).  IQ was originally invented to predict how well children would do in school. 

This notion of general intelligence, one that is intended to measure how well people think overall, has proven to be elusive.  Although there is some correlation between performance on one cognitive test and another, that correlation is not particularly high.  Moreover, the correlation may be more indicative of the similarity between cognitive tests and tasks than of shared cognitive abilities.  The correlation may be a result of the kind of situations where we attribute intelligence (for example, multiple classroom activities or business) and not be general at all.  Even among so-called intellectual activities, the correlation may be absent.  

The same applies to artificial intelligence.  We don’t have any generally intelligent machines.  So far, artificial intelligence machines are rather narrowly specialized.  Building a great chess playing machine is unlikely to be any use to winning at Jeopardy. Intelligence, both human and natural, seems to be largely domain specific.  If the evidence were stronger for general human intelligence, I might be more willing to predict that kind of success in general artificial intelligence, but so far, the evidence seems strongly to contrary.

Further, the problems that seem to rely most on intellectual capacity, such as chess playing or Jeopardy answering, turn out to be the easier problems to solve with computers.  Problems that people find natural, such as recognizing a voice or a face turn out to be more difficult for computers.  It is only recently, that we have made progress on addressing such problems with computers.  

Chess playing and Jeopardy answering by computers uses approaches that are different from those used by humans.  The differences are often revealed in the kind of mistakes people and machines make.  IBM’s Watson beat the human Jeopardy players handily, but it made certain mistakes that humans would not (for example, asserting that Toronto was a US city).  The difference in mistakes (artificial vs. natural stupidity?) is not a sign of AI’s failure, just that the computer is doing things in a different way than a smart person would.  

Similarly, the kinds of mistakes people make tell us something about how they form their intelligence.  For example, people will give different answers to the same question, depending on how precisely it is asked.  In a seminal study by Kahneman and Tversky, participants were asked to choose between two treatments for 600 people infected with a deadly disease.   

If the people were given a positively framed choice, 72% chose Treatment A:
  • Positive Frame: With Treatment A 200 people will be saved.  With Treatment B, there is a 33% chance of saving all 600 and a 66% chance of saving no one.
On the other hand, if the situation was described more negatively, only 22% chose Treatment A:
  • Negative Frame: With Treatment A, 400 people will die.  With Treatment B, there is a 33% chance that no one will die and 66% chance that all 600 people will die.
With both sets of alternatives, 200 people are predicted to live and 400 people will die under treatment A, and under Treatment B there is a 33% chance that everyone will survive and a 66% chance that no one will survive.  Logically, people should give the same answer to both, but instead they are affected by the pattern of how the question was asked.  The first pair match a positive pattern and the second pair match a negative pattern, and thus lead to different choices.

People have a tendency to jump to conclusions based on first impressions or other patterns that they analyze.  In fact, the root factor underlying human cognition seems to be pattern recognition.  People see patterns in everything.  The gambler’s fallacy, for example, relies on the fact that people see patterns in random events.  If heads come up six times in a row, people are much more likely to think that the next flip will result in tails, but in reality heads and tails are equally likely.  

Humans evolved the ability to exploit patterns over millions of years.  Artificial intelligence, on the other hand, has seen dramatic progress over the last few decades because it is only recently that computer software has been designed to take advantage of patterns.

People are naturally impressionistic intuitive reasoners.  Computers are naturally logical and consistent.  Computers recognize patterns, and thereby become more intelligent, to the extent that these patterns can be accommodated in a logical, mathematical framework.  Humans have a difficult time with logic.  They can use logic to the extent that it is consistent with the patterns that are perceived or are “emulated” by external devices.  But logic is difficult to learn and difficult to employ consistently.

Every increase in human intelligence over the last several thousand years, I would argue, has been caused by the introduction of some artifact that helped people to think more effectively.  These artifacts range from language itself, which makes a number of thinking processes more accessible to things like checklists and pro/con lists, which help make decisions more systematic, to mathematics.

In contrast, the kinds of tasks that people find easy (and challenge computers), such as recognizing faces are apparently a property of specific brain structures, which have evolved over millions of years.  Other aspects of what we usually think of as intelligence are much more recent developments, evolutionarily speaking, over a time frame of, at most, a few thousand years.  Our species, Homo sapiens, has only been around for about 150,000 years.  There have been quite a few changes to our intellectual capacity over that time, particularly over the last few thousand years.  The cave paintings at Lascaux in France, among the earliest known artifacts of human intelligence, are only about 20,000 years old.

An example of Face recognition by humans.  Although upside down, this face is easily recognizable, but there is something strange about it.  See below.


Computer face-recognition systems do not yet have the same capacity as human face recognition, but the progress in computerized face recognition has come largely from algorithms that exploit invariant measurements of the faces (such as the ratio of the distance between the eyes relative to the length of the nose).  

The birth of self-driving cars can be attributed to the DARPA Grand Challenge.  DARPA offered a million dollar prize for a self-driving car that could negotiate, unrehearsed, a 142 mile off-road course through the Mojave desert.  The 2014 competition was a complete failure.  None of the vehicles managed more than 5% of course.  In 2015, on the other hand, things were dramatically different.  Stanley, the Stanford team’s car negotiated the full course in just under 7 hours.  A major source of its success, I believe, lay in the sensors that were deployed and in the way information from those sensors was processed and pattern analyzed.  Core to the system were machine learning algorithms that learned to avoid obstacles from example incidents in which human drivers avoided obstacles.

Enhancement in computer intelligence similarly has come from hacks and artifacts.  Face recognition algorithms, navigational algorithms, sensors, parallel distributed computing, and pattern recognition (often called machine learning) have all contributed to the enhancement of machine intelligence.   But, like the elusiveness of human general intelligence, it is dubious that we will see anything resembling general intelligence in computers.   

Winning a Noble prize in physics is no guarantee that one can come to sensible conclusions about race and intelligence, for example.  Being able to answer Jeopardy questions is arguably more challenging than winning chess games, but it is not the same thing as being able to navigate a vehicle from Barstow, California to Primm, Nevada.  Computers are getting better at what they do, but the functions on which each one is successful are still narrowly defined.  Computers may increasingly take over jobs that used to require humans, but they are unlikely, I think, to replace them altogether. Ultimately, computers, even those endowed with artificial intelligence, are tools for raising human capabilities.  They are not a substitute for those capabilities.  


The same face turned right-side up.  In the view above, the mouth and eyes were upright while the rest of the face was upside down.  The effect is readily seen when the face is right-side up, but was glossed over when upside down.  We recognize the pattern inherent in the parts and in the whole.  Source: http://thatchereffect.com/