Sunday, August 9, 2015

Where Did All The Thinking Go?

 
Some people are saying that statistical methods in data science and analytics are obsolete. These people have either just grown tired of thinking or have forgotten how to.

What is wrong with this picture?

This view has two major problems. First, espousing the idea that machine learning algorithms is the only method required for providing analytic solutions to business problems is a very naïve view. Second, this idea is philosophically dangerous and reeks with an undertone of quantitative inadequacy.

How can you be so naïve?

Naïve is being kind. What you really have is extreme arrogance. You have some people that practically no one has ever heard of, essentially saying they are smarter than the late George Box, who is not here to defend himself. They apparently know more about probability and statistics than Andrey Kolmogorov, Nikolai Smirnov, Andrey Markov, Richard Jeffrey, Adrien-Marie Legendre, John Herschel, Friedrich Bessel and Richard Cox. They want o throw away statistical models and only use machine learning algorithms, which reminds me of the King James version only movement. What I really see is a desperate cry of “We do not understand mathematics, probability or statistics, so we’ll assume it away.”

Why is this dangerous?

To me this is a no brainer, but those who propose this seem to be brainless. We (in the United States) already have a math-phobic society and an educational system that is substandard relative to many other countries. As if we have not dumbed down quantitative skills enough, we add the “for Dummies” series to add salt to the wound.
 
It seems that undergraduate programs are teaching tools, and when you ask a recent graduate to solve a real problem with a customer's licensed tool, you may hear, “Can I do it in MATLAB? That’s what I know.” We tend to want to force every problem into our favorite tool or technique, rather than solve the problem with the appropriate tool, or actually think.
 
The cry is, “Give me a tool that does not require me to apply much thought in order to use!” And many are providing such tools, along with courses to learn them, and making lots of money in the process. What we get is a society of people who do not have any critical thinking skills. Moreover, critical thinking skills are not only required for the quantitative sciences, but also in disciplines like biology (my undergraduate degree) as well. Though I am not a great writer, I am critically thinking about sentence structure, grammar, logic and so on, as I write.

Can Machines Think?




Alan Turing said they could, but he qualified his statement by saying they think differently than humans. Roger Penrose basically said “Ditto” when addressing artificial intelligence. So, are machine learning algorithms the way to solve problems? Certainly, except they are not the only way, as some might propose. If you are trying to solve a problem where all the assumptions of a linear program are met, will a genetic algorithm give a better answer? Not necessarily and probably not.

There has to be a decision process involved in choosing the best functional form for solving various problems. Decision points, like whether or not data pathologies exist, have to be weighed. Generally, if the assumptions of traditional methods are not violated, they usually yield the best results. Do an experiment. Take a problem were all the assumption of a logistic regression are met and compare the results with an artificial neural network. I performed such an experiment with a real business problem and two different logistic regression models outperformed a neural network. However, when used together in an ensemble, the logistic regression and neural network combination (using averaging) outperformed everything else in performance testing. In very simple terms, this takes the strengths of both and negates the weaknesses of either.
I also checked the results of a logistic regression uplift model built in SAS by employing a random forest in R. Although the distribution among pentile was a little different, the overall net lift was the same. So, I am not saying that machine learning algorithms should not be used, only that some logic has to be used for selecting them as the functional form of your solution method.

Should Humans Think?

They should, but there seems to be a growing thesis to not do so. “I don’t want to think!” “It makes my brain hurt!” When solving problems, we usually examine the “What” or the “So what”. However, the “Why”, though it may not be important for the business owner, should be important to the analyst. Anytime our methods produce answers, we should be asking “Why?” (and probably “How?”). I would never give my customer a solution without knowing the “Why” and the “How”. I may never be asked questions that requires my understanding of either, but as the analyst, I have to know.
 
If my solution method is a black-box, I must try to make it as “gray” as possible. One of the things we have a tendency to do is forget intuition as a legitimate problem solving process. When I produce a solution through the logical approach, I have to ask, “Does this intuitively make sense?” Does the period required for underwriting have a bearing on a decision to buy insurance from company X? Does the possession of a reward card from Citibank have a bearing on a decision to buy insurance from company X? There latter is not so intuitively clear, but we have to know why the relation exists.

Conclusion

If we were asked to build a house, would we show up with just a screwdriver? Probably not. We wound bring our complete set of tools to bear. If we were asked to make a decision for financing our new home with a mortgage, would we choose the type and mortgage company at random? Would you force the problem into a model with an unsupervised learning algorithm? (You would probably just ask who has the lowest interest rate.)
The analysis of data should produce information that is useful for making a decision. Yet, that is not all of the information. This is the fallacy of taking “Human” out of HR. When we screen every resume with software and reject some based on certain criteria, are we possibly eliminating the very best candidate for the job? The human element must be involved in decisions, no matter what the question is or in what discipline it occurs. Blindly accepting solutions is naïve and dangerous. Believing you know better than George Box is arrogant.

“All models are wrong; but some are useful”

—George Box

About The Author

Serving in the military for 24 years as a cavalry unit officer and operations research analyst, Jeffrey Strickland has been applying quantitative methods in decision making for 34 years. He has been involved in the design of long-range unmanned aerial vehicles (UAV), manned space launch systems, missile defense systems, satellite systems, and communication systems. He has developed models for predicting combat outcomes, weapon systems effectiveness, vulnerability to cyber-attacks, occurrences of crime, propensity to purchase, propensity to engage, and propensity to churn. He holds a Masters and Doctorate in Mathematics and is a Certified Modeling and Simulation Professional (CMSP). Jeffrey has published over 20 technical books and written over 300 articles and blogs.

No comments:

Post a Comment