Sunday, August 9, 2015

What the Heck is Predictive Analytics?


[Excerpt from my new book, Predictive Analytics using R, downloadable from my profile for free]

Predictive analytics—sometimes used synonymously with predictive modeling—is not synonymous with statistics, often requiring modification of functional forms and use of ad hoc procedures, making it a part of data science to some degree. It does however, encompasses a variety of statistical techniques for modeling, incorporates machine learning, and utilizes data mining to analyze current and historical facts, making predictions about future.

In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. Predictive models are not restricted to business, for they are used to predict anything from the reliability of an electronic component to the success of a manned lunar landing. These model, however, are usually stochastic models that can be used in a simulation.
Predictive analytics is used in actuarial science (Conz, 2008), marketing (Fletcher, 2011), financial services (Korn, 2011), insurance, telecommunications (Barkin, 2011), retail (Das & Vidyashankar, 2006), travel (McDonald, 2010), healthcare (Stevenson, 2011), pharmaceuticals (McKay, 2009), defense (Strickland, 2011) and other fields.

Definition

Predictive analytics is an area of data science that deals with extracting information from data and using it to predict trends and behavior patterns. Often the unknown events of interest is in the future, but predictive analytics can be applied to any type of unknown whether it be in the past, present or future. For example, identifying suspects after a crime has been committed, or credit card fraud as it occurs (Strickland J., 2013). The core of predictive analytics relies on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting them to predict the unknown outcome. It is important to note, however, that the accuracy and usability of results will depend greatly on the level of data analysis and the quality of assumptions.

Not Statistics

Predictive analytics uses statistical methods, but also machine learning algorithms, and heuristics. Though statistical methods are important, the Analytics professional cannot always follow the “rules of statistics to the letter.” Instead, the analyst often implements what I call “modeler judgment”. Unlike the statistician, the analytics professional—akin to the operations research analyst—must understand the system, business, or enterprise where the problem lies, and in the context of the business processes, rules, operating procedures, budget, and so on, make judgments about the analytical solution subject to various constraints. This requires a certain degree of creativity, and lends itself to being both a science and an art.

For example, a pure statistical model, say a logistic regression, may determine that the response is explained by 30 independent variables with a significance of 0.05. However, the analytics professional knows that 10 of the variables cannot be used subject to legal constraints imposed for say a bank product. Moreover, the analytics modeler is aware that variables with many degrees of freedom can lead to overfitting the model. Thus, in their final analysis they develop a good model with 12 explanatory variables using modeler judgment. The regression got them near to a solution, and their intuition carried them to the end.

Additionally, the Analytics professional does not always look for a hypothesis a priori. Consequently, they may use a machine learning algorithm, such as Random Forests, that does not depend upon statistical assumptions, but instead they "learn" from the data.

Types

Generally, the term predictive analytics is used to mean predictive modeling, “scoring” data with predictive models, and forecasting. However, people are increasingly using the term to refer to related analytical disciplines, such as descriptive modeling and decision modeling or optimization. These disciplines also involve rigorous data analysis, and are widely used in business for segmentation and decision making, but have different purposes and the statistical techniques underlying them vary.

Predictive models

Predictive models are models of the relation between the specific performance of a unit in a sample and one or more known attributes or features of the unit. The objective of the model is to assess the likelihood that a similar unit in a different sample will exhibit the specific performance. This category encompasses models that are in many areas, such as marketing, where they seek out subtle data patterns to answer questions about customer performance, such as fraud detection models. Predictive models often perform calculations during live transactions, for example, to evaluate the risk or opportunity of a given customer or transaction, in order to guide a decision. With advancements in computing speed, individual agent modeling systems have become capable of simulating human behavior or reactions to given stimuli or scenarios.

The available sample units with known attributes and known performances is referred to as the “training sample.” The units in other sample, with known attributes but un-known performances, are referred to as “out of [training] sample” units. The out of sample bear no chronological relation to the training sample units. For example, the training sample may consists of literary attributes of writings by Victorian authors, with known attribution, and the out-of sample unit may be newly found writing with unknown authorship; a predictive model may aid the attribution of the unknown author. Another example is given by analysis of blood splatter in simulated crime scenes in which the out-of sample unit is the actual blood splatter pattern from a crime scene. The out of sample unit may be from the same time as the training units, from a previous time, or from a future time.

Descriptive models

Descriptive models quantify relationships in data in a way that is often used to classify customers or prospects into groups. Unlike predictive models that focus on predicting a single customer behavior (such as credit risk), descriptive models identify many different relationships between customers or products. Descriptive models do not rank-order customers by their likelihood of taking a particular action the way predictive models do. Instead, descriptive models can be used, for example, to categorize customers by their product preferences and life stage. Descriptive modeling tools can be utilized to develop further models that can simulate large number of individualized agents and make predictions.

Decision models

Decision models describe the relationship between all the elements of a decision—the known data (including results of predictive models), the decision, and the forecast results of the decision—in order to predict the results of decisions involving many variables. These models can be used in optimization, maximizing certain outcomes while minimizing others. Decision models are generally used to develop decision logic or a set of business rules that will produce the desired action for every customer or circumstance.

Applications

Although predictive analytics can be put to use in many applications, I outline a few examples where predictive analytics has shown positive impact in recent years.

Clinical decision support systems

Experts use predictive analysis in health care primarily to determine which patients are at risk of developing certain conditions, like diabetes, asthma, heart disease, and other lifetime illnesses. Additionally, sophisticated clinical decision support systems incorporate predictive analytics to support medical decision making at the point of care. A working definition has been proposed by Robert Hayward of the Centre for Health Evidence: “Clinical Decision Support Systems link health observations with health knowledge to influence health choices by clinicians for improved health care.” (Hayward, 2004)

Customer retention

With the number of competing services available, businesses need to focus efforts on maintaining continuous consumer satisfaction, rewarding consumer loyalty and minimizing customer attrition. Businesses tend to respond to customer attrition on a reactive basis, acting only after the customer has initiated the process to terminate service. At this stage, the chance of changing the customer's decision is almost impossible. Proper application of predictive analytics can lead to a more proactive retention strategy.

Direct marketing

When marketing consumer products and services, there is the challenge of keeping up with competing products and consumer behavior. Apart from identifying prospects, predictive analytics can also help to identify the most effective combination of product versions, marketing material, communication channels and timing that should be used to target a given consumer. The goal of predictive analytics is typically to lower the cost per order or cost per action.

Fraud detection

Fraud is a big problem for many businesses and can be of various types: inaccurate credit applications, fraudulent transactions (both offline and online), identity thefts and false insurance claims. These problems plague firms of all sizes in many industries. Some examples of likely victims are credit card issuers, insurance companies (Schiff, 2012), retail merchants, manufacturers, business-to-business suppliers and even services providers. A predictive model can help weed out the “bads” and reduce a business's exposure to fraud.

The Internal Revenue Service (IRS) of the United States also uses predictive analytics to mine tax returns and identify tax fraud (Schiff, 2012).

Recent advancements in technology have also introduced predictive behavior analysis for web fraud detection. This type of solution utilizes heuristics in order to study normal web user behavior and detect anomalies indicating fraud attempts.

Portfolio, product or economy-level prediction

Often the focus of analysis is not the consumer but the product, portfolio, firm, industry or even the economy. For example, a retailer might be interested in predicting store-level demand for inventory management purposes. Or the Federal Reserve Board might be interested in predicting the unemployment rate for the next year. These types of problems can be addressed by predictive analytics using time series techniques. They can also be addressed via machine learning approaches which transform the original time series into a feature vector space, where the learning algorithm finds patterns that have predictive power.

Risk management

When employing risk management techniques, the results are always to predict and benefit from a future scenario. The Capital asset pricing model (CAM-P) and Probabilistic Risk Assessment (PRA) examples of approaches that can extend from project to market, and from near to long term. CAP-M (Chong, Jin, & Phillips, 2013) “predicts” the best portfolio to maximize return. PRA, when combined with mini-Delphi Techniques and statistical approaches, yields accurate forecasts (Parry, 1996). @Risk is an Excel add-in used for modeling and simulating risks (Strickland, 2005). Underwriting (see below) and other business approaches identify risk management as a predictive method.

Underwriting

Many businesses have to account for risk exposure due to their different services and determine the cost needed to cover the risk. For example, auto insurance providers need to accurately determine the amount of premium to charge to cover each automobile and driver. A financial company needs to assess a borrower's potential and ability to pay before granting a loan. For a health insurance provider, predictive analytics can analyze a few years of past medical claims data, as well as lab, pharmacy and other records where available, to predict how expensive an enrollee is likely to be in the future. Predictive analytics can help underwrite these quantities by predicting the chances of illness, default, bankruptcy, etc. Predictive analytics can streamline the process of customer acquisition by predicting the future risk behavior of a customer using application level data. Predictive analytics in the form of credit scores have reduced the amount of time it takes for loan approvals, especially in the mortgage market where lending decisions are now made in a matter of hours rather than days or even weeks. Proper predictive analytics can lead to proper pricing decisions, which can help mitigate future risk of default.

Technology and big data influences

Big data is a collection of data sets that are so large and complex that they become awkward to work with using traditional database management tools. The volume, variety and velocity of big data have introduced challenges across the board for capture, storage, search, sharing, analysis, and visualization. Examples of big data sources include web logs, RFID and sensor data, social networks, Internet search indexing, call detail records, military surveillance, and complex data in astronomic, biogeochemical, genomics, and atmospheric sciences. Thanks to technological advances in computer hardware—faster CPUs, cheaper memory, and MPP architectures—and new technologies such as Hadoop, MapReduce, and in-database and text analytics for processing big data, it is now feasible to collect, analyze, and mine massive amounts of structured and unstructured data for new insights (Conz, 2008). Today, exploring big data and using predictive analytics is within reach of more organizations than ever before and new methods that are capable for handling such datasets are proposed (Ben-Gal I. Dana A., 2014).

Analytical Techniques

The approaches and techniques used to conduct predictive analytics can broadly be grouped into regression techniques and machine learning techniques. [condensed]

Regression techniques

Regression models are the mainstay of predictive analytics.
  • Linear regression model
  • Ridge regression
  • LASSO (Least Absolute Shrinkage and Selection Operator)
  • Logic regression
  • Quantile regression
  • Multinomial logistic regression
  • Probit regression

Classification and regression trees

  • Hierarchical Optimal Discriminant Analysis (HODA)
  • Classification and regression trees (CART)
  • Decision trees
  • Multivariate adaptive regression splines (MARS)

Machine learning techniques

Machine learning, a branch of artificial intelligence, was originally employed to develop techniques to enable computers to learn.
  • Neural networks
  • Multilayer Perceptron (MLP)
  • Radial basis function (RBF)
  • Naïve Bayes
  • K-Nearest Neighbor algorithm (k-NN)

Criticism

There are plenty of skeptics when it comes to computers and algorithms abilities to predict the future, including Gary King, a professor from Harvard University and the director of the Institute for Quantitative Social Science. People are influenced by their environment in innumerable ways. Trying to understand what people will do next assumes that all the influential variables can be known and measured accurately. “People’s environments change even more quickly than they themselves do. Everything from the weather to their relationship with their mother can change the way people think and act. All of those variables are unpredictable. How they will impact a person is even less predictable. If put in the exact same situation tomorrow, they may make a completely different decision. This means that a statistical prediction is only valid in sterile laboratory conditions, which suddenly isn't as useful as it seemed before.” (King, 2014)

Tools

Tools change often, but SAS appears to be the industry standard, and I relay heavily on SAS Enterprise Modeler for my job. Be that as it may, I use R a great deal and find SPSS (particularly SPSS Modeler) useful for some things. Personally, I prefer R.

▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄

About the Author

Jeffrey Strickland is the Author of "Predictive Analytics Using R" and a Senior Analytics Scientist with Clarity Solution Group. He has performed predictive modeling, simulation and analysis for the Department of Defense, NASA, the Missile Defense Agency, and the Financial and Insurance Industries. He is also the author of 20 books including:
  • Discrete Event simulation using ExtendSim
  • Crime Analysis and Mapping
  • Missile Flight Simulation
  • Mathematical modeling of Warfare and Combat Phenomenon
  • Predictive Modeling and Analytics
  • Using Math to Defeat the Enemy
  • Verification and Validation for Modeling and Simulation
  • Simulation Conceptual Modeling
  • System Engineering Process and Practices
  • Weird Scientist: the Creators of Quantum Physics
  • Albert Einstein: No one expected me to lay a golden eggs
  • The Men of Manhattan: the Creators of the Nuclear Era
  • Fundamentals of Combat Modeling
Connect with Jeffrey Strickland
Contact Jeffrey Strickland

▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄▀▄

References

Barkin, E. (2011). CRM + Predictive Analytics: Why It All Adds Up. New York: Destination CRM. Retrieved 2014, from http://www.destinationcrm.com/Articles/Editorial/Magazine-Features/CRM---Predictive-Analytics-Why-It-All-Adds-Up-74700.aspx
Conz, N. (2008). Insurers Shift to Customer-focused Predictive Analytics Technologies. New York: Insurance & Technology. Retrieved 2014, from http://www.insurancetech.com/business-intelligence/insurers-shift-to-customer-focused-predi/210600271
Das, K., & Vidyashankar, G. (2006). Competitive Advantage in Retail Through Analytics: Developing Insights, Creating Value. New York: Information Management. Retrieved 2014, from http://www.information-management.com/infodirect/20060707/1057744-1.html
Fletcher, H. (2011). The 7 Best Uses for Predictive Analytics in Multichannel Marketing. Philadelphia: Target Marketing. Retrieved 2014, from http://www.targetmarketingmag.com/article/7-best-uses-predictive-analytics-modeling-multichannel-marketing/1#
Hayward, R. (2004). Clinical decision support tools: Do they support clinicians? FUTURE Practice, 66-68.
Korn, S. (2011). The Opportunity for Predictive Analytics in Finance. San Diego: HPC Wire. Retrieved 2014, from http://www.hpcwire.com/2011/04/21/the_opportunity_for_predictive_analytics_in_finance/
McDonald, M. (2010). New Technology Taps 'Predictive Analytics' to Target Travel Recommendations. Oyster Bay: Travel Market Report. Retrieved 2014, from http://www.travelmarketreport.com/technology?articleID=4259&LP=1,
McKay, L. (2009, August). The New Prescription for Pharma. Destination CRM. Retrieved 2014, from http://www.destinationcrm.com/articles/Web-Exclusives/Web-Only-Bonus-Articles/The-New-Prescription-for-Pharma-55774.aspx
Parry, G. (1996, November–December). The characterization of uncertainty in Probabilistic Risk Assessments of complex systems. Reliability Engineering & System Safety, 54(2-3), 119–1. Retrieved 2014, from http://www.sciencedirect.com/science/article/pii/S0951832096000695
Schiff, M. (2012, March 6). BI Experts: Why Predictive Analytics Will Continue to Grow. Renton: The Data Warehouse Institute. Retrieved 2014, from http://tdwi.org/Articles/2012/03/06/Predictive-Analytics-Growth.aspx?Page=1
Stevenson, E. (2011, December 16). Tech Beat: Can you pronounce health care predictive analytics? Times-Standard. Retrieved 2014, from http://www.times-standard.com/business/ci_19561141
Strickland, J. (2013). Introduction toe Crime Analysis and Mapping. Lulu.com. Retrieved from http://www.lulu.com/shop/jeffrey-strickland/introduction-to-crime-analysis-and-mapping/paperback/product-21628219.html

No comments:

Post a Comment