Special ArticlesData scientists: salary survey shows pay, demand continue to rise
Elite data scientists, increasingly in demand from U.S. leading corporations, are riding high on the big data movement, with top-level managers seeing an 8 percent increase in base salary over last year and median bonuses topping $56,000. Data scientists can score 16 percent increases in their median base salary when changing jobs.Read More
Analytics Section of INFORMS NewsAnalytics Section salutes award winners
The Analytics Section of INFORMS hosted a recognition breakfast for those who won awards and competitions sponsored by the Section at the recent INFORMS Business Analytics & Operations Research Conference in Huntington Beach, Calif.Read More
CAP NewsApplications for the 2015 Wagner Prize due May 1
Applications for the 2015 Wagner Prize, including a two-page abstract, are due May 1. The prestigious practice prize rewards professionals who devise innovative analytical methods, utilize those methods in a verifiably successful O.R./analytics project, and describe their work in a clear, well-written paper. Any work presented in an INFORMS section or society practice-oriented competition is eligible as long as the work did not result in a published paper.Read More
The Analytics Journey
An IBM view of the structured data analysis landscape: descriptive, predictive and prescriptive analytics.
By (left to right) Irv Lustig, Brenda Dietrich, Christer Johnson and Christopher Dziekan
Use of the term “business analytics” is being used within the information technology industry to refer to the use of computing to gain insight from data. The data may be obtained from a company’s internal sources, such as its enterprise resource planning application, data warehouses/marts, from a third party data provider or from public sources. Companies seek to leverage the digitized data from transaction systems and automated business processes to support “fact-based” decision-making. Thus, business analyticsis a category of computing rather than a specific method, application or product.
In many ways, business analytics is the next competitive breakthrough following business automation but with the goal of making better business decisions, rather than simply automating standardized processes. This new computing category leverages the wealth of data being produced on a daily basis. In the near future tremendous amounts of additional data will become available, including both structured data, such as from sensors and unstructured data, such as from cameras, social media and sentiment from the social network.
Within IBM the term “business analytics” applies to software products (business intelligence and performance management, predictive analytics, mathematical optimization, enterprise information management, enterprise content and collaboration), analytic solutions areas (industry solutions, finance/risk/fraud analytics, customer analytics, human capital analytics, supply chain analytics), consulting services, even outsourced business processes and configured hardware. Business analytics centers around five key areas of customer needs:
- Information access: This first segment is foundational to business analytics. It is all about fostering informed/collaborative decision-making across the organization – ensuring that decision-makers can understand how their area of the business is doing so they can make informed decisions.
- Insight: Gaining a deeper understanding of why things are happening, for example, gaining a full view of your customer (transaction history, segmentation, sentiment and opinion, etc.) to make better decisions and enable profitable growth.
- Foresight: Leveraging the past to predict potential future outcomes so that actions and decisions are computed in order to meet the objectives and requirements of the organization.
- Business agility: Driving real-time decision optimization in both people-centric and process/automated-centric processes.
- Strategic alignment: This segment of the market is about strategically aligning everyone in the organization – from strategy to execution. It is about enabling enterprise and operational visibility. It is about documenting the preferences, priorities, objectives, and requirements that drive decision-making.
This article focuses on one slice of the business analytics space that is based on leveraging structured data to achieve improved business outcomes. Our thesis is that structured data analytics includes three categories of increasing complexity and impact: descriptive, predictive and prescriptive. These three categories support the customer needs described above.
In their book “Competing on Analytics: The New Science of Winning” , Thomas Davenport and Jeanne Harris define analytics as “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions,” and define business intelligence as “a set of technologies and processes that use data to understand and analyze business performance.” We propose that business analytics comprise both of these areas. We further refine the analysis of structured data into three categories of analytics:
- Descriptive Analytics: A set of technologies and processes that use data to understand and analyze business performance
- Predictive Analytics: The extensive use of data and mathematical techniques to uncover explanatory and predictive models of business performance representing the inherit relationship between data inputs and outputs/outcomes.
- Prescriptive Analytics: A set of mathematical techniques that computationally determine a set of high-value alternative actions or decisions given a complex set of objectives, requirements, and constraints, with the goal of improving business performance.
Organizations that undertake a journey into the applications of business analytics must begin with an information management agenda that treats data and information as a strategic asset. Once information is treated as an asset, then descriptive, predictive and prescriptive analytics can be applied. Typically an organization begins this journey by examining the data generated from its automation systems: enterprise resource planning, customer relationship management, time and attendance, e-commerce, warranty management and the like. The organization may also have unstructured data, such as contracts; customer complaints; internal e-mails; and, increasingly, image data from facility monitoring systems, as well as unstructured data from various Web sources, such as Facebook, Twitter and blogs. This article addresses only the analysis of structured data but recognizes that structured data, such as time and location, author, topic and other semantic information, is attached to unstructured data. Together structured and unstructured data establishes a 360-degree view of information to improve decision-making. Many of the business analytic techniques used for structured data can be applied to unstructured data as well.
Descriptive analytics can be classified into three areas that answer certain kinds of questions:
- Standard reporting and dashboards: What happened? How does it compare to our plan? What is happening now?
- Ad-hoc reporting: How many? How often? Where?
- Analysis/query/drill-down: What exactly is the problem? Why is it happening?
Descriptive analytics are the most commonly used and most well understood type of analytics. Descriptive analytics categorizes, characterizes, consolidates and classifies data. Descriptive analytics includes dashboards, reports (e.g., budget, sales, revenue and costs) and various types of queries. Tools for descriptive analytics may provide mechanisms for interfacing to enterprise data sources. They typically include report generation, distribution capability and data visualization facilities. Descriptive analytics techniques are most commonly applied to structured data, although there have been numerous efforts to extend their reach to unstructured data, often through the creation of structured metadata and indices. Descriptive analytics help provide an understanding of the past as well as events occurring in real-time.
Many descriptive analytics applications are implemented though out-of-the-box business intelligence software solutions or spreadsheet tools; however, version control difficulties may result from a proliferation of spreadsheets. The advantage of a descriptive analytics software platform (business intelligence and information/data management software) is the connectivity it provides to the underlying trusted information management system, as well as the ability to work with data along multiple dimensions to gain insight. Insight into what is happening now or has happened in the past can be useful in making decisions about the future, but descriptive analytics relies on the human review of data and does not contain robust techniques that facilitate understanding what might happen in the future, nor does it provide the tools to suggest decisions of what should be done next.
Descriptive analytics does provide significant insight into business performance and enables users to better monitor and manage their business processes. Additionally, descriptive analytics often serves as a first step in the successful application of predictive or prescriptive analytics. Organizations that effectively use descriptive analytics typically have a single view of the past and can focus their attention on the present, rather than on reconciling different views of the past.
Predictive analytics can be classified into six categories:
- Data mining: What data is correlated with other data?
- Pattern recognition and alerts: When should I take action to correct or adjust a process or piece of equipment?
- Monte-Carlo simulation: What could happen?
- Forecasting: What if these trends continue?
- Root cause analysis: Why did something happen?
- Predictive modeling: What will happen next if?
Predictive analytics uses the understanding of the past to make “predictions” about the future. Predictive analytics is applied both in real-time to affect the operational process (ex: real-time retention actions via chat messages or real-time identification of suspicious transactions) or in batch (target new customers on Web site or direct mail to drive cross-sell/up-sell, predict churn etc.). These predictions are made by examining data about the past, detecting patterns or relationships in this data and then extrapolating these relationships forward in time. For example, a particular type of insurance claim that falls into a category (pattern) that has proven troublesome in the past might be flagged for closer investigation.
Descriptive analytics may begin by providing a static view of the past, but as more instances are accumulated in the data sources that document past experience, the steps of evaluation, classification and categorization can be performed repetitively by fast algorithms, endowing the overall work process with a measure of adaptability. As descriptive analytics reach the stage where they support anticipatory action, a threshold is passed into the domain of predictive analytics. Predictive analysis applies advanced techniques to examine scenarios and helps to detect hidden patterns in large quantities of data in order to project future events. It uses techniques that segment and group data (transaction, individuals, events, etc.) into coherent sets in order to predict behavior and detect trends. It utilizes techniques such as clustering, expert rules, decision trees and neural networks. Predictive analysis is most commonly used to calculate potential behavior in ways that allow one to:
- Examine time series, evaluating past data and trends to predict future demands (level, trend, seasonality). Advanced methods include identifying cyclical patterns, isolating the impact of external events (e.g., weather), characterizing inherent variability and detecting trends.
- Determine “causality” relationships between two or more time series, for example forecasting the demand for replacement parts at a municipal bus maintenance facility by considering both historical usage rates and known, predicted or seasonal changes in passenger demand.
- Extract patterns from large data quantities via data mining, to predict non-linear behavior not easily identifiable through other approaches. This predicted behavior can be used to create policies that automate actions to be taken in the future; for example, by classifying past insurance claims, future claims can be flagged for investigation if they have a high probability of being fraudulent. In operational terms, predictive analytics may be applied as a guide to answer questions such as:
- Who are my best customers and what is the best way to target them?
- Which patients are most likely to respond to a given treatment?
- Is this insurance application likely to be rejected?
- Is this a suspicious transaction that may be fraudulent?
It is at this level that the term “advanced analytics” is more aptly applied. Included are techniques for predictive modeling and simulation as well as forecasting. In simulation, a model of the system is created; estimates or predictions about the future behavior of the system are made by exercising the model under a variety of scenarios. Simulation requires being able to build algorithms or mathematical constructs that provide a sufficiently accurate representation of the observable behavior of a system. This in turn can be used to evaluate proposed changes to a system before they are implemented, thus minimizing cost and risk.
Much of business process modeling falls into this category. Forecasting, which is part of predictive analytics, can be applied in many ways, not the least of which is predicting workload, which is often translated into resources required, including human resources. The forecasting activity establishes a desired end state, and details are subsequently translated into an agreed upon operational plan (enter enterprise planning activity) and together descriptive, advanced analytics, enterprise planning and final mile close/consolidate/compliance activities form a closed loop performance management system that repeats over and over again within an organization.
IBM offers the SPSS suite of products that allows clients to address these questions. The aim is to predict the future based on past events. This can be as simple as setting control levels via business rules that declare that when a particular business parameter goes out of a specified range, an alert should arise. It can be a detailed simulation model of how the business works that is played forward in time under different scenarios to evaluate what might happen. It includes statistical forecasting, which uses the past as a predictor of the future in order to gain a better understanding of what might happen if the past trends continue.
Predictive modeling techniques can also be used to examine data to evaluate hypotheses. If each data point (or observation) is comprised of multiple attributes, then it may be useful to understand whether some combinations of a subset of attributes are predictive of a combination of other attributes. For example, one may examine insurance claims in order to validate the hypothesis that age, gender and zip code can predict the likelihood of an auto insurance claim. Predictive modeling tools can aid in both validating and generating hypotheses. This is particularly useful when some of the attributes are actions determined by the business decision-makers.
Data is at the heart of predictive analytics, and to drive a complete view, data is combined from descriptive data (attributes, characteristics, geo/demographics), behavior data (orders, transaction, payment history), interaction data (e-mail, chat transcripts, Web click-streams) and attitudinal data (opinions, preferences, needs and desires). With a full view, customers can achieve higher performance such as dramatically lowering costs of claims, fighting fraud and maximizing payback, turning a call center into a profit center, servicing customers faster, and effectively reducing costs.
Beyond capturing the data, accessing trusted and social data inside and outside of the organization, and modeling and applying predictive algorithms, deployment of the model is just as vital in order to maximize the impact of analytics in real-time operations. The IBM goal is to drive predictive insights to the business user at the point of decisions driving real-time actions. Recent product capabilities combine predictive analytics, data mining, business intelligence, event processing and data management, plus business rules and optimization, to automate the high volume, high value, actionable decisions taken every day within an organization. Web-based business user interfaces are highly configurable in the field to address new business problems that enable business users to deploy real-
time actionable decision services to a wide range of business problems.
Prescriptive analytics, which is part of “advanced analytics,” is based on the concept of optimization, which can be divided into two areas:
- Optimization: How can we achieve the best outcome?
- Stochastic optimization: How can we achieve the best outcome and address uncertainty in the data to make better decisions?
Once the past is understood and predictions can be made about what might happen in the future, it is then time to think about what the best response or action will be, given the limited resources of the enterprise. This is the area of prescriptive analytics. Many problems simply involve too many choices or alternatives for a human decision-maker to effectively consider, weigh and trade off – scheduling or work planning problems, for example. Twenty, 15 or 10 years ago these problems could only be solved using computers running algorithms on a particular data set for hours or even days. It was not useful to embed such problem-solving capability into a decision support system since it could not provide timely results. Now, however, with improvements in the speed and memory size of computers, as well as the significant progress in the performance of the underlying mathematical algorithms, similar computations can be performed in minutes. While this kind of information might have been used in the past to shape policy and offer guidance on action in a class of situations, assessments can now be completed in real time to support decisions to modify actions, assign resources and so on.
IBM offers the ILOG Optimization products, including ILOG CPLEX Optimization Studio that allow clients to address these questions. Prescriptive analytics, based on mathematical optimization, is used to model a system of potential decisions, the interactions between those decisions, the factors or constraints limiting combinations of the decisions, and then uses robust mathematical algorithms to search for the best set of decisions that meet the constraints. Optimization is used pervasively in many industries in applications ranging from long-term planning to operational scheduling. Because of the computational requirements to solve an optimization problem, optimization is not applied in high-volume transactional applications. This is in contrast with the real-time applications of predictive analytics. However, the foundational mathematical and statistical techniques of predictive analytics can be combined with optimization in an area called stochastic optimization, where the goal is to create systems that make decisions that take into account the uncertainty in the data. IBM Research is actively investigating applications of stochastic optimization with its customers.
Aside from the categorization of a variety of analytical techniques given above, it is also important that these techniques be applied in a disciplined way. As a result, IBM has created a service line, Business Analytics and Optimization, to help clients realize the benefits of these powerful techniques. For those clients, IBM believes the biggest competitive advantage that they can realize from their information is when math is applied in new ways to solve specific challenges or opportunities within their business. For this reason, it is important to design a holistic framework that enables better use of advanced analytics by carefully defining and prioritizing the key business questions or opportunities which closely align with the performance management process and then focus on building the analytical tools for each of those questions and opportunities in a structured and prioritized order, leveraging a consistent data foundation.
IBM often begins the advanced analytics journey for clients by focusing on two related, but different objectives: efficiency and effectiveness.
- Efficiency. In the supply chain area, advanced analytics are often used to produce and/or deliver a set of services or products as efficiently as possible in order to meet defined customer needs or demands. Advanced analytics techniques such as inventory optimization, advanced planning and scheduling of resources or production plans, and supply chain network design/optimization represent common ways that companies apply advanced analytics to improve their ability to minimize the costs of delivering upon a given set of business and marketing goals associated with perceived customer needs and desires. Achieving efficiency in operations, however, does not necessarily mean that those operations are effective if the goals and objectives are based on an imperfect understanding of the needs and desires of the customers in the first place.
- Effectiveness. One phrase that is often used to describe the meaning of effectiveness for retail companies is “the right product/service, at the right place, at the right time, at the right price.” From an advanced analytics perspective, this is about applying predictive analytics to better understand what customers truly want and to understand the underlying drivers behind their buying behaviors. This often takes the form of advanced customer segmentation, pricing optimization, demand forecasting, marketing mix optimization, social network analysis, social media analysis and many other customer analytics techniques.
We see many organizations delay the use of advanced analytics until after they have fully rationalized and developed complex data warehouses and verified the quality of all data. However, beginning the process of applying advanced analytics to address critical business questions, challenges or opportunities can actually serve as an extremely valuable input into a broader data warehousing and data quality initiative. This occurs by highlighting the key data and data transformations that are needed to drive important business insights, decisions and actions. As is often the case when applying truly advanced mathematical and data analytics concepts, it is not known what questions can actually be answered until the analytical discovery process has started. These previously unforeseen questions or unforeseen ways to analyze data can have a significant impact on the design of data warehouses and information management systems. For this reason, we often separate the application of advanced analytics into two categories:
- Creating the mathematical models. Companies can easily use data from a wide variety of sources and formats during the model creation process to create and refine models that reflect in-depth understanding of the relationship between inputs and desired outcomes. Data quality issues can also be addressed during the model creation process in an exploratory manner. In most cases, IBM believes in scoping and phasing these activities around business challenges that can be solved in 8- to 12-week intervals so that the business can see, understand and validate the value that these advanced analytics models can produce.
- Putting the models in action. Once the models or algorithms have been created and validated, then it is time to build the data architectures and systems to assemble current or real-time data into a model or algorithm that is then architected into an operational or planning system to drive decisions or actions for a business. This is the stage where completing the enterprise wide master data management and data quality activities becomes critical.
The vision at IBM is that descriptive analytics allows an understanding of what has happened, while advanced analytics, consisting of both predictive and prescriptive analytics, is where there is real impact on the decisions made by businesses every day. As IBM embarks on its strategy for business analytics, we envision applications that will take advantage of the combination of descriptive, predictive and prescriptive analytics. With the combination of the individual software offerings in business analytics, industry-specific analytic solutions, the business analytics and optimization service line and IBM Research, IBM is aiming to be a market leader in business analytics.
Irv Lustig (email@example.com) is Business Unit Executive, [IBM] ILOG Optimization Solutions Leader, responsible for sales strategy for selling solutions based on ILOG optimization products. Brenda Dietrich (firstname.lastname@example.org) is an IBM Fellow and Vice President, Business Analytics and Mathematical Sciences, IBM T J Watson Research Center. Christer Johnson (email@example.com) is IBM Global Business Services Leader for Advanced Analytics and Optimization Services for North America. Christopher Dziekan (Christopher.Dziekan@ca.ibm.com) is Chief Strategy Officer for Business Analytics Software at IBM and leads the Office of Strategy.
1. Davenport, Thomas H., and Jeanne G. Harris, 2007, “Competing on Analytics: the New Science of Winning,” Boston, Mass., Harvard Business School, pp. 7.