Sexy job, sense of humor, slogan
By Peter Horner
They already have the sexiest job of the 21st century according to a Harvard Business Review article by Tom Davenport and D.J. Patil. Now, it turns out, data scientists also have a great sense of humor. Who knew?
The Institute for Operations Research and the Management Sciences (INFORMS) will hold exams for its Certified Analytics Professional program according to the following schedule:
April 29, 2014
MORS Educational & Professional Colloquium (EPD)
Virginia Military Institute
May 17, 2014
May 22, 2014
University of Cincinnati
Lindner College of Business
Carleton Room, Ottawa Marriott
(in conjunction with CORS Annual Meeting)
Ottawa, Ontario, Canada
June 21, 2014
INFORMS Conference on The Business of Big Data
San Jose Marriott
San Jose, Calif.
To apply, click on https://www.informs.org/Certification-Continuing-Ed/Analytics-Certification/Apply-for-Certification
For more information, click on https://www.informs.org/Certification-Continuing-Ed/Analytics-Certification
June 22-24, 2014
2014 INFORMS Conference on the Business of Big Data
San Jose, CA
Analytics Section of INFORMS NewsAnalytics Section welcomes new members from SPRIG
At the close of 2013, the officers of the INFORMS Analytics Section and the Spreadsheet Productivity Research Interest Group (SPRIG) agreed that SPRIG would merge into the Analytics Section. SPRIG’s officers evaluated membership trends and activities, and came to a difficult conclusion that the group was not likely to be viable in its current form. They approached the Analytics Section, which welcomed any and all SPRIG members.Read More
Analytics Section of INFORMS NewsIBM, UNICEF team up to win Innovative Applications Award
For members of the INFORMS Analytics Section, one of the many highlights of the recent INFORMS Conference on Business Analytics and Operations Research were the excellent presentations by the finalists in the Third Annual Innovative Applications in Analytics Award competition.Read More
Special ArticlesINFORMS conference to focus on the business of big data
The Institute for Operations Research and the Management Sciences (INFORMS, publishers of Analytics magazine) is launching a new topical conference that will put the focus squarely on the business of big data – how organizations can transition from being data-rich to decision-smart. The INFORMS Big Data Conference will be held June 22-24 at the San Jose Convention Center in San Jose, Calif.Read More
The times they are a changin’ for advanced analytics
Statistical modelers urged to embrace machine learning, open-source tools for the road ahead.
By Sameer Chopra
My thesis below addresses the following points:
- While statistical modeling is not going away, analytics groups are advised to leverage machine-learning approaches as well.
- While traditional statistical modeling software packages are not going away, analytics groups need to actively embrace new skill-sets in emerging software such as open-source tools (e.g., R, MangoDB) and Big Data tools (e.g., Hadoop). Big Data is just getting bigger, and new tools are emerging that round out the tool suite of analytics groups.
Statistical Modeling vs. Machine Learning
Since the mid-1990s I have used statistical modeling tools such as SAS as the primary tool for advanced analytics. I would place myself squarely in the camp of “statistical modelers” (vs. my machine learning friends – though I realize some might quibble with this distinction). Over the years I have led teams of statistical analysts who have primarily used such statistical packages as SAS/SPSS/S-Plus, etc. as their go-to analysis tool.
In my current capacity, I am responsible for advanced analytics at Orbitz Worldwide. Advanced analytics is a strategic lever at Orbitz and has the good fortune of executive support at the highest levels. Competing on analytics is feasible only if there is buy-in at the highest levels.
I lead the traditional statistical modelers as well as the chief scientist and the machine learning (ML) crew. At Orbitz, we have found value in incorporating both types of data mining professionals (machine learners and statistical modelers) because many problems are well-suited for both camps. For example, the statistical modelers effectively address areas such as marketing mix analysis, predictive models across online marketing channels, customer lifetime value models, churn models, credit card fraud models, etc. Similarly, the machine learning staff deploys their algorithms in areas leveraging Big Data, where system feedback is leveraged to quickly learn from patterns in order to self-improve – areas such as the Hotel Recommendation Engine and Hotel Sort on the Orbitz Web site.
Conceptually, both camps are “data mining” professionals, so there is a lot of overlap. For instance, both fields do work with some common methods such as decision trees and clustering algorithms. I also find that the camps often use different jargon for the same basic concepts (“weights” vs. “parameters,” “learning” vs. “fitting,” etc.).
However, I find the machine learning area to clearly be of a different cloth – the contrast in tools and approaches between ML and statistical modelers is rather stark. The following are but a few examples to illustrate some differences between the two sides:
- Apart from cosmetic differences in labels used, statistical modeling has a probabilistic approach with a strong emphasis on parametric assumptions, regression diagnostics, inference, hypothesis testing, interpretability of model and so on – areas not important in the ML world.
- On the flip side, ML practitioners regularly use tools such as support vector machines (SVM), tools that are not commonly used by statistical modelers. ML focuses on predictive accuracy and not much on interpretation of models. Note that ML has its roots in artificial intelligence (AI), and practitioners of machine learning usually tend to have a strong computer science background – another key difference.
The comparison sparked the following question: “Which side of this analytics fence lends itself better to the road ahead?” My (likely controversial) response: “At this point in time, machine learning!” In fact, never before has the need for this been as forceful and urgent as it is today. I am not implying that statistical modeling is going away, but I am stating that machine learning is rapidly increasing in relevance and prominence. It makes sense for analytical teams to complement their skill sets by incorporating machine-learning approaches in order to be better positioned for the road ahead.
Not surprisingly, general interest in machine learning has exploded in the past year. Late last year, Stanford University offered a free online course in ML/AI that went viral to the point of having well over 100,000 students register from around the world in a matter of weeks! (This speaks to both the growing interest in ML as well as to a fundamental paradigm shift in the making vis-à-vis the educational method/framework.)
Big Data & Open Source Analytics
Machine learning lends itself well to situations where the design and development of algorithms is against high dimensional data where computational issues are very important – and the Big Data paradigm shift, along with open source tools, is ideally suited for ML to leverage.
The open source language R has become the data-mining tool of choice for machine learners for the following reasons:
- R has very good integration with Hadoop, an area where established commercial statistical tools have frankly been playing catch-up over the past year. (Note: At the time of this writing, some established statistical solution providers were announcing an access interface to Hadoop.)
- Many startups and smaller firms do not have deep pockets and are embracing open source tools such as the R programming language and NoSQL database systems such as MangoDB.
- R is a leading language for developing new statistical methods, and it is a platform for statistical innovation and collaboration across both the corporate world and academia. In my opinion, for the first time in years, the stronghold of established commercial players seems to be potentially threatened; open source tools are better suited for Big Data and will slowly but surely continue to take share away from commercialized statistical packages. In fact, traditional statistical vendors have recognized that R is a force to be reckoned with. In response, many of these vendors have developed hooks into R so users can interface with the R language.
- Based on the resumes I’ve been reading, the next generation of data miners is flocking to R as their go-to tool. Professors in general are comfortable with R; they tend to use R and Excel as part of their curriculum.
- In short, open-source analytics tools and platforms have arrived.
R hasn’t been widely adopted in the corporate world because it used to be considered (and still is to a large extent) not quite “enterprise ready,” but even that is changing as firms such as Revolution Analytics focus on the enterprise capabilities for R.
Despite some hype associated with the topic of Big Data, it is generally acknowledged that Big Data and Distributed Computing are rapidly changing the analytics landscape. Leveraging Hadoop and being well-versed in MapReduce jobs is quickly transitioning from a “nice to know” to a “must do” skill. Here again, machine learning practitioners seamlessly tend to adapt, whereas many traditional statistical modelers seem to face a “who moved my cheese” syndrome. Prerequisites such as being well-versed in Python or Java tend to be second nature to those in the ML camp.
What does this mean for today’s traditional statistical modelers?
Gone are the days when a statistical analyst might have been complacent about a relatively slowly changing world (relative to say a computer science or IT professional who had to strive more to stay current with changing languages and new tools). In order to stay competitive, it would behoove traditional statistical modelers to proactively plunge into professional development mode and take a page from the book of our machine-learning friends.
Specifically, the best-in-class analytical organizations of the future will be those that embrace traditional statistical modeling and machine learning approaches along with established and emerging tools and technology associated with Big Data analytics, including R, Hadoop/HDFS, Map Reduce, Java/Python, Pig, Hive, etc.
The times they are a changin’….
Sameer Chopra (Sameer.Chopra@orbitz.com) is vice president of Advanced Analytics at Orbitz Worldwide, Inc., a leading global online travel company. He has more than 15 years of experience in applying data mining and predictive analytics across various business domains at both Fortune 500 firms and startups. Before joining Orbitz, Chopra led the Marketing Analytics and Web testing team at Intuit’s Small Business Group and served as director of analytics at eBay. He holds a master’s degree in Operations Research from the Massachusetts Institute of Technology.