Last week here in New York we had the opportunity to invite a couple of dozen law firms to “An Introduction to IBM Watson” at the brand new $1-billion IBM Watson facility down on Astor Place.
This is not going to be a report on that event, except insofar as it helped advance our thinking on the general concept of “machine learning,” which was also the topic of a lead article in the current McKinsey Quarterly. That piece, An Executive’s Guide to Machine Learning: it’s no longer the preserve of artificial intelligence researchers and born-digital companies like Amazon, Google, and Netflix, is a primer on the need for machine learning in a world of big data, and how it is emerging as a mainstream management tool.
The proper starting point is simply to define “machine learning:” Here’s what McKinsey has to say.
Machine learning is based on algorithms that can learn from data without relying on rules-based programming. It came into its own as a scientific discipline in the late 1990s as steady advances in digitization and cheap computing power enabled data scientists to stop building finished models and instead train computers to do so.
Even this definition would benefit from a bit of unpacking, so let me take a stab at it with an example. For years AI researchers focused on trying to write exhaustive lists of rules to guide computers in such areas as recognizing whether a given image was a cat, yet no matter how many rules were piled on top of rules, exceptions and unforeseen situations invariably arose, at which point the system was essentially helpless—no available rule applied, so the rule set was exhausted.
Machine learning took the opposite approach: Without relying on any rules whatsoever, it simply ingested a large data set of (in this case) cat images, and began to generate its own scoring system to arrive at probabilities on whether image X was in fact a cat. Note one key difference from rules-based systems: Machine learning is designed to achieve probabilities, so whereas a rules-based system that came to a “dead end” with rules couldn’t offer any opinion whatsoever, at least machine learning could venture an opinion joined to a probabilistic confidence level.
Another point: The larger the data set the machine learning algorithm can review, the more accurate it will become (and it never forgets what it’s already seen).
I feel compelled to intercept a debate which may be going on in some readers’ minds by about this point: If you’re asking yourself the question of whether the humans or the machines will be the ultimate winners in this is, might I respectfully suggest that is massively the wrong question. As a practical matter, we simply don’t know, although that has never been the case with the introduction of major new technology before. But as an optimist and an armchair business historian, my money is on the machines providing a platform from which humans can vault to the next level of productivity, insight, creativity, and intellectual achievement.
Back to the thread.
Now, recognizing cats is obviously a trivial example, but Watson and other implementations are addressing far more sophisticated and consequential decisions. It’s probably nowhere more advanced than in assisting medical diagnoses, and those of us who could attend the event at IBM last week saw a powerful example of that type of domain expertise/functionality, which was literally life-saving.
How is machine learning different from classical statistics, including such things as regression analysis?
Probably the most important distinction is that to arrive at hypotheses about correlations, statistics has to rely on the conjectures of statisticians (or other experts in the field). As we have all learned over the past decade or so, unconscious and unrecognized biases creep in when humans are applying judgment. Here’s an example from the McKinsey article (emphasis supplied):
Closer to home, our colleagues have been applying hard analytics to the soft stuff of talent management. Last fall, they tested the ability of three algorithms developed by external vendors and one built internally to forecast, solely by examining scanned résumés, which of more than 10,000 potential recruits the firm would have accepted. The predictions strongly correlated with the real-world results. Interestingly, the machines accepted a slightly higher percentage of female candidates, which holds promise for using analytics to unlock a more diverse range of profiles and counter hidden human bias.
In other words, with classical statistics, to test hypotheses statisticians first have to have hypotheses, and that’s where human bias can come in.
Watson sounds really neat, but will it really make more of an impact on law practice than other machine learning systems? I’m unconvinced.
There is no current single overarching best machine learning technology, rather different best approaches depending on the problem, and constant advances in understanding of what the best approach for a specific problem is. IBM is devoting a lot of resources to Watson, but other big companies are also pouring money in to machine learning (e.g., Google, Microsoft, HP, Facebook, Baidu, Amazon). Perhaps as important, lots of companies are building machine learning technology for specific verticals (like us with contract review). Current machine learning is quite problem-specific, and these companies are getting experience honing their technology for their particular use cases. Will Watson’s technology really be better for specific verticals (like law, or sub-areas within law) than companies focused on those specific verticals?
For a much more detailed analysis of these points, see my recent post “One Ring to Rule Them All? Will IBM’s Watson Transform Contract Review and Law Practice?”:
http://info.kirasystems.com/blog/one-ring-to-rule-them-all-will-ibms-watson-change-law-practice
I have read the ASE posts on Watson and machine learning as using Watson as an example, in fact as almost surely the only example most of us would recognize by name; not as an endorsement much less a prediction.
The McKinsey article contains some additional information that strikes me as crucial, specifically the need for users of “machine learning”, especially in areas like Law, to have access to two classes of personnel: “Quants” and “Translators.” The harder position to fill will be your translator, who will need to be able to work both directions with something like equal facility. It is not just explaining to the C-suite what that graph actually says and why you should find its results reliable, it is being able to take strategic directions of the company and explain to the Quants what is required. If the problem is not understood properly, if there is not clarity by all parties as to what counts as an answer to the problem, then the Quants will go off and do their thing, and it may be some fair time before anyone knows whether the problem has been addressed in ways that are in fact useful.
Where will one find / how will one develop the Translators?