Natural Language Understanding and Processing are Key to Unlocking Information Value
5 years ago, as organizations were starting to grapple with the question of Big Data and Big Content and what to do about it, AIIM issued an eBook called The Big Data Balancing Act – Too much yin and not enough yang?
I recently had occasion to take another look at that eBook. I was pleased to see that rather than being aged and dated, it actually was really quite ahead of its time in understanding the current issues facing the information management space — and the key role that natural language processing (NLP) and natural language understanding (NLU) will play in resolving those issues.
The core thesis of the eBook was that yes, we were on the cusp of a massive growth in demand for data scientists — the “yin” of Big Data and Big Content.
“The current big data story talks about predictive analytics, text-mining, Hadoop, MapReduce and the other latest toys – sorry, tools – that can process all of the zettabytes of information coming at your organization daily….A perfect storm of data, where capability and execution can now match expectation, has arrived – and there is a vast array of literature available describing why this has happened, what tools and technologies are around to cope with this, and how the data scientist fits in.”
However, organizations need more than the “yin” — data scientists — in order to be successful. They also need the “yang” — business executives who understand data and information and can use them to totally new combinations to structure new hypotheses and insights about the business — call this new breed of executive a data entrepreneur or information professional:
“No matter how well a data scientist ‘understands’ business, the most valuable insights that can be derived from a big data project all require intimate knowledge of the specific business areas of an organization; a combination of detailed subject matter knowledge and experience, awareness of what questions need to be asked, and how the subsequent results can be deployed.”
These two sets of skills — data scientists, who have their “ears to the business and their eyes full on the data,” and data entrepreneurs, who have “exactly the opposite focus, eyes full on the business, ears to the data” — are both critical to capitalizing on new opportunities created by artificial intelligence and machine learning.
Natural language understanding, NLU (the ability of machines to understand and draw inferences about what information actually means, based on the context in which it is used) and natural language processing, NLP (the ability to query machines in natural language and have machines understand these commands) lie at this critical intersection between the data scientist and the data entrepreneur.
The current challenge is that data and content in most organizations is not only BIG, it’s also DARK. All of the machine learning technology in the world will be for naught if the raw material fueling machine learning is incomprehensible. Metadata is the key to solving this problem and opening the locks on information value. Organizations need the capability to understand BOTH semantics AND concepts in their content and communication systems.
NLU is critical to assigning metadata where it doesn’t exist and to the ability of machines to understand what an information object is and what it means. This task is simply too big and too complex to be performed by humans. Automation through artificial intelligence is needed in order to keep up with the rising scale and complexity of information coming into our organizations.
And as we then query this information — as data and information entrepreneurs form new hypotheses about the business and customers based on all of this new information — NLP becomes a key enabler. The data entrepreneur needs to focus on what business questions are of interest, create a series of hypotheses that can help answer those questions (or at least move closer to an answer) and determine what data can be used to assist in the process. And will expect to perform this function in natural language, not machine language.
We are in an era in which work will be radically redefined, people will have much more powerful tools, and mundane and repetitive work will transfer to machines. To do this — and do it at scale — organizations need technology that automatically draws intelligence from text in unstructured content, which streamlines the process of classifying, processing and securing business information while also providing contextual insights on related content assets and workflows.
View original content: Here
Related M-Files News: