Natural Language Processing

Sonali Narang

Computer Applications Department

sonalinarang9@gmail.com

INTRODUCTION

There have been high hopes for Natural Language Processing. Natural Language Processing, also known simply as NLP, is part of the broader field of Artificial Intelligence, the effort towards making machines think. Computers may appear intelligent as they crunch numbers and process information with blazing speed. In truth, computers are nothing but dumb slaves who only understand on or off and are limited to exact instructions. But since the invention of the computer, scientists have been attempting to make computers not only appear intelligent but are intelligent. A truly intelligent computer would not be limited to rigid computer language commands, but instead be able to process and understand the English language. This is the concept behind Natural Language Processing. The phases a message would go through during NLP would consist of message, syntax, semantics, pragmatics, and intended meaning. (M. A. Fischer,1987) Syntax is the grammatical structure. Semantics is the literal meaning. Pragmatics is world knowledge, knowledge of the context, and a model of the sender. Alan Turing predicted of NLP in 1950 (Daniel Crevier, 1994, page 9):"I believe that in about fifty years' time it will be possible to program computers to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning."
But in 1950, the current computer technology was limited. Because of these limitations, NLP programs of that day focused on exploiting the strengths the computers did have. For example, a program called SYNTHEX tried to determine the meaning of sentences by looking up each word in its encyclopedia.

OBJECTIVES

The goal of the Natural Language Processing (NLP) group is to design and build software that will analyze, understand, and generate languages that humans use naturally, so that eventually you will be able to address your computer as though you were addressing another person.

This goal is not easy to reach.

"Understanding" language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master.

Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

Instructional Objective

How an intelligent system can be develop. First step that student must understand is the necessity and Ambiguities in NLP processing, understanding difference between natural and formal language and processing the former, steps involved in natural language understanding, required information i.e., syntax, semantics, world-knowledge, phonology, morphology and Basic language operation such as semantics processing, knowledge representation, parts-of-speech tagging, Morphology analysis.

METHODOLOGY

Medical language processing (MLP) systems that codify information in textual patient reports have been developed to help solve the data entry problem. Some systems have been evaluated in order to assess performance, but there has been little evaluation of the underlying technology. Various methodologies are used by the different MLP systems but a comparison of the methods has not been performed although evaluations of MLP methodologies would be extremely beneficial to the field. This paper describes a study that evaluates different techniques. To accomplish this task an existing MLP system Med LEE was modified and results from a previous study were used. Based on confidence intervals and differences in sensitivity and specificity between each technique and all the others combined, the results showed that the two methods based on obtaining the largest well-formed segment within a sentence had significantly higher sensitivity than the others by 5% and 6%. The method based on recognizing a complete sentence had a significantly worse sensitivity than the others by 7% and a better specificity by .2%. None of the methods had significantly worse specificity.

MAJOR TASKS IN NLP

Following is a list of NLP task some of which has direct real-world application, and some are used to aid in solving larger tasks. These tasks are different from other potential and actual NPL task because of the volume of research devoted to these tasks, problem setting, standard metric, corpora to evaluate task and competition devoted are defined for each specific task.

Automatic summarization Co reference resolution

Discourse, Sentiment analysis

Named entity, Speech, Optical character recognition

Natural language understanding, generation etc

THE FUTURE IN NLP

Human-level natural language processing is an AI-complete problem. That is, it is equivalent to solving the central artificial intelligence problem—making computers as intelligent as people, or strong AI. NLP's future is therefore tied closely to the development of AI in general.

As natural language understanding improves, computers will be able to learn from the information online and apply what they learned in the real world. Combined with natural language generation, computers will become more and more capable of receiving and giving instructions.

CONCLUSION

This goal is not easy to reach. "Understanding" language means, among other things, knowing what concepts a word or phrase stands for and knowing how to link those concepts together in a meaningful way. It's ironic that natural language, the symbol system that is easiest for humans to learn and use, is hardest for a computer to master. Long after machines have proven capable of inverting large matrices with speed and grace, they still fail to master the basics of our spoken and written languages.

The challenges we face stem from the highly ambiguous nature of natural language. As an English speaker you effortlessly understand a sentence like "Flying planes can be dangerous". Yet this sentence presents difficulties to a software program that lacks both your knowledge of the world and your experience with linguistic structures. Is the more plausible interpretation that the pilot is at risk, or that the danger is to people on the ground? Should "can" be analyzed as a verb or as a noun? Which of the many possible meanings of "plane" is relevant? Depending on context, "plane" could refer to, among other things, an airplane, a geometric object, or a woodworking tool. How much and what sort of context needs to be brought to bear on these questions in order to adequately disambiguate the sentence?

We address these problems using a mix of knowledge-engineered and statistical/machine-learning techniques to disambiguate and respond to natural language input. Our work has implications for applications like text critiquing, information retrieval, question answering, summarization, gaming, and translation. The grammar checkers in Office for English, French, German, and Spanish are outgrowths of our research; Encarta uses our technology to retrieve answers to user questions; Intellishrink uses natural language technology to compress cellphone messages; Microsoft Product Support uses our machine translation software to translate the Microsoft Knowledge Base into other languages. As our work evolves, we expect it to enable any area where human users can benefit by communicating with their computers in a natural way.

REFERENCES

· Munmun De Choudhury, Scott Counts, Eric Horvitz, and Michael Gamon, Predicting Depression via Social Media., AAAI, July 2013

· Michael Gamon, Martin Chodorow, Claudia Leacock, and Joel Tetreault, Grammatical Error Detection in Automatic Essay Scoring and Feedback, in Handbook of Automated Essay Evaluation, Routledge, May 2013

· Munmun de Choudhury, Michael Gamon, Aaron Hoff, and Asta Roseway, "Moon Phrases": A Social Media Facilitated Tool for Emotional Reflection and Wellness., European Alliance for Innovation, May 2013

· Hassan Sajjad, Patrick Pantel, and Michael Gamon, Underspecified Query Refinement via Natural, ACL/SIGPARSE, December 2012

· Patrick Pantel, Thomas Lin, and Michael Gamon, Mining Entity Types from Query Logs via User Intent Modeling, Association for Computational Linguistics, July 2012

· Munmun De Choudhury, Scott Counts, and Michael Gamon, Not All Moods are Created Equal! Exploring Human Emotional States in Social Media., Association for the Advancement of Artificial Intelligence, June 2012

Pages

Upcoming & latest technologies in 2014

Labels

Tuesday, 28 January 2014