Definition
Natural Language Processing (NLP) is the field of computer science that studies and develops algorithms and models enabling computers to process, understand, and generate human natural language (text, speech). NLP combines computational linguistics with machine learning to solve practical problems requiring semantic understanding.
The discipline is interdisciplinary: linguistics, computer science, cognitive psychology, and statistics converge on how to formalize language and teach systems to handle it.
Fundamental Components
Morphology and Syntax: analysis of linguistic structure.
- Tokenization: segmentation of text into tokens (words, sub-word)
- Part-of-speech tagging: identification of nouns, verbs, adjectives
- Parsing: extraction of syntactic structures (dependency trees)
Semantics: meaning.
- Word sense disambiguation: which meaning of a word is intended?
- Relationship extraction: which entities are correlated and how?
- Semantic role labeling: who does what to whom?
Pragmatics and Discourse: context and intention.
- Coreference resolution: which pronouns refer to which entities?
- Sentiment analysis: emotional tone of the text?
- Entailment: does one sentence logically imply another?
Main NLP Tasks
Understanding (Discriminative):
- Classification: sentiment, spam detection, topic categorization
- Named Entity Recognition (NER): identification of people, places, organizations
- Relation extraction: extraction of relations between entities
- Question Answering: answering questions about text
Generation (Generative):
- Machine translation: text from one language to another
- Summarization: synthesis of long documents
- Text generation: creation of coherent text (articles, creativity)
- Dialogue: conversational systems
Structured Prediction:
- Tagging: assigning labels to sequences (POS tagging, NER, chunking)
- Parsing: extraction of structures (syntax trees, dependency graphs)
Methodological Evolution
Era 1: Rule-based (1950s-1980s): hand-written rule systems. Fragile, limited to restricted domains.
Era 2: Statistical NLP (1990s-2010s): probabilistic models (HMM, CRF, SVM). Manual feature engineering, but better generalization.
Era 3: Neural NLP (2010s): recurrent neural networks (LSTM, GRU), convolutional. Automatic feature learning. Breakthrough on sequence-to-sequence models.
Era 4: Pre-trained models / Transformers (2018+): BERT, GPT, T5. Pre-training on web-scale data. Dominant paradigm today.
Benchmarks and Evaluation
GLUE (General Language Understanding Evaluation): 9 tasks. Average accuracy ~94% (human-level ~96%, attainable). Benchmark “solved” by LLMs.
SuperGLUE: harder version. Many large models still underperform humans.
SQuAD (Stanford Question Answering Dataset): machine reading comprehension. Accuracy over 90% on recent models.
MTEB (Massive Text Embedding Benchmark): 56 tasks of retrieval, clustering, classification. Comprehensive benchmark for embedding models.
WMT (Workshop on Machine Translation): benchmark for translation. BLEU score is standard metric (weak correlation with human quality).
Use Cases
Chatbots and Assistants: conversational chatbots, FAQ answering, customer support automation.
Content analysis: analysis of customer feedback, social media monitoring, content moderation.
Information extraction: structured extraction from unstructured documents (contracts, articles).
Search and Ranking: semantic search (vs. keyword matching), ranking results by relevance.
Machine translation: automatic translation between languages.
Document classification: automatic document categorization.
Practical Considerations
Data requirements: modern NLP requires abundant data (millions of examples for specific tasks). Transfer learning (fine-tuning pre-trained models) mitigates this for some tasks.
Language diversity: models are often trained on English. Multilinguality (Italian, under-resourced languages) remains challenging. Multilingual models (mBERT, XLM-RoBERTa) have lower per-language performance vs. monolingual.
Pragmatic ambiguity: sarcasm, idioms, referential ambiguity remain difficult. Humans disambiguate through world context; models lack this.
Interpretability: LLMs are black boxes. Understanding why a model makes a prediction is hard. Research in explainability is active (attention weights, SHAP, LIME).
Common Misconceptions
”Modern NLP is ‘solved’ by LLMs”
Partial. Stylized benchmarks (GLUE) have reached human-level accuracy. But real tasks (domain shift, adversarial examples, rich linguistics) remain difficult. Zero-shot generalization is better but imperfect.
”NLP models ‘understand’ language”
No. They operate on statistical representations. They have no inner world, consciousness, or understanding in the cognitive sense. They produce plausible output without awareness.
”Once trained, the NLP model solves any linguistic task”
No. Transfer learning mitigates data scarcity, but specialization remains relevant. A model trained on news generates journalistic style; on legal text, it may underperform.
Related Terms
- LLM: modern instance of NLP, generative at web-scale
- Transformer: dominant architecture in contemporary NLP
- Embeddings: vector representations of text
- Tokenization: fundamental NLP preprocessing
- RAG: retrieval pattern extending NLP capabilities with external knowledge
Sources
- Jurafsky, D. & Martin, J.H. (2024). Speech and Language Processing (3rd Edition). Stanford (standard textbook)
- Lewis-Kraus, G. (2023). The Great AI Awakening. NYT Magazine
- Papers with Code - NLP Benchmarks: aggregation of benchmarks and sota
- EMNLP: primary conference for NLP research