Natural Language Processing (NLP) – Study Notes

General Science July 28, 2025 4 min read

What is Natural Language Processing?

Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that enables computers to understand, interpret, and generate human language. It bridges the gap between human communication and computer understanding, allowing machines to process text and speech data in ways that are meaningful and useful.

Key Components of NLP

Tokenization: Breaking text into words, phrases, or symbols.
Part-of-Speech Tagging: Identifying grammatical categories (noun, verb, etc.).
Parsing: Analyzing sentence structure.
Named Entity Recognition (NER): Detecting names, organizations, dates, and other entities.
Sentiment Analysis: Determining the emotional tone behind text.
Machine Translation: Automatically translating text between languages.
Text Summarization: Reducing text to its essential points.

How NLP Works

Preprocessing: Text is cleaned (removing punctuation, stop words, etc.).
Feature Extraction: Numerical features are generated from text (e.g., word embeddings).
Modeling: Machine learning models (e.g., neural networks, transformers) are trained on labeled data.
Prediction: The model predicts outputs such as sentiment, translation, or entity types.

Practical Applications of NLP

Healthcare: Extracting information from medical records, automating patient queries.
Drug Discovery: AI-powered NLP identifies chemical interactions and potential drug candidates by mining scientific literature. For example, NLP models have accelerated COVID-19 research by analyzing thousands of papers (Wang et al., 2020).
Customer Service: Chatbots and virtual assistants handle queries and complaints.
Legal: Automating contract analysis, legal research, and compliance monitoring.
Education: Automated grading, feedback generation, and language learning apps.
Social Media Monitoring: Tracking public sentiment, misinformation, and trends.
Accessibility: Speech-to-text and text-to-speech tools for people with disabilities.
Material Science: NLP parses research papers to find new material properties and synthesis methods.

Surprising Facts about NLP

NLP Can Detect Mental Health Issues: Recent studies show NLP algorithms can identify signs of depression and anxiety from social media posts and online conversations.
Language Models Can Generate Scientific Hypotheses: Large language models have been used to suggest new chemical compounds and experimental approaches in drug discovery.
NLP Models Learn Biases from Data: NLP systems can inadvertently learn and propagate social, cultural, and gender biases present in training data, impacting fairness and ethics.

Debunking a Myth

Myth: NLP can fully understand human language just like a person.

Fact: NLP models do not “understand” language in a human sense. They identify patterns and statistical relationships within text, but lack true comprehension, context awareness, and common sense reasoning. Their outputs are based on learned associations, not genuine understanding.

Impact on Daily Life

Search Engines: NLP improves the relevance of search results by interpreting queries more accurately.
Voice Assistants: Devices like smartphones and smart speakers rely on NLP for voice commands and conversation.
Spam Filtering: NLP detects and blocks unwanted emails.
Social Media: NLP powers content moderation, translation, and personalized recommendations.
Accessibility: Real-time translation and speech recognition help break language barriers and support users with disabilities.

Recent Research Example

A notable 2020 study, “CORD-19: The COVID-19 Open Research Dataset” (Wang et al., 2020), used NLP to analyze over 29,000 scientific articles about COVID-19. The project enabled researchers to rapidly identify relevant information, track emerging trends, and accelerate drug and vaccine discovery.

Reference:
Wang, Lucy Lu, et al. “CORD-19: The COVID-19 Open Research Dataset.” ArXiv preprint arXiv:2004.10706 (2020). Link

Diagram: NLP in Daily Life

NLP daily life diagram

Future Directions

Multimodal NLP: Combining text, image, and audio data for richer understanding.
Low-resource Languages: Expanding NLP tools to languages with limited digital resources.
Ethical NLP: Reducing bias and improving fairness in language models.

Summary Table

Aspect	Details
Definition	AI field for processing human language
Key Techniques	Tokenization, NER, sentiment analysis, translation
Applications	Healthcare, drug discovery, education, customer service
Surprising Facts	Mental health detection, hypothesis generation, bias
Daily Impact	Search, voice assistants, accessibility, moderation
Recent Study	CORD-19 dataset for COVID-19 research (2020)

Conclusion

Natural Language Processing is a rapidly evolving field with wide-ranging impacts on science, technology, and daily life. Its applications in drug discovery and materials science are transforming research, while its presence in consumer technology is making interactions more intuitive and accessible. Ongoing advances promise even greater integration and understanding in the future.