Computer Vision: Study Notes

General Science July 28, 2025 4 min read

What is Computer Vision?

Computer Vision (CV) is a field of artificial intelligence (AI) that enables machines to interpret and understand the visual world. It involves developing algorithms and systems that can process, analyze, and make decisions based on images or videos.

Key Concepts

1. Image Acquisition

Analogy: Like human eyes capturing a scene, cameras or sensors capture digital images for computers.
Real-World Example: Security cameras recording footage for facial recognition.

2. Image Processing

Analogy: Like adjusting brightness or contrast on a photo app.
Real-World Example: Instagram filters automatically enhancing photos.

3. Feature Extraction

Analogy: Identifying key landmarks on a map.
Real-World Example: Detecting edges, corners, or textures in an image to find objects.

4. Object Detection & Recognition

Analogy: Spotting your friend in a crowd.
Real-World Example: Self-driving cars recognizing pedestrians and traffic signs.

5. Image Segmentation

Analogy: Cutting a pizza into slices to analyze each piece.
Real-World Example: Medical imaging separating organs in an MRI scan.

6. Classification

Analogy: Sorting mail into different bins based on address.
Real-World Example: Sorting images into categories like cats, dogs, or cars.

7. Tracking

Analogy: Following a moving ball in a sports game.
Real-World Example: Surveillance systems tracking a person across multiple cameras.

Core Techniques

Convolutional Neural Networks (CNNs): Mimic the human visual cortex. Used for image classification, object detection, and more.
Transfer Learning: Using pre-trained models for new tasks, saving time and resources.
Data Augmentation: Creating new training data by altering existing images (rotating, flipping, etc.).
Generative Adversarial Networks (GANs): Two neural networks (generator and discriminator) compete to create realistic images.

Analogies & Real-World Examples

Barcode Scanner: Like a cashier scanning items, CV systems read barcodes to identify products.
Face ID on Smartphones: Similar to recognizing a friend by their face, CV matches facial features to unlock devices.
Google Photos Search: Like flipping through a photo album to find all beach pictures, CV can search and group similar images.

Applications

Healthcare: Analyzing X-rays and MRIs for disease detection.
Agriculture: Monitoring crop health via drone imagery.
Retail: Automated checkout using visual recognition.
Manufacturing: Detecting defects in products on assembly lines.
Autonomous Vehicles: Navigating roads using real-time video feeds.
Drug and Material Discovery: Using CV to analyze molecular structures, as in the AI-driven discovery of new antibiotics (Nature, 2020).

Common Misconceptions

CV is the same as human vision:
- Computers do not “see” like humans; they process pixels and patterns, not meaning.
CV is always accurate:
- Performance depends on data quality, context, and model limitations.
CV can work with any image:
- Poor lighting, occlusion, or low resolution can hinder performance.
CV systems are fully autonomous:
- Most require human oversight, especially in critical applications.
CV only works with photos:
- It also processes videos, 3D scans, and even satellite imagery.

Future Directions

Explainable CV: Making models transparent and understandable for critical applications (e.g., healthcare).
Edge Computing: Running CV algorithms directly on devices (phones, drones) for faster response and privacy.
Multimodal AI: Combining vision with language and audio for richer understanding (e.g., video captioning).
Self-supervised Learning: Reducing reliance on labeled data by learning from unlabeled images.
AI for Scientific Discovery: Accelerating drug and material discovery by analyzing molecular images and simulations.
- Example: In 2020, researchers used deep learning to discover Halicin, a novel antibiotic, by screening molecular structures (Stokes et al., Cell, 2020).
Ethical & Fair CV: Addressing bias, privacy, and responsible deployment in society.

Future Trends

Real-time CV in AR/VR: Enabling immersive and interactive experiences.
Federated Learning: Training models across decentralized devices without sharing raw data.
Synthetic Data Generation: Using GANs to create realistic training data for rare scenarios.
Integration with Robotics: Enhancing robot perception for complex tasks.
Sustainable AI: Reducing the environmental impact of large-scale CV models.

Quick Revision Points

CV enables machines to interpret images/videos using AI.
Key processes: acquisition, processing, feature extraction, recognition, segmentation, classification, tracking.
Real-world impact: healthcare, transportation, retail, science.
Misconceptions: not same as human vision, not always accurate, not fully autonomous.
Future: explainability, edge AI, multimodal systems, ethical deployment, scientific discovery.