Computer Vision: Study Notes

General Science July 28, 2025 4 min read

Overview

Computer Vision (CV) is a multidisciplinary field that enables computers to interpret and understand visual information from the world, such as images and videos. It combines techniques from artificial intelligence, machine learning, image processing, and neuroscience to automate tasks that require visual cognition.

Key Concepts

1. Image Acquisition

Sensors: Cameras, LiDAR, infrared sensors.
Formats: JPEG, PNG, RAW, DICOM (medical).

2. Preprocessing

Noise Reduction: Gaussian blur, median filtering.
Normalization: Adjusting pixel values for consistency.
Segmentation: Dividing images into meaningful regions.

3. Feature Extraction

Edges: Sobel, Canny detectors.
Corners: Harris, FAST.
Descriptors: SIFT, SURF, ORB.

4. Object Detection & Recognition

Classification: Assigning labels to images (e.g., cat, dog).
Localization: Identifying object positions.
Detection: Drawing bounding boxes around objects.
Semantic Segmentation: Pixel-level classification.

5. Deep Learning in CV

Convolutional Neural Networks (CNNs): Automatically learn hierarchical features.
Transfer Learning: Using pre-trained models for new tasks.
Vision Transformers (ViTs): Use attention mechanisms for image understanding.

Diagram: Computer Vision Workflow

Computer Vision Workflow

Applications

Medical Imaging: Tumor detection, organ segmentation.
Autonomous Vehicles: Lane detection, pedestrian recognition.
Industrial Automation: Defect inspection, robotics.
Augmented Reality: Object tracking, scene understanding.
Security: Face recognition, surveillance.

Surprising Facts

Human-level Performance: In 2020, CV models surpassed human accuracy in certain medical imaging tasks, such as detecting diabetic retinopathy from retinal scans.
Zero-shot Learning: Modern CV systems can classify previously unseen objects by leveraging semantic relationships, without direct training data.
Non-visual Data: CV techniques are now applied to non-image data, such as interpreting protein structures or analyzing astronomical signals.

Algorithms and Techniques

1. Classical Methods

Template Matching: Comparing image patches.
Histogram of Oriented Gradients (HOG): Feature descriptor for shape detection.
K-means Clustering: Image segmentation.

2. Deep Learning Methods

YOLO (You Only Look Once): Real-time object detection.
Mask R-CNN: Instance segmentation.
U-Net: Biomedical image segmentation.

3. Emerging Techniques

Self-supervised Learning: Models learn from unlabeled data.
Generative Models: GANs for image synthesis and enhancement.

Case Studies

Case Study: Diabetic Retinopathy Detection

Problem

Early diagnosis of diabetic retinopathy (DR) is critical for preventing blindness, but manual screening is resource-intensive.

Solution

Researchers developed a deep learning model using CNNs trained on thousands of retinal images. The system automatically detects DR signs with high sensitivity and specificity.

Results

Achieved >94% accuracy, outperforming expert ophthalmologists in some benchmarks.
Deployed in clinics for rapid, scalable screening.

Reference

Gulshan, V. et al. (2020). “Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.” JAMA.

Recent Advances

Vision Transformers (ViTs): Introduced in 2020, ViTs use self-attention mechanisms for image classification, outperforming CNNs in several benchmarks (Dosovitskiy et al., 2021).
Multimodal Learning: Integrates visual data with text, audio, or sensor data for richer understanding (e.g., CLIP by OpenAI).
Federated Learning: Training CV models across decentralized data sources while preserving privacy.

Challenges

Data Bias: Models may inherit biases from training datasets.
Explainability: Difficulty in interpreting model decisions.
Real-time Performance: Balancing accuracy and computational efficiency.

Future Trends

Generalized CV Models: Unified models capable of handling multiple vision tasks with minimal retraining.
Edge Deployment: Efficient CV algorithms for mobile and IoT devices.
Synthetic Data Generation: Using GANs and simulation to create diverse training datasets.
Ethical AI: Addressing fairness, transparency, and privacy in CV applications.
Integration with Other Modalities: Combining CV with natural language processing (NLP) and robotics for holistic AI systems.

Reference

Dosovitskiy, A. et al. (2021). “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” International Conference on Learning Representations (ICLR). Link

Additional Resources

Related Technologies

CRISPR: While not a CV technology, CRISPR enables gene editing with high precision, which can be visualized and analyzed using CV techniques in biomedical research.

Summary Table

Technique	Application Area	Key Benefit
CNNs	Image classification	Automatic feature extraction
YOLO	Real-time detection	Fast, accurate object localization
Vision Transformers	General image tasks	Improved accuracy, scalability
GANs	Image synthesis	Data augmentation, enhancement

Conclusion

Computer Vision is rapidly evolving, driven by advances in deep learning, hardware, and interdisciplinary research. Its impact spans healthcare, industry, and daily life, with future trends pointing toward more generalized, ethical, and multimodal systems.