Computer Vision: Study Notes

General Science July 28, 2025 5 min read

Introduction

Computer Vision is a field of artificial intelligence (AI) that enables computers to interpret and understand visual information from the world, such as images and videos. It combines concepts from mathematics, physics, engineering, and computer science to analyze visual data and automate tasks that require human vision.

History of Computer Vision

1960s: Early research focused on simple image processing tasks, such as edge detection and pattern recognition. The first experiments involved digitizing images and analyzing their pixel values.
1970s: The development of algorithms for extracting features from images, such as lines and shapes. Researchers began exploring object recognition and scene understanding.
1980s: Introduction of neural networks for image classification. The concept of “visual perception” was formalized, and the first attempts at face recognition appeared.
1990s: Advances in machine learning led to improved image segmentation and object detection. The use of support vector machines (SVMs) and principal component analysis (PCA) became common.
2000s: The rise of digital cameras and large datasets enabled more complex experiments. Feature extraction methods like SIFT (Scale-Invariant Feature Transform) and SURF (Speeded Up Robust Features) were developed.
2010s: Deep learning revolutionized computer vision. Convolutional Neural Networks (CNNs) achieved state-of-the-art results in image classification and object detection.

Key Experiments

Year	Experiment/Method	Description	Impact
1966	“Summer Vision Project”	MIT project to solve object recognition in a summer	Highlighted complexity of vision tasks
1998	Viola-Jones Detector	Real-time face detection using Haar features	Enabled practical face detection
2006	SIFT Feature Extraction	Robust method for identifying keypoints in images	Improved image matching and retrieval
2012	AlexNet	CNN that won ImageNet competition	Sparked deep learning era in vision
2014	YOLO (You Only Look Once)	Real-time object detection system	Fast and accurate detection
2021	Vision Transformers	Applied transformer models to images	Enhanced accuracy and scalability

Modern Applications

Autonomous Vehicles: Computer vision enables self-driving cars to detect objects, read traffic signs, and navigate safely.
Medical Imaging: Automated analysis of X-rays, MRIs, and CT scans assists doctors in diagnosing diseases.
Facial Recognition: Used in security systems, smartphones, and social media platforms for identification and authentication.
Retail: Automated checkout systems and inventory management use vision to track products.
Agriculture: Drones and robots analyze crop health, detect pests, and optimize irrigation.
Environmental Monitoring: Satellite imagery and sensors detect deforestation, pollution, and wildlife populations.
Augmented Reality (AR): Vision algorithms overlay digital information onto real-world scenes.

Recent Breakthroughs

Vision Transformers (ViT): Since 2020, transformer-based models have outperformed traditional CNNs in image classification tasks, offering better scalability and accuracy (Dosovitskiy et al., 2021).
Self-supervised Learning: Models now learn from unlabeled data, reducing the need for large labeled datasets and enabling broader applications.
Zero-shot Learning: Systems can recognize objects they have never seen before by leveraging semantic relationships.
Real-time 3D Scene Understanding: Advances in depth sensing and point cloud analysis allow for detailed 3D reconstruction from 2D images.
AI for Climate Monitoring: Computer vision is used to analyze satellite images for tracking climate change indicators, such as melting glaciers and urban heat islands.

Table: Computer Vision Tasks and Performance

Task	Typical Dataset	Best Model (2023)	Accuracy (%)	Notes
Image Classification	ImageNet	Vision Transformer (ViT)	88.5	Large-scale dataset
Object Detection	COCO	YOLOv7	56.8 (mAP)	Real-time detection
Semantic Segmentation	Cityscapes	DeepLabV3+	82.1	Urban scene analysis
Face Recognition	LFW	ArcFace	99.83	High accuracy
Medical Diagnosis	CheXpert	DenseNet-121	76.2	Chest X-ray classification

Environmental Implications

Positive Impacts:
- Wildlife Conservation: Computer vision helps track endangered species and monitor habitats using drones and camera traps.
- Pollution Detection: Automated analysis of satellite images identifies oil spills, illegal dumping, and air pollution sources.
- Precision Agriculture: Reduces resource waste by optimizing water and fertilizer use.
- Climate Change Monitoring: Tracks deforestation, glacier retreat, and urbanization at a global scale.
Negative Impacts:
- Energy Consumption: Training large vision models requires significant computational power, contributing to carbon emissions.
- E-waste: Increased use of cameras and sensors can lead to electronic waste if not properly recycled.
- Privacy Concerns: Surveillance systems may impact personal privacy and civil liberties.

Cited Research

Dosovitskiy, A., et al. (2021). “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” arXiv preprint arXiv:2010.11929.

Summary

Computer Vision has evolved from simple image processing experiments in the 1960s to sophisticated deep learning models capable of real-time scene understanding and medical diagnosis. Key experiments, such as the development of SIFT, AlexNet, and Vision Transformers, have driven progress. Modern applications span autonomous vehicles, healthcare, agriculture, and environmental monitoring. Recent breakthroughs include transformer-based models and self-supervised learning, which have expanded the scope and accuracy of vision systems. While computer vision offers significant environmental benefits, such as improved conservation and climate monitoring, it also poses challenges related to energy use and privacy. Continued research and responsible deployment are essential for maximizing its positive impact.