Machine Learning: Study Notes
1. Introduction
Machine Learning (ML) is a subset of artificial intelligence (AI) that enables computers to learn from data and improve their performance over time without being explicitly programmed. ML algorithms identify patterns in data and use these patterns to make predictions or decisions.
2. Historical Context
- 1950s: Alan Turing introduces the concept of a “learning machine” in his paper “Computing Machinery and Intelligence.”
- 1957: Frank Rosenblatt develops the Perceptron, an early neural network.
- 1986: Backpropagation algorithm popularized, enabling multi-layer neural networks.
- 2012: Deep learning breakthrough with AlexNet winning the ImageNet competition.
- 2020s: ML is integral to fields like healthcare, finance, autonomous vehicles, and genomics.
3. How Machine Learning Works
Main Components
- Data: The raw information fed into algorithms.
- Features: Individual measurable properties or characteristics of data.
- Model: Mathematical representation that maps input data to output predictions.
- Training: The process of adjusting model parameters using data.
- Evaluation: Testing the model’s performance on unseen data.
Types of Machine Learning
Type | Description | Example |
---|---|---|
Supervised | Learns from labeled data | Email spam detection |
Unsupervised | Finds patterns in unlabeled data | Customer segmentation |
Semi-supervised | Mix of labeled and unlabeled data | Speech recognition |
Reinforcement | Learns via rewards and penalties | Game-playing AI (e.g., AlphaGo) |
4. Key Algorithms
- Linear Regression: Predicts continuous values.
- Logistic Regression: Classifies data into categories.
- Decision Trees: Splits data based on feature values.
- Random Forests: Ensemble of decision trees for better accuracy.
- Support Vector Machines (SVM): Finds the best boundary between classes.
- Neural Networks: Mimics the human brain for complex tasks.
- Clustering (K-Means): Groups similar data points.
5. Machine Learning Workflow
- Data Collection
- Data Preprocessing (cleaning, normalization)
- Feature Engineering
- Model Selection
- Training
- Evaluation
- Deployment
- Monitoring & Maintenance
6. Visual Representation
7. Surprising Facts
- ML Models Can Detect Diseases Before Symptoms Appear: Recent studies show ML can identify subtle patterns in medical images or genetic data, predicting diseases like cancer or Alzheimer’s before clinical symptoms manifest.
- Adversarial Examples: Slight, often imperceptible changes to input data can fool even the most advanced ML models, raising security concerns.
- Zero-Shot Learning: Some models can classify data into categories they were never explicitly trained on, by leveraging semantic relationships.
8. Common Misconceptions & Myth Debunked
Myth: “ML Models Understand Data Like Humans”
Reality: ML models do not “understand” data contextually. They identify statistical patterns, not meaning. For example, a model trained to recognize cats in images does not know what a cat is; it just learns pixel patterns associated with the label “cat.”
Other Misconceptions
- ML is Always Accurate: ML models can be biased or make errors, especially with poor-quality data.
- ML Replaces Humans: ML augments human decision-making but often requires human oversight for critical decisions.
- Bigger Models Are Always Better: Larger models can overfit or require impractical amounts of data and computation.
9. Applications
- Healthcare: Disease prediction, drug discovery, medical imaging.
- Finance: Fraud detection, algorithmic trading, credit scoring.
- Autonomous Vehicles: Object detection, path planning.
- Natural Language Processing: Translation, sentiment analysis, chatbots.
- Genomics: Pattern recognition in DNA sequences, CRISPR gene-editing guidance.
10. Recent Research Example
A 2022 study published in Nature demonstrated that ML models can predict the outcome of CRISPR gene-editing with high accuracy by analyzing DNA sequences and predicting off-target effects, improving the safety of gene-editing therapies (Nature, 2022).
11. Challenges and Limitations
- Bias and Fairness: Models can inherit biases present in training data.
- Explainability: Complex models (e.g., deep neural networks) are often “black boxes.”
- Data Requirements: Large, high-quality datasets are essential.
- Security: Vulnerable to adversarial attacks.
12. Future Trends
- Federated Learning: Training models across decentralized devices while preserving privacy.
- Explainable AI (XAI): Making decisions of ML models more interpretable.
- Integration with Biotechnology: ML is accelerating discoveries in genomics and gene editing.
13. Glossary
- Overfitting: Model fits training data too closely, performing poorly on new data.
- Underfitting: Model is too simple, missing important patterns.
- Feature Engineering: Creating new input features to improve model performance.
- Hyperparameter Tuning: Adjusting algorithm settings for optimal results.
14. References
- Nature. (2022). “Machine learning enables CRISPR–Cas9 off-target prediction at high accuracy.” Link
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
- Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach.
15. Diagram: Types of ML
End of Reference Handout