Machine Learning: Science Club Revision Sheet

General Science July 28, 2025 4 min read

Introduction

Machine Learning (ML) is a subfield of artificial intelligence (AI) that enables systems to learn from data, identify patterns, and make decisions with minimal human intervention. Unlike traditional programming, where explicit instructions are coded, ML algorithms improve their performance through experience, adapting to new data over time. ML is foundational to many modern technologies, including recommendation systems, autonomous vehicles, and medical diagnostics.

Main Concepts

1. Types of Machine Learning

Supervised Learning: Algorithms learn from labeled datasets, making predictions or classifications based on input-output pairs. Common algorithms include linear regression, support vector machines, and neural networks.
Unsupervised Learning: Algorithms analyze unlabeled data to find structure, such as clustering or association. Examples include k-means clustering and principal component analysis (PCA).
Semi-supervised Learning: Combines a small amount of labeled data with a large amount of unlabeled data during training.
Reinforcement Learning: Agents learn optimal actions through trial and error, receiving feedback via rewards or penalties. Used in robotics and game AI.

2. Key Algorithms

Decision Trees: Hierarchical models for classification/regression, splitting data based on feature values.
Random Forests: Ensembles of decision trees to improve accuracy and reduce overfitting.
Neural Networks: Models inspired by the human brain, consisting of interconnected nodes (neurons) organized in layers.
Support Vector Machines (SVM): Classifiers that find the optimal boundary between classes in high-dimensional space.
K-Nearest Neighbors (KNN): Classifies data points based on the majority class among their nearest neighbors.

3. Model Evaluation

Accuracy: Proportion of correct predictions.
Precision & Recall: Measures of relevance and completeness in classification.
F1 Score: Harmonic mean of precision and recall.
Confusion Matrix: Table showing true vs. predicted classifications.
Cross-Validation: Technique for assessing model performance by partitioning data into training and testing sets multiple times.

4. Feature Engineering

Feature Selection: Identifying the most relevant variables for model training.
Feature Extraction: Creating new features from raw data (e.g., using PCA).
Normalization & Scaling: Adjusting data ranges to improve algorithm performance.

5. Overfitting and Underfitting

Overfitting: Model learns noise and details from training data, performing poorly on new data.
Underfitting: Model is too simple, failing to capture underlying patterns.

Ethical Considerations

Bias and Fairness: ML models may perpetuate or amplify biases present in training data, leading to unfair outcomes (e.g., in hiring or lending).
Privacy: Use of personal data in ML raises concerns about consent and data protection.
Transparency: Many ML models, especially deep learning, are “black boxes,” making it difficult to explain decisions.
Accountability: Determining responsibility for decisions made by autonomous systems is an ongoing challenge.
Environmental Impact: Training large models requires significant computational resources, contributing to carbon emissions.

Comparison with Another Field: Statistical Analysis

Machine Learning focuses on prediction and pattern recognition, often using large datasets and complex algorithms.
Statistical Analysis primarily aims to infer relationships and test hypotheses, emphasizing interpretability and theoretical foundations.
Overlap: Both use data-driven approaches and share methods (e.g., regression).
Differences: ML prioritizes predictive accuracy, while statistics values explainability and causality.

Future Trends

Federated Learning: Training models across decentralized devices while preserving data privacy.
Explainable AI (XAI): Developing methods to interpret and understand ML decisions.
Edge Computing: Deploying ML models on local devices for real-time inference.
Automated Machine Learning (AutoML): Tools that automate model selection, feature engineering, and hyperparameter tuning.
Integration with Quantum Computing: Exploring quantum algorithms to accelerate ML tasks.
Sustainability: Research into energy-efficient ML architectures and training methods.

Recent Research

A 2022 study published in Nature Communications (“Machine learning enables detection of plastic pollution in the ocean from satellite imagery”) demonstrated the use of ML algorithms to analyze satellite data, identifying plastic debris in marine environments with high accuracy. This approach facilitates large-scale monitoring of ocean plastic pollution, supporting conservation efforts and policy decisions. (Source)

Conclusion

Machine Learning is transforming scientific research, industry, and daily life by enabling systems to learn from data and make autonomous decisions. Its applications are vast, from environmental monitoring to healthcare diagnostics. As ML continues to evolve, addressing ethical concerns and advancing interpretability will be essential. Future trends point toward greater privacy, transparency, and sustainability, ensuring ML remains a powerful tool for positive societal impact.

References

Nature Communications, 2022. “Machine learning enables detection of plastic pollution in the ocean from satellite imagery.”
IEEE Spectrum, 2021. “The Carbon Footprint of Machine Learning.”
MIT Technology Review, 2023. “Federated Learning: Privacy-Preserving AI at Scale.”