MCA · TMC-211 / TMC-311 · Machine Learning

Jatin's Notes

Premium Exam Preparation · PYQ Analysis 2025–2026

KNN Linear Regression Confusion Matrix SVM & Kernels Neural Networks Decision Trees Backpropagation CNN · RNN · LSTM K-Means · DBSCAN PCA Gradient Descent Apriori Algorithm
scroll to study
40+
Expected Questions
5
PYQ Papers Analysed
15+
Solved Numericals
100%
Exam Ready
Unit 1
Foundations of Machine Learning
★★★
Define Machine Learning. How is it trained and tested?
10 Marks PYQ 2025 Mid-Sem Most Repeated
+

Definition

Standard Exam Definition:
"Machine Learning is the study of algorithms and statistical models that computer systems use to perform tasks by learning patterns from data, without using rule-based programming."

Tom Mitchell: "A computer program is said to learn from experience E with respect to task T and performance P, if its performance on T, as measured by P, improves with experience E."

How ML Model is Trained and Tested

  1. Data Collection: Gather relevant data (structured/unstructured)
  2. Data Preprocessing: Clean data, handle missing values, normalize
  3. Feature Selection: Choose important input variables
  4. Model Selection: Choose algorithm (Linear Regression, KNN, SVM, etc.)
  5. Training: Feed training data → model learns patterns by adjusting parameters
  6. Validation: Evaluate on validation set, tune hyperparameters
  7. Testing: Test on unseen test data to evaluate final performance
  8. Deployment: Use model for real-world predictions
Data → Split → Training Set (70%) + Testing Set (30%) ↓ Train the Model ↓ Evaluate on Test Data ↓ Calculate Accuracy/Metrics
Key Points to Remember
  • Training: Model learns weights/parameters
  • Testing: Model predicts on new, unseen data
  • Overfitting: Too well on training, poor on test
  • Underfitting: Poor on both training and test
★★★
Define & Compare: Supervised, Unsupervised, Semi-Supervised, Reinforcement Learning
10 Marks PYQ 2026 Mid-Sem & 2025 End-Term Most Repeated
+

1. Supervised Learning

Definition: Model is trained on labeled data (input + correct output given).
Goal: Learn a mapping function f(X) → Y
Examples: Email spam detection, house price prediction, medical diagnosis
Algorithms: Linear Regression, Logistic Regression, SVM, KNN, Decision Trees

💡
Memory Trick: "Teacher is present" → labels = teacher

2. Unsupervised Learning

Definition: Model is trained on unlabeled data (only input, no output labels).
Goal: Find hidden patterns or groupings in data
Examples: Customer segmentation, anomaly detection, topic modeling
Algorithms: K-Means, DBSCAN, PCA, Apriori

💡
Memory Trick: "No teacher" → model discovers structure itself

3. Reinforcement Learning

Definition: An agent learns by interacting with environment, receiving rewards or penalties.

  • Agent — learner/decision maker
  • Environment — what agent interacts with
  • State (S) — current situation
  • Action (A) — what agent does
  • Reward (R) — feedback signal
  • Policy (π) — strategy of agent
Q(s,a) ← Q(s,a) + α[R + γ·max Q(s',a') − Q(s,a)]
💡
Memory Trick: "Carrot and stick" → reward/penalty drives learning

Comparison Table [MUST DRAW IN EXAM]

FeatureSupervisedUnsupervisedSemi-SupervisedReinforcement
LabelsRequired (all)Not requiredPartialReward signal
GoalPredict outputFind patternsImprove with few labelsMaximize reward
OutputClass/ValueClusters/PatternsClass/ValuePolicy
ExampleSpam detectionClusteringImage taggingGame AI
AlgorithmsSVM, KNN, LRK-Means, PCASelf-trainingQ-Learning
★★★
Write a Short Note on NumPy and TensorFlow
10 Marks PYQ 2026 Mid-Sem Q2a
+

NumPy — Numerical Python

Purpose: Provides support for large multi-dimensional arrays and matrices, along with mathematical functions. Faster than Python lists (implemented in C).

Key Features: N-dimensional array object (ndarray), Broadcasting, Linear algebra, Fourier transform, Random number generation

import numpy as np a = np.array([1, 2, 3]) # Create array b = np.zeros((3,3)) # Zero matrix c = np.dot(a, a) # Dot product d = np.mean(a), np.std(a) # Statistics e = a.reshape(1,3) # Reshape

TensorFlow

Developer: Google Brain Team (2015) | Purpose: Open-source library for numerical computation and large-scale Machine Learning using data flow graphs.

Key Features: Tensors = multi-dimensional arrays, Automatic differentiation (Autograd) for backpropagation, GPU/TPU acceleration, Keras API (high-level), Eager execution mode

import tensorflow as tf model = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=10)

NumPy vs TensorFlow

FeatureNumPyTensorFlow
PurposeArray computationDeep Learning
GPU SupportNoYes
Auto-differentiationNoYes
LevelLow-levelHigh-level (with Keras)
Unit 2
Supervised Learning
★★★
Linear Regression — Theory + Solved Numerical [EXAM FAVOURITE]
10 Marks PYQ 2025 Mid-Sem Q5a & End-Term Q3b Numerical
+

Definition

Linear Regression is a supervised learning algorithm used to predict a continuous output variable (Y) based on one or more input features (X) by fitting a straight line to the data.

Formula

ŷ = a₀ + a₁x (Simple Linear Regression) Where: a₁ = [n·Σxy − Σx·Σy] / [n·Σx² − (Σx)²] ← slope a₀ = ȳ − a₁·x̄ ← intercept

Solved Numerical — PYQ 2025

Problem: Find the Linear Regression equation for the following data and estimate Y when X = 7.0
X: 2.0 | 3.0 | 4.0 | 5.0 | 6.0
Y: 3.00 | 4.00 | 3.40 | 6.00 | 5.00

Step 1: Calculation Table

XYXY
23.006.004
34.0012.009
43.4013.6016
56.0030.0025
65.0030.0036
ΣX=20ΣY=21.40ΣXY=91.60ΣX²=90

Step 2: Calculate a₁ (slope) — n = 5

a₁ = [n·ΣXY − ΣX·ΣY] / [n·ΣX² − (ΣX)²] = [5×91.60 − 20×21.40] / [5×90 − (20)²] = [458 − 428] / [450 − 400] = 30 / 50 = 0.6

Step 3: Calculate a₀ (intercept)

x̄ = 20/5 = 4, ȳ = 21.40/5 = 4.28 a₀ = ȳ − a₁·x̄ = 4.28 − 0.6×4 = 4.28 − 2.4 = 1.88

Step 4: Regression Equation & Prediction

ŷ = 1.88 + 0.6x When X = 7: ŷ = 1.88 + 0.6×7 = 1.88 + 4.2 = 6.08
Answer: Regression equation is ŷ = 1.88 + 0.6x
When X = 7.0, predicted Y = 6.08
★★★
Confusion Matrix — Diabetes Problem (1000 Patients) [PYQ EXACT]
10 Marks PYQ 2026 Mid-Sem Q3a Must Practice
+
Problem: 1000 patients tested. 900 healthy, 100 sick.
Sick: 72 test +ve, 28 test –ve. Healthy: 28 test +ve, 872 test –ve.
Construct confusion matrix. Calculate Accuracy, Precision, Recall, F1-Score.

Step 1: Identify Values

  • TP (Sick predicted Sick) = 72
  • FN (Sick predicted Healthy) = 28
  • FP (Healthy predicted Sick) = 28
  • TN (Healthy predicted Healthy) = 872

Step 2: Confusion Matrix

Pred: Sick (+)
Pred: Healthy (–)
Actual: Sick
TP = 72
FN = 28
Actual: Healthy
FP = 28
TN = 872

Step 3: Calculate Metrics

Accuracy
94.4%
(TP+TN)/(TP+TN+FP+FN)
= (72+872)/1000
Precision
72%
TP/(TP+FP)
= 72/(72+28)
Recall
72%
TP/(TP+FN)
= 72/(72+28)
F1-Score
72%
2×P×R/(P+R)
= 2×0.72×0.72/1.44
Memory Trick
  • TP & TN = friends (both correct)
  • FP = False Alarm (predicted sick but healthy)
  • FN = Missed case (predicted healthy but sick)
  • Precision = "When I say sick, how often right?"
  • Recall = "Of all sick people, how many did I catch?"
★★★
K-NN Classifier — Theory + Numerical (Euclidean Distance) [PYQ EXACT]
10 Marks PYQ 2026 Mid-Sem Q3b, 2025 End-Term Q1a, 2025 Mid Q4b Must Practice
+

Definition

K-Nearest Neighbour (K-NN) is a non-parametric, instance-based supervised learning algorithm used for classification and regression. It classifies a new point based on the majority class of its K nearest neighbors.

Euclidean Distance Formula

d(p,q) = √[(x₁-x₂)² + (y₁-y₂)²] (2D) General: d = √[Σ(pᵢ - qᵢ)²]

Algorithm Steps

  1. Choose value of K
  2. Calculate Euclidean distance from test point to all training points
  3. Sort distances in ascending order
  4. Select K nearest neighbors
  5. For classification: take majority vote of K neighbors
  6. Assign that class to test point

Solved Numerical [PYQ 2025 Mid-Sem Q4b]

Problem: Classify point (4,6) using K=3.
D1:(2,1)=Y, D2:(4,2)=N, D3:(3,3)=Y, D4:(3,5)=N, D5:(4,3)=N, D6:(5,4)=Y

Euclidean Distance Calculations

PointCoordClassDistance from (4,6)
D1(2,1)Y√[(4-2)²+(6-1)²] = √[4+25] = √29 ≈ 5.39
D2(4,2)N√[(4-4)²+(6-2)²] = √[0+16] = 4.00
D3(3,3)Y√[(4-3)²+(6-3)²] = √[1+9] = √10 ≈ 3.16
D4(3,5)N√[(4-3)²+(6-5)²] = √[1+1] = √2 ≈ 1.41
D5(4,3)N√[(4-4)²+(6-3)²] = √[0+9] = 3.00
D6(5,4)Y√[(4-5)²+(6-4)²] = √[1+4] = √5 ≈ 2.24

K=3 Nearest Neighbors (sorted)

RankPointDistanceClass
1D41.41N
2D62.24Y
3D53.00N
K=3 Neighbors: N, Y, N → Majority = N (2 votes)
∴ Class of (4,6) = N
💡
Memory: "Find K Friends and vote!" → nearest K decide the class by majority
★★★
Sigmoid Function + Logistic Regression Numerical [PYQ EXACT]
10 Marks PYQ 2026 Mid-Sem Q4a Numerical
+

Sigmoid Function

σ(z) = 1 / (1 + e^(-z)) Where z = a₀ + a₁x (linear combination) Output range: (0, 1) — used as probability Decision boundary: σ(z) ≥ 0.5 → class 1, else class 0 Equivalent to: z ≥ 0 → class 1, else class 0

Solved Numerical [PYQ 2026 Mid-Sem Q4a]

Problem: With a₀ = -64, a₁ = 2, find pass % for student who studies 33 hours.

Step 1: Calculate z

z = a₀ + a₁·x = -64 + 2×33 = -64 + 66 = 2

Step 2: Apply Sigmoid

σ(z) = 1/(1 + e^(-2)) = 1/(1 + 0.1353) = 1/1.1353 = 0.8808
Pass Probability = 88.08%
Since σ(2) ≈ 0.88 > 0.5 → Student PASSES

Verify with Data Table

Hours (x)Pass/Failz = -64+2xσ(z)Predicted
240 (Fail)-16≈0.0000Fail ✓
150 (Fail)-34≈0.0000Fail ✓
281 (Pass)-8≈0.0003Fail ✗
331 (Pass)2≈0.8808Pass ✓
391 (Pass)14≈0.9999Pass ✓
★★★
SVM — Support Vector Machine + Kernel Functions [PYQ EXACT]
10 Marks PYQ 2026 Mid-Sem Q2b, 2025 End-Term Q3c Theory + Kernel
+

Definition

SVM is a supervised learning algorithm that finds the optimal hyperplane which maximizes the margin between two classes.

Decision boundary: w·x + b = 0 Margin = 2 / ||w|| Maximize margin = Minimize ||w||²/2 Support Vectors = data points closest to hyperplane
💡
"Draw the FATTEST possible line between two classes" — the support vectors are the data points that sit right on the edge of that fat line.

Kernel Functions [List any 4]

KernelFormulaUse Case
LinearK(x,y) = xᵀyLinearly separable data
PolynomialK(x,y) = (xᵀy + c)^dNon-linear boundaries
RBF/GaussianK(x,y) = exp(-γ||x-y||²)Most common, non-linear
SigmoidK(x,y) = tanh(αxᵀy + c)Neural network-like

Kernel Trick

The Kernel Trick computes the dot product in high-dimensional space without explicitly transforming data, making SVM computationally efficient for non-linearly separable data.

Example from PYQ 2026: Points like (0,2), (0,-2) belong to class O and points like (1,1), (-1,-1), (2,0) belong to class X. These are not linearly separable in 2D. Using RBF kernel: φ(x) = x₁² + x₂² transforms them into a linearly separable 1D problem.
★★
Cross Validation + Bias-Variance Tradeoff
10 Marks PYQ 2025 Mid-Sem Q5b, 2025 End-Term
+

Cross Validation

Definition: Cross Validation is a technique to evaluate ML models by training and testing on different subsets of data to avoid overfitting and get a reliable performance estimate.

Methods

  • K-Fold CV: Divide data into K equal folds. Train on K-1 folds, test on 1 fold. Repeat K times, average results.
  • Leave-One-Out (LOOCV): Each data point is test set once. Very accurate but slow.
  • Stratified K-Fold: Like K-fold but preserves class proportions in each fold.
K-Fold: Final Score = (Score₁ + Score₂ + ... + ScoreK) / K

Bias-Variance Tradeoff

Total Error = Bias² + Variance + Irreducible Noise Bias: Error from wrong assumptions (underfitting) Variance: Error from sensitivity to training data (overfitting)
ModelBiasVarianceProblem
Simple (Linear)HighLowUnderfitting
Complex (Deep Tree)LowHighOverfitting
Optimal ModelLowLowBest!
💡
Bias = "wrong assumption" | Variance = "too sensitive to training data"
Goal: Balance both! Not too simple, not too complex.
★★★
Decision Tree — Information Gain, Entropy, Gini Index
10 Marks PYQ 2025 End-Term Theory + Numerical
+

Entropy Formula

Entropy(S) = -Σ pᵢ · log₂(pᵢ) Where pᵢ = proportion of class i in set S Pure node (all same class) → Entropy = 0 Equally mixed → Entropy = 1 (maximum disorder)

Information Gain

IG(S, A) = Entropy(S) − Σ [|Sᵥ|/|S| × Entropy(Sᵥ)] Where Sᵥ = subset of S where attribute A = v Choose attribute with HIGHEST Information Gain as root

Gini Index

Gini(S) = 1 − Σ pᵢ² Pure node → Gini = 0 Equally mixed (2 classes) → Gini = 0.5 (maximum)
MeasureFormulaRangeBest Split
Entropy−Σ pᵢ log₂pᵢ0 to 1Highest IG
Gini Index1 − Σ pᵢ²0 to 0.5Lowest Gini
ID3 Algorithm Steps
  • Calculate Entropy of whole dataset
  • For each attribute, calculate Information Gain
  • Select attribute with highest IG as root node
  • Split dataset, repeat recursively for each branch
  • Stop when: all data in leaf is same class OR no attributes left
Unit 3
Unsupervised Learning
★★★
K-Means Clustering Algorithm
10 Marks PYQ 2025 End-Term Q1c
+

Definition

K-Means is an unsupervised learning algorithm that partitions n data points into K clusters by minimizing the sum of squared distances from each point to its cluster centroid.

Algorithm Steps

  1. Initialize: Randomly choose K centroids from data
  2. Assignment: Assign each data point to nearest centroid (using Euclidean distance)
  3. Update: Recalculate centroids as mean of all points in each cluster
  4. Repeat: Go to step 2 until centroids do not change (convergence)
Centroid update: μₖ = (1/|Cₖ|) × Σ xᵢ for xᵢ ∈ Cₖ Objective: Minimize J = Σₖ Σ_{xᵢ∈Cₖ} ||xᵢ − μₖ||²

Advantages

  • Simple and fast
  • Scales to large data
  • Easy to implement

Disadvantages

  • Need to specify K
  • Sensitive to outliers
  • Assumes spherical clusters
💡
K-Means = "Pick K centers, assign, move centers, repeat until stable"
★★★
DBSCAN Algorithm — Density Based Clustering [PYQ REPEATED]
10 Marks PYQ 2025 End-Term Q1c, 2025 End-Term Q2a
+

Definition

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups data points based on density. It can find clusters of arbitrary shapes and identifies outliers as noise.

Key Parameters

  • ε (epsilon): Radius of neighborhood around a point
  • MinPts: Minimum number of points required to form a dense region

Point Types

Core Point: Has ≥ MinPts within ε radius Border Point: Has < MinPts within ε, but within ε of a Core point Noise Point: Neither Core nor Border — treated as outlier

Algorithm Steps

  1. Pick an unvisited point
  2. Find all points within ε radius (neighbors)
  3. If neighbors ≥ MinPts → Core point → start new cluster
  4. Expand cluster by recursively adding density-connected points
  5. If neighbors < MinPts → mark as noise (may change to border later)
  6. Repeat until all points visited

DBSCAN vs K-Means

FeatureK-MeansDBSCAN
Need to specify K?YesNo
Cluster ShapeSpherical onlyAny shape
Outlier handlingAssigns to clusterLabels as noise
Sensitive to outliersYesNo
★★
PCA — Principal Component Analysis
10 Marks PYQ 2025 End-Term
+

Definition

PCA is an unsupervised dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving maximum variance.

Steps of PCA

  1. Standardize: Normalize data (mean=0, variance=1)
  2. Covariance Matrix: Compute covariance matrix of features
  3. Eigenvalues & Eigenvectors: Compute from covariance matrix
  4. Sort: Sort eigenvalues in descending order
  5. Select: Choose top K eigenvectors (principal components)
  6. Project: Transform data onto new K-dimensional space
Covariance: Cov(X,Y) = Σ(xᵢ-x̄)(yᵢ-ȳ) / (n-1) Variance explained = λᵢ / Σλ (where λ = eigenvalue)
💡
"Compress by keeping IMPORTANT directions" — PCA finds the directions of maximum variance (spread) in data.
Applications
  • Face recognition, Image compression
  • Remove noise from data
  • Visualization of high-dimensional data
  • Speed up machine learning algorithms
★★
Apriori Algorithm — Association Rule Mining
10 Marks PYQ 2025 End-Term Q3
+

Definition

Apriori is an algorithm for frequent itemset mining and association rule learning. Used to discover interesting relationships (rules) between variables in large databases.

Key Measures

Support(A) = Transactions containing A / Total transactions Confidence(A→B) = Support(A∪B) / Support(A) Lift(A→B) = Confidence(A→B) / Support(B) Lift > 1 = Positive association (useful rule)

Apriori Principle

Apriori Property: If an itemset is frequent, ALL its subsets must also be frequent.
Contrapositive: If any subset is infrequent → the superset is also infrequent (prune it!).

Algorithm Steps

  1. Find all frequent 1-itemsets (≥ min_support)
  2. Generate candidate 2-itemsets from frequent 1-itemsets
  3. Prune candidates with infrequent subsets
  4. Find frequent 2-itemsets
  5. Repeat until no more frequent itemsets found
  6. Generate association rules from frequent itemsets
  7. Keep rules with Confidence ≥ min_confidence

Application: Market Basket Analysis — "Customers who buy bread also buy butter"

Unit 4
Neural Networks & Deep Learning
★★★
Artificial Neuron + Activation Functions [PYQ REPEATED]
10 Marks PYQ 2025 ML-2 Mid-Sem Q3a, 2025 End-Term
+

Artificial Neuron Model

z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b = Σwᵢxᵢ + b Output: y = f(z) where f is the activation function Inputs (x) → Weights (w) → Sum (z) → Activation f(z) → Output (y)
x₁ ─── w₁ ──┐ x₂ ─── w₂ ──┤ x₃ ─── w₃ ──┼──→ [ Σwᵢxᵢ + b ] ──→ f(z) ──→ Output ... │ xₙ ─── wₙ ──┘

Activation Functions

FunctionFormulaRangeUse Case
Sigmoid1/(1+e^(-z))(0,1)Binary classification output
Tanh(eᶻ-e^(-z))/(eᶻ+e^(-z))(-1,1)Hidden layers (zero-centered)
ReLUmax(0,z)[0,∞)Most common in deep networks
Leaky ReLUmax(0.01z, z)(-∞,∞)Fix dying ReLU problem
Softmaxe^zᵢ / Σe^zⱼ(0,1), sums to 1Multi-class classification output
Linearz(-∞,∞)Regression output

Solved Numerical [PYQ 2025 End-Term Q5a]

Problem: 3-input neuron, inputs x=(0.8, 0.6, 0.4), weights w=[0.2, 0.1, -0.3, 0.35], b=0.35. Use sigmoid. Find output y.
z = w₁x₁ + w₂x₂ + w₃x₃ + b = 0.2×0.8 + 0.1×0.6 + (-0.3)×0.4 + 0.35 = 0.16 + 0.06 − 0.12 + 0.35 = 0.45 y = sigmoid(0.45) = 1/(1+e^(-0.45)) = 1/(1+0.6376) = 1/1.6376 ≈ 0.611
Output y ≈ 0.611
★★★
Backpropagation + Gradient Descent [PYQ EXACT]
10 Marks PYQ 2025 ML-2 Mid-Sem Q2a, 2025 End-Term Numerical
+

Gradient Descent

Gradient Descent is an optimization algorithm used to minimize the loss function by iteratively updating parameters in the direction of steepest descent.

Weight update rule: w = w − α × ∂L/∂w where: α = learning rate ∂L/∂w = gradient of loss with respect to weight

Solved Numerical [PYQ ML-2 Q2a]

Problem: Use Gradient Descent to minimize f(x) = x² − 2x, learning rate η = 0.1, start x = 0, 3 iterations.
f(x) = x² − 2x f'(x) = 2x − 2 ← derivative (gradient) Update rule: xₙₑₓₜ = x − η × f'(x)

Iteration 1 (x = 0)

f'(0) = 2(0) − 2 = −2 x₁ = 0 − 0.1×(−2) = 0 + 0.2 = 0.2

Iteration 2 (x = 0.2)

f'(0.2) = 2(0.2) − 2 = 0.4 − 2 = −1.6 x₂ = 0.2 − 0.1×(−1.6) = 0.2 + 0.16 = 0.36

Iteration 3 (x = 0.36)

f'(0.36) = 2(0.36) − 2 = 0.72 − 2 = −1.28 x₃ = 0.36 − 0.1×(−1.28) = 0.36 + 0.128 = 0.488
After 3 iterations: x ≈ 0.488
True minimum: x = 1 (where f'(x) = 0 → 2x−2=0 → x=1)

Backpropagation Steps

  1. Forward Pass: Compute output layer by layer
  2. Compute Loss: L = actual − predicted (using loss function)
  3. Backward Pass: Use chain rule to compute gradients layer by layer
  4. Update Weights: w = w − α × ∂L/∂w
  5. Repeat until convergence
★★★
CNN — Convolutional Neural Network [PYQ ML-2]
10 Marks PYQ 2025 ML-2 Mid-Sem Q3b
+

CNN Architecture Layers

Input → Conv Layer → ReLU → Pooling → Conv Layer → ReLU → Pooling → Flatten → FC Layer → Output (Softmax)

Layers Explained

  1. Conv Layer: Applies filters/kernels to extract features (edges, textures, patterns). Output = Feature Map. Formula: Output = (Input − Kernel + 2×Padding) / Stride + 1
  2. ReLU Activation: Applies max(0,z) — removes negative values, adds non-linearity
  3. Pooling Layer: Reduces spatial dimensions. Max Pooling = take maximum in each region. Reduces computation, provides translation invariance
  4. Flatten: Converts 2D feature maps to 1D vector
  5. Fully Connected (FC) Layer: Regular neural network layers for classification
  6. Softmax Output: Converts to class probabilities
Applications: Image Recognition, Object Detection, Medical Image Analysis, Self-driving cars, Face Recognition
💡
CNN = "Look → Shrink → Repeat → Guess!" (Conv→Pool→Conv→Pool→FC)
★★★
RNN + LSTM — Architecture and Gates [PYQ ML-2 REPEATED]
10 Marks PYQ 2025 ML-2 Mid-Sem Q3b, Q4a
+

RNN — Recurrent Neural Network

RNN is a neural network designed for sequential/time-series data. It has a feedback loop — the output of previous step is used as input to current step.

hₜ = tanh(Wₕ·hₜ₋₁ + Wₓ·xₜ + b) yₜ = Wᵧ·hₜ + bᵧ Problem: Vanishing Gradient — can't remember long sequences!

LSTM — Long Short-Term Memory

LSTM is an improved RNN with 3 gates that control information flow, solving the vanishing gradient problem.

Cell State (Cₜ): Long-term memory highway Hidden State (hₜ): Short-term/working memory

Three Gates of LSTM

  1. Forget Gate (fₜ): Decides what information to THROW AWAY from cell state.
    fₜ = σ(Wf·[hₜ₋₁, xₜ] + bf) → Output: 0 = forget all, 1 = keep all
  2. Input Gate (iₜ): Decides what NEW information to ADD to cell state.
    iₜ = σ(Wᵢ·[hₜ₋₁, xₜ] + bᵢ), C̃ₜ = tanh(Wc·[hₜ₋₁, xₜ] + bc)
  3. Output Gate (oₜ): Decides what to OUTPUT as hidden state.
    oₜ = σ(Wo·[hₜ₋₁, xₜ] + bo), hₜ = oₜ × tanh(Cₜ)
💡
"3-gated memory controller" — LSTM: Forget useless, Input new, Output relevant!
FeatureRNNLSTM
MemoryShort-term onlyLong + Short term
Vanishing GradientYes (big problem)Solved by gates
Long dependenciesFailsHandles well
ComplexitySimpleMore complex
★★
Ensembling — Bagging, Boosting, Random Forest [PYQ ML-2]
10 Marks PYQ 2025 ML-2 Mid-Sem Q1a,Q1b
+

Ensembling

Ensembling combines multiple models to produce a better prediction than any single model. Two main methods: Bagging and Boosting.

Bagging (Bootstrap Aggregating)

  • Train models in parallel on different random subsets
  • Combine by majority vote (classification) or average (regression)
  • Reduces Variance
  • Example: Random Forest

Boosting

  • Train models sequentially — each fixes errors of previous
  • Combine by weighted sum
  • Reduces Bias
  • Examples: AdaBoost, XGBoost, Gradient Boosting

Random Forest

Random Forest = Bagging + Decision Trees. It builds multiple decision trees on random subsets of data and features, then combines their predictions.

Random Forest = Many Decision Trees + Voting = Better accuracy, less overfitting
Unit 5
Implementation, Preprocessing & Ethics
★★
Data Preprocessing — Missing Values, Feature Scaling, Encoding
10 Marks PYQ 2025 End-Term Q2a, 2025 End-Term Q5b
+

Missing Value Handling

  • Mean Imputation: Replace with column mean (for normal distribution)
  • Median Imputation: Replace with median (for skewed data)
  • Mode Imputation: Replace with most frequent value (for categorical)
  • KNN Imputation: Replace using K nearest neighbors' values
  • Deletion: Remove rows/columns with too many missing values

Feature Scaling [PYQ: "What is feature scaling? Why required?"]

Feature scaling normalizes the range of features so that no feature dominates due to its scale. Required for algorithms that use distance (KNN, SVM, K-Means) or gradient descent (Neural Networks).

Min-Max Normalization: x' = (x − min) / (max − min) → Range: [0, 1] Z-Score Standardization: x' = (x − μ) / σ → Mean=0, Std=1

Categorical Encoding

  • Label Encoding: Assign integer to each category (Red=0, Blue=1, Green=2) — use for ordinal data
  • One-Hot Encoding: Create binary column for each category — use for nominal data
💡
Min-Max → squishes into [0,1] | Z-Score → centers at 0 with unit spread
Ethical Issues in Machine Learning
5–10 Marks Short Note
+
  • Bias and Fairness: ML models can inherit biases from training data, leading to unfair decisions (e.g., gender bias in hiring algorithms)
  • Privacy: Training on personal data without consent violates privacy (facial recognition, health data)
  • Transparency (Explainability): "Black box" models are hard to interpret — doctors/judges need explanations for AI decisions
  • Accountability: Who is responsible when an AI system causes harm? (Self-driving car accident)
  • Job Displacement: Automation through ML may cause unemployment
  • Deepfakes & Misinformation: ML can generate realistic fake content, spreading false information
  • Security: Adversarial attacks can fool ML models with small, imperceptible changes
Quick Reference
Important Formulas Sheet
📐 All Formulas at a Glance
Linear Regression — Slope
a₁ = (nΣxy − ΣxΣy) / (nΣx² − (Σx)²)
Linear Regression — Intercept
a₀ = ȳ − a₁·x̄
Sigmoid Function
σ(z) = 1 / (1 + e^(-z))
Euclidean Distance
d = √[Σ(xᵢ − yᵢ)²]
Accuracy
(TP + TN) / (TP + TN + FP + FN)
Precision
TP / (TP + FP)
Recall (Sensitivity)
TP / (TP + FN)
F1-Score
2 × Precision × Recall / (Precision + Recall)
Entropy
−Σ pᵢ × log₂(pᵢ)
Gini Index
1 − Σ pᵢ²
Information Gain
IG = Entropy(S) − Σ(|Sv|/|S|) × Entropy(Sv)
Gradient Descent Update
w = w − α × ∂L/∂w
Min-Max Scaling
x' = (x − min) / (max − min)
Z-Score Standardization
x' = (x − μ) / σ
Apriori: Support
Support(A) = freq(A) / N
Apriori: Confidence
Conf(A→B) = Support(A∪B) / Support(A)
Neuron Output
y = f(Σwᵢxᵢ + b)
ReLU
f(z) = max(0, z)
Softmax
f(zᵢ) = e^zᵢ / Σe^zⱼ
Q-Learning (RL)
Q(s,a) ← Q(s,a) + α[R + γ·maxQ(s',a') − Q(s,a)]
Viva Prep
Viva-Style Short Questions
Q: What is a tensor?
A generalization of scalars (0D), vectors (1D), matrices (2D) to N-dimensions. TensorFlow's basic data structure.
Q: What is the curse of dimensionality?
As the number of features increases, data becomes sparse and algorithms perform poorly.
Q: What is regularization?
Technique to prevent overfitting by adding a penalty term to the loss function (L1=Lasso, L2=Ridge).
Q: What is the vanishing gradient problem?
Gradients become extremely small during backpropagation in deep networks, making training very slow or stuck. LSTM solves this.
Q: Difference between Precision and Recall?
Precision = how many predicted positives are actually positive. Recall = how many actual positives are correctly predicted.
Q: Why is K-fold better than simple train-test split?
K-fold uses all data for both training and testing across K experiments, giving a more reliable performance estimate.
Q: What does 'kernel trick' mean in SVM?
Computing dot product in high-dimensional space without explicitly transforming data, making it computationally efficient.
Q: What is Gini Impurity?
Measure of how often a randomly chosen element would be incorrectly labeled. Gini=0 means pure node.
Q: What is one-hot encoding?
Converting categorical variables into binary columns (each category = one column with 0 or 1).
Q: What is dropout in neural networks?
Regularization technique that randomly sets neurons to zero during training to prevent overfitting.
Q: What is epoch in deep learning?
One complete pass through the entire training dataset.
Q: What is transfer learning?
Using a pre-trained model as starting point for a new task. Reduces training time and data requirements.
Q: What is similarity score?
A measure of how similar two data points are. Cosine similarity = cos(θ) = A·B / (|A|×|B|). Range: [-1, 1].
Q: What is hyperparameter tuning?
Process of finding the best values for hyperparameters (like K in KNN, learning rate). Methods: Grid Search, Random Search, Bayesian Optimization.
Emergency Prep
Last Night Revision Notes 🔥
🔥 Must Remember for Tomorrow

Unit 1 — Foundations

ML Definition = "Learning from data without explicit programming" (Tom Mitchell)
4 Types: Supervised (labeled), Unsupervised (unlabeled), Semi-supervised (partial), Reinforcement (reward)
NumPy = array computation | TensorFlow = deep learning framework by Google

Unit 2 — Supervised Learning

Linear Reg slope: a₁ = (nΣxy−ΣxΣy)/(nΣx²−(Σx)²)
Sigmoid: σ(z) = 1/(1+e^(-z)), z = a₀ + a₁x, decision: z≥0 → class 1
KNN: Euclidean distance → sort → K nearest → majority vote
Confusion Matrix: TP,TN,FP,FN → Accuracy=(TP+TN)/N, Precision=TP/(TP+FP), Recall=TP/(TP+FN)
Cross Validation: K-Fold, LOOCV, Stratified K-Fold
Bias-Variance: Simple=High Bias(underfitting), Complex=High Variance(overfitting)
SVM: Maximize margin, Kernel trick for non-linear data

Unit 3 — Unsupervised Learning

K-Means: Random centroids → Assign → Update → Repeat until convergence
DBSCAN: ε and MinPts → Core/Border/Noise points, no K needed, handles outliers
PCA: Standardize→Covariance→Eigenvalues→Project, keeps max variance directions
Apriori: Support, Confidence, Lift — "Customers who buy X also buy Y"

Unit 4 — Neural Networks & Deep Learning

Neuron: z = Σwᵢxᵢ + b, output = f(z)
Activations: Sigmoid(0,1), Tanh(-1,1), ReLU(max(0,z)), Softmax(multiclass)
Backprop: Forward pass → Loss → Backward (chain rule) → Update weights
GD: w = w − α×∂L/∂w (move against gradient)
CNN: Conv→ReLU→Pool→Flatten→FC→Output (for images)
LSTM: 3 gates (Forget, Input, Output) — solves vanishing gradient problem in RNN
Random Forest = Bagging + Decision Trees

Unit 5 — Preprocessing & Ethics

Missing values: Mean/Median/Mode imputation or KNN imputation
Scaling: Min-Max x'=(x-min)/(max-min), Z-score x'=(x-μ)/σ
Encoding: Label encoding (integers), One-hot (binary columns)
Ethics: Bias, Privacy, Transparency, Accountability, Deepfakes

⚡ Memory Tricks

📌 KNN: "Find K Friends and vote"
📌 SVM: "Draw fattest line between classes"
📌 PCA: "Compress by keeping important directions"
📌 LSTM: "3-gated memory controller: Forget, Input, Output"
📌 Confusion Matrix: "TP and TN are correct friends, FP and FN are mistakes"
Most Probable
Mock Exam Paper (PYQ Pattern Based)

MCA II SEMESTER — MACHINE LEARNING (TMC-211)

MOST PROBABLE END-TERM EXAM PAPER

Time: 3 Hours Note: Answer any TWO parts from each question. Each part = 10 marks. Max Marks: 100
Q1. (10×2=20)
a) Define supervised and unsupervised learning. Illustrate each with two examples. Give four applications of Machine Learning.
— OR —
b) What is Reinforcement Learning? Explain its components (Agent, Environment, State, Action, Reward, Policy) with a suitable example.
— OR —
c) Write a short note on: (i) NumPy (ii) TensorFlow (iii) Pandas
Q2. (10×2=20)
a) Use KNN classifier (K=5) to classify the test point (Brightness=20, Saturation=35). Use Euclidean distance: [Data points: 7 rows with Brightness, Saturation, Class (Red/Blue)]
— OR —
b) Explain: (i) Recall (ii) Information Gain (iii) Gini Index (iv) K-Means Clustering (v) F1 Score
— OR —
c) Explain DBSCAN algorithm for density based clustering. List its advantages compared to K-Means.
Q3. (10×2=20)
a) What is feature scaling? Why is it required? Explain Min-Max Normalization and Z-Score Standardization with examples.
— OR —
b) Explain Bayesian Classifier for multiclass classification with a suitable example.
— OR —
c) What are Kernel Functions in SVM? Explain any four kernel functions with examples.
Q4. (10×2=20)
a) What is sigmoidal function? With a₀=−64, a₁=2, find pass/fail probability for a student who studies 33 hours.
— OR —
b) Obtain the Linear Regression equation for: X: 2.0 3.0 4.0 5.0 6.0 / Y: 3.00 4.00 3.40 6.00 5.00. Find Y when X=7.0.
— OR —
c) 1000 patients tested for Diabetes; 900 healthy, 100 sick. Sick: 72 +ve, 28 −ve. Healthy: 28 +ve, 872 −ve. Construct confusion matrix. Calculate Accuracy, Precision, Recall, F1-Score.
Q5. (10×2=20)
a) Explain Backpropagation algorithm. Use Gradient Descent to minimize f(x)=x²−2x with η=0.1, starting from x=0 for 3 iterations.
— OR —
b) Draw clear diagram of CNN. Give brief introduction of all layers. Explain Sigmoid, tanh, ReLU activation functions.
— OR —
c) Give architecture of LSTM. What problem does it solve in RNN? Explain its three gates in detail.