🤖 Exam Notes

AI & ML Using Python

Complete notes & cheatsheet — CAP378 coverage of AI foundations, search, machine learning, neural networks, and hands-on Python practicals.

📄 6 Units + Practicals ⏱ Credits: 3+1 🐍 Python Based

Course Outcomes

What You Will Learn

CO1 — AI Fundamentals

Explain fundamental concepts of Artificial Intelligence
Understand intelligent agents and problem solving
Describe knowledge representation techniques

CO2 — Search & Reasoning

Apply search algorithms to classical AI problems
Use logical reasoning and basic planning approaches
Solve problems using uninformed and informed search

CO3 — ML Analysis

Analyze ML paradigms: regression, classification, clustering
Understand and apply model evaluation techniques
Distinguish between AI, ML, DL, and Data Science

CO4 — Python Development

Develop ML and neural network models in Python
Apply models to real-world application scenarios
Use libraries such as NumPy, Pandas, Scikit-learn

Unit I

Foundations of Artificial Intelligence

What is Artificial Intelligence?

📖 Definition

Artificial Intelligence (AI) is the simulation of human intelligence by machines, especially computer systems. It encompasses the ability of a machine to perceive its environment, reason about it, learn from experience, and take actions to achieve goals.

History and Evolution of AI

Era	Key Event
1950s	Alan Turing proposes the Turing Test; term "Artificial Intelligence" coined by John McCarthy (1956 Dartmouth Conference)
1960s–70s	Early expert systems and symbolic AI; first AI winter due to unrealistic expectations
1980s	Expert systems boom; machine learning research grows; second AI winter
1990s–2000s	Statistical ML approaches dominate; Support Vector Machines, neural networks revived
2010s–Now	Deep learning revolution; GPUs enable large-scale training; NLP breakthroughs (BERT, GPT)

Applications of AI Across Industries

AI is used across virtually every industry today:

Healthcare: Disease diagnosis, drug discovery, medical imaging analysis.
Finance: Fraud detection, algorithmic trading, credit scoring.
Transportation: Self-driving vehicles, route optimization.
NLP: Chatbots, machine translation, sentiment analysis.
Entertainment: Recommendation systems (Netflix, Spotify), game AI.
Manufacturing: Predictive maintenance, quality control via computer vision.

Rational Agents and Their Properties

📖 Definition

An agent is anything that perceives its environment through sensors and acts upon that environment through actuators. A rational agent acts so as to achieve the best expected outcome given the available information.

Key properties of a rational agent:

Performance Measure: The criterion that defines the degree of success — e.g., distance travelled, score earned.
Environment: Everything the agent interacts with (the "world").
Actuators: The agent's output mechanisms (wheels, speakers, API calls).
Sensors: The agent's input mechanisms (camera, keyboard, microphone).

Together, these four form the PEAS description of an agent.

Types of Environments (PEAS Properties)

Property	Meaning	Example
Fully / Partially Observable	Can the agent see the complete environment state?	Chess (full) vs. Poker (partial)
Deterministic / Stochastic	Is the next state fully determined by current state + action?	Crossword (det.) vs. Driving (stoch.)
Episodic / Sequential	Is the current decision independent of past decisions?	Image classification (ep.) vs. Chess (seq.)
Static / Dynamic	Does the environment change while the agent deliberates?	Crossword (static) vs. Driving (dynamic)
Discrete / Continuous	Are states and actions finite or infinite?	Chess (discrete) vs. Robot arm (continuous)

Types of Agents

Simple Reflex Agent

Acts based only on the current percept
Uses condition–action rules: if (situation) then (action)
No memory of past states
Only works in fully observable environments
Example: thermostat (if temp < 20°C → turn on heater)

Model-Based Reflex Agent

Maintains an internal model of the world
Tracks the current state even in partially observable env.
Combines current percept with stored state
More powerful than simple reflex agents
Example: a robot that maps its surroundings

Goal-Based Agent

Has explicit goal states to achieve
Uses search and planning to find action sequences
More flexible — goals can change
Less efficient (needs planning for each goal)
Example: GPS navigation agent

Utility-Based Agent

Uses a utility function to measure desirability
Makes decisions that maximize expected utility
Handles conflicting goals and trade-offs
Most general and powerful type
Example: stock trading agent, route planner with traffic

State Space Representation & Problem Solving

📖 Definition

State space representation models a problem as a graph where each node is a possible state of the world and each edge is an action that transitions between states. Problem solving is the process of finding a path from the initial state to a goal state.

Components of a Problem:

Initial State: The starting configuration of the problem. E.g., for 8-puzzle: a specific tile arrangement.
Actions / Operators: The set of moves available to the agent from any given state.
Transition Model: Describes what each action does (next state = result(state, action)).
Goal State / Goal Test: Condition(s) that determine if a state is a solution.
Path Cost: A numeric cost assigned to each path through the state space; the agent seeks minimum-cost paths.

🔍 Example — 8-Puzzle

Initial State: A specific scrambled arrangement of tiles 1–8 with one blank.
Actions: Move blank Up, Down, Left, Right.
Goal State: Tiles arranged in order 1–8 with blank at bottom-right.
Path Cost: Number of moves made.

Game Playing: Minimax & Alpha-Beta Pruning

Game playing is a classic area of AI where the agent competes against an adversary. The key algorithm is:

📖 Minimax Algorithm

The Minimax algorithm is used for two-player zero-sum games (e.g., chess, tic-tac-toe). The MAX player tries to maximize the score; the MIN player tries to minimize it. The algorithm performs a complete depth-first search of the game tree to find the optimal move.

How it works:

Generate the game tree to a certain depth (terminal states or cutoff depth).
Assign utility values to all terminal states.
Propagate values upward: MAX nodes take the maximum of children's values; MIN nodes take the minimum.
The root node's value gives MAX's best guaranteed outcome; the move that achieves it is selected.

⚠️ Alpha-Beta Pruning

Alpha-Beta Pruning is an optimization of Minimax. It prunes (skips) branches that cannot possibly affect the final decision, significantly reducing computation.

α = best value MAX can guarantee so far.
β = best value MIN can guarantee so far.
A node is pruned when α ≥ β. In the best case, Alpha-Beta reduces the search depth by half — from O(b^m) to O(b^m/2).

📋 Unit I — Quick Cheatsheet

AI = simulation of human intelligence in machines; term coined by McCarthy (1956)
PEAS = Performance measure, Environment, Actuators, Sensors
Simple Reflex = if (percept) → action; no memory
Model-Based = maintains internal world state
Goal-Based = plans to achieve specific goals
Utility-Based = maximizes a utility function; most general
Problem components = Initial state, Actions, Transition model, Goal test, Path cost
Minimax = MAX maximizes, MIN minimizes; guarantees best outcome in zero-sum games
Alpha-Beta = prunes branches when α ≥ β; reduces O(b^m) to O(b^m/2)

Unit II

Search, Logic and Reasoning

Uninformed Search Strategies

Uninformed (blind) search algorithms have no additional information about states beyond the problem definition — they don't know how far they are from the goal.

Algorithm	Strategy	Complete?	Optimal?	Time / Space
BFS (Breadth-First Search)	Expands shallowest node first; uses a queue (FIFO)	✅ Yes	✅ Yes (uniform cost)	O(b^d) / O(b^d)
DFS (Depth-First Search)	Expands deepest node first; uses a stack (LIFO)	❌ No (infinite spaces)	❌ No	O(b^m) / O(bm)
UCS (Uniform Cost Search)	Expands lowest path-cost node first; uses a priority queue	✅ Yes	✅ Yes	O(b^⌈C*/ε⌉)

Where: b = branching factor, d = depth of shallowest solution, m = max depth of tree.

Informed Search Strategies & Heuristics

Informed search uses a heuristic function h(n) that estimates the cost from node n to the goal. A good heuristic dramatically reduces search time.

Greedy Best-First Search

Expands the node that appears closest to the goal according to h(n). It uses only h(n) and ignores the path cost — making it fast but neither complete nor optimal.

A* Search

📖 A* Algorithm

A* evaluates nodes using f(n) = g(n) + h(n) where:
g(n) = actual cost from start to node n
h(n) = estimated (heuristic) cost from n to goal
A* is complete and optimal if h(n) is admissible (never overestimates).

Heuristic Design Principles

Admissibility: h(n) ≤ actual cost to goal. Never overestimates — guarantees optimality.
Consistency (Monotonicity): h(n) ≤ cost(n, n') + h(n') for all successors n'. Ensures nodes are expanded in order of their f values.
Informedness: A more accurate h(n) leads to fewer nodes expanded. If h₁(n) ≥ h₂(n) for all n, h₁ dominates h₂ (fewer expansions).

🔍 Heuristic Example — 8-Puzzle

h₁ = number of misplaced tiles (admissible, less informed)
h₂ = Manhattan distance (sum of |row_curr − row_goal| + |col_curr − col_goal|) — admissible and dominates h₁; typically expands fewer nodes.

Propositional Logic & First-Order Logic

Propositional Logic

Propositional logic deals with propositions (statements that are true or false) connected by logical connectives:

Connective	Symbol	Meaning
NOT	`¬`	Negation — flips truth value
AND	`∧`	Conjunction — true only if both are true
OR	`∨`	Disjunction — true if at least one is true
IMPLIES	`⇒`	Implication — false only when P is true and Q is false
BICONDITIONAL	`⟺`	Equivalence — true when both sides match

Predicate Logic / First-Order Logic (FOL)

FOL extends propositional logic to express relationships between objects. It introduces:

Constants: Specific objects — e.g., Ansh, Delhi.
Variables: Stand-ins for objects — e.g., x, y.
Predicates: Relations between objects — e.g., LikesAI(Ansh), Greater(5, 3).
Functions: Map objects to objects — e.g., FatherOf(Ansh).
Quantifiers: ∀ (for all) and ∃ (there exists).

🔍 FOL Example

"All students who study AI pass the exam."
∀x: Student(x) ∧ StudiesAI(x) ⇒ Passes(x)

"There exists a student who studies both AI and ML."
∃x: Student(x) ∧ StudiesAI(x) ∧ StudiesML(x)

Semantic Networks for Knowledge Representation

📖 Definition

A semantic network is a graph-based knowledge representation where: nodes represent concepts or objects, and edges represent relationships between them (e.g., "is-a", "has-a", "part-of").

Key relationships used in semantic networks:

is-a: Represents inheritance. Dog is-a Animal means Dog inherits all properties of Animal.
instance-of: A specific member of a class. Buddy instance-of Dog.
has-a (part-of): Compositional relationship. Car has-a Engine.
Custom relationships: Ansh lives-in Delhi, Python used-for ML.

💡 Advantage of Semantic Networks

Semantic networks support inheritance naturally. If "Animal has-a Heart" and "Dog is-a Animal," then a dog automatically has a heart — knowledge propagates through the network without redundant facts.

📋 Unit II — Quick Cheatsheet

Uninformed search = BFS (optimal, high memory), DFS (low memory, not optimal), UCS (optimal by cost)
A* = f(n) = g(n) + h(n); optimal if h is admissible
Admissible heuristic = never overestimates true cost
Propositional logic = statements connected by ¬, ∧, ∨, ⇒, ⟺
FOL = adds objects, predicates, quantifiers (∀, ∃)
Semantic network = nodes (concepts) + edges (is-a, has-a, instance-of)
Greedy Best-First uses only h(n); A* uses g(n) + h(n)

Unit III

Machine Learning

Basics of Machine Learning

📖 Definition

Machine Learning (ML) is a subset of AI where systems learn from data to improve their performance on tasks without being explicitly programmed. Instead of writing rules, you give the machine data and let it figure out the rules.

AI vs ML vs DL vs Data Science

Field	Focus	Subset of
Artificial Intelligence	Any technique that enables machines to mimic human intelligence	—
Machine Learning	Algorithms that learn patterns from data automatically	AI
Deep Learning	Multi-layered neural networks; learns representations automatically	ML
Data Science	Extracting insights from data using statistics, programming, and domain knowledge	Overlaps with ML/AI

Types of Learning

Supervised Learning

Training data has labels (input-output pairs)
Model learns mapping: input → output
Tasks: Classification, Regression
Examples: spam detection, house price prediction

Unsupervised Learning

Training data has no labels
Model finds hidden structure in data
Tasks: Clustering, Dimensionality Reduction
Examples: customer segmentation, anomaly detection

Reinforcement Learning

Agent learns by interacting with environment
Receives rewards (+) or penalties (−)
Goal: maximize cumulative reward
Examples: game-playing AI (AlphaGo), robotics

ML Workflow

A machine learning project follows a systematic pipeline:

Data Collection: Gather raw data from databases, APIs, web scraping, sensors, etc.
Data Cleaning: Handle missing values (imputation or removal), remove duplicates, fix inconsistencies, handle outliers.
Feature Engineering: Select relevant features, create new ones (e.g., age from date of birth), encode categorical variables, normalize/scale numerical features.
Model Selection: Choose appropriate algorithm based on problem type, data size, and interpretability requirements.
Training: Fit the model to training data — the algorithm finds optimal parameters.
Evaluation: Test the model on unseen data using appropriate metrics.
Deployment: Integrate the model into the production system.

Train-Test Split and Cross-Validation

Train-Test Split: Divide the dataset into two parts — typically 70–80% for training and 20–30% for testing. The test set is never seen during training, providing an unbiased estimate of performance.

K-Fold Cross-Validation: The dataset is split into k equal folds. The model is trained k times, each time using k−1 folds for training and one fold for validation. The final performance is the average across all k runs. Common choice: k = 5 or 10.

💡 Why Cross-Validation?

A single train-test split can be "lucky" or "unlucky" depending on which data ended up where. Cross-validation uses all the data for both training and validation (at different times), giving a more robust and reliable estimate of model performance.

Introduction to Regression

Regression is a supervised learning task where the output is a continuous numerical value.

Linear Regression

Models the relationship between input features and output as a straight line:

Linear Regression ŷ = w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ

The model learns the weights w by minimizing the Mean Squared Error (MSE) using Gradient Descent or the Normal Equation.

Polynomial Regression

When the relationship is non-linear, polynomial regression fits a curve by introducing polynomial features (x², x³, etc.). It is still a linear model in the weight space but captures non-linear patterns. Risk: overfitting with high-degree polynomials.

Evaluation Metrics

Classification Metrics

Metric	Formula	When to Use
Accuracy	(TP + TN) / (TP + TN + FP + FN)	Balanced classes; overall correctness
Precision	TP / (TP + FP)	When false positives are costly (spam filter)
Recall (Sensitivity)	TP / (TP + FN)	When false negatives are costly (disease diagnosis)
F1-Score	2 × (Precision × Recall) / (Precision + Recall)	Imbalanced classes; need balance of P and R
ROC-AUC	Area under the ROC curve	Ranking models; threshold-independent evaluation

⚠️ Confusion Matrix

A confusion matrix is a 2×2 table showing TP (true positives), TN (true negatives), FP (false positives), FN (false negatives). All classification metrics derive from these four values. High accuracy with imbalanced data can be misleading — always check precision and recall too.

📋 Unit III — Quick Cheatsheet

ML = systems that learn from data; subset of AI
Supervised = labelled data; Classification + Regression
Unsupervised = no labels; Clustering + Dimensionality Reduction
Reinforcement = reward-based learning; agent-environment interaction
ML Workflow = Collect → Clean → Feature Eng. → Model → Train → Evaluate → Deploy
Linear Regression = ŷ = w₀ + w₁x₁ + ...; minimizes MSE
Cross-validation = k-fold; robust performance estimate
Precision = TP/(TP+FP); Recall = TP/(TP+FN)
F1 = harmonic mean of Precision & Recall; good for imbalanced data

Unit IV

Machine Learning Algorithms

Key Classification Algorithms

Logistic Regression

Despite the name, logistic regression is a classification algorithm. It uses the sigmoid function to output a probability between 0 and 1, then applies a threshold (usually 0.5) to classify.

Sigmoid Function σ(z) = 1 / (1 + e^−z), where z = w₀ + w₁x₁ + ... + wₙxₙ

Decision Trees

A decision tree splits data into subsets based on feature values, creating a tree of decisions. Each internal node tests a feature, each branch represents an outcome, and each leaf node holds a class label.

✅ Advantages

Easy to understand and visualize
Handles both numerical and categorical data
No need for feature scaling
Interpretable — can explain decisions

❌ Disadvantages

Prone to overfitting (deep trees)
Unstable — small data changes → different tree
Biased towards features with more levels
Not optimal for large datasets

Random Forests

A Random Forest builds many decision trees during training and outputs the mode (classification) or mean (regression) of all trees. The key ideas:

Bagging (Bootstrap Aggregation): Each tree is trained on a random sample (with replacement) of the data.
Feature Randomness: Each split considers only a random subset of features, reducing correlation between trees.
Result: More accurate and robust than individual decision trees; reduces overfitting significantly.

Support Vector Machines (SVM)

📖 SVM Concept

SVM finds the optimal hyperplane that maximally separates classes in the feature space. The distance between the hyperplane and the nearest points of each class (called support vectors) is called the margin. SVM maximizes this margin. For non-linearly separable data, the kernel trick maps data to higher dimensions.

Naïve Bayes

Based on Bayes' Theorem, Naïve Bayes assumes that features are conditionally independent given the class label (the "naïve" assumption). Despite this simplification, it works surprisingly well for:

Text classification (spam detection, sentiment analysis)
Real-time prediction (low computation cost)
High-dimensional data

Bayes' Theorem P(Class | Features) ∝ P(Features | Class) × P(Class)

K-Nearest Neighbors (KNN)

KNN is a lazy learner — it stores all training data without building a model. To classify a new point:

Calculate the distance from the new point to all training points (usually Euclidean distance).
Find the K nearest neighbors.
Assign the class that is most common among the K neighbors (majority vote).

⚠️ Choosing K

Small K → complex boundary, overfitting. Large K → smoother boundary, underfitting. Use cross-validation to find the optimal K. A common heuristic: K = √n where n is the number of training samples.

Clustering Algorithms

K-Means Clustering

K-Means partitions data into K clusters by iteratively assigning points to the nearest centroid and updating centroids.

Algorithm:

Choose K (number of clusters) and initialize K centroids randomly.
Assignment Step: Assign each data point to the nearest centroid.
Update Step: Recalculate each centroid as the mean of all points assigned to it.
Repeat steps 2–3 until centroids no longer change (convergence).

💡 Choosing K — Elbow Method

Plot the Within-Cluster Sum of Squares (WCSS) against K. The point where the curve "elbows" (rate of decrease slows significantly) is the optimal K.

Hierarchical Clustering

Hierarchical clustering builds a tree of clusters (dendrogram) without pre-specifying K:

Agglomerative (bottom-up): Start with each point as its own cluster; merge the two closest clusters repeatedly until one remains.
Divisive (top-down): Start with one cluster containing all points; split recursively.

Dimensionality Reduction — PCA

📖 Principal Component Analysis (PCA)

PCA is an unsupervised technique that transforms data into a new coordinate system where the axes (principal components) are ordered by the amount of variance they explain. By keeping only the top k components, you reduce dimensionality while retaining most information.

Steps:

Standardize the data (zero mean, unit variance).
Compute the covariance matrix.
Compute eigenvectors and eigenvalues of the covariance matrix.
Sort eigenvectors by eigenvalue (descending) — these are the principal components.
Project data onto the top k eigenvectors.

Explained Variance: The eigenvalue of a component divided by the total sum of eigenvalues gives the fraction of variance explained. A scree plot shows variance explained by each component.

📋 Unit IV — Quick Cheatsheet

Logistic Regression = classification using sigmoid; outputs probability 0–1
Decision Tree = splits features; interpretable; prone to overfitting
Random Forest = ensemble of trees + bagging; reduces overfitting
SVM = max-margin hyperplane; kernel trick for non-linear data
Naïve Bayes = Bayes' Theorem + feature independence assumption; fast
KNN = lazy learner; majority vote of K nearest neighbors
K-Means = assign → update centroids; repeat until convergence
Hierarchical = builds dendrogram; agglomerative (bottom-up) or divisive (top-down)
PCA = transforms to principal components; maximizes variance; reduces dimensions

Unit V

Artificial Neural Networks & Deep Learning

Biological vs. Artificial Neurons

🧠 Biological Neuron

Dendrites: Receive signals from other neurons
Cell Body (Soma): Integrates incoming signals
Axon: Transmits output signal
Synapse: Junction between neurons; strengthens with learning
Fires (action potential) if combined signal exceeds threshold

🤖 Artificial Neuron

Inputs (x₁, x₂, …): Features or outputs of previous layer
Weights (w₁, w₂, …): Learnable; analogous to synaptic strengths
Bias (b): Shifts the activation threshold
Activation Function: Determines output from weighted sum
Output = f(w₁x₁ + w₂x₂ + … + b)

Perceptron, MLP, and Activation Functions

Perceptron

The Perceptron (Rosenblatt, 1958) is the simplest neural network — a single artificial neuron that learns a binary classifier. It uses a step activation function and can only classify linearly separable data.

Multilayer Perceptron (MLP)

An MLP consists of:

Input Layer: Receives raw features; one node per feature.
Hidden Layer(s): Intermediate layers that learn representations; the "depth" of the network.
Output Layer: Produces final prediction; nodes depend on task (1 for regression, C for C-class classification).

Common Activation Functions

Function	Formula	Range	Use Case
Sigmoid	σ(x) = 1 / (1 + e^−x)	(0, 1)	Output layer for binary classification
Tanh	tanh(x) = (e^x − e^−x) / (e^x + e^−x)	(−1, 1)	Hidden layers; zero-centered (better than sigmoid)
ReLU	f(x) = max(0, x)	[0, ∞)	Hidden layers in deep networks; most popular
Softmax	σ(z)ᵢ = e^zᵢ / Σ e^zⱼ	(0, 1)	Multi-class classification output

⚠️ Vanishing Gradient Problem

Sigmoid and Tanh saturate (output near 0 or 1/−1 for large inputs), causing gradients to become nearly zero during backpropagation. This makes training deep networks very slow. ReLU solves this by having a constant gradient (1) for positive inputs.

Training: Backpropagation & Gradient Descent

📖 Backpropagation

Backpropagation is the algorithm used to train neural networks. It computes the gradient of the loss function with respect to each weight by applying the chain rule of calculus backwards through the network — from output layer to input layer.

Training Process:

Forward Pass: Input flows through the network layer by layer; compute output and loss.
Backward Pass: Compute gradients of loss w.r.t. each weight using backpropagation.
Weight Update: Adjust weights using Gradient Descent: w = w − η × ∂L/∂w
Repeat for many epochs (full passes over the training data) until the loss converges.

Gradient Descent Update Rule w ← w − η · ∇L(w)

Where η (eta) is the learning rate — controls the step size. Too large → overshooting; too small → slow convergence.

Training, Validation, and Testing

Split	Purpose	Typical Size
Training Set	Model learns weights on this data	60–70%
Validation Set	Tune hyperparameters; monitor overfitting during training	10–20%
Test Set	Final unbiased evaluation; never seen during training or tuning	10–20%

Deep Learning Overview: CNNs and RNNs

🖼️ Convolutional Neural Networks (CNNs)

Designed for image data
Convolutional layers learn local features (edges, shapes)
Pooling layers reduce spatial dimensions
Fully connected layers for final classification
Key insight: weight sharing — same filter applied across all positions
Applications: image classification, object detection, face recognition

🔄 Recurrent Neural Networks (RNNs)

Designed for sequential data (text, time series, speech)
Have hidden state that carries information across time steps
Process input one element at a time while remembering context
Problem: vanishing gradient with long sequences
LSTM (Long Short-Term Memory) solves this with gates
Applications: text generation, machine translation, sentiment analysis

📋 Unit V — Quick Cheatsheet

Artificial Neuron = weighted sum of inputs + bias → activation function
Perceptron = single neuron; linearly separable data only
MLP = input + hidden layers + output; non-linear decision boundaries
ReLU = most popular hidden layer activation; avoids vanishing gradient
Sigmoid → binary output; Softmax → multi-class output
Backprop = chain rule to compute gradients; updates weights layer by layer
Learning rate (η) = controls gradient descent step size
CNN = convolution + pooling; designed for images; weight sharing
RNN = sequential data; hidden state; LSTM for long sequences

Unit VI

Advanced AI Concepts

Planning in Artificial Intelligence

📖 Definition

AI Planning is the automated reasoning about sequences of actions that an agent must take to achieve a desired goal from an initial state. It is more powerful than search — plans can be abstract, hierarchical, and conditional.

Key concepts in AI planning:

STRIPS Representation: States described by predicates; actions have preconditions and effects (add/delete lists).
Forward Chaining: Start from the initial state and apply actions to reach the goal.
Backward Chaining: Start from the goal and work back to find required preconditions.
Partial-Order Planning: Build a plan where not all steps are totally ordered, allowing parallelism.

Fuzzy Logic

📖 Definition

Fuzzy Logic extends classical binary logic (true/false → 1/0) to allow degrees of truth between 0 and 1. This models the imprecise and vague way humans think and communicate. Introduced by Lotfi Zadeh (1965).

Fuzzy Sets and Membership Functions

A fuzzy set is a set where each element has a membership degree μ ∈ [0, 1] indicating how much it belongs to the set. The membership function defines this mapping.

🔍 Example

For the fuzzy set "tall": Person with height 5'0" → membership 0.1 (barely tall)
Person with height 5'9" → membership 0.6 (somewhat tall)
Person with height 6'2" → membership 0.95 (very tall)

Fuzzy Rules and Inference

Fuzzy systems use IF-THEN rules that operate on fuzzy sets:

IF temperature is HIGH AND humidity is HIGH THEN fan_speed is VERY_FAST

Defuzzification

Defuzzification converts the fuzzy output back to a crisp (numerical) value for action. Common methods:

Centroid Method: Center of gravity of the output fuzzy set — most common.
Maximum Method: Take the point with the highest membership value.
Weighted Average: Average of maximum points weighted by their membership.

Expert Systems

📖 Definition

An Expert System is an AI program that emulates the decision-making ability of a human expert in a specific domain. It uses a knowledge base and applies inference to answer queries. Classic examples: MYCIN (medical diagnosis), DENDRAL (chemical analysis).

Expert System Architecture

Component	Role
Knowledge Base	Stores domain-specific facts and rules (IF-THEN rules; typically 100s–1000s of rules)
Inference Engine	Applies logical rules to the knowledge base to derive conclusions; uses forward or backward chaining
User Interface	Allows users to input queries and receive explanations/answers
Explanation Facility	Explains the reasoning ("Why did you ask that?" / "How did you reach this conclusion?")
Knowledge Acquisition Module	Helps add new knowledge from domain experts (knowledge engineers)

Basics of Robotics

Robotics is the interdisciplinary field of designing, building, and operating robots — physical agents that interact with the physical world.

Sensors (Perception): Gather information from the environment: cameras, lidar, sonar, GPS, IMU (accelerometers/gyroscopes), touch sensors.
Actuators (Action): Create physical effects: electric motors, pneumatic/hydraulic actuators, servo motors, grippers.
Controller: The "brain" — processes sensor data and sends commands to actuators (the AI/ML component).
Degrees of Freedom (DoF): Number of independent ways a robot can move. A 6-DoF robotic arm can position and orient an end-effector freely in 3D space.

Fundamentals of Natural Language Processing

📖 Definition

NLP (Natural Language Processing) is the field of AI that enables computers to understand, interpret, and generate human language (text or speech).

Text Preprocessing Pipeline

Tokenization: Split text into tokens (words, sentences, subwords). E.g., "Hello world!" → ["Hello", "world", "!"]
Lowercasing: Convert all text to lowercase for case-insensitive matching.
Stop Word Removal: Remove common words with little meaning (the, is, in, etc.).
Stemming: Reduces words to their root form by chopping suffixes. (running → run; studies → studi — may not be a real word).
Lemmatization: Reduces to dictionary root form using vocabulary. (better → good; running → run — always a valid word).
N-grams: Sequences of N consecutive tokens. Bigrams (2), Trigrams (3). Used to capture context and phrases.

Stemming

Rule-based, crude suffix stripping
Fast but may produce non-words
"Studies" → "Studi" (not valid)
Algorithms: Porter, Snowball

Lemmatization

Uses vocabulary and morphological analysis
Slower but produces real dictionary words
"Studies" → "Study" (valid)
Tools: WordNet Lemmatizer

Basic Chatbot Concepts

A simple rule-based chatbot uses pattern matching (regular expressions or keyword detection) to select pre-written responses. Modern chatbots use:

Intent Classification: Identify what the user wants (e.g., book flight, check weather).
Entity Extraction (NER): Extract key info from text (e.g., dates, names, locations).
Dialogue Management: Track conversation state and determine next action.
Natural Language Generation (NLG): Generate human-like responses.

📋 Unit VI — Quick Cheatsheet

AI Planning = automated action sequencing; STRIPS uses preconditions + effects
Fuzzy Logic = degrees of truth between 0 and 1; models imprecision
Fuzzy sets = membership function μ ∈ [0, 1]
Defuzzification = converts fuzzy output to crisp value (centroid method)
Expert System = Knowledge Base + Inference Engine + User Interface
Inference types = Forward chaining (facts → conclusion) and Backward (goal → facts)
Sensors (camera, lidar) → Robot Controller → Actuators (motors)
NLP pipeline = Tokenize → Lowercase → Remove stopwords → Stem/Lemmatize
Stemming = fast, crude; Lemmatization = slower, linguistically correct
N-grams = sequences of N tokens; bigram, trigram for context

Practicals

List of Experiments (P1–P9)

P1: Python Environment Setup

🎯 Objective

Introduction to Python programming environment and installation of Anaconda / Python IDE.

Key Steps

Download and install Anaconda from anaconda.com (includes Python, Jupyter, and 250+ packages).
Launch Jupyter Notebook via Anaconda Navigator or command: jupyter notebook.
Alternatively install VS Code with the Python extension and select interpreter.
Verify installation: open terminal and run python --version and pip --version.
Create a new Jupyter notebook (.ipynb) and run a test cell: print("Hello, AI World!").

P2: Basic Python Programs

🎯 Objective

Write simple Python programs using variables, data types, input/output, and control statements.

🔍 Sample Program — Variables & Control Flow


            # Data types

            name = "Alice"       # str

            age = 20             # int

            gpa = 9.2            # float

            is_student = True    # bool


            # Input / Output

            score = int(input("Enter score: "))

            print(f"Score entered: {score}")


            # Control statements

            if score >= 90:

                print("Grade: A")

            elif score >= 75:

                print("Grade: B")

            else:

                print("Grade: C")


            # Loop

            for i in range(1, 6):

                print(f"{i} × {i} = {i*i}")

P3: Data Structures & File Handling

🎯 Objective

Implementation of Python programs using lists, tuples, dictionaries, and basic file handling.

Structure	Syntax	Mutable?	Use Case
List	`[1, 2, 3]`	✅ Yes	Ordered collection; most common
Tuple	`(1, 2, 3)`	❌ No	Fixed data; faster than list
Dictionary	`{'a': 1, 'b': 2}`	✅ Yes	Key-value pairs; fast lookup
Set	`{1, 2, 3}`	✅ Yes	Unique elements; set operations

🔍 File Handling


            # Write to file

            with open("data.txt", "w") as f:

                f.write("AI is fascinating!\n")


            # Read from file

            with open("data.txt", "r") as f:

                content = f.read()

                print(content)

P4: NumPy, Pandas & Matplotlib

🎯 Objective

Introduction to NumPy, Pandas, and Matplotlib for numerical computation, data handling, and visualization.

Library	Purpose	Key Operations
NumPy	Numerical computing; fast array ops	`np.array()`, `np.mean()`, `np.dot()`, broadcasting
Pandas	Data manipulation; DataFrames	`pd.read_csv()`, `df.describe()`, `df.dropna()`, `groupby()`
Matplotlib	2D plotting and visualization	`plt.plot()`, `plt.bar()`, `plt.scatter()`, `plt.show()`

P5: Data Visualization Techniques

🎯 Objective

Implementation of basic data visualization techniques such as line charts, bar charts, and scatter plots using Python.

🔍 Sample Code


            import matplotlib.pyplot as plt

            import numpy as np


            x = np.arange(1, 11)

            y = x ** 2


            # Line Chart

            plt.figure(figsize=(12, 4))

            plt.subplot(1, 3, 1)

            plt.plot(x, y, 'b-o'); plt.title("Line Chart")


            # Bar Chart

            plt.subplot(1, 3, 2)

            plt.bar(['A','B','C','D'], [23, 45, 12, 67]); plt.title("Bar Chart")


            # Scatter Plot

            plt.subplot(1, 3, 3)

            plt.scatter(np.random.randn(50), np.random.randn(50)); plt.title("Scatter Plot")

            plt.tight_layout(); plt.show()

P6: Linear Regression / KNN

🎯 Objective

Implementation of a simple ML model — Linear Regression or K-Nearest Neighbors — using Python.

🔍 Linear Regression with Scikit-learn


            from sklearn.linear_model import LinearRegression

            from sklearn.model_selection import train_test_split

            from sklearn.metrics import mean_squared_error

            import numpy as np


            # Generate data

            X = np.random.rand(100, 1) * 10

            y = 2.5 * X.squeeze() + np.random.randn(100) * 2


            # Split data

            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


            # Train model

            model = LinearRegression()

            model.fit(X_train, y_train)


            # Predict & evaluate

            y_pred = model.predict(X_test)

            print("MSE:", mean_squared_error(y_test, y_pred))

            print("Coefficient:", model.coef_, "Intercept:", model.intercept_)

P7: Classification — Logistic Regression / Decision Tree

🎯 Objective

Implementation of a basic classification algorithm — Logistic Regression or Decision Tree — using Python.

🔍 Logistic Regression on Iris Dataset


            from sklearn.datasets import load_iris

            from sklearn.linear_model import LogisticRegression

            from sklearn.model_selection import train_test_split

            from sklearn.metrics import accuracy_score, classification_report


            iris = load_iris()

            X, y = iris.data, iris.target

            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


            model = LogisticRegression(max_iter=200)

            model.fit(X_train, y_train)

            y_pred = model.predict(X_test)


            print("Accuracy:", accuracy_score(y_test, y_pred))

            print(classification_report(y_test, y_pred, target_names=iris.target_names))

P8: K-Means Clustering

🎯 Objective

Implementation of K-Means clustering using Python.

🔍 K-Means with Scikit-learn


            from sklearn.cluster import KMeans

            from sklearn.datasets import make_blobs

            import matplotlib.pyplot as plt


            # Generate blob data

            X, true_labels = make_blobs(n_samples=300, centers=4, random_state=42)


            # Fit K-Means

            kmeans = KMeans(n_clusters=4, random_state=42)

            kmeans.fit(X)

            labels = kmeans.labels_

            centers = kmeans.cluster_centers_


            # Plot clusters

            plt.scatter(X[:,0], X[:,1], c=labels, cmap='viridis', alpha=0.6)

            plt.scatter(centers[:,0], centers[:,1], c='red', marker='X', s=200)

            plt.title("K-Means Clustering (K=4)")

            plt.show()

P9: Model Evaluation — Accuracy & Confusion Matrix

🎯 Objective

Evaluation of machine learning models using accuracy and confusion matrix.

🔍 Confusion Matrix Visualization


            from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, accuracy_score

            from sklearn.ensemble import RandomForestClassifier

            from sklearn.datasets import load_iris

            from sklearn.model_selection import train_test_split

            import matplotlib.pyplot as plt


            iris = load_iris()

            X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)


            clf = RandomForestClassifier(n_estimators=100, random_state=42)

            clf.fit(X_train, y_train)

            y_pred = clf.predict(X_test)


            print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")


            cm = confusion_matrix(y_test, y_pred)

            disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=iris.target_names)

            disp.plot(cmap='Blues')

            plt.title("Random Forest — Confusion Matrix")

            plt.show()

📖 Reading a Confusion Matrix

Rows = Actual classes. Columns = Predicted classes.
Diagonal elements = correctly classified samples (true positives for each class).
Off-diagonal = misclassifications. A good model has a bright diagonal and dark off-diagonals.