AI & ML Using Python
Complete notes & cheatsheet β CAP378 coverage of AI foundations, search, machine learning, neural networks, and hands-on Python practicals.
What You Will Learn
CO1 β AI Fundamentals
- Explain fundamental concepts of Artificial Intelligence
- Understand intelligent agents and problem solving
- Describe knowledge representation techniques
CO2 β Search & Reasoning
- Apply search algorithms to classical AI problems
- Use logical reasoning and basic planning approaches
- Solve problems using uninformed and informed search
CO3 β ML Analysis
- Analyze ML paradigms: regression, classification, clustering
- Understand and apply model evaluation techniques
- Distinguish between AI, ML, DL, and Data Science
CO4 β Python Development
- Develop ML and neural network models in Python
- Apply models to real-world application scenarios
- Use libraries such as NumPy, Pandas, Scikit-learn
Foundations of Artificial Intelligence
What is Artificial Intelligence?
History and Evolution of AI
| Era | Key Event |
|---|---|
| 1950s | Alan Turing proposes the Turing Test; term "Artificial Intelligence" coined by John McCarthy (1956 Dartmouth Conference) |
| 1960sβ70s | Early expert systems and symbolic AI; first AI winter due to unrealistic expectations |
| 1980s | Expert systems boom; machine learning research grows; second AI winter |
| 1990sβ2000s | Statistical ML approaches dominate; Support Vector Machines, neural networks revived |
| 2010sβNow | Deep learning revolution; GPUs enable large-scale training; NLP breakthroughs (BERT, GPT) |
Applications of AI Across Industries
AI is used across virtually every industry today:
- Healthcare: Disease diagnosis, drug discovery, medical imaging analysis.
- Finance: Fraud detection, algorithmic trading, credit scoring.
- Transportation: Self-driving vehicles, route optimization.
- NLP: Chatbots, machine translation, sentiment analysis.
- Entertainment: Recommendation systems (Netflix, Spotify), game AI.
- Manufacturing: Predictive maintenance, quality control via computer vision.
Rational Agents and Their Properties
Key properties of a rational agent:
- Performance Measure: The criterion that defines the degree of success β e.g., distance travelled, score earned.
- Environment: Everything the agent interacts with (the "world").
- Actuators: The agent's output mechanisms (wheels, speakers, API calls).
- Sensors: The agent's input mechanisms (camera, keyboard, microphone).
Together, these four form the PEAS description of an agent.
Types of Environments (PEAS Properties)
| Property | Meaning | Example |
|---|---|---|
| Fully / Partially Observable | Can the agent see the complete environment state? | Chess (full) vs. Poker (partial) |
| Deterministic / Stochastic | Is the next state fully determined by current state + action? | Crossword (det.) vs. Driving (stoch.) |
| Episodic / Sequential | Is the current decision independent of past decisions? | Image classification (ep.) vs. Chess (seq.) |
| Static / Dynamic | Does the environment change while the agent deliberates? | Crossword (static) vs. Driving (dynamic) |
| Discrete / Continuous | Are states and actions finite or infinite? | Chess (discrete) vs. Robot arm (continuous) |
Types of Agents
Simple Reflex Agent
- Acts based only on the current percept
- Uses conditionβaction rules: if (situation) then (action)
- No memory of past states
- Only works in fully observable environments
- Example: thermostat (if temp < 20Β°C β turn on heater)
Model-Based Reflex Agent
- Maintains an internal model of the world
- Tracks the current state even in partially observable env.
- Combines current percept with stored state
- More powerful than simple reflex agents
- Example: a robot that maps its surroundings
Goal-Based Agent
- Has explicit goal states to achieve
- Uses search and planning to find action sequences
- More flexible β goals can change
- Less efficient (needs planning for each goal)
- Example: GPS navigation agent
Utility-Based Agent
- Uses a utility function to measure desirability
- Makes decisions that maximize expected utility
- Handles conflicting goals and trade-offs
- Most general and powerful type
- Example: stock trading agent, route planner with traffic
State Space Representation & Problem Solving
Components of a Problem:
- Initial State: The starting configuration of the problem. E.g., for 8-puzzle: a specific tile arrangement.
- Actions / Operators: The set of moves available to the agent from any given state.
- Transition Model: Describes what each action does (next state = result(state, action)).
- Goal State / Goal Test: Condition(s) that determine if a state is a solution.
- Path Cost: A numeric cost assigned to each path through the state space; the agent seeks minimum-cost paths.
Actions: Move blank Up, Down, Left, Right.
Goal State: Tiles arranged in order 1β8 with blank at bottom-right.
Path Cost: Number of moves made.
Game Playing: Minimax & Alpha-Beta Pruning
Game playing is a classic area of AI where the agent competes against an adversary. The key algorithm is:
How it works:
- Generate the game tree to a certain depth (terminal states or cutoff depth).
- Assign utility values to all terminal states.
- Propagate values upward: MAX nodes take the maximum of children's values; MIN nodes take the minimum.
- The root node's value gives MAX's best guaranteed outcome; the move that achieves it is selected.
Ξ± = best value MAX can guarantee so far.
Ξ² = best value MIN can guarantee so far.
A node is pruned when
Ξ± β₯ Ξ². In the best case, Alpha-Beta reduces the search depth by half β
from O(bm) to O(bm/2).
- AI = simulation of human intelligence in machines; term coined by McCarthy (1956)
- PEAS = Performance measure, Environment, Actuators, Sensors
- Simple Reflex = if (percept) β action; no memory
- Model-Based = maintains internal world state
- Goal-Based = plans to achieve specific goals
- Utility-Based = maximizes a utility function; most general
- Problem components = Initial state, Actions, Transition model, Goal test, Path cost
- Minimax = MAX maximizes, MIN minimizes; guarantees best outcome in zero-sum games
- Alpha-Beta = prunes branches when Ξ± β₯ Ξ²; reduces O(bm) to O(bm/2)
Search, Logic and Reasoning
Uninformed Search Strategies
Uninformed (blind) search algorithms have no additional information about states beyond the problem definition β they don't know how far they are from the goal.
| Algorithm | Strategy | Complete? | Optimal? | Time / Space |
|---|---|---|---|---|
| BFS (Breadth-First Search) | Expands shallowest node first; uses a queue (FIFO) | β Yes | β Yes (uniform cost) | O(bd) / O(bd) |
| DFS (Depth-First Search) | Expands deepest node first; uses a stack (LIFO) | β No (infinite spaces) | β No | O(bm) / O(bm) |
| UCS (Uniform Cost Search) | Expands lowest path-cost node first; uses a priority queue | β Yes | β Yes | O(bβC*/Ξ΅β) |
Where: b = branching factor, d = depth of shallowest solution, m = max depth of tree.
Informed Search Strategies & Heuristics
Informed search uses a heuristic function h(n) that estimates the cost from node n to the goal. A good heuristic dramatically reduces search time.
Greedy Best-First Search
Expands the node that appears closest to the goal according to h(n). It uses only h(n) and ignores the path cost β making it fast but neither complete nor optimal.
A* Search
f(n) = g(n) + h(n) where:g(n) = actual cost from start to node n
h(n) = estimated (heuristic) cost from n to goal
A* is complete and optimal if h(n) is admissible (never overestimates).
Heuristic Design Principles
- Admissibility: h(n) β€ actual cost to goal. Never overestimates β guarantees optimality.
- Consistency (Monotonicity): h(n) β€ cost(n, n') + h(n') for all successors n'. Ensures nodes are expanded in order of their f values.
- Informedness: A more accurate h(n) leads to fewer nodes expanded. If hβ(n) β₯ hβ(n) for all n, hβ dominates hβ (fewer expansions).
hβ = Manhattan distance (sum of |row_curr β row_goal| + |col_curr β col_goal|) β admissible and dominates hβ; typically expands fewer nodes.
Propositional Logic & First-Order Logic
Propositional Logic
Propositional logic deals with propositions (statements that are true or false) connected by logical connectives:
| Connective | Symbol | Meaning |
|---|---|---|
| NOT | Β¬ | Negation β flips truth value |
| AND | β§ | Conjunction β true only if both are true |
| OR | β¨ | Disjunction β true if at least one is true |
| IMPLIES | β | Implication β false only when P is true and Q is false |
| BICONDITIONAL | βΊ | Equivalence β true when both sides match |
Predicate Logic / First-Order Logic (FOL)
FOL extends propositional logic to express relationships between objects. It introduces:
- Constants: Specific objects β e.g.,
Ansh,Delhi. - Variables: Stand-ins for objects β e.g.,
x,y. - Predicates: Relations between objects β e.g.,
LikesAI(Ansh),Greater(5, 3). - Functions: Map objects to objects β e.g.,
FatherOf(Ansh). - Quantifiers:
β(for all) andβ(there exists).
βx: Student(x) β§ StudiesAI(x) β Passes(x)"There exists a student who studies both AI and ML."
βx: Student(x) β§ StudiesAI(x) β§ StudiesML(x)
Semantic Networks for Knowledge Representation
Key relationships used in semantic networks:
- is-a: Represents inheritance. Dog is-a Animal means Dog inherits all properties of Animal.
- instance-of: A specific member of a class. Buddy instance-of Dog.
- has-a (part-of): Compositional relationship. Car has-a Engine.
- Custom relationships: Ansh lives-in Delhi, Python used-for ML.
- Uninformed search = BFS (optimal, high memory), DFS (low memory, not optimal), UCS (optimal by cost)
- A* = f(n) = g(n) + h(n); optimal if h is admissible
- Admissible heuristic = never overestimates true cost
- Propositional logic = statements connected by Β¬, β§, β¨, β, βΊ
- FOL = adds objects, predicates, quantifiers (β, β)
- Semantic network = nodes (concepts) + edges (is-a, has-a, instance-of)
- Greedy Best-First uses only h(n); A* uses g(n) + h(n)
Machine Learning
Basics of Machine Learning
AI vs ML vs DL vs Data Science
| Field | Focus | Subset of |
|---|---|---|
| Artificial Intelligence | Any technique that enables machines to mimic human intelligence | β |
| Machine Learning | Algorithms that learn patterns from data automatically | AI |
| Deep Learning | Multi-layered neural networks; learns representations automatically | ML |
| Data Science | Extracting insights from data using statistics, programming, and domain knowledge | Overlaps with ML/AI |
Types of Learning
Supervised Learning
- Training data has labels (input-output pairs)
- Model learns mapping: input β output
- Tasks: Classification, Regression
- Examples: spam detection, house price prediction
Unsupervised Learning
- Training data has no labels
- Model finds hidden structure in data
- Tasks: Clustering, Dimensionality Reduction
- Examples: customer segmentation, anomaly detection
Reinforcement Learning
- Agent learns by interacting with environment
- Receives rewards (+) or penalties (β)
- Goal: maximize cumulative reward
- Examples: game-playing AI (AlphaGo), robotics
ML Workflow
A machine learning project follows a systematic pipeline:
- Data Collection: Gather raw data from databases, APIs, web scraping, sensors, etc.
- Data Cleaning: Handle missing values (imputation or removal), remove duplicates, fix inconsistencies, handle outliers.
- Feature Engineering: Select relevant features, create new ones (e.g., age from date of birth), encode categorical variables, normalize/scale numerical features.
- Model Selection: Choose appropriate algorithm based on problem type, data size, and interpretability requirements.
- Training: Fit the model to training data β the algorithm finds optimal parameters.
- Evaluation: Test the model on unseen data using appropriate metrics.
- Deployment: Integrate the model into the production system.
Train-Test Split and Cross-Validation
Train-Test Split: Divide the dataset into two parts β typically 70β80% for training and 20β30% for testing. The test set is never seen during training, providing an unbiased estimate of performance.
K-Fold Cross-Validation: The dataset is split into k equal folds. The model is trained k times, each time using kβ1 folds for training and one fold for validation. The final performance is the average across all k runs. Common choice: k = 5 or 10.
Introduction to Regression
Regression is a supervised learning task where the output is a continuous numerical value.
Linear Regression
Models the relationship between input features and output as a straight line:
The model learns the weights w by minimizing the Mean Squared Error (MSE) using Gradient Descent or the Normal Equation.
Polynomial Regression
When the relationship is non-linear, polynomial regression fits a curve by introducing polynomial features (xΒ², xΒ³, etc.). It is still a linear model in the weight space but captures non-linear patterns. Risk: overfitting with high-degree polynomials.
Evaluation Metrics
Classification Metrics
| Metric | Formula | When to Use |
|---|---|---|
| Accuracy | (TP + TN) / (TP + TN + FP + FN) | Balanced classes; overall correctness |
| Precision | TP / (TP + FP) | When false positives are costly (spam filter) |
| Recall (Sensitivity) | TP / (TP + FN) | When false negatives are costly (disease diagnosis) |
| F1-Score | 2 Γ (Precision Γ Recall) / (Precision + Recall) | Imbalanced classes; need balance of P and R |
| ROC-AUC | Area under the ROC curve | Ranking models; threshold-independent evaluation |
- ML = systems that learn from data; subset of AI
- Supervised = labelled data; Classification + Regression
- Unsupervised = no labels; Clustering + Dimensionality Reduction
- Reinforcement = reward-based learning; agent-environment interaction
- ML Workflow = Collect β Clean β Feature Eng. β Model β Train β Evaluate β Deploy
- Linear Regression = Ε· = wβ + wβxβ + ...; minimizes MSE
- Cross-validation = k-fold; robust performance estimate
- Precision = TP/(TP+FP); Recall = TP/(TP+FN)
- F1 = harmonic mean of Precision & Recall; good for imbalanced data
Machine Learning Algorithms
Key Classification Algorithms
Logistic Regression
Despite the name, logistic regression is a classification algorithm. It uses the sigmoid function to output a probability between 0 and 1, then applies a threshold (usually 0.5) to classify.
Decision Trees
A decision tree splits data into subsets based on feature values, creating a tree of decisions. Each internal node tests a feature, each branch represents an outcome, and each leaf node holds a class label.
β Advantages
- Easy to understand and visualize
- Handles both numerical and categorical data
- No need for feature scaling
- Interpretable β can explain decisions
β Disadvantages
- Prone to overfitting (deep trees)
- Unstable β small data changes β different tree
- Biased towards features with more levels
- Not optimal for large datasets
Random Forests
A Random Forest builds many decision trees during training and outputs the mode (classification) or mean (regression) of all trees. The key ideas:
- Bagging (Bootstrap Aggregation): Each tree is trained on a random sample (with replacement) of the data.
- Feature Randomness: Each split considers only a random subset of features, reducing correlation between trees.
- Result: More accurate and robust than individual decision trees; reduces overfitting significantly.
Support Vector Machines (SVM)
NaΓ―ve Bayes
Based on Bayes' Theorem, NaΓ―ve Bayes assumes that features are conditionally independent given the class label (the "naΓ―ve" assumption). Despite this simplification, it works surprisingly well for:
- Text classification (spam detection, sentiment analysis)
- Real-time prediction (low computation cost)
- High-dimensional data
K-Nearest Neighbors (KNN)
KNN is a lazy learner β it stores all training data without building a model. To classify a new point:
- Calculate the distance from the new point to all training points (usually Euclidean distance).
- Find the K nearest neighbors.
- Assign the class that is most common among the K neighbors (majority vote).
Clustering Algorithms
K-Means Clustering
K-Means partitions data into K clusters by iteratively assigning points to the nearest centroid and updating centroids.
Algorithm:
- Choose K (number of clusters) and initialize K centroids randomly.
- Assignment Step: Assign each data point to the nearest centroid.
- Update Step: Recalculate each centroid as the mean of all points assigned to it.
- Repeat steps 2β3 until centroids no longer change (convergence).
Hierarchical Clustering
Hierarchical clustering builds a tree of clusters (dendrogram) without pre-specifying K:
- Agglomerative (bottom-up): Start with each point as its own cluster; merge the two closest clusters repeatedly until one remains.
- Divisive (top-down): Start with one cluster containing all points; split recursively.
Dimensionality Reduction β PCA
Steps:
- Standardize the data (zero mean, unit variance).
- Compute the covariance matrix.
- Compute eigenvectors and eigenvalues of the covariance matrix.
- Sort eigenvectors by eigenvalue (descending) β these are the principal components.
- Project data onto the top k eigenvectors.
Explained Variance: The eigenvalue of a component divided by the total sum of eigenvalues gives the fraction of variance explained. A scree plot shows variance explained by each component.
- Logistic Regression = classification using sigmoid; outputs probability 0β1
- Decision Tree = splits features; interpretable; prone to overfitting
- Random Forest = ensemble of trees + bagging; reduces overfitting
- SVM = max-margin hyperplane; kernel trick for non-linear data
- NaΓ―ve Bayes = Bayes' Theorem + feature independence assumption; fast
- KNN = lazy learner; majority vote of K nearest neighbors
- K-Means = assign β update centroids; repeat until convergence
- Hierarchical = builds dendrogram; agglomerative (bottom-up) or divisive (top-down)
- PCA = transforms to principal components; maximizes variance; reduces dimensions
Artificial Neural Networks & Deep Learning
Biological vs. Artificial Neurons
π§ Biological Neuron
- Dendrites: Receive signals from other neurons
- Cell Body (Soma): Integrates incoming signals
- Axon: Transmits output signal
- Synapse: Junction between neurons; strengthens with learning
- Fires (action potential) if combined signal exceeds threshold
π€ Artificial Neuron
- Inputs (xβ, xβ, β¦): Features or outputs of previous layer
- Weights (wβ, wβ, β¦): Learnable; analogous to synaptic strengths
- Bias (b): Shifts the activation threshold
- Activation Function: Determines output from weighted sum
- Output = f(wβxβ + wβxβ + β¦ + b)
Perceptron, MLP, and Activation Functions
Perceptron
The Perceptron (Rosenblatt, 1958) is the simplest neural network β a single artificial neuron that learns a binary classifier. It uses a step activation function and can only classify linearly separable data.
Multilayer Perceptron (MLP)
An MLP consists of:
- Input Layer: Receives raw features; one node per feature.
- Hidden Layer(s): Intermediate layers that learn representations; the "depth" of the network.
- Output Layer: Produces final prediction; nodes depend on task (1 for regression, C for C-class classification).
Common Activation Functions
| Function | Formula | Range | Use Case |
|---|---|---|---|
| Sigmoid | Ο(x) = 1 / (1 + eβx) | (0, 1) | Output layer for binary classification |
| Tanh | tanh(x) = (ex β eβx) / (ex + eβx) | (β1, 1) | Hidden layers; zero-centered (better than sigmoid) |
| ReLU | f(x) = max(0, x) | [0, β) | Hidden layers in deep networks; most popular |
| Softmax | Ο(z)α΅’ = ezα΅’ / Ξ£ ezβ±Ό | (0, 1) | Multi-class classification output |
Training: Backpropagation & Gradient Descent
Training Process:
- Forward Pass: Input flows through the network layer by layer; compute output and loss.
- Backward Pass: Compute gradients of loss w.r.t. each weight using backpropagation.
- Weight Update: Adjust weights using Gradient Descent:
w = w β Ξ· Γ βL/βw - Repeat for many epochs (full passes over the training data) until the loss converges.
Where Ξ· (eta) is the learning rate β controls the step size. Too large β overshooting; too small β slow convergence.
Training, Validation, and Testing
| Split | Purpose | Typical Size |
|---|---|---|
| Training Set | Model learns weights on this data | 60β70% |
| Validation Set | Tune hyperparameters; monitor overfitting during training | 10β20% |
| Test Set | Final unbiased evaluation; never seen during training or tuning | 10β20% |
Deep Learning Overview: CNNs and RNNs
πΌοΈ Convolutional Neural Networks (CNNs)
- Designed for image data
- Convolutional layers learn local features (edges, shapes)
- Pooling layers reduce spatial dimensions
- Fully connected layers for final classification
- Key insight: weight sharing β same filter applied across all positions
- Applications: image classification, object detection, face recognition
π Recurrent Neural Networks (RNNs)
- Designed for sequential data (text, time series, speech)
- Have hidden state that carries information across time steps
- Process input one element at a time while remembering context
- Problem: vanishing gradient with long sequences
- LSTM (Long Short-Term Memory) solves this with gates
- Applications: text generation, machine translation, sentiment analysis
- Artificial Neuron = weighted sum of inputs + bias β activation function
- Perceptron = single neuron; linearly separable data only
- MLP = input + hidden layers + output; non-linear decision boundaries
- ReLU = most popular hidden layer activation; avoids vanishing gradient
- Sigmoid β binary output; Softmax β multi-class output
- Backprop = chain rule to compute gradients; updates weights layer by layer
- Learning rate (Ξ·) = controls gradient descent step size
- CNN = convolution + pooling; designed for images; weight sharing
- RNN = sequential data; hidden state; LSTM for long sequences
Advanced AI Concepts
Planning in Artificial Intelligence
Key concepts in AI planning:
- STRIPS Representation: States described by predicates; actions have preconditions and effects (add/delete lists).
- Forward Chaining: Start from the initial state and apply actions to reach the goal.
- Backward Chaining: Start from the goal and work back to find required preconditions.
- Partial-Order Planning: Build a plan where not all steps are totally ordered, allowing parallelism.
Fuzzy Logic
Fuzzy Sets and Membership Functions
A fuzzy set is a set where each element has a membership degree ΞΌ β [0, 1] indicating how much it belongs to the set. The membership function defines this mapping.
Person with height 5'9" β membership 0.6 (somewhat tall)
Person with height 6'2" β membership 0.95 (very tall)
Fuzzy Rules and Inference
Fuzzy systems use IF-THEN rules that operate on fuzzy sets:
IF temperature is HIGH AND humidity is HIGH THEN fan_speed is VERY_FAST
Defuzzification
Defuzzification converts the fuzzy output back to a crisp (numerical) value for action. Common methods:
- Centroid Method: Center of gravity of the output fuzzy set β most common.
- Maximum Method: Take the point with the highest membership value.
- Weighted Average: Average of maximum points weighted by their membership.
Expert Systems
Expert System Architecture
| Component | Role |
|---|---|
| Knowledge Base | Stores domain-specific facts and rules (IF-THEN rules; typically 100sβ1000s of rules) |
| Inference Engine | Applies logical rules to the knowledge base to derive conclusions; uses forward or backward chaining |
| User Interface | Allows users to input queries and receive explanations/answers |
| Explanation Facility | Explains the reasoning ("Why did you ask that?" / "How did you reach this conclusion?") |
| Knowledge Acquisition Module | Helps add new knowledge from domain experts (knowledge engineers) |
Basics of Robotics
Robotics is the interdisciplinary field of designing, building, and operating robots β physical agents that interact with the physical world.
- Sensors (Perception): Gather information from the environment: cameras, lidar, sonar, GPS, IMU (accelerometers/gyroscopes), touch sensors.
- Actuators (Action): Create physical effects: electric motors, pneumatic/hydraulic actuators, servo motors, grippers.
- Controller: The "brain" β processes sensor data and sends commands to actuators (the AI/ML component).
- Degrees of Freedom (DoF): Number of independent ways a robot can move. A 6-DoF robotic arm can position and orient an end-effector freely in 3D space.
Fundamentals of Natural Language Processing
Text Preprocessing Pipeline
- Tokenization: Split text into tokens (words, sentences, subwords). E.g., "Hello world!" β ["Hello", "world", "!"]
- Lowercasing: Convert all text to lowercase for case-insensitive matching.
- Stop Word Removal: Remove common words with little meaning (the, is, in, etc.).
- Stemming: Reduces words to their root form by chopping suffixes. (running β run; studies β studi β may not be a real word).
- Lemmatization: Reduces to dictionary root form using vocabulary. (better β good; running β run β always a valid word).
- N-grams: Sequences of N consecutive tokens. Bigrams (2), Trigrams (3). Used to capture context and phrases.
Stemming
- Rule-based, crude suffix stripping
- Fast but may produce non-words
- "Studies" β "Studi" (not valid)
- Algorithms: Porter, Snowball
Lemmatization
- Uses vocabulary and morphological analysis
- Slower but produces real dictionary words
- "Studies" β "Study" (valid)
- Tools: WordNet Lemmatizer
Basic Chatbot Concepts
A simple rule-based chatbot uses pattern matching (regular expressions or keyword detection) to select pre-written responses. Modern chatbots use:
- Intent Classification: Identify what the user wants (e.g., book flight, check weather).
- Entity Extraction (NER): Extract key info from text (e.g., dates, names, locations).
- Dialogue Management: Track conversation state and determine next action.
- Natural Language Generation (NLG): Generate human-like responses.
- AI Planning = automated action sequencing; STRIPS uses preconditions + effects
- Fuzzy Logic = degrees of truth between 0 and 1; models imprecision
- Fuzzy sets = membership function ΞΌ β [0, 1]
- Defuzzification = converts fuzzy output to crisp value (centroid method)
- Expert System = Knowledge Base + Inference Engine + User Interface
- Inference types = Forward chaining (facts β conclusion) and Backward (goal β facts)
- Sensors (camera, lidar) β Robot Controller β Actuators (motors)
- NLP pipeline = Tokenize β Lowercase β Remove stopwords β Stem/Lemmatize
- Stemming = fast, crude; Lemmatization = slower, linguistically correct
- N-grams = sequences of N tokens; bigram, trigram for context
List of Experiments (P1βP9)
P1: Python Environment Setup
Key Steps
- Download and install Anaconda from anaconda.com (includes Python, Jupyter, and 250+ packages).
- Launch Jupyter Notebook via Anaconda Navigator or command:
jupyter notebook. - Alternatively install VS Code with the Python extension and select interpreter.
- Verify installation: open terminal and run
python --versionandpip --version. - Create a new Jupyter notebook (.ipynb) and run a test cell:
print("Hello, AI World!").
P2: Basic Python Programs
# Data types
name = "Alice" # str
age = 20 # int
gpa = 9.2 # float
is_student = True # bool
# Input / Output
score = int(input("Enter score: "))
print(f"Score entered: {score}")
# Control statements
if score >= 90:
print("Grade: A")
elif score >= 75:
print("Grade: B")
else:
print("Grade: C")
# Loop
for i in range(1, 6):
print(f"{i} Γ {i} = {i*i}")
P3: Data Structures & File Handling
| Structure | Syntax | Mutable? | Use Case |
|---|---|---|---|
| List | [1, 2, 3] | β Yes | Ordered collection; most common |
| Tuple | (1, 2, 3) | β No | Fixed data; faster than list |
| Dictionary | {'a': 1, 'b': 2} | β Yes | Key-value pairs; fast lookup |
| Set | {1, 2, 3} | β Yes | Unique elements; set operations |
# Write to file
with open("data.txt", "w") as f:
f.write("AI is fascinating!\n")
# Read from file
with open("data.txt", "r") as f:
content = f.read()
print(content)
P4: NumPy, Pandas & Matplotlib
| Library | Purpose | Key Operations |
|---|---|---|
| NumPy | Numerical computing; fast array ops | np.array(), np.mean(), np.dot(), broadcasting |
| Pandas | Data manipulation; DataFrames | pd.read_csv(), df.describe(), df.dropna(), groupby() |
| Matplotlib | 2D plotting and visualization | plt.plot(), plt.bar(), plt.scatter(), plt.show() |
P5: Data Visualization Techniques
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(1, 11)
y = x ** 2
# Line Chart
plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.plot(x, y, 'b-o'); plt.title("Line Chart")
# Bar Chart
plt.subplot(1, 3, 2)
plt.bar(['A','B','C','D'], [23, 45, 12, 67]); plt.title("Bar Chart")
# Scatter Plot
plt.subplot(1, 3, 3)
plt.scatter(np.random.randn(50), np.random.randn(50)); plt.title("Scatter Plot")
plt.tight_layout(); plt.show()
P6: Linear Regression / KNN
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate data
X = np.random.rand(100, 1) * 10
y = 2.5 * X.squeeze() + np.random.randn(100) * 2
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict & evaluate
y_pred = model.predict(X_test)
print("MSE:", mean_squared_error(y_test, y_pred))
print("Coefficient:", model.coef_, "Intercept:", model.intercept_)
P7: Classification β Logistic Regression / Decision Tree
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=iris.target_names))
P8: K-Means Clustering
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate blob data
X, true_labels = make_blobs(n_samples=300, centers=4, random_state=42)
# Fit K-Means
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
centers = kmeans.cluster_centers_
# Plot clusters
plt.scatter(X[:,0], X[:,1], c=labels, cmap='viridis', alpha=0.6)
plt.scatter(centers[:,0], centers[:,1], c='red', marker='X', s=200)
plt.title("K-Means Clustering (K=4)")
plt.show()
P9: Model Evaluation β Accuracy & Confusion Matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=iris.target_names)
disp.plot(cmap='Blues')
plt.title("Random Forest β Confusion Matrix")
plt.show()
Diagonal elements = correctly classified samples (true positives for each class).
Off-diagonal = misclassifications. A good model has a bright diagonal and dark off-diagonals.