Machine Learning Lab



1. Install and set up Python and essential libraries like NumPy and pandas.

Install NumPy and pandas:
Open a command prompt (Windows) or terminal (macOS/Linux).
To install NumPy, type:

pip install numpy
To install pandas, type:

pip install pandas

Press Enter and wait for the installation to complete.

Verify installation:

After installation, you can verify if NumPy and pandas are installed correctly by opening a Python interpreter.

Type python in your command prompt or terminal to open the Python interpreter.

Once inside the Python interpreter, try importing NumPy and pandas:
python

import numpy
import pandas
If there are no errors, the libraries are successfully installed.

2.Introduce scikit-learn as a machine learning library

Here's an introduction to some key aspects of scikit-learn:

1.Versatility: scikit-learn offers a wide range of machine
learning algorithms and tools that are suitable for various
tasks such as classification,regression, clustering,
dimensionality reduction, and more Whether you're working
on supervised learning or unsupervised learning problems,
scikit-learn has you covered.


2.Ease of Use: One of the main advantages of scikit-learn is
its user-friendly and consistent interface. It provides a simple
and intuitive API that makes it easy to use and experiment with
different machine learning techniques. This accessibility makes
it a great choice for beginners and experts alike.

3.Integration with NumPy and pandas: scikit-learn seamlessly
integrates with other popular libraries like NumPy and pandas,
allowing for efficient data manipulation and preprocessing before
feeding the data into

4.Model Evaluation: scikit-learn provides tools for model evaluation
and validation, including functions for cross-validation, hyperparameter
tuning,and performance metrics such as accuracy, precision, recall, and
F1-score. These tools help in assessing the performance of machine learning
models and selecting the best one for your specific task.
5.Community and Documentation: scikit-learn has a vibrant community of
users and contributors, which means you can find plenty of resources,
tutorials, and examples online. The library also offers comprehensive
documentation with detailed explanations of each function and example
code snippets to get you started.

3. Install and set up scikit-learn and other necessary tools.

Install NumPy and pandas:
If you haven't installed NumPy and pandas yet, you can do so
by following the steps mentioned earlier:

>pip install numpy
>pip install pandas
Install scikit-learn:
Once you have NumPy and pandas installed, you can install scikit-learn using pip:
>pip install scikit-learn

Verify installation:
After installing scikit-learn and any optional tools, you can verify the installation
by opening a Python interpreter or a Jupyter Notebook and trying to import scikit-learn:
>python
>import sklearn
Update packages (optional but recommended):
It's a good practice to regularly update your Python packages to ensure you have the latest
features and bug fixes. You can update all installed packages using pip:

>pip install --upgrade numpy pandas scikit-learn matplotlib jupyter




4. Write a program to Load and explore the dataset of .CVS and excel files using  pandas


import pandas as pd

def explore_dataset(file_path):
# Check file extension to determine the file type
if file_path.endswith('stuff.csv'):
# Load CSV file into a pandas DataFrame
df = pd.read_csv(file_path)
elif file_path.endswith('Cola.xlsx'):
# Load Excel file into a pandas DataFrame
df = pd.read_excel(file_path)
else:
print("Unsupported file format. Please provide a CSV or Excel file.")
return

# Display basic information about the dataset
print("Shape of the dataset:", df.shape)
print("\nColumn names:")
print(df.columns)
print("\nFirst 5 rows of the dataset:")
print(df.head())
print("\nSummary statistics:")
print(df.describe())
print("\nInformation about the dataset:")
print(df.info())

# Example usage
file_path = 'Cola.xlsx' # Replace 'example_dataset.csv' with your file path
explore_dataset(file_path)

*Alert!! make sure you replace the file (csv and xlsx) in the top and below also mention the path of the file 

5.Write a program to Visualize the dataset to gain insights using Matplotlib by plotting scatter plots, bar charts.

import matplotlib.pyplot as plt
import pandas as pd

# Read the CSV file
df = pd.read_csv('fifth.csv')

# Scatter plot
plt.figure(figsize=(10, 5))
plt.scatter(df['Age'], df['Income'], c=df['Education'], cmap='coolwarm', s=100, alpha=0.8)
plt.colorbar(label='Education Level')
plt.xlabel('Age')
plt.ylabel('Income')
plt.title('Income vs Age')
plt.grid(True)
plt.show()

# Bar chart
plt.figure(figsize=(10, 5))
region_counts = df['Region'].value_counts()
region_counts.plot(kind='bar', color='lightgreen')
plt.xlabel('Regions')
plt.ylabel('Counts')
plt.title('Distribution of Regions')
plt.show()

*Alert use your own dataset and make sure you change the values according to your csv file

6.Write a program to Handle missing data, encode categorical variables, and perform feature scaling.

import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Sample dataset with missing data and categorical variables
data = {
    'age': [25, 30, None, 40, 45, 50, 55, 60],
    'income': [40000, 45000, 55000, None, 65000, 70000, None, 80000],
    'education': ['High School', 'Bachelor', 'Master', 'PhD', 'Bachelor', 'Master', 'High School', 'PhD'],
    'region': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B']
}
df = pd.DataFrame(data)

# Define preprocessing steps
# Handle missing data: Impute missing values with mean for numerical features
# Encode categorical variables: One-hot encode categorical features
# Feature scaling: Standardize numerical features
preprocessor = ColumnTransformer(
    transformers=[
        ('num', SimpleImputer(strategy='mean'), ['age', 'income']),
        ('cat', OneHotEncoder(), ['education', 'region'])
    ]
)

# Create pipeline
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('scaler', StandardScaler())
])

# Apply preprocessing and scaling
transformed_data = pipeline.fit_transform(df)

# Convert transformed data back to DataFrame for visualization (optional)
transformed_df = pd.DataFrame(transformed_data, columns=['age', 'income', 'HS', 'Bachelor', 'Master', 'PhD', 'A', 'B', 'C'])
print(transformed_df)

7.Write a program to implement a k-Nearest Neighbours (k-NN) classifier using scikitlearn and Train the classifier on the dataset and evaluate its performance

# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features (sepal length, sepal width, petal length, petal width)
y = iris.target  # Target variable (species)

# Split data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the k-NN classifier
k = 3  # Number of neighbors
knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Train the classifier on the training data
knn_classifier.fit(X_train, y_train)

# Predictions on the test data
y_pred = knn_classifier.predict(X_test)

# Evaluate classifier performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of k-NN classifier with k={k}: {accuracy:.2f}")

# Print classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

8.Write a program to implement a linear regression model for regression tasks and Train the model on a dataset with continuous target variables.

# Import necessary libraries
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic dataset for regression (you can replace this with your own dataset)
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)

# Split data into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the linear regression model
linear_reg = LinearRegression()

# Train the model on the training data
linear_reg.fit(X_train, y_train)

# Make predictions on the test data
y_pred = linear_reg.predict(X_test)

# Evaluate model performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R^2 Score: {r2:.2f}")

# Print coefficients and intercept
print("\nCoefficients:", linear_reg.coef_)
print("Intercept:", linear_reg.intercept_)

9.Write a program to implement a decision tree classifier using scikit-learn and visualize the decision tree and understand its splits.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.cluster import KMeans

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Perform K-Means clustering on the training data
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_train)
labels = kmeans.labels_

# Plot the results
plt.figure(figsize=(8, 6))
plt.scatter(X_train[:, 0], X_train[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red', label='Centroid')
plt.title('K-Means Clustering on Iris Training Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()

10.Write a program to Implement K-Means clustering and Visualize clusters.

import numpy as np

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# Generate sample data
X, y = make_blobs(n_samples=500, centers=4, cluster_std=0.8, random_state=42)

# Perform K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_

# Plot the results
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100, c='red', label='Centroid')
plt.title('K-Means Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

*Note all the file are available in the link 

https://mega.nz/folder/9tpknDYT#nHoMRHVoc4T9Z8kON9Mp9Q

Comments

Popular posts from this blog

Mobile Application Development Lab

WEB PROGRAMMING LAB (html&php)

Python Programming