Linear Regression | Machine Learning Beginners

Hello Learners,

In this article, I am going to tell you about Linear Regression Implementation using Gradient Descent Method in Python.

We are applying the regression to the data set having the area of the house and it’s sell price. We are predicting the house sell price based on area of the house.

Gradient Descent function is an iterative optimization algorithm to find the minimum of a function.

First of all we need to import the required libraries as below.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Then we are reading the training data from csv file as follow, and checking the data by printing first five rows. For that we can use head() method of pandas library.

data = pd.read_csv('Train.csv');
data.head()

x	y
0	24	21.549452
1	50	47.464463
2	15	17.218656
3	38	36.586398
4	87	87.288984

Now we are splitting the whole data into training set and test set. In x_train we are adding one dummy column that contains one initially.

x_train=np.ones((699,2))
x_train[:,1],y_train = np.array(data['x']),np.array(data['y'])

Next, we are initializing initial theta value as 0. And based on training set we are getting number of attributes and number of training set. Define number of iteration we want to perform and the value of alpha. Here cost_history and theta_history list contains the cost value and theta value for each iteration for plotting purpose.

theta = (0,0)

m = x_train.shape[0]
no_of_attr = x_train.shape[1]
no_of_iteration = 20
alpha = 0.0001
cost_history=[]
theta_history=[]

The cost function finds the cost in each iteration.

def cost(error, m):
    J = np.dot(error.T,error)/(2*m)
    return J

Training The Model

Now we iterate the loop for number of iteration. Inside the loop we are first initializing the new theta as 0. Then we are performing dot product between training data and theta value and initialize it in y_predicted. Then find the error by subtracting the y_train into the y_predicted.

Iterate nested loop for number of attributes. and finding the new theta by using the gradient descent function.

for i in range(no_of_iteration):
    new_theta = np.zeros(no_of_attr)
    y_predicted = np.dot(x_train,theta)
    error = y_predicted - y_train
    for j in range(no_of_attr):
        new_theta[j] = np.sum(np.dot(error,x_train[:,j]))
    
    theta = theta - (alpha) * (1/m)*new_theta
    
    cost_history.append(cost(error,m))
    theta_history.append(theta)

Now, we are performing the prediction of training data set. And printing the slop and intercept of the training set model. At last, we are plotting the graph for actual and predicted value.

final_prediction = theta[0] + theta[1]*x_train[:,1]
print('Slop ',theta[1])
print('Intercept ',theta[0])

plt.scatter(x_train[:,1],y_train)
plt.plot(x_train[:,1],final_prediction,color ='red')
plt.legend(['Predicted','Actual'])

Number of Iteration VS Cost

The below code is for number of iteration vs cost, which is used to visualize when we can found the global minima in our code. You can see after iteration 7 there is minor changes is the cost. So roughly we can say that at iteration 7 to 10 we can find the global minima.

noitr = np.arange(start=1, stop=no_of_iteration+1, step=1)
plt.plot(noitr,cost_history,color ='green')
plt.title('Cost Function')
plt.xlabel("Number Of Iteration")
plt.ylabel("Cost")

Testing The Model

Now we are apply the trained model to the test set and make prediction. At last we are plotting the graph of test data set with actual and predicted value.

testdata = pd.read_csv('Test.csv');
testdata

x_test,y_test = np.array(testdata['x']),np.array(testdata['y'])

test_prediction = theta[0] + theta[1]*x_test

plt.scatter(x_test,y_test)
plt.plot(x_test,test_prediction,color ='red')
plt.legend(['Predicted','Actual'])

That’s it. This is how we can apply regression to predict house sell price based on area of the house.

Linear Regression | Machine Learning Beginners

Training The Model

Number of Iteration VS Cost

Testing The Model

Leave a Reply Cancel reply