Trainee journalist praises Ability Today programme after landing job

A student who received NCTJ training designed for aspiring journalists with disabilities has secured a place on a traineeship with ITV News. Jamie Green, who has cerebral palsy and uses a wheelchair…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Understanding Logistic Regression

Today we will be learning about a probabilistic classification algorithm known as logistic regression and its implementation.

Logistic Regression: Concept & Application | SunJackson Blog
Source: dimensionless.in

Logistic regression comes under the supervised machine learning algorithms. Its name is a bit misleading as it is used for binary classification however, the regression part stems from its similarity to linear regression. As we have learnt, linear regression involves around finding out a regression line which “fits” our dataset. For this we find out the parameters for our line which gives the minimum value of loss function. Logistic regression is pretty similar to that. However, it includes just one more thing, namely sigmoid or logit function which takes a value and results in a number lying between 0 and 1.

This is because the regression line can be affected a lot due to outliers. Let’s take an example. In the image given below, without outliers the regression line is pretty acceptable. If we take a threshold value like y=0.5 and if we get a y value ≥0.5 then it is classified in the “Malignant” group else, it would be classified as “Benign”.

But with introduction of an outlier, we see that in order to decrease the loss function and “fit” with dataset better, the regression line must adjust itself. This causes some points to be misclassified as shown below.

We can see that the points circled in blue are now, classified incorrectly. Another flaw here is, that linear regression might result in y value >1 or <0. In both cases we won’t be able to determine exactly in which class should our data point lie in.

Apart from the procedure followed in linear regression, we have a small addition in case of logistic regression known as sigmoid function which helps in normalizing the dot product of θ (the line parameters) and x (data points) . This is important because we have seen above that outliers might drastically change the line and it may lead to mis-classifications.

Sigmoid function (Source: oreilly.com)
Graphical representation of the sigmoid function (Source : en.wikipedia.org)

where, z = θᵀx .

But the above function σ(z) lies between 0 and 1 thus, nullifying any effect that the outliers might have.

So, the hypothesis of logistic regression (hθ(x) )is σ(z) which gives the probability of our data point belonging in a certain class.

After calculating the probability from σ(z), we use a threshold value (= 0.5, generally) to ultimately predict where our data point lies. So for σ(z)≥ 0.5, the data point belongs to the class y =1. Else, it belongs to y=0.

This also implies that, σ(z)≥ 0.5 => z (= θᵀx)≥ 0 and similarly, σ(z)< 0.5 if z < 0.

Cost = -y*log(hθ(x))-(1-y)*log(1-hθ(x))

The above function gives a convex shape which makes it easier for us to use gradient descent to find the convergence (point where cost function is minimum). Notice if y=1, then the second term vanishes while for y=0 the first term vanishes.

For every right classification cost = 0 while for wrong ones it penalizes with cost tends to ∞. This happens as for y = 1 and hθ(x) is very close to 0 then, log(0)->∞. Similarly, for y = 0 and hθ(x) very close to 1 then, log(1-hθ(x))≈log(0)->∞.

∂ (cost)/∂θ = [hθ(x)-y]*x

Source: stats.stackexchange.com
normalization of dataset

alpha and epoch are the learning rate and number of iterations respectively.

r and c store the number of rows and columns for the dataset X respectively.

theta (or parameter vector) is initialized with ones and it has shape (c,1).

At each iteration, we calculate the hypothesis using the functions below:

To predict for the values in our test set:

First find the hypothesis using the theta found using training dataset and the test dataset. This will give you the probability for each row in our testing data. Then for each row, check whether it is greater than or equal to our threshold value (i.e., 0.5). If it is then the predicted value is 1 else, it is 0.

For every match between our predicted and actual y label, increment the correct variable and then find the percentage by dividing the value stored in the correct variable with the number of rows in our test set.

So, that wraps up the blog. Hope you have learnt something out of this.

Thank you for taking some time off your day to read this blog. Have a wonderful day 😄!!!

Add a comment

Related posts:

Four scenes Unity project template

When you develop a video game, sometimes you create the prototype, then you think about making it a real and complete game. Where do I have to optimize it, how to switch from one level to another…

An Introduction to React Server Components

React Server components let you only load the code that is necessary, And the major advantages are Zero bundle size and Backend access

This Week 5.25.18

This Week is a column from TGYN where various members of the TGYN community sounds off about what’s on their mind. It comes out weekly. Thus the name. Lyrics have long been important to me in music…