Logistic Regression Program Rabla
Logistic Regression from Scratch in Python5 minute readIn this post, I’m going to implement standard logistic regression from scratch. Logistic regression is a generalized linear model that we can use to model or predict categorical outcome variables. For example, we might use logistic regression to predict whether someone will be denied or approved for a loan, but probably not to predict the value of someone’s house.So, how does it work?
In logistic regression, we’re essentially trying to find the weights that maximize the likelihood of producing our given data and use them to categorize the response variable. Maximum Likelihood Estimation is a well covered topic in statistics courses (my Intro to Statistics professor has a straightforward, high-level description ), and it is extremely useful.Since the likelihood maximization in logistic regression doesn’t have a closed form solution, I’ll solve the optimization problem with gradient ascent. Gradient ascent is the same as gradient descent, except I’m maximizing instead of minimizing a function. Before I do any of that, though, I need some data. Generating DataLike I did in my post on, I’m going to use simulated data.
I can easily simulate separable data by sampling from a multivariate normal distribution. Import numpy as np import matplotlib.pyplot as plt% matplotlib inline np. Seed ( 12 ) numobservations = 5000 x1 = np. Multivariatenormal ( 0, 0 1,. 75, 1 , numobservations ) x2 = np. Multivariatenormal ( 1, 4 1,. 75, 1 , numobservations ) simulatedseparableishfeatures = np.
Vstack (( x1, x2 )). Float32 ) simulatedlabels = np. Hstack (( np. Zeros ( numobservations ), np. Ones ( numobservations )))Let’s see how it looks. Def sigmoid ( scores ): return 1 / ( 1 + np. Exp ( - scores )) Maximizing the LikelihoodTo maximize the likelihood, I need equations for the likelihood and the gradient of the likelihood.
Fortunately, the likelihood (for binary classification) can be reduced to a fairly intuitive form by switching to the log-likelihood. We’re able to do this without affecting the weights parameter estimation because log transformations are.For anyone interested in the derivations of the functions I’m using, check out Section 4.4.1 of Hastie, Tibsharani, and Friedman’s. For those less comfortable reading math, Carlos Guestrin (Univesity of Washington) has a fantastic walkthrough of one possible formulation of the likelihood and gradient in a series of short lectures on. Calculating the Log-LikelihoodThe log-likelihood can be viewed as a sum over all the training data. Mathematically,where is the target class (0 or 1), is an individual data point, and is the weights vector.I can easily turn that into a function and take advantage of matrix algebra. Def loglikelihood ( features, target, weights ): scores = np. Dot ( features, weights ) ll = np.
Logistic Regression Program Rablade
Sum ( target. scores - np.
Log ( 1 + np. Exp ( scores )) ) return ll Calculating the GradientNow I need an equation for the gradient of the log-likelihood. By taking the derivative of the equation above and reformulating in matrix form, the gradient becomes:Like the other equation, this is really easy to implement. It’s so simple I don’t even need to wrap it into a function. Building the Logistic Regression FunctionFinally, I’m ready to build the model function. I’ll add in the option to calculate the model with an intercept, since it’s a good option to have.
Def logisticregression ( features, target, numsteps, learningrate, addintercept = False ): if addintercept: intercept = np. Ones (( features. Shape 0 , 1 )) features = np. Hstack (( intercept, features )) weights = np. Zeros ( features. Shape 1 ) for step in xrange ( numsteps ): scores = np.
Dot ( features, weights ) predictions = sigmoid ( scores ) # Update weights with gradient outputerrorsignal = target - predictions gradient = np. Dot ( features. T, outputerrorsignal ) weights += learningrate.
gradient # Print log-likelihood every so often if step% 10000 0: print loglikelihood ( features, target, weights ) return weightsTime to run the model. 435.-162-157-155Comparing with Sk-Learn’s LogisticRegressionHow do I know if my algorithm spit out the right weights? Well, on the one hand, the math looks right – so I should be confident it’s correct. On the other hand, it would be nice to have a ground truth.Fortunately, I can compare my function’s weights to the weights from sk-learn’s logistic regression function, which is known to be a correct implementation.
They should be the same if I did everything correctly. Since sk-learn’s LogisticRegression automatically does L2 regularization (which I didn’t do), I set C=1e15 to essentially turn off regularization.7 -5.02712572 8.23286799-1 -5.05899648 8.28955762As expected, my weights nearly perfectly match the sk-learn LogisticRegression weights. If I trained the algorithm longer and with a small enough learning rate, they would eventually match exactly. Because gradient ascent on a concave function will always reach the global optimum, given enough time and sufficiently small learning rate. What’s the Accuracy?To get the accuracy, I just need to use the final weights to get the logits for the dataset ( finalscores). Then I can use sigmoid to get the final predictions and round them to the nearest integer (0 or 1) to get the predicted class.
Figure ( figsize = ( 12, 8 )) plt. Scatter ( simulatedseparableishfeatures :, 0 , simulatedseparableishfeatures :, 1 , c = preds simulatedlabels - 1, alpha =.
8, s = 50 )ConclusionIn this post, I built a logistic regression function from scratch and compared it with sk-learn’s logistic regression function. While both functions give essentially the same result, my own function is significantly slower because sklearn uses a highly optimized solver. While I’d probably never use my own algorithm in production, building algorithms from scratch makes it easier to think about how you could design extensions to fit more complex problems or problems in new domains.For those interested, the Jupyter Notebook with all the code can be found in the for this post.Tags:Updated: November 05, 2016 Share on.