Free Will

Am I there?

Sometimes in your life you feel the need to ponder the hard questions. Today's, and really the last few months' question has been "Do we have free will?" And the answer is currently "I don't know, but I don't think so". This is my attempt to put …

comments

What is Gradient Descent?

Fitting a machine learning model means finding optimal values for the parameters of the model. Sometimes this can be done in one step (when a closed form solution is available), but more often than not even if this can be done it is computationally prohibitively expensive. That is why most …

comments

What's so naive about naive Bayes?

Cover art: Naive art by Ivan Generalic

Naive Bayes (NB) is 'naive' because it makes the assumption that features of a measurement are independent of each other. This is naive because it is (almost) never true. Here is why NB works anyway.

NB is a very intuitive classification algorithm. It …

comments

Entropy and Information Gain

Neither 'Entropy' nor 'Information' are concepts with very intuitive definitions. Most people learn about entropy in chemistry class where it is used to describe the amount of 'order' in a system. But how do you translate 'order' into a mathematical equation? And what about information?

In data science the terms …

comments

What is Logistic Regression?

Logistic Regression is closely related to Linear Regression. Read my post on Linear Regression here

Logistic Regression is a classification technique, meaning that the target Y is qualitative instead of quantitative. For example, trying to predict whether a customer will stop doing business with you, a.k.a. churn.
Logistic …

comments

What is Linear Regression?

Linear regression is used to model the relationship between continuous variables. For example to predict the price of a house when you have features like size in square meters and crime in the neighborhood etc.  A linear regression function takes the form of

$$\hat{y}=\hat{\beta_0}+\hat{\beta_1}x_1 …

comments

What is k-Nearest Neighbors?

k-Nearest Neighbors or kNN is a classification algorithm that asigns a class to a new data point based on known data points that are similar, or nearby.

What do you mean 'nearby'?
To determine similarity of data you can use a few different distance algorithms. For example Euclidian distance, which …

comments

Bayes and Binomial Theorems

Bayes Theorem
In statistics there are many situations where you want to determine the probability that a sample for which you have certain measurement belongs to a certain set. Say you want to know the chance that you have HIV if you test positive. No test is perfect, so this …

comments