Sunday, February 12, 2017

Predicting the Future!

Konnichiwa Mina San!!


We are approaching 15,000 views for the Kaizen/Kaikaku Life!! So, I have a treat, lets learn how to predict the future.  

Mirror, mirror, on the wall tell me what the future is for us all? The mirror replies, due your regression analysis and you will find out you lazy bastard!

The best way, other than using some advanced software programs used in big data applications, is to predict the future through regression analysis with as many data points as possible.

There are many types of regression analysis and they all have their purpose. I'll do a quick overview of the different types, then I will go over an example.


  1. Linear Regression: The most common type of Linear Regression Model is the Ordinary Least Squares method. In order to use this model the data should be linear; however, if there is a some quadratic data, the model can measure the curvature by transforming the variables   (instead of the coefficients). The Linear Regression is also the oldest type of regression and is more useful with smaller data sets. This regression is better suited for interpolation and not the best for predictive analytics because of its sensitivity to cross-correlations and outliers. Another great Linear Regression that is better suited for a more robust predictive analytics solutions would be the piecewise-linear regression.  The beauty is that both types of models can be drafted in Excel.               
  2. Logistic regression: Used almost primarily in clinical trials, scoring and fraud detection, when the response is binary (chance of succeeding or failing, e.g. for a new tested drug or a credit card transaction). Suffers same drawbacks as linear regression (not robust, model-dependent), and computing regression coeffients involves using complex iterative, numerically unstable algorithm. Some versions (Poisson or Cox regression) have been designed for a non-binary response, for categorical data (classification), ordered integer response (age groups), and even continuous response (regression trees).
  3. Ridge regression: A more robust version of linear regression, putting constraints on regression coefficients to make them much more natural, less subject to over-fitting, and easier to interpret.
  4. Lasso regression: Similar to ridge regression, but automatically performs variable reduction which allows the regression coefficients to be 0. 
  5. Power Law: Experience Curves are extremely helpful and I have found to be very accurate in predicting future prices and costs. Ln = L1N^-a where a=log10(P)/log10(N) where P is the ratio of the Nth time to the first one. I used this law a couple times already this year when asked to predict the amount of future defects in the system over time. If you would like to know more about this, hit me up.   

Lets see a Linear Regression example:

What is the equation we will be using?
     Y= aX+b

Where: a = slope of the line ( a = rise (change is y's)/run (change in x's))
             b  = y intercept

The equation for slope with many data points can be calculated simply as:

     

Lets take this data that I made up- (Remember that X always comes first in the parenthesis)

I made up a very linear data set that anyone can figure out what the solution for x would be to test this linear model:

(1, 2), (2,4), (3,6), (4,8), (5,10), (6,12), (7,14)

Step 1: Count the number of X values: 7

Step 2: Find the XY and X2 for each given value:

Step 3: Find the sum of each category


Step 4: Lets substitute numbers to fit into the equation:


 

Step 5: Substitute for the intercept formula:



Step 6: Now substitute in the regression formula:

y = aX + b

= 2x+0
  

So here is where it gets good. There is an x here right?

what do we put here?

This is where we can start to tell the future! X is the mirror!! Sounded Matrix Like here.

The regression line on a graph is a mathematical model used to predict the value of "y" for a given "x".  Pretty cool eh. So lets try this.

We will define X as 8 what will y end up being?

= 2(8)+0
= 16

The reason I used such simple numbers so we could figure out in our head the solution in order to prove this equation. Now you can trust this model with any linear data set.







See how simple this can be.
This can be done for quadratic equations as well as linear.

A quadratic regression is the process of finding the equation of the parabola that fits best for a set of data. As a result, we get an equation of the form
where:









I got this quadratic equation example from Hotmath.com because I thought it was a simple straight forward example.