A principal component analysis step-by-step

Apr 17, 2014

Sebastian Raschka offers a step-by-step tutorial for a principal component analysis in Python.

The main purposes of a principal component analysis are the analysis of data to identify patterns and finding patterns to reduce the dimensions of the dataset with minimal loss of information.

Here, our desired outcome of the principal component analysis is to project a feature space (our dataset consisting of n x d-dimensional samples) onto a smaller subspace that represents our data “well”. A possible application would be a pattern classification task, where we want to reduce the computational costs and the error of parameter estimation by reducing the number of dimensions of our feature space by extracting a subspace that describes our data “best”.

That is, imagine you have a dataset with a lot of variables, some of them important and some of them not so much. A PCA helps you identify which is which, so the source doesn’t seem so unwieldy or to reduce overhead.

Favorites

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

Interactive: When Do Americans Leave For Work?

We don’t all start our work days at the same time, despite what morning rush hour might have you think.

Divorce and Occupation

Some jobs tend towards higher divorce rates. Some towards lower. Salary also probably plays a role.

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.