A principal component analysis step-by-step

Posted to Statistics  |  Tags:  |  Nathan Yau

Sebastian Raschka offers a step-by-step tutorial for a principal component analysis in Python.

The main purposes of a principal component analysis are the analysis of data to identify patterns and finding patterns to reduce the dimensions of the dataset with minimal loss of information.

Here, our desired outcome of the principal component analysis is to project a feature space (our dataset consisting of n x d-dimensional samples) onto a smaller subspace that represents our data “well”. A possible application would be a pattern classification task, where we want to reduce the computational costs and the error of parameter estimation by reducing the number of dimensions of our feature space by extracting a subspace that describes our data “best”.

That is, imagine you have a dataset with a lot of variables, some of them important and some of them not so much. A PCA helps you identify which is which, so the source doesn’t seem so unwieldy or to reduce overhead.

Favorites

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.

The Changing American Diet

See what we ate on an average day, for the past several decades.

Graphical perception – learn the fundamentals first

Before you dive into the advanced stuff – like just about everything in your life – you have to learn the fundamentals before you know when you can break the rules.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.