Skip to main content

A Practical guide to Autoencoders

Usually in a conventional neural network, one tries to predict a target vector $y$ from input vectors $x$. In an auto-encoder network, one tries to predict $x$ from $x$. It is trivial to learn a mapping from $x$ to $x$ if the network has no constraints, but if the network is constrained the learning process becomes more interesting. In this article, we are going to take a detailed look at the mathematics of different types of autoencoders (with different constraints) along with a sample implementation of it using Keras, with a tensorflow back-end.

Reading Time: 22 minutes       Read more…

Descriptive Statistics

One of the first tasks involved in any data science project is to get to understand the data. This can be extremely beneficial for several reasons: Catch mistakes in data See patterns in data Find violations of statistical assumptions Generate hypotheses etc. We can think of this task as an exercise in summarization of the data. To summarize the main characteristics of the data, often two methods are used: numerical and graphical.

Reading Time: 13 minutes       Read more…

Exploring Multiple Variables

In this section, we will continue re-using the data from the previous post based on Pseudo Facebook data from udacity.

The data from the project corresponds to a typical data set at Facebook. You can load the data through the following command. Notice that this is a TAB delimited tsv file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 14 minutes       Read more…

Pseudo Facebook Data - Exploring Two Variables

In this section, we will be re-using the data from the previous post based on Pseudo Facebook data from udacity.

The data from the project corresponds to a typical data set at Facebook. You can load the data through the following command. Notice that this is a TAB delimited csv file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 10 minutes       Read more…

Pseudo Facebook Data - Plots in Python

In this post, we will learn about EDA of single variables using simple plots like histograms, frequency plots and box plots.

Data sets used below are part of a project from the UD651 course on udacity by Facebook. The data from the project corresponds to a typical data set at Facebook. You can load the data through the following command. Notice that this is a <TAB> delimited csv file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 6 minutes       Read more…