A Practical guide to Autoencoders

Usually in a conventional neural network, one tries to predict a target vector $y$ from input vectors $x$. In an auto-encoder network, one tries to predict $x$ from $x$. It is trivial to learn a mapping from $x$ to $x$ if the network has no constraints, but if the network is constrained the learning process becomes more interesting. In this article, we are going to take a detailed look at the mathematics of different types of autoencoders (with different constraints) along with a sample implementation of it using Keras, with a tensorflow back-end.

Reading Time: 22 minutes       Read more…

Understanding Boosted Trees Models

In the previous post, we learned about tree based learning methods - basics of tree based models and the use of bagging to reduce variance. We also looked at one of the most famous learning algorithms based on the idea of bagging- random forests. In this post, we will look into the details of yet another type of tree-based learning algorithms: boosted trees.

Reading Time: 26 minutes       Read more…

A Practical Guide to Tree Based Learning Algorithms

Tree based learning algorithms are quite common in data science competitions. These algorithms empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. Common examples of tree based models are: decision trees, random forest, and boosted trees.

Reading Time: 25 minutes       Read more…

Understanding Support Vector Machine via Examples

In the previous post on Support Vector Machines (SVM), we looked at the mathematical details of the algorithm. In this post, I will be discussing the practical implementations of SVM for classification as well as regression. I will be using the iris dataset as an example for the classification problem, and a randomly generated data as an example for the regression problem.

Reading Time: 12 minutes       Read more…

Switching to Hugo from Nikola

I have been using Nikola to build this Blog. Its a great static site build system that is based on Python. However, It has some crazy amount of dependencies (to have reasonable looking site). It uses restructured text (rst) as the primary language for content creation. Personally, I use markdown for almost every thing else - taking notes, making diary, code documentation etc. Furthermore, given Nikola tries to support almost everything in a static site builder, lately its is becoming more and more bloated.

Reading Time: 8 minutes       Read more…

Descriptive Statistics

One of the first tasks involved in any data science project is to get to understand the data. This can be extremely beneficial for several reasons: Catch mistakes in data See patterns in data Find violations of statistical assumptions Generate hypotheses etc. We can think of this task as an exercise in summarization of the data. To summarize the main characteristics of the data, often two methods are used: numerical and graphical.

Reading Time: 13 minutes       Read more…

My Arch Linux Setup with Plasma 5

Arch Linux is a general purpose GNU/Linux distribution that provides most up-to-date software by following the rolling-release model. Arch Linux allows you to use updated cutting-edge software and packages as soon as the developers released them. KDE Plasma 5 is the current generation of the desktop environment created by KDE primarily for Linux systems. In this post, we will do a complete installation of Arch Linux with Plasma 5 as the desktop environment.

Reading Time: 21 minutes       Read more…