A Practical Guide to Tree Based Learning Algorithms

Tree based learning algorithms are quite common in data science competitions. These algorithms empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. Common examples of tree based models are: decision trees, random forest, and boosted trees.

Reading Time: 25 minutes       Read more…

Understanding Support Vector Machine via Examples

In the previous post on Support Vector Machines (SVM), we looked at the mathematical details of the algorithm. In this post, I will be discussing the practical implementations of SVM for classification as well as regression. I will be using the iris dataset as an example for the classification problem, and a randomly generated data as an example for the regression problem.

Reading Time: 18 minutes       Read more…

Switching to Hugo from Nikola

I have been using Nikola to build this Blog. Its a great static site build system that is based on Python. However, It has some crazy amount of dependencies (to have reasonable looking site). It uses restructured text (rst) as the primary language for content creation. Personally, I use markdown for almost every thing else - taking notes, making diary, code documentation etc. Furthermore, given Nikola tries to support almost everything in a static site builder, lately its is becoming more and more bloated.

Reading Time: 10 minutes       Read more…

Descriptive Statistics

One of the first tasks involved in any data science project is to get to understand the data. This can be extremely beneficial for several reasons: Catch mistakes in data See patterns in data Find violations of statistical assumptions Generate hypotheses etc. We can think of this task as an exercise in summarization of the data. To summarize the main characteristics of the data, often two methods are used: numerical and graphical.

Reading Time: 15 minutes       Read more…

My Arch Linux Setup with Plasma 5

Arch Linux is a general purpose GNU/Linux distribution that provides most up-to-date software by following the rolling-release model. Arch Linux allows you to use updated cutting-edge software and packages as soon as the developers released them. KDE Plasma 5 is the current generation of the desktop environment created by KDE primarily for Linux systems. In this post, we will do a complete installation of Arch Linux with Plasma 5 as the desktop environment.

Reading Time: 21 minutes       Read more…

Python Tutorial - Week 2

In the Week 1 we got started with Python. Now that we can interact with python, lets dig deeper into it.

This week we will go over some additional fundamental things common in any program - interactive input from users, adding comments to your code, use of conditional logic i.e. if - else conditions, loops, formatted output with strings and print() statements.

Reading Time: 16 minutes       Read more…