Skip to main content

Descriptive Statistics

One of the first tasks involved in any data science project is to get to understand the data. This can be extremely beneficial for several reasons: Catch mistakes in data See patterns in data Find violations of statistical assumptions Generate hypotheses etc. We can think of this task as an exercise in summarization of the data. To summarize the main characteristics of the data, often two methods are used: numerical and graphical.

Reading Time: 15 minutes       Read more…

Exploring Multiple Variables

In this section, we will continue re-using the data from the previous post based on Pseudo Facebook data from udacity.

The data from the project corresponds to a typical data set at Facebook. You can load the data through the following command. Notice that this is a TAB delimited tsv file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 10 minutes       Read more…

Pseudo Facebook Data - Exploring Two Variables

In this section, we will be re-using the data from the previous post based on Pseudo Facebook data from udacity.

The data from the project corresponds to a typical data set at Facebook. You can load the data through the following command. Notice that this is a TAB delimited csv file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 12 minutes       Read more…

Pseudo Facebook Data - Plots in Python

In this post, we will learn about EDA of single variables using simple plots like histograms, frequency plots and box plots.

Data sets used below are part of a project from the UD651 course on udacity by Facebook. The data from the project corresponds to a typical data set at Facebook. You can load the data through the following command. Notice that this is a <TAB> delimited csv file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 8 minutes       Read more…