# Descriptive Statistics

Reading Time: 15 minutes Read more…

Skip to main content
# Descriptive Statistics

# EDA of Lending Club Data - II

# EDA of Lending Club Data

# Exploring Multiple Variables

# Pseudo Facebook Data - Exploring Two Variables

# Pseudo Facebook Data - Plots in Python

# Reddit Survey: Introduction to Pandas

One of the first tasks involved in any data science project is to get to understand the data. This can be extremely beneficial for several reasons:
Catch mistakes in data See patterns in data Find violations of statistical assumptions Generate hypotheses etc. We can think of this task as an exercise in summarization of the data. To summarize the main characteristics of the data, often two methods are used: numerical and graphical.

Reading Time: 15 minutes Read more…

In the last post we looked at some initial cleanup of the data. We will start from there by loading the pickled dataframe.

Reading Time: 15 minutes Read more…

We will first look at various aspects of the LendingClub data using techniques of Exploratory Data Analysis (EDA).
Please look at my past post for finding further details on EDA techniques.
Different data files for this analysis have already been downloaded in the current folder.

Reading Time: 10 minutes Read more…

In this section, we will continue re-using the data from the previous post based on Pseudo
Facebook data from udacity.

The data from the project corresponds to a typical data set
at Facebook. You can load the data through the following command. Notice that this is a TAB delimited *tsv* file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 10 minutes Read more…

In this section, we will be re-using the data from the previous post based on Pseudo
Facebook data from udacity.

The data from the project corresponds to a typical data set
at Facebook.
You can load the data through the following command. Notice that this is a TAB delimited *csv* file.
This data set consists of 99000 rows of data. We will see the details of different columns using the
command below.

Reading Time: 12 minutes Read more…

In this post, we will learn about EDA of single variables using simple plots like histograms, frequency plots and box plots.

Data sets used below are part of a project from the UD651 course on udacity by Facebook.
The data from the project corresponds to a typical data set at Facebook. You can load the data through the following command. Notice that this is a `<TAB>`

delimited *csv* file. This data set consists of 99000 rows of data. We will see the details of different columns using the command below.

Reading Time: 8 minutes Read more…

The data set used here is part of a project from UD651 course on udacity by Facebook.

The data from the project corresponds to a survey from reddit.com. You can load the data through the following command. We will first look at the different attributes of this data using the `summary()`

and `describe()`

pandas methods.

Reading Time: 5 minutes Read more…