top of page

[python]How to visualize data


Introduction


When doing machine learning such as kaggle's competition, the first thing to do is to visualize the data. And I think that seaborn is often used for data visualization. But do you ever wonder which one to use because there are various types of graphs? (I have)


There are many explanations that "which method can be used to draw such a graph", but I feel that there are few explanations that "in what circumstances this graph is good". Therefore, here I have summarized which method of seaborn should be used for each type of explanatory variable and objective variable.


Environment is python: 3.6.6 seaborn: 0.10.0


Explanatory variable: Discrete quantity (category) Objective variable: Discrete quantity


First is when both the explanatory variable and the objective variable are discrete quantities (categories). Use seaborn count plot. Draw how many each category of objective variables exists. Pass the explanatory variable to the argument x of countplot and the objective variable to hue. The data is titanic.


import pandas as pd
import seaborn as sns

data=pd.read_csv("train.csv")
sns.countplot(x='Embarked', data=data, hue='Survived')

You can also reverse x and hue (which is a matter of taste?).


sns.countplot(x='Survived', data=data, hue='Embarked')


Explanatory variable: Continuous quantity Objective variable: Discrete quantity


Next is when the explanatory variable is a continuous quantity and the objective variable is a discrete quantity. Draw the distribution of explanatory variables for each category of objective variables with seaborn's distroplot.


g=sns.FacetGrid(data=data, hue='Survived', size=5)
g.map(sns.distplot, 'Fare')
g.add_legend()

Please refer to the other article for how to color-code with a method that does not have a hue as an argument .


Explanatory variable: Discrete quantity Objective variable: Continuous quantity


Next, when the explanatory variable is a discrete quantity and the objective variable is a continuous quantity. Draw the distribution of the objective variable for each category of explanatory variables with the seaborn violin plot. We use Kaggle's House Prices for the data.


train_data=pd.read_csv("train.csv")
sns.violinplot(x="MSZoning", y="SalePrice", data=train_data)

Explanatory variable: continuous quantity Objective variable: continuous quantity


Finally, when both the explanatory variable and the objective variable are continuous quantities. Draw the correlation between the explanatory variable and the objective variable with seaborn's joint plot.


sns.jointplot(x="LotArea", y="SalePrice", data=train_data)

This joint plot is excellent because you can see the correlation between two variables and their distribution at the same time.



Summary


The above is summarized in the table below.


説明変数:Explanatory variable 目的変数:Objective variable 離散:descrete 連続:continuous

Recent Posts

See All

[Python] Conditionally fitting

Overview If you want to do fitting, you can do it with scipy.optimize.leastsq etc. in python. However, when doing fitting, there are many...

Comments


Let's do our best with our partner:​ ChatReminder

iphone6.5p2.png

It is an application that achieves goals in a chat format with partners.

google-play-badge.png
Download_on_the_App_Store_Badge_JP_RGB_blk_100317.png

Let's do our best with our partner:​ ChatReminder

納品:iPhone6.5①.png

It is an application that achieves goals in a chat format with partners.

google-play-badge.png
Download_on_the_App_Store_Badge_JP_RGB_blk_100317.png

Theme diary: Decide the theme and record for each genre

It is a diary application that allows you to post and record with themes and sub-themes for each genre.

google-play-badge.png
Download_on_the_App_Store_Badge_JP_RGB_blk_100317.png
bottom of page