Hubway Capstone Project-- User Type & Gender-- Category Breakdown¶

Hubway is a bike-share program collectively owned by the metro Boston cities; Boston, Cambridge, Somerville, and Brookline. It is operated by Motivate, who manages similar initiatives in NYC, Portland, Chicago, Washington DC, and several other metro areas in Ohio, Tennessee, and New Jersey. They are opening up operations in San Francisco during the month of June, 2017. Hubway currently exists as a system of 188 stations with 1,800 bikes.

For this project, I investigated shared data for the months of January, May, June, July, and October during the years of 2015 and 2016.
Of concern were the questions of;
- How do riders use the bike-share service?
- Are the bikes used as a conveyance or for recreation?
- What type of customer uses the service?

Below is a continuation of the emperical data analysis looking into user type and gender. Also the category matrix of the probabilities of where a trip begins and ends is presented.

Import libraries

import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
#from sklearn import datasets
#from sklearn.linear_model import LogisticRegression
#from sklearn.multiclass import OneVsRestClassifier
#import statsmodels.api as sm

Here is the category matrix of probabilities of where trips begin and end, establishing the first look into what the baseline values are prior to running the models.

catsp_csv = pd.read_csv('./database/hubway-cats-prob.csv')
catsp_df = pd.DataFrame(catsp_csv)
catsp_df.head()

catsp_df = catsp_df.drop('Unnamed: 0', 1)
catsp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
Categories Percent          5 non-null object
end_side_streets            5 non-null float64
end_mixed_squares           5 non-null float64
end_recreation              5 non-null float64
end_business_institution    5 non-null float64
end_major_shopping          5 non-null float64
dtypes: float64(5), object(1)
memory usage: 312.0+ bytes

catsp_df = catsp_df.set_index('Categories Percent')
catsp_df.head()

Here the heatmap shows how strong the end_business_institution is a probable end point.

ax = sns.heatmap(catsp_df, linewidths=.5)
plt.show()

catsp_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, start_side_streets to start_major_shopping
Data columns (total 5 columns):
end_side_streets            5 non-null float64
end_mixed_squares           5 non-null float64
end_recreation              5 non-null float64
end_business_institution    5 non-null float64
end_major_shopping          5 non-null float64
dtypes: float64(5)
memory usage: 240.0+ bytes

Now presented are the probabilities for user type and gender for each ending category.

gndsr_csv = pd.read_csv('./database/hubway-g-u-percent.csv')
gndsr_df = pd.DataFrame(gndsr_csv)
gndsr = gndsr_df.set_index('End Category')
gndsr.head()

cust_sub = gndsr_df.drop(['Gender(0) End', 'Gender(1) End', 'Gender(2) End'], 1)

cust_sub.index = ['end_side_streets', 'end_mixed_squares', 'end_recreation', 
                  'end_business_institution', 'end_major_shopping']
cust_sub.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, end_side_streets to end_major_shopping
Data columns (total 3 columns):
End Category      5 non-null object
Customer End      5 non-null float64
Subscriber End    5 non-null float64
dtypes: float64(2), object(1)
memory usage: 160.0+ bytes

The service has two types of users, subscribers to the service, and customers who pay at the docking stations.
Here in this heatmap the strongest value is for subscribers going to businesses and institutions.

ax = sns.heatmap(cust_sub[['Customer End', 'Subscriber End']] , linewidths=.5)
plt.show()

gndsr_df.set_index('End Category')
gndsr_df.index = index = [32
                         22
                         
                         3333333333
                         ]
gndsr_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, end_side_streets to end_major_shopping
Data columns (total 6 columns):
End Category      5 non-null object
Customer End      5 non-null float64
Subscriber End    5 non-null float64
Gender(0) End     5 non-null float64
Gender(1) End     5 non-null float64
Gender(2) End     5 non-null float64
dtypes: float64(5), object(1)
memory usage: 280.0+ bytes

ax = gndsr_df[['Customer End', 
              'Subscriber End']].plot(kind='bar', title ="User by Category",figsize=(15,10), legend=True, fontsize=12)
ax.set_xlabel("Category",fontsize=12)
ax.set_ylabel("User Type",fontsize=12)
plt.show()

Clearly most users are going to a business or institution, but note that recreation is much higher with customers, slightly higher with major shopping centers, and even with residential side street. Also the strength of subscribers that live or shop in the area's many neighborhood squares is to be expected.

The Hubway data did not come with a dictionary, and lists three gender types (0, 1, 2). In this section it is assumed that customers are zero (0), since their gender would be unknown at time of service, or subscribers that do not declare their gender. (1) is assumed male, and (2) is assumed female.

ax = sns.heatmap(gndsr_df[['Gender(0) End', 
               'Gender(1) End', 'Gender(2) End']], linewidths=.5)
plt.show()

gndsr_df = gndsr_df.drop(['Customer End', 'Subscriber End'], 1)
#[['Gender(0) Start', 'Gender(0) End', 'Gender(1) Start', 
               #'Gender(1) End', 'Gender(2) Start', 'Gender(2) End']]
ax = gndsr_df.plot(kind='bar', title ="User by Gender",figsize=(15,10), legend=True, fontsize=12)
ax.set_xlabel("Category",fontsize=12)
ax.set_ylabel("Gender Category",fontsize=12)
plt.show()

This is interesting; most males are going to business and institutions, while females edge out males in most other categories. Recreation users and tourists are mostly (0), which coincides with the characteristics of a customer, lending supporting evidence that (0) is gender unknown.

	Unnamed: 0	Categories Percent	end_side_streets	end_mixed_squares	end_recreation	end_business_institution	end_major_shopping
0	0	start_side_streets	0.114010	0.197736	0.135013	0.408610	0.144632
1	1	start_mixed_squares	0.138742	0.236436	0.100348	0.376485	0.147988
2	2	start_recreation	0.116273	0.121135	0.247220	0.373013	0.142358
3	3	start_business_institution	0.128534	0.173401	0.151474	0.402990	0.143601
4	4	start_major_shopping	0.130734	0.179296	0.153727	0.379662	0.156582

	end_side_streets	end_mixed_squares	end_recreation	end_business_institution	end_major_shopping
Categories Percent
start_side_streets	0.114010	0.197736	0.135013	0.408610	0.144632
start_mixed_squares	0.138742	0.236436	0.100348	0.376485	0.147988
start_recreation	0.116273	0.121135	0.247220	0.373013	0.142358
start_business_institution	0.128534	0.173401	0.151474	0.402990	0.143601
start_major_shopping	0.130734	0.179296	0.153727	0.379662	0.156582

	Customer End	Subscriber End	Gender(0) End	Gender(1) End	Gender(2) End
End Category
end_side_streets	0.127103	0.126786	0.105590	0.129084	0.143775
end_mixed_squares	0.159643	0.188436	0.155309	0.183183	0.201574
end_recreation	0.198834	0.139543	0.233586	0.136850	0.123022
end_business_institution	0.358619	0.402499	0.332738	0.414207	0.387037
end_major_shopping	0.155801	0.142735	0.172777	0.136676	0.144592

Erik Ellis // Technical Communications || Data Analytics

Wednesday, December 12, 2018

Hubway Capstone Project-- User Type & Gender-- Category Breakdown

Hubway Capstone Project-- User Type & Gender-- Category Breakdown¶

No comments:

Post a Comment