Hubway Capstone Project-- User Type & Gender-- Category Breakdown¶
Hubway is a bike-share program collectively owned by the metro Boston cities; Boston, Cambridge, Somerville, and Brookline. It is operated by Motivate, who manages similar initiatives in NYC, Portland, Chicago, Washington DC, and several other metro areas in Ohio, Tennessee, and New Jersey. They are opening up operations in San Francisco during the month of June, 2017. Hubway currently exists as a system of 188 stations with 1,800 bikes.- For this project, I investigated shared data for the months of January, May, June, July, and October during the years of 2015 and 2016.
- Of concern were the questions of;
- How do riders use the bike-share service?
- Are the bikes used as a conveyance or for recreation?
- What type of customer uses the service?
- How do riders use the bike-share service?
Import libraries
In [2]:
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
#from sklearn import datasets
#from sklearn.linear_model import LogisticRegression
#from sklearn.multiclass import OneVsRestClassifier
#import statsmodels.api as sm
Here is the category matrix of probabilities of where trips begin and end, establishing the first look into what the baseline values are prior to running the models.
In [3]:
catsp_csv = pd.read_csv('./database/hubway-cats-prob.csv')
catsp_df = pd.DataFrame(catsp_csv)
catsp_df.head()
Out[3]:
In [4]:
catsp_df = catsp_df.drop('Unnamed: 0', 1)
catsp_df.info()
In [6]:
catsp_df = catsp_df.set_index('Categories Percent')
catsp_df.head()
Out[6]:
Here the heatmap shows how strong the end_business_institution is a probable end point.
In [7]:
ax = sns.heatmap(catsp_df, linewidths=.5)
plt.show()
In [8]:
catsp_df.info()
Now presented are the probabilities for user type and gender for each ending category.
In [9]:
gndsr_csv = pd.read_csv('./database/hubway-g-u-percent.csv')
gndsr_df = pd.DataFrame(gndsr_csv)
gndsr = gndsr_df.set_index('End Category')
gndsr.head()
Out[9]:
In [12]:
cust_sub = gndsr_df.drop(['Gender(0) End', 'Gender(1) End', 'Gender(2) End'], 1)
cust_sub.index = ['end_side_streets', 'end_mixed_squares', 'end_recreation',
'end_business_institution', 'end_major_shopping']
cust_sub.info()
The service has two types of users, subscribers to the service, and customers who pay at the docking stations.
Here in this heatmap the strongest value is for subscribers going to businesses and institutions.
Here in this heatmap the strongest value is for subscribers going to businesses and institutions.
In [14]:
ax = sns.heatmap(cust_sub[['Customer End', 'Subscriber End']] , linewidths=.5)
plt.show()
In [19]:
gndsr_df.set_index('End Category')
gndsr_df.index = index = [32
22
3333333333
]
gndsr_df.info()
In [20]:
ax = gndsr_df[['Customer End',
'Subscriber End']].plot(kind='bar', title ="User by Category",figsize=(15,10), legend=True, fontsize=12)
ax.set_xlabel("Category",fontsize=12)
ax.set_ylabel("User Type",fontsize=12)
plt.show()
Clearly most users are going to a business or institution, but note that recreation is much higher with customers, slightly higher with major shopping centers, and even with residential side street. Also the strength of subscribers that live or shop in the area's many neighborhood squares is to be expected.
The Hubway data did not come with a dictionary, and lists three gender types (0, 1, 2). In this section it is assumed that customers are zero (0), since their gender would be unknown at time of service, or subscribers that do not declare their gender. (1) is assumed male, and (2) is assumed female.
In [25]:
ax = sns.heatmap(gndsr_df[['Gender(0) End',
'Gender(1) End', 'Gender(2) End']], linewidths=.5)
plt.show()
In [28]:
gndsr_df = gndsr_df.drop(['Customer End', 'Subscriber End'], 1)
#[['Gender(0) Start', 'Gender(0) End', 'Gender(1) Start',
#'Gender(1) End', 'Gender(2) Start', 'Gender(2) End']]
ax = gndsr_df.plot(kind='bar', title ="User by Gender",figsize=(15,10), legend=True, fontsize=12)
ax.set_xlabel("Category",fontsize=12)
ax.set_ylabel("Gender Category",fontsize=12)
plt.show()
This is interesting; most males are going to business and institutions, while females edge out males in most other categories. Recreation users and tourists are mostly (0), which coincides with the characteristics of a customer, lending supporting evidence that (0) is gender unknown.
In [ ]:
No comments:
Post a Comment