Hubway Capstone Project-- User Age¶
Hubway is a bike-share program collectively owned by the metro Boston cities; Boston, Cambridge, Somerville, and Brookline. It is operated by Motivate, who manages similar initiatives in NYC, Portland, Chicago, Washington DC, and several other metro areas in Ohio, Tennessee, and New Jersey. They are opening up operations in San Francisco during the month of June, 2017. Hubway currently exists as a system of 188 stations with 1,800 bikes.- For this project, I investigated shared data for the months of January, May, June, July, and October during the years of 2015 and 2016.
- Of concern were the questions of;
- How do riders use the bike-share service?
- Are the bikes used as a conveyance or for recreation?
- What type of customer uses the service?
- How do riders use the bike-share service?
Import Libraries
In [42]:
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from datetime import datetime
Import SQL functionality and database
In [43]:
import sqlite3
from pandas.io import sql
from sqlalchemy import create_engine
In [44]:
hubway = './database/hubway.db'
conn = sqlite3.connect(hubway)
c = conn.cursor()
The age decile was created in the logistic regression portion of the project, some of the work is commented below
In [ ]:
'''Drop the null values from the birth year'''
#mask = hubway_df['birth year'] == '\N'
#drop_index = hubway_df[mask].index
#hubway = hubway_df.drop(drop_index)
'''Turn them to numeric and drop values that are obvious errors ar incorrect; the cutoff was 1927'''
#hubway['birth year'] = pd.to_numeric(hubway['birth year'])
#mask = hubway['birth year'] < 1927
#print len(hubway[mask])
#drop_index = hubway[mask].index
#hubway = hubway.drop(drop_index)
'''Turn birth year into age value'''
#hubway['birth year'] = hubway['birth year'].map(lambda x: 2016 - x)
#hubway['birth year'].head()
'''Create age decile'''
#mask = hubway['birth year'] < 23
#hubway['college_age'] = hubway[mask]
#mask = (hubway['birth year'] > 22) & (hubway['birth year'] < 31)
#hubway['young_professional'] = hubway[mask]
#mask = (hubway['birth year'] > 30) & (hubway['birth year'] < 55)
#hubway['working_professional'] = hubway[mask]
#mask = (hubway['birth year'] > 54)
#hubway['pension'] = hubway[mask]
#hubway['birth year'] = ((hubway['birth year'] // 10) * 10).astype(str)
Descriptive statistics of the age decile
In [45]:
hubbirth = pd.read_sql('''SELECT * FROM hubway''', con = conn)
hubbirth['age decile'].describe()
Out[45]:
Graph of age decile distribution
In [46]:
ax = plt.hist(hubbirth['age decile'].T, bins=8)
plt.title('Age decile count per customer')
plt.xlabel('Age decile')
plt.ylabel('Counts')
plt.grid(axis='y', alpha=0.75)
plt.show()
The majority of users' age is between 20 and 35 years.
In [13]:
conn.close()
In [ ]:
No comments:
Post a Comment