Wednesday, December 12, 2018

Hubway Capstone Project-- User Age

Hubway Capstone-- User Age

Hubway Capstone Project-- User Age

Hubway is a bike-share program collectively owned by the metro Boston cities; Boston, Cambridge, Somerville, and Brookline. It is operated by Motivate, who manages similar initiatives in NYC, Portland, Chicago, Washington DC, and several other metro areas in Ohio, Tennessee, and New Jersey. They are opening up operations in San Francisco during the month of June, 2017. Hubway currently exists as a system of 188 stations with 1,800 bikes.
  • For this project, I investigated shared data for the months of January, May, June, July, and October during the years of 2015 and 2016.
  • Of concern were the questions of;
    • How do riders use the bike-share service?
    • Are the bikes used as a conveyance or for recreation?
    • What type of customer uses the service?
Below is the continuation of the empirical data analysis looking into age. [Note: theage decile work was completed in the logistic regression and is included here in commented text.]
Import Libraries
In [42]:
import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from datetime import datetime
Import SQL functionality and database
In [43]:
import sqlite3
from pandas.io import sql
from sqlalchemy import create_engine
In [44]:
hubway = './database/hubway.db'
conn = sqlite3.connect(hubway) 
c = conn.cursor()
The age decile was created in the logistic regression portion of the project, some of the work is commented below
In [ ]:
'''Drop the null values from the birth year'''


#mask = hubway_df['birth year'] == '\N'
#drop_index = hubway_df[mask].index
#hubway = hubway_df.drop(drop_index)

'''Turn them to numeric and drop values that are obvious errors ar incorrect; the cutoff was 1927'''


#hubway['birth year'] = pd.to_numeric(hubway['birth year'])
#mask = hubway['birth year'] < 1927
#print len(hubway[mask])
#drop_index = hubway[mask].index
#hubway = hubway.drop(drop_index)

'''Turn birth year into age value'''


#hubway['birth year'] = hubway['birth year'].map(lambda x: 2016 - x)
#hubway['birth year'].head()

'''Create age decile'''


#mask = hubway['birth year'] < 23
#hubway['college_age'] = hubway[mask]
#mask = (hubway['birth year'] > 22) & (hubway['birth year'] < 31)
#hubway['young_professional'] = hubway[mask]
#mask = (hubway['birth year'] > 30) & (hubway['birth year'] < 55)
#hubway['working_professional'] = hubway[mask]
#mask = (hubway['birth year'] > 54)
#hubway['pension'] = hubway[mask]
#hubway['birth year'] = ((hubway['birth year'] // 10) * 10).astype(str)
Descriptive statistics of the age decile
In [45]:
hubbirth = pd.read_sql('''SELECT * FROM hubway''', con = conn)
hubbirth['age decile'].describe()
Out[45]:
count    946473.000000
mean         30.809194
std          11.678176
min          10.000000
25%          20.000000
50%          30.000000
75%          40.000000
max          80.000000
Name: age decile, dtype: float64
Graph of age decile distribution
In [46]:
ax = plt.hist(hubbirth['age decile'].T, bins=8)
plt.title('Age decile count per customer')
plt.xlabel('Age decile')
plt.ylabel('Counts')
plt.grid(axis='y', alpha=0.75)
plt.show()
The majority of users' age is between 20 and 35 years.
In [13]:
conn.close()
In [ ]:
 
Back to Executive Summary

No comments:

Post a Comment