Hubway Capstone Project-- Demand Over Time of Day¶

Hubway is a bike-share program collectively owned by the metro Boston cities; Boston, Cambridge, Somerville, and Brookline. It is operated by Motivate, who manages similar initiatives in NYC, Portland, Chicago, Washington DC, and several other metro areas in Ohio, Tennessee, and New Jersey. They are opening up operations in San Francisco during the month of June, 2017. Hubway currently exists as a system of 188 stations with 1,800 bikes.

For this project, I investigated shared data for the months of January, May, June, July, and October during the years of 2015 and 2016.
Of concern were the questions of;
- How do riders use the bike-share service?
- Are the bikes used as a conveyance or for recreation?
- What type of customer uses the service?

Below is a look into demand over the course of all the days in the dataset.

Import Libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import warnings
warnings.filterwarnings("ignore")

Read data csv file and created data frame

hubway_csv = pd.read_csv('./hubway.csv')
hubway_df = pd.DataFrame(hubway_csv)
hubway_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 946473 entries, 0 to 946472
Data columns (total 19 columns):
tripduration               946473 non-null int64
starttime                  946473 non-null object
stoptime                   946473 non-null object
start station id           946473 non-null int64
start station name         946473 non-null object
start station latitude     946473 non-null float64
start station longitude    946473 non-null float64
end station id             946473 non-null int64
end station name           946473 non-null object
end station latitude       946473 non-null float64
end station longitude      946473 non-null float64
bikeid                     946473 non-null int64
age decile                 946473 non-null int64
male                       946473 non-null int64
end station category       946473 non-null int64
start station category     946473 non-null int64
day_of_week                946473 non-null object
usertype_Customer          946473 non-null int64
usertype_Subscriber        946473 non-null int64
dtypes: float64(4), int64(10), object(5)
memory usage: 137.2+ MB

Create dataframe of just the start times, stop times, and end station category

dayhub = hubway_df[['starttime','stoptime', 'end station category']]
dayhub.head()

Covert objects to actual dates and times, while droping date information and the seconds

dayhub['srt_time'] = pd.to_datetime(dayhub['starttime'])
#dayhub['srt_time'] = dayhub.index.map(lambda x: x.replace(second=0))
dayhub['stp_time'] = pd.to_datetime(dayhub['stoptime'])
#dayhub['stp_time'] = dayhub['stp_time'].values.astype('<M8[m]')
dayhub.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 946473 entries, 0 to 946472
Data columns (total 5 columns):
starttime               946473 non-null object
stoptime                946473 non-null object
end station category    946473 non-null int64
srt_time                946473 non-null datetime64[ns]
stp_time                946473 non-null datetime64[ns]
dtypes: datetime64[ns](2), int64(1), object(2)
memory usage: 36.1+ MB

dayhub['Time'] = [d.time() for d in dayhub['stp_time']]
dayhub['Time'] = dayhub['Time'].map(lambda x: x.replace(second=0))
dayhub.tail(5)

Drop unneeded information leaving only the end station category and the time

dayhub_vis = dayhub.drop(['starttime', 'stoptime', 'srt_time', 'stp_time'], 1)
dayhub_vis.tail()

dayhub_vis.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 946473 entries, 0 to 946472
Data columns (total 2 columns):
end station category    946473 non-null int64
Time                    946473 non-null object
dtypes: int64(1), object(1)
memory usage: 14.4+ MB

hubway_demand = dayhub_vis['Time'].value_counts()
dayhub_dmnd = pd.DataFrame(hubway_demand)
dayhub_dmnd.columns = ['Demand']
dayhub_dmnd.tail()

ax = dayhub_dmnd.plot(kind='line', title ="Total demand over the course of the day",
                      figsize=(15,10), legend=True, fontsize=12)

plt.show()

Clearly the highest demand is during rush hour which points to a commuter-type customer

	starttime	stoptime	end station category
0	2015-01-01 00:21:44	2015-01-01 00:30:47	4
1	2015-01-01 00:53:46	2015-01-01 01:00:58	4
2	2015-01-04 14:29:05	2015-01-04 14:38:45	4
3	2015-01-08 16:17:04	2015-01-08 16:29:39	4
4	2015-01-10 11:40:49	2015-01-10 11:51:57	4

	starttime	stoptime	end station category	srt_time	stp_time	Time
946468	2016-10-26 18:07:59	2016-10-26 18:11:12	2	2016-10-26 18:07:59	2016-10-26 18:11:12	18:11:00
946469	2016-10-27 07:40:49	2016-10-27 07:43:55	2	2016-10-27 07:40:49	2016-10-27 07:43:55	07:43:00
946470	2016-10-30 15:21:23	2016-10-30 15:29:57	2	2016-10-30 15:21:23	2016-10-30 15:29:57	15:29:00
946471	2016-10-30 15:21:28	2016-10-30 15:29:57	2	2016-10-30 15:21:28	2016-10-30 15:29:57	15:29:00
946472	2016-10-14 13:05:51	2016-10-14 13:10:14	4	2016-10-14 13:05:51	2016-10-14 13:10:14	13:10:00

	end station category	Time
946468	2	18:11:00
946469	2	07:43:00
946470	2	15:29:00
946471	2	15:29:00
946472	4	13:10:00

	Demand
03:32:00	8
04:24:00	8
03:46:00	8
04:08:00	8
04:06:00	6

Erik Ellis // Technical Communications || Data Analytics

Wednesday, December 12, 2018

Hubway Capstone Project-- Demand Over Time of Day

Hubway Capstone Project-- Demand Over Time of Day¶

2 comments: