Hubway Capstone-- Preliminary Exploration of Data¶

Hubway is a bike-share program collectively owned by the metro Boston cities; Boston, Cambridge, Somerville, and Brookline. It is operated by Motivate, who manages similar initiatives in NYC, Portland, Chicago, Washington DC, and several other metro areas in Ohio, Tennessee, and New Jersey. They are opening up operations in San Francisco during the month of June, 2017. Hubway currently exists as a system of 188 stations with 1,800 bikes.

For this project, I investigated shared data for the months of January, May, June, July, and October during the years of 2015 and 2016.
Of concern were the questions of;
- How do riders use the bike-share service?
- Are the bikes used as a conveyance or for recreation?
- What type of customer uses the service?

Below is some preliminary exploration of the data that occured early in the project.

Once the database has been created and verified, first observations are made abut the data's structure and characteristics, such as outliers and null values. Also, at this time there's consideration into what hypotheses can be or can't be proven.¶

Import libraries

import numpy as np
import scipy.stats as stats
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
#import sqlite3
from scipy import stats

Load SQL database

import sqlite3
hubway_db = './database/hubway_full.db'
conn = sqlite3.connect(hubway_db) 
c = conn.cursor()

#hubway_db.describe()
pd.options.display.max_rows = 1000

Looking at outliers in tripduration that were detected in various dataes

October 12, 2016

oct_12 = pd.read_sql('SELECT * FROM hubway_db WHERE "stoptime" LIKE "2016-10-12%"', con=conn)
oct_12 = oct_12.set_index('stoptime')
oct_12 = oct_12.drop(['starttime', 'start station name', 'start station id',
                            'start station latitude',
                            'start station longitude', 'end station name', 
                            'end station id',
                            'end station latitude',
                            'end station longitude', 
                            'bikeid', 'usertype',
                            'birth year', 'gender'], axis=1)
oct_12_list = oct_12.tripduration.values.tolist()
oct_12_list

[7349465L,
 436691L,
 73506L,
 50253L,
 71101L,
 1187L,
 1394L,
 692L,
 856L,
 822L,
 271L,
 187L,
 355L,
 624L,
 346L,
 521L,
 497L,
 578L,
 346L,
 336L,
 ...
 602L,
 1230L,
 838L,
 703L,
 464L,
 319L,
 466L,
 627L,
 552L,
 735L,
 150L,
 499L,
 464L,
 628L,
 655L,
 1823L,
 196L,
 ...]

Boxplot of List

plt.figure()
plt.boxplot(oct_12_list, 0, '', 1, 0.75)
plt.show()

Plot for outliers

plt.figure()
plt.boxplot(oct_12_list, 0, 'rs', 1, 0.75)
plt.show()

January 14, 2016

jan_14 = pd.read_sql('SELECT * FROM hubway_db WHERE "stoptime" LIKE "2016-01-14%"', con=conn)
jan_14 = jan_14.set_index('stoptime')
jan_14 = jan_14.drop(['starttime', 'start station name', 'start station id',
                            'start station latitude',
                            'start station longitude', 'end station name', 
                            'end station id',
                            'end station latitude',
                            'end station longitude', 
                            'bikeid', 'usertype',
                            'birth year', 'gender'], axis=1)
jan_14_list = jan_14.tripduration.values.tolist()

plt.figure()
plt.boxplot(jan_14_list, 0, '', 1, 0.75)
plt.show()

plt.figure()
plt.boxplot(jan_14_list, 0, 'rs', 1, 0.75)

{'boxes': [<matplotlib.lines.Line2D at 0x7f784ff7cf90>],
 'caps': [<matplotlib.lines.Line2D at 0x7f784ff53e90>,
  <matplotlib.lines.Line2D at 0x7f784ff6c510>],
 'fliers': [<matplotlib.lines.Line2D at 0x7f784ff571d0>],
 'means': [],
 'medians': [<matplotlib.lines.Line2D at 0x7f784ff6cb50>],
 'whiskers': [<matplotlib.lines.Line2D at 0x7f784ea22b50>,
  <matplotlib.lines.Line2D at 0x7f784ff53850>]}

July 23, 2016

jul_23 = pd.read_sql('SELECT * FROM hubway_db WHERE "stoptime" LIKE "2016-07-23%"', con=conn)
jul_23 = jul_23.set_index('stoptime')
jul_23 = jul_23.drop(['starttime', 'start station name', 'start station id',
                            'start station latitude',
                            'start station longitude', 'end station name', 
                            'end station id',
                            'end station latitude',
                            'end station longitude', 
                            'bikeid', 'usertype',
                            'birth year', 'gender'], axis=1)
jul_23_list = jul_23.tripduration.values.tolist()

plt.figure()
plt.boxplot(jul_23_list, 0, '', 1, 0.75)
plt.show()

<matplotlib.figure.Figure at 0xef8b7f0>

plt.figure()
plt.boxplot(jul_23_list, 0, 'rs', 1, 0.75)
plt.show()

Look at descriptive statistics of database

hubway_df = pd.read_sql('SELECT * FROM hubway_db', con=conn)
hubway_df.describe()

Look at the bike ids

stats.mode(hubway_df.bikeid)

ModeResult(mode=array([351], dtype=int64), count=array([1317]))

Bike with id 351 is the most used out of 1317 trips

most_traveled_bike_df = pd.read_sql('''SELECT * FROM hubway_db WHERE "bikeid" LIKE 351''', con = conn)
most_traveled_bike_df.head()

Look at Birth Year

hubway_df["birth year"].unique()

array([u'1984', u'1985', u'1974', u'1987', u'1986', u'1989', u'1964',
       u'1981', u'1993', u'1978', u'1988', u'1982', u'1991', u'1962',
       u'\\N', u'1977', u'1970', u'1960', u'1954', u'1980', u'1957',
       u'1990', u'1992', u'1973', u'1946', u'1967', u'1944', u'1983',
       u'1995', u'1966', u'1965', u'1948', u'1956', u'1972', u'1968',
       u'1975', u'1961', u'1979', u'1933', u'1971', u'1959', u'1963',
       u'1994', u'1947', u'1976', u'1969', u'1996', u'1953', u'1958',
       u'1949', u'1950', u'1942', u'1955', u'1900', u'1951', u'1952',
       u'1945', u'1899', u'1943', u'1997', u'1917', u'1940', u'1897',
       u'1939', u'1941', u'1937', u'1998', u'1999', u'1934', u'1901',
       u'1902', u'1886', u'1938', u'2000', u'1927'], dtype=object)

hubway_df["birth year"].value_counts()

\N      265698
1988     54209
1989     50023
1986     49919
1990     48076
1987     47821
1984     44543
1985     43499
1991     39959
1983     37616
1992     35685
1982     34965
1993     28182
1981     27382
1980     26386
1994     23159
1978     22848
1979     20215
1995     16682
1977     15592
1974     15288
1970     14658
1975     14391
1976     13713
1972     12696
1971     12422
1996     11721
1973     11484
1969     11179
1961     11051
         ...  
1997      4370
1952      3326
1946      2668
1950      2301
1951      2196
1998      1596
1948      1537
1947      1244
1942       862
1945       822
1949       758
1944       740
1999       728
1900       331
1943       295
1934       199
2000       178
1886       132
1941        90
1937        84
1939        82
1940        54
1897        43
1933        32
1902        31
1899        29
1901        18
1917         9
1938         5
1927         1
Name: birth year, dtype: int64

Obviously there needs to be some cleaning of the data, for null values and customers inputing incorrect information

Looking at the start station id

hubway_df["start station id"].unique()

array([115,  80,  91, 105,  88,  68, 117,  67,  75,  90, 116,  95, 110,
        73, 104, 140, 107,  74,  89, 143,  76, 141,  72,  96,  70,  87,
        97, 108, 118,  84, 142, 145,  36,   9, 138,  58, 100,  16,  17,
        47,  45,  71,  29, 109,  54,  27, 151,  46,  11,  10, 133,  41,
        42,  57,  25,  60,  12,  38,  98,  53,  21,  69, 114,  82,  40,
        94,  66,  61,  26,  78,  22,  20,  55,   4,   8,  39,  23,  31,
       122, 131,  33,   6, 130,  63,  86,  85,  32, 149,  64,  50,  81,
        43, 113, 137,  48, 120,  13,   3, 150, 134, 139, 119, 103, 112,
        99,  77,  51,   5, 152,  19, 121, 125,  44,  24,  14,  49,  35,
        52,  59, 126, 129, 111, 135, 123, 128, 124,  65,  30,  15, 136,
        93,   7, 102, 106,  37,  56,   1, 132,  79,  92, 161, 160, 162,
       167, 163, 170, 169, 174, 175, 159, 171, 178, 179, 177, 176, 180,
       189, 190, 192, 193, 186, 146, 194, 173, 195, 197, 196, 200, 199,
       153, 158, 201, 216, 209, 211, 183, 214, 215, 210, 217, 205, 208,
       212, 204, 202, 207, 203, 213, 218, 184, 185], dtype=int64)

stats.mode(hubway_df["start station id"])

ModeResult(mode=array([22], dtype=int64), count=array([32415]))

Station id 22 (South Station) is the most used for begining trips, accounting for 32415 trips

most_visited_start_df = pd.read_sql('''SELECT * FROM hubway_db WHERE "start station id" LIKE 22''', con = conn)
most_visited_start_df.head()

Looking at the end station id

stats.mode(hubway_df["end station id"])

ModeResult(mode=array([22], dtype=int64), count=array([32783]))

End station with id 22 is also the most used, ending 32783 trips, just slightly more than the start station count.

most_visited_end_df = pd.read_sql('''SELECT * FROM hubway_db WHERE "end station id" LIKE 22''', con = conn)
most_visited_end_df

Looking at recreational or tourist riders

day_trip_df = pd.read_sql('''SELECT * FROM hubway_db WHERE "end station id" LIKE "start station id"''', con = conn)
day_trip_df.describe()

Looking for recreational rider most used start station and end station

stats.mode(day_trip_df["start station id"])

ModeResult(mode=array([58], dtype=int64), count=array([1341]))

stats.mode(day_trip_df["end station id"])

ModeResult(mode=array([58], dtype=int64), count=array([1341]))

The station most used for recreation is 58, The Esplanade, for 1341 trips. This makes sense given the station location

day_trip_station_df = pd.read_sql('''SELECT * FROM hubway_db WHERE "start station id" LIKE 58 AND "end station id" LIKE 58''', con = conn)
day_trip_station_df.describe()

day_trip_station_df.head(21)

#CLeaning of the birth year feature actually occured during the logistic regression
#when the years were converted to an age decile 


#hubway_df["birth year"] = hubway_df["birth year"].str.replace('''\\\\N''', '1970')
#hubway_df["birth year"] = hubway_df["birth year"].str.replace('\0','0')
#hubway_df["birth year"].unique()

#stats.mode(hubway_df['birth year'])

#np.mean(hubway_df['birth year'])

#if hubway_df['birth year'] <= '1974':
    #hubway_df['birth year'].replace(1974)
#hubway_df["birth year"] = pd.to_numeric(hubway_df["birth year"])

Cambridge has the highest volume of traffic¶

camb_trip_station_df = pd.read_sql('''SELECT * FROM hubway_db WHERE "start station name" LIKE "%Cambridge%" OR "end station name" LIKE "%Cambridge%"''', con = conn)
camb_trip_station_df.tail()

These results aren't accurate, SQL string needs to be refined-- for exapmle Cambridge Street in Boston is in the results

Qustions to investigate:¶

Peak usage can be tied to rush hour commuter usage¶

Involves breaking date, which is a string down to a time feature.

Other peaks in usage involve traveling to 'squares' that are both commercial and residential¶

Involves setting up a classification feature for locations of stations

A rider/customer profile can be predicted¶

Involves:

Logistic regression
Random forest
Adaboost

#conn.close()

Creation of categories¶

Categories for station will be classified as follows:

1: Residential side streets
2: Squares with businesses and residential elements
3: Recreational and tourist areas
4: Businesses (large enterprise) and institutions (hospitals, colleges, transit)
5: Major shopping areas and plazas

Isolate and export list of station names

cat_df = hubway_df['start station name'].unique()

#cat_csv = cat_df.to_csv(path_or_buf='./database/cat_csv.csv')

cat_db = pd.read_csv('./database/cat_csv.csv')
#cat_csv = pd.DataFrame(cat_csv)
type(cat_db)

pandas.core.frame.DataFrame

Join the category list to main dataframe and create new database

cat_db.head()

#hubway_cats_df = hubway_df.join(cat_csv, on='start station name', how='right')
#hubway_cats_df.tail(10)
#from sqlalchemy import create_engine

#engine = create_engine('sqlite:///C:\Users\Owner\Documents\Capstone\database\cats.db', echo=True)

#cats_db = cat_csv.to_sql(name = 'cats_db', con = engine, if_exists = 'replace', index = False)

#hubway_start_cats_db = './database/cats.db'
#connct = sqlite3.connect('cats_db')

Create new table and merge categories to appropriate end station names

#hubway_cats_db = pd.read_sql('CREATE TABLE hubway_full.cats_db 
    #("start station name" VARCHAR "start station catagory INTEGER")', con=conn)
#hubway_cats_db

hubway_cats_df = pd.merge(hubway_df, cat_db, on='end station name')

hubway_cats_df.head()

Perform same operation with start station names

cat2_db = pd.read_csv('./database/cat2_csv.csv')
#cat_csv = pd.DataFrame(cat_csv)
type(cat2_db)

pandas.core.frame.DataFrame

cat2_db.head()

hubway_cats2_df = pd.merge(hubway_cats_df, cat2_db, on='start station name')

#hubway_cats2_df.to_csv(path_or_buf='./database/hubway_cats.csv')

hubway_cats2_df.head()

Drop unneeded columns and check dataframe's structure

#hubway_cats2_df.drop('Unnamed: 0_x', 1)
#hubway_cats2_df.drop('Unnamed: 0_y', 1)
hubway_cats2_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1214424 entries, 0 to 1214423
Data columns (total 19 columns):
tripduration               1214424 non-null int64
starttime                  1214424 non-null object
stoptime                   1214424 non-null object
start station id           1214424 non-null int64
start station name         1214424 non-null object
start station latitude     1214424 non-null float64
start station longitude    1214424 non-null float64
end station id             1214424 non-null int64
end station name           1214424 non-null object
end station latitude       1214424 non-null float64
end station longitude      1214424 non-null float64
bikeid                     1214424 non-null int64
usertype                   1214424 non-null object
birth year                 1214424 non-null object
gender                     1214424 non-null int64
Unnamed: 0_x               1214424 non-null int64
end station category       1214424 non-null int64
Unnamed: 0_y               1214424 non-null int64
start station category     1214424 non-null int64
dtypes: float64(4), int64(9), object(6)
memory usage: 185.3+ MB

Final save and close SQL connection

hubway_cats2_df.to_csv(path_or_buf='./database/hubway_cats.csv')

conn.close()

	tripduration	start station id	start station latitude	start station longitude	end station id	end station latitude	end station longitude	bikeid	gender
count	1.214424e+06	1.214424e+06	1.214424e+06	1.214424e+06	1.214424e+06	1.214424e+06	1.214424e+06	1.214424e+06	1.214424e+06
mean	1.230372e+03	7.125669e+01	4.235758e+01	-7.108546e+01	7.115549e+01	4.235760e+01	-7.108535e+01	7.958509e+02	9.813352e-01
std	2.757293e+04	4.452513e+01	4.133521e-02	6.922767e-02	4.443033e+01	4.131822e-02	6.921803e-02	4.586949e+02	6.491990e-01
min	6.100000e+01	1.000000e+00	0.000000e+00	-7.356692e+01	1.000000e+00	0.000000e+00	-7.356692e+01	1.000000e+00	0.000000e+00
25%	4.130000e+02	3.700000e+01	4.234871e+01	-7.110441e+01	3.700000e+01	4.234871e+01	-7.110441e+01	4.080000e+02	1.000000e+00
50%	6.800000e+02	6.700000e+01	4.235695e+01	-7.108592e+01	6.700000e+01	4.235714e+01	-7.108592e+01	7.970000e+02	1.000000e+00
75%	1.102000e+03	9.800000e+01	4.236610e+01	-7.106425e+01	9.800000e+01	4.236594e+01	-7.106335e+01	1.159000e+03	1.000000e+00
max	1.334659e+07	2.180000e+02	4.550509e+01	0.000000e+00	2.180000e+02	4.550509e+01	0.000000e+00	1.930000e+03	2.000000e+00

	tripduration	starttime	stoptime	start station id	start station name	start station latitude	start station longitude	end station id	end station name	end station latitude	end station longitude	bikeid	usertype	birth year	gender
0	316	2015-05-01 06:31:33	2015-05-01 06:36:49	42	Boylston St. at Arlington St.	42.352033	-71.070627	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	351	Subscriber	\N	0
1	223	2015-05-01 07:26:23	2015-05-01 07:30:06	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	151	John F Fitzgerald - Surface Road at India Street	42.358155	-71.052163	351	Subscriber	1970	1
2	417	2015-05-01 09:58:49	2015-05-01 10:05:47	109	TD Garden - Causeway at Portal Park #1	42.365942	-71.060515	98	Charlestown - Warren St at Chelsea St	42.371848	-71.060292	351	Customer	1983	2
3	1150	2015-05-01 11:43:25	2015-05-01 12:02:35	98	Charlestown - Warren St at Chelsea St	42.371848	-71.060292	120	Charles St at Beacon St	42.356052	-71.069849	351	Subscriber	\N	0
4	410	2015-05-01 12:34:09	2015-05-01 12:41:00	120	Charles St at Beacon St	42.356052	-71.069849	35	Franklin St. / Arch St.	42.355402	-71.058087	351	Customer	1958	1

	tripduration	starttime	stoptime	start station id	start station name	start station latitude	start station longitude	end station id	end station name	end station latitude	end station longitude	bikeid	usertype	birth year	gender
0	458	2015-05-01 06:21:41	2015-05-01 06:29:19	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	42	Boylston St. at Arlington St.	42.352033	-71.070627	1048	Subscriber	1971	1
1	293	2015-05-01 06:23:25	2015-05-01 06:28:19	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	48	Post Office Square	42.356755	-71.055407	800	Customer	1953	1
2	803	2015-05-01 06:39:45	2015-05-01 06:53:08	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	107	Ames St at Main St	42.362500	-71.088220	414	Subscriber	1967	1
3	605	2015-05-01 06:39:59	2015-05-01 06:50:05	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	135	ID Building East	42.344827	-71.028664	1120	Subscriber	1960	1
4	263	2015-05-01 06:40:30	2015-05-01 06:44:53	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	7	Fan Pier	42.353287	-71.044389	1186	Subscriber	1976	1

	tripduration	starttime	stoptime	start station id	start station name	start station latitude	start station longitude	end station id	end station name	end station latitude	end station longitude	bikeid	usertype	birth year	gender
0	938	2015-05-01 05:19:00	2015-05-01 05:34:38	90	Lechmere Station at Cambridge St / First St	42.370677	-71.076529	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1120	Subscriber	1968	1
1	1070	2015-05-01 06:08:11	2015-05-01 06:26:01	11	Longwood Ave / Binney St	42.338629	-71.106500	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1004	Subscriber	\N	0
2	718	2015-05-01 06:20:21	2015-05-01 06:32:19	98	Charlestown - Warren St at Chelsea St	42.371848	-71.060292	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1172	Customer	1961	1
3	337	2015-05-01 06:26:10	2015-05-01 06:31:48	4	Tremont St. at Berkeley St.	42.345392	-71.069616	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	190	Customer	1977	1
4	393	2015-05-01 06:27:02	2015-05-01 06:33:36	16	Back Bay / South End Station	42.348074	-71.076570	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	588	Customer	1988	1
5	316	2015-05-01 06:31:33	2015-05-01 06:36:49	42	Boylston St. at Arlington St.	42.352033	-71.070627	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	351	Subscriber	\N	0
6	327	2015-05-01 06:33:23	2015-05-01 06:38:50	58	The Esplanade - Beacon St. at Arlington St.	42.355596	-71.072780	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	587	Customer	1986	1
7	880	2015-05-01 06:37:43	2015-05-01 06:52:23	94	Charlestown - Main St at Austin St	42.375603	-71.064608	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	980	Customer	1986	1
8	453	2015-05-01 07:03:18	2015-05-01 07:10:52	40	Lewis Wharf - Atlantic Ave.	42.363871	-71.050877	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	43	Subscriber	1957	1
9	879	2015-05-01 07:04:54	2015-05-01 07:19:34	45	Yawkey Way at Boylston St.	42.344706	-71.097515	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	450	Customer	1984	1
10	177	2015-05-01 07:05:04	2015-05-01 07:08:01	81	Boylston St / Washington St	42.352409	-71.062679	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	465	Customer	1972	1
11	832	2015-05-01 07:07:20	2015-05-01 07:21:12	113	Andrew Station - Dorchester Ave at Humboldt Pl	42.330716	-71.057044	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	246	Subscriber	1954	1
12	459	2015-05-01 07:20:48	2015-05-01 07:28:27	26	Washington St. at Waltham St.	42.341522	-71.068922	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	568	Customer	1971	1
13	343	2015-05-01 07:27:16	2015-05-01 07:32:59	20	Aquarium Station - 200 Atlantic Ave.	42.359770	-71.051601	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	27	Customer	1990	1
14	1048	2015-05-01 07:27:48	2015-05-01 07:45:17	53	Beacon St / Mass Ave	42.350851	-71.089886	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	772	Customer	1988	2
15	509	2015-05-01 07:32:03	2015-05-01 07:40:32	109	TD Garden - Causeway at Portal Park #1	42.365942	-71.060515	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	785	Customer	1973	1
16	412	2015-05-01 07:33:30	2015-05-01 07:40:23	20	Aquarium Station - 200 Atlantic Ave.	42.359770	-71.051601	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1176	Customer	1987	2
17	1053	2015-05-01 07:38:09	2015-05-01 07:55:43	51	Washington St. at Lenox St.	42.334876	-71.079097	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1327	Subscriber	1989	1
18	780	2015-05-01 07:41:39	2015-05-01 07:54:39	38	TD Garden - Causeway at Portal Park #2	42.366222	-71.059914	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	636	Subscriber	1970	1
19	505	2015-05-01 07:42:25	2015-05-01 07:50:51	61	Boylston at Fairfield	42.348762	-71.082383	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	483	Customer	1971	1
20	606	2015-05-01 07:42:40	2015-05-01 07:52:46	39	Washington St. at Rutland St.	42.338623	-71.074182	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	401	Customer	1988	1
21	372	2015-05-01 07:43:28	2015-05-01 07:49:40	24	Seaport Square - Seaport Blvd. at Boston Wharf	42.351205	-71.043983	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1122	Customer	1990	1
22	500	2015-05-01 07:43:52	2015-05-01 07:52:13	47	Cross St. at Hanover St.	42.362811	-71.056067	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	324	Customer	1985	1
23	386	2015-05-01 07:44:02	2015-05-01 07:50:29	6	Cambridge St. at Joy St.	42.361174	-71.065142	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	369	Subscriber	1980	2
24	553	2015-05-01 07:44:09	2015-05-01 07:53:23	38	TD Garden - Causeway at Portal Park #2	42.366222	-71.059914	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1017	Customer	1970	1
25	423	2015-05-01 07:47:31	2015-05-01 07:54:34	50	Boylston St / Berkeley St	42.350989	-71.073644	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	81	Customer	1974	2
26	786	2015-05-01 07:50:27	2015-05-01 08:03:34	39	Washington St. at Rutland St.	42.338623	-71.074182	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1047	Customer	1963	1
27	484	2015-05-01 07:52:16	2015-05-01 08:00:20	47	Cross St. at Hanover St.	42.362811	-71.056067	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	570	Customer	1984	2
28	448	2015-05-01 08:03:05	2015-05-01 08:10:34	109	TD Garden - Causeway at Portal Park #1	42.365942	-71.060515	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	395	Customer	1984	1
29	619	2015-05-01 08:06:49	2015-05-01 08:17:09	38	TD Garden - Causeway at Portal Park #2	42.366222	-71.059914	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1179	Customer	1957	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
32753	368	2016-10-31 17:03:25	2016-10-31 17:09:33	43	Rowes Wharf - Atlantic Ave	42.357143	-71.050699	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1605	Subscriber	1973	1
32754	373	2016-10-31 17:03:54	2016-10-31 17:10:07	23	Mayor Martin J Walsh - 28 State St	42.358920	-71.057629	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1091	Subscriber	1970	1
32755	1122	2016-10-31 17:04:03	2016-10-31 17:22:45	30	Brigham Cir / Huntington Ave	42.334073	-71.105221	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1731	Subscriber	1994	1
32756	821	2016-10-31 17:05:24	2016-10-31 17:19:05	32	Landmark Centre	42.343912	-71.102221	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1399	Subscriber	1986	1
32757	228	2016-10-31 17:06:47	2016-10-31 17:10:36	48	Post Office Square	42.356755	-71.055407	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	918	Subscriber	1961	1
32758	467	2016-10-31 17:07:39	2016-10-31 17:15:27	16	Back Bay / South End Station	42.348074	-71.076570	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	200	Subscriber	1987	1
32759	192	2016-10-31 17:07:56	2016-10-31 17:11:08	81	Boylston St / Washington St	42.352409	-71.062679	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	560	Subscriber	1957	1
32760	290	2016-10-31 17:08:04	2016-10-31 17:12:55	64	Congress / Sleeper	42.351100	-71.049600	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1681	Subscriber	1953	1
32761	348	2016-10-31 17:08:55	2016-10-31 17:14:44	40	Lewis Wharf - Atlantic Ave.	42.363871	-71.050877	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	256	Subscriber	1992	1
32762	654	2016-10-31 17:12:01	2016-10-31 17:22:55	136	ID Building West	42.344796	-71.031614	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1246	Subscriber	1957	2
32763	280	2016-10-31 17:12:08	2016-10-31 17:16:48	35	Franklin St. / Arch St.	42.355402	-71.058087	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	74	Customer	\N	0
32764	834	2016-10-31 17:12:30	2016-10-31 17:26:25	12	Ruggles Station / Columbus Ave.	42.335911	-71.088496	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1280	Subscriber	1966	1
32765	174	2016-10-31 17:15:06	2016-10-31 17:18:01	64	Congress / Sleeper	42.351100	-71.049600	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	252	Subscriber	1986	1
32766	772	2016-10-31 17:15:56	2016-10-31 17:28:48	136	ID Building West	42.344796	-71.031614	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	963	Customer	\N	0
32767	569	2016-10-31 17:22:46	2016-10-31 17:32:15	36	Boston Public Library - 700 Boylston St.	42.349673	-71.077303	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	920	Subscriber	1986	1
32768	410	2016-10-31 17:26:14	2016-10-31 17:33:04	136	ID Building West	42.344796	-71.031614	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1129	Subscriber	1961	1
32769	390	2016-10-31 17:31:24	2016-10-31 17:37:54	47	Cross St. at Hanover St.	42.362811	-71.056067	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1883	Subscriber	1960	1
32770	1028	2016-10-31 17:31:46	2016-10-31 17:48:55	14	HMS / HSPH - Ave. Louis Pasteur at Longwood Ave.	42.337171	-71.102797	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1474	Subscriber	1992	1
32771	636	2016-10-31 17:41:00	2016-10-31 17:51:36	61	Boylston at Fairfield	42.348762	-71.082383	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	928	Subscriber	1973	1
32772	141	2016-10-31 17:41:35	2016-10-31 17:43:57	192	Purchase St at Pearl St	42.354686	-71.053292	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1023	Subscriber	1973	1
32773	231	2016-10-31 17:41:48	2016-10-31 17:45:39	23	Mayor Martin J Walsh - 28 State St	42.358920	-71.057629	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	851	Subscriber	1951	1
32774	619	2016-10-31 17:49:01	2016-10-31 17:59:20	21	Prudential Center / Belvidere	42.345959	-71.082578	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	208	Subscriber	1988	1
32775	61475	2016-10-31 18:09:20	2016-11-01 11:13:55	13	Boston Medical Center - East Concord at Harri...	42.336437	-71.073089	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	762	Subscriber	1980	2
32776	493	2016-10-31 18:18:07	2016-10-31 18:26:20	135	ID Building East	42.344827	-71.028664	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1142	Subscriber	1956	1
32777	216	2016-10-31 19:17:15	2016-10-31 19:20:52	44	Faneuil Hall - Union St. at North St.	42.360613	-71.057452	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	216	Subscriber	1960	1
32778	1027	2016-10-31 19:41:00	2016-10-31 19:58:08	134	New Balance Store - Boylston at Dartmouth	42.350413	-71.076550	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	151	Subscriber	1981	2
32779	512	2016-10-31 20:16:37	2016-10-31 20:25:10	39	Washington St. at Rutland St.	42.338493	-71.074019	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1023	Subscriber	1984	1
32780	173	2016-10-31 20:43:23	2016-10-31 20:46:17	64	Congress / Sleeper	42.351100	-71.049600	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	378	Subscriber	1963	1
32781	391	2016-10-31 21:06:04	2016-10-31 21:12:35	40	Lewis Wharf - Atlantic Ave.	42.363871	-71.050877	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	1303	Subscriber	1986	1
32782	401	2016-10-31 21:53:45	2016-10-31 22:00:27	151	John F Fitzgerald - Surface Road at India Street	42.358155	-71.052163	22	South Station - 700 Atlantic Ave.	42.352175	-71.055547	849	Subscriber	1972	2

	tripduration	start station id	start station latitude	start station longitude	end station id	end station latitude	end station longitude	bikeid	gender
count	3.910100e+04	39101.000000	39101.000000	39101.000000	39101.000000	39101.000000	39101.000000	39101.000000	39101.000000
mean	2.793716e+03	75.742129	42.357064	-71.085892	75.742129	42.357064	-71.085892	788.698729	0.549142
std	1.090920e+04	46.215144	0.023174	0.030698	46.215144	0.023174	0.030698	458.357826	0.686643
min	6.100000e+01	1.000000	42.303469	-73.566921	1.000000	42.303469	-73.566921	1.000000	0.000000
25%	6.170000e+02	41.000000	42.348203	-71.108279	41.000000	42.348203	-71.108279	401.000000	0.000000
50%	1.419000e+03	67.000000	42.356052	-71.085918	67.000000	42.356052	-71.085918	788.000000	0.000000
75%	2.947000e+03	107.000000	42.366222	-71.063348	107.000000	42.366222	-71.063348	1151.000000	1.000000
max	1.126543e+06	218.000000	45.505086	-71.006212	218.000000	45.505086	-71.006212	1930.000000	2.000000

Erik Ellis // Technical Communications || Data Analytics

Wednesday, December 12, 2018

Hubway Capstone Project-- Preliminary Exploration of Data