A Capstone Project for the Google Data Analytics Professional Certificate

Discover Chicago on Two Wheels with Cyclistic! 🚲

Ever wonder how Chicagoans and visitors zip around the city? At Cyclistic, we were curious too! That’s why we delved into a year-long exploration, diving deep into our ride data. Our mission? To understand the unique ways our riders experience the Windy City.

Guess what we found? Fascinating trends and patterns that are now steering our next big moves. Armed with these insights, we’re fine-tuning our services and gearing up our marketing game to give you an even smoother ride around Chicago. Because for us, it’s not just about getting from point A to B; it’s about enhancing your journey every pedal of the way.

Ready to be a part of Cyclistic’s exciting future in urban transportation? Hop on, and let’s ride the end together!

Marketing Analytics Team’s Objectives 🎯

The Marketing Analytics team has established a clear directive:

  • Goal: Design and implement marketing strategies with a focus on converting casual riders into annual members.

To realize this ambition, the team recognizes the need for a deeper understanding of several facets:

  • Distinctions between annual members and casual riders.
  • Incentives that might persuade casual riders to opt for a membership.
  • The influence of digital media on their marketing strategies.

In addition to the above, there’s a marked interest from the Marketing Analytics team in dissecting Cyclistic’s historical bike trip data to pinpoint pivotal trends.

My Hypothesis đź’ˇ

While the primary focus remains on converting casual riders into annual members, considering the significant presence of casual riders, I suggest a dual-strategy approach. Not only should we direct efforts towards conversion, but we must also amplify our engagement strategies targeting the casual rider segment. By resonating with the preferences and requirements of this substantial segment, we may uncover greater opportunities and ensure elevated user satisfaction across the board.

Sneak Peek 🔍

My analysis indicates that Cyclistic’s annual members are mainly workday commuters who prefer shorter rides. At the same time, casual riders opt for relaxed weekend explorations. As summer is the prime biking season, I recommend implementing strategies tailored to these insights, such as offering a limited Spring 2024 Pass, exclusive digital perks, and targeted SEO campaigns. By leveraging these insights, Cyclistic can convert casual riders into committed members. Check out the report for a more detailed analysis.

To learn more about other projects I have, you can visit my website ➡️ click here

Stakeholders

  • Director of Marketing
  • Marketing Analytics team
  • Cyclistic users

Setting up my environment

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.3     âś” readr     2.1.4
## âś” forcats   1.0.0     âś” stringr   1.5.0
## âś” ggplot2   3.4.3     âś” tibble    3.2.1
## âś” lubridate 1.9.3     âś” tidyr     1.3.0
## âś” purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## â„ą Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(dplyr)
library(hms)
## 
## Attaching package: 'hms'
## 
## The following object is masked from 'package:lubridate':
## 
##     hms

Data Preparation đź’»

I’ve sourced 12 months of trip data for Cyclistic from July 2022 to June 2023. Each month’s dataset resides in its own .csv file. This first-party data, which is directly gathered and utilized by Cyclistic, is the backbone of my analysis. For this capstone, the data is licensed from “divvybikes.com,” an established bike-sharing service in Chicago. It’s worth noting that all riders’ personal information has been thoroughly anonymized for privacy reasons.

Loading .csv files

jun23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202306-divvy-tripdata.csv")
## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
may23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202305-divvy-tripdata.csv")
## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
apr23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202304-divvy-tripdata.csv")
## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
mar23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202303-divvy-tripdata.csv")
## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
feb23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202302-divvy-tripdata.csv")
## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
jan23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202301-divvy-tripdata.csv")
## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
dec22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202212-divvy-tripdata.csv")
## Rows: 181806 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
nov22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202211-divvy-tripdata.csv")
## Rows: 337735 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
oct22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202210-divvy-tripdata.csv")
## Rows: 558685 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
sep22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202209-divvy-publictripdata.csv")
## Rows: 701339 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
aug22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202208-divvy-tripdata.csv")
## Rows: 785932 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
jul22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202207-divvy-tripdata.csv")
## Rows: 823488 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## â„ą Use `spec()` to retrieve the full column specification for this data.
## â„ą Specify the column types or set `show_col_types = FALSE` to quiet this message.
For sep22_df, the file name deviates from the usual pattern.
However, after inspecting the dataset, the columns are consistent with other monthly datasets.

Data Credibility: The ROCCC Approach

  • Reliable: The dataset is both complete and accurate, encapsulating every bike ride in Chicago during our analysis period.

  • Original: The data has been sourced from the Google Data Analytics Capstone: Complete a Case Study.

  • Comprehensive: It provides a holistic view of each ride, detailing the start/end times, station names and IDs, membership types, and more.

  • Current: The dataset is recent, covering up to the end of June 2023.

  • Cited: We’ve adhered to data licensing terms, sourcing our information under the appropriate Data License Agreement.

Cleaning

Checking Dataset Column Consistency

In this section, I will verify if all the datasets have the same column structure. This ensures that they’re consistent and compatible for my further analysis.

all_same_columns <- all(
  identical(colnames(jun23_df), colnames(may23_df)),
  identical(colnames(jun23_df), colnames(apr23_df)),
  identical(colnames(jun23_df), colnames(mar23_df)),
  identical(colnames(jun23_df), colnames(feb23_df)),
  identical(colnames(jun23_df), colnames(jan23_df)),
  identical(colnames(jun23_df), colnames(dec22_df)),
  identical(colnames(jun23_df), colnames(nov22_df)),
  identical(colnames(jun23_df), colnames(oct22_df)),
  identical(colnames(jun23_df), colnames(sep22_df)),
  identical(colnames(jun23_df), colnames(aug22_df)),
  identical(colnames(jun23_df), colnames(jul22_df))
)

if (all_same_columns) {
  cat("All datasets have the same columns.\n")
} else {
  cat("Not all datasets have the same columns.\n")
}
## All datasets have the same columns.
All datasets have the same columns.

Let’s Dive Deep Into Cyclistic’s Data 🚴

Step 1: First thing’s first! Merging all our monthly data sets - from July 2022 to June 2023 - into a single dataframe.

cyclistic_df <- rbind(jun23_df, may23_df, apr23_df, mar23_df, feb23_df, jan23_df, dec22_df, nov22_df, oct22_df, sep22_df, aug22_df, jul22_df)

Step 2: Now, I’m zooming into the essential details. Let’s keep our data tidy by focusing only on the columns we really need.

trimmed_cyclistic_df <- select(cyclistic_df, rideable_type, started_at, ended_at, member_casual)

Step 3: Keeping things neat! I’ll clone our trimmed dataframe so that we can freely manipulate without messing the original.

cyclistic_new_df <- trimmed_cyclistic_df

Step 4: Time for some calculations! Figuring out how long each ride lasted.

cyclistic_new_df$ride_length <- difftime(cyclistic_new_df$ended_at, cyclistic_new_df$started_at, units = "mins")
cyclistic_new_df$ride_length <- round(cyclistic_new_df$ride_length, digits = 1)

Step 5: Knowledge is power, right? Let’s enrich our dataset with some juicy time details.

cyclistic_new_df$date <- as.Date(cyclistic_new_df$started_at) 
cyclistic_new_df$day_of_week <- format(as.Date(cyclistic_new_df$date),"%A") 
cyclistic_new_df$month <- format(as.Date(cyclistic_new_df$date), "%m") 
cyclistic_new_df$day <- format(as.Date(cyclistic_new_df$date), "%d") 
cyclistic_new_df$year <- format(as.Date(cyclistic_new_df$date), "%Y") 
cyclistic_new_df$time <- as_hms(cyclistic_new_df$started_at) 
cyclistic_new_df$hour <- hour(cyclistic_new_df$time) 

Step 6: A sneak peek! Let’s see the first few rows of our enriched data.

head(cyclistic_new_df)
## # A tibble: 6 Ă— 12
##   rideable_type started_at          ended_at            member_casual
##   <chr>         <dttm>              <dttm>              <chr>        
## 1 electric_bike 2023-06-05 13:34:12 2023-06-05 14:31:56 member       
## 2 electric_bike 2023-06-05 01:30:22 2023-06-05 01:33:06 member       
## 3 electric_bike 2023-06-20 18:15:49 2023-06-20 18:32:05 member       
## 4 electric_bike 2023-06-19 14:56:00 2023-06-19 15:00:35 member       
## 5 electric_bike 2023-06-19 15:03:34 2023-06-19 15:07:16 member       
## 6 electric_bike 2023-06-09 21:30:25 2023-06-09 21:49:52 member       
## # â„ą 8 more variables: ride_length <drtn>, date <date>, day_of_week <chr>,
## #   month <chr>, day <chr>, year <chr>, time <time>, hour <int>

Step 7: The final touch! Cleaning is essential, be it our rooms or our data. Let’s dust off any inconsistencies.

cyclistic_new_df <- na.omit(cyclistic_new_df) 
cyclistic_new_df <- distinct(cyclistic_new_df) 
cyclistic_new_df <- cyclistic_new_df[!(cyclistic_new_df$ride_length <=0),] 

There you have it! With these steps, we’ve paved a smooth path to dive into deeper analyses and visuals. Let’s roll!

Deep Dive: Analyzing the Ride Data

1. General Overview

overall_stats <- cyclistic_new_df %>% 
  summarise(
    total_rides = n(),
    avg_ride_length = mean(ride_length, na.rm = TRUE),
    busiest_time = hms::as_hms(as.numeric(names(which.max(table(hour)))) * 3600),
    busiest_weekday = names(which.max(table(day_of_week))),
    busiest_month = month.name[as.numeric(names(which.max(table(month))))],
    most_popular_bike = names(which.max(table(rideable_type))),
    most_rides = ifelse(sum(member_casual == "member") > sum(member_casual == "casual"), "Members", "Casual Riders")
  )
overall_stats
## # A tibble: 1 Ă— 7
##   total_rides avg_ride_length busiest_time busiest_weekday busiest_month
##         <int> <drtn>          <time>       <chr>           <chr>        
## 1     5768331 18.37589 mins   17:00        Saturday        July         
## # â„ą 2 more variables: most_popular_bike <chr>, most_rides <chr>

2. Breaking Down by Customer Type

stats_by_type <- cyclistic_new_df %>%
  group_by(member_casual) %>%
  summarise(
    total_rides = n(),
    avg_ride_length = mean(ride_length, na.rm = TRUE),
    busiest_time = names(which.max(table(hour))),
    busiest_weekday = names(which.max(table(day_of_week))),
    busiest_month = names(which.max(table(month))),
    most_popular_bike = names(which.max(table(rideable_type)))
  )
stats_by_type
## # A tibble: 2 Ă— 7
##   member_casual total_rides avg_ride_length busiest_time busiest_weekday
##   <chr>               <int> <drtn>          <chr>        <chr>          
## 1 casual            2240782 27.76864 mins   17           Saturday       
## 2 member            3527549 12.40939 mins   17           Wednesday      
## # â„ą 2 more variables: busiest_month <chr>, most_popular_bike <chr>

3. Weekday Analysis

avg_ride_by_weekday <- cyclistic_new_df %>%
  group_by(member_casual, day_of_week) %>%
  summarise(average_ride_length = mean(ride_length, na.rm = TRUE), .groups = 'drop') %>%
  spread(key = member_casual, value = average_ride_length) %>%
  arrange(desc(casual), desc(member))
avg_ride_by_weekday
## # A tibble: 7 Ă— 3
##   day_of_week casual        member       
##   <chr>       <drtn>        <drtn>       
## 1 Sunday      32.82831 mins 13.69855 mins
## 2 Saturday    32.39272 mins 13.98907 mins
## 3 Monday      27.21524 mins 11.79224 mins
## 4 Friday      26.95822 mins 12.33629 mins
## 5 Tuesday     24.24343 mins 11.86197 mins
## 6 Thursday    23.29346 mins 11.91361 mins
## 7 Wednesday   23.28850 mins 11.84987 mins

4. Members vs. Casual Riders

most_rides <- cyclistic_new_df %>%
  count(member_casual) %>%
  arrange(desc(n))
most_rides
## # A tibble: 2 Ă— 2
##   member_casual       n
##   <chr>           <int>
## 1 member        3527549
## 2 casual        2240782

Visualizing 🎨

Total Rides

stats_by_type <- stats_by_type %>%
  arrange(desc(member_casual)) %>%
  mutate(
    percent = total_rides / sum(total_rides) * 100,
    label = sprintf("%s\n%.1f%%", scales::comma(total_rides), percent),
    lab.y = cumsum(total_rides) - 0.5 * total_rides
  )

stats_by_type %>%
  ggplot(aes(x = "", y = total_rides, fill = member_casual)) +
  geom_bar(width = 1, stat = "identity") +
  geom_text(aes(label = label, y = lab.y), size = 4) +
  coord_polar(theta = "y") +
  labs(title = "Total Rides By Customer Type", fill = "Customer Type", caption = "Data represents rides from Jan to Dec 2023.") +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )

Average Ride Length

stats_by_type %>%
  ggplot(aes(x = member_casual, y = avg_ride_length, fill = member_casual)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = sprintf("%.2f mins", avg_ride_length)), vjust = -0.25, size = 4) +
  labs(title = "Average Ride Length By Customer Type",
       y = "Ride Length (minutes)",
       x = "Customer Type") +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(face = "bold", size = 12))
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

Weekday Analysis

Computing the average ride length by weekday for each member_casual category

avg_ride_by_weekday <- cyclistic_new_df %>%
  group_by(member_casual, day_of_week) %>%
  summarize(average_ride_length = mean(ride_length, na.rm = TRUE)) %>%
  arrange(member_casual, day_of_week)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Plotting the average ride length by weekday

avg_ride_by_weekday %>%
  ggplot(aes(x = factor(day_of_week, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), 
             y = average_ride_length, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.8) +
  labs(title = "Average Ride Length by Weekday", 
       y = "Average Ride Length (minutes)", 
       x = "") +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_light() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.title = element_blank())
## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

Understanding Customer Behavior by Analyzing Bike Usage by Hour

Function to compute stats by hour

stats_by_time_function <- function(df) {
  df %>%
    group_by(hour, member_casual) %>%
    summarize(number_of_trips = n()) %>%
    arrange(hour, member_casual)
}

stats_by_time <- stats_by_time_function(cyclistic_new_df)
## `summarise()` has grouped output by 'hour'. You can override using the
## `.groups` argument.

Plotting the results

stats_by_time %>%
  ggplot(aes(x = hour, y = number_of_trips, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.7) +
  labs(title = "Bike Demand by Hour", 
       y = "Number of Rides", 
       x = "Hour of Day") +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.title = element_blank())

Understanding Customer Behavior by Analyzing Bike Usage by Day of the Week

Function to compute stats by weekday
stats_by_weekday_function <- function(df) {
  df %>%
    group_by(day_of_week, member_casual) %>%
    summarize(number_of_rides = n(), .groups = "drop") %>%
    mutate(day_of_week = factor(day_of_week, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))) %>%
    arrange(day_of_week)
}

Function to plot stats by weekday

plot_by_weekday <- function(data) {
  data %>%
    ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) +
    geom_col(position = "dodge", width = 0.7) +
    labs(title = "Bike Rides by Day of the Week", 
         y = "Number of Rides", 
         x = "Day of the Week") +
    scale_y_continuous(labels = scales::comma) +
    scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          plot.title = element_text(hjust = 0.5),
          legend.title = element_blank())
}

Compute stats and plot

weekday_data <- stats_by_weekday_function(cyclistic_new_df)
plot_by_weekday(weekday_data)

Understanding Customer Behavior by Analyzing Bike Usage by Month of the Year

Function to compute stats by month

stats_by_month_function <- function(df) {
  df %>%
    group_by(month, member_casual) %>%
    summarize(number_of_rides = n(), .groups = "drop") %>%
    arrange(match(month, month.name), member_casual)
}

Compute stats

stats_by_month_data <- stats_by_month_function(cyclistic_new_df)

Plotting the results

stats_by_month_data %>%
  ggplot(aes(x = month, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.7) +
  labs(title = "Bike Rides by Month of the Year", 
       y = "Number of Rides", 
       x = "Month") +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.title = element_blank())

Key Findings for Cyclistic Bike Share Usage 🔑

My analysis of Cyclistic’s bike share data has revealed interesting usage patterns:

  • Ride Frequency: The bulk of the rides are dominated by members, contributing to 61.2% of the total. In comparison, casual riders are responsible for 38.8% of the rides. This skew underscores the loyalty and habitual usage patterns of members.

  • Ride Duration Disparity: Casual riders are not in a hurry, it seems! Their rides average 27.77 minutes, a leisurely pace in contrast to the brisk 12.41 minutes that members spend on average for their trips. The shorter rides of members may hint at utilitarian trips, while casual users indulge in longer, exploratory rides.

  • Peak Usage Times: A synchrony is observed in the usage patterns of both member and casual riders around 17:00 (5 pm). This popular window may coincide with the end of a workday, an opportune moment for a ride.

  • Weekday Analysis: The weekdays are bustling for members, suggesting a potential alignment with work-related commutes. On the flip side, casual riders are most active during the weekends, painting a picture of recreational usage.

  • Monthly Trends: Summer is the reigning champion for both categories of riders. June, July, and August see the highest usage. A noteworthy mention is the continued preference of members for biking even into September, possibly stretching their work-related commutes a bit more before winter sets in.

Differences in Usage Between Annual Members and Casual Riders

Frequency & Duration

  • Members: Represent 61.2% of total rides with brisk, purpose-driven rides averaging 12.41 minutes.
  • Casual Riders: Constitute 38.8% of rides, indulging in more leisurely rides, averaging 27.77 minutes.

Peak Times & Weekday Patterns

  • Shared Peak Time: Both groups converge in usage around 17:00 (5 pm).
  • Usage Pattern: Members lean towards weekdays, likely for commuting. Casual riders are active on weekends, suggesting recreational intent.

Recommendations for Encouraging Casual Riders to Purchase Annual Memberships

Cost Efficiency for Regular Riders

  • Spring 2024 Pass: A limited three-month discounted pass for spring can attract casual riders by demonstrating value over single-use fares.

Weekend Incentives

  • Weekend Wanderer Pass: Casual riders often use the service during weekends, averaging 27.7 minutes per ride. A specialized weekend pass can extend their rides without time concerns, increasing brand affinity.

Exclusive Benefits & Perks

  • Member-Only Rewards: By introducing exclusive perks like special bikes, discounts, or early access events for members, Cyclistic can create a compelling case for casual riders to commit to longer-term memberships.

How can Cyclistic use digital media to influence casual riders to become members?

Targeted Advertising

Amplify brand awareness with targeted campaigns at high-frequency stations during peak hours (5 p.m.) and summer months.

Engaging Content

  • Weekend Wonders: Collaborate with renowned influencers to spotlight Chicago’s prime locations best explored by bike.

  • Healthy Commutes: Promote the health benefits of daily bike commuting.

  • Transformational Tales: Share impactful stories of individuals whose lives transformed positively through regular cycling.

Exclusive Digital Offers

Unveil time-limited membership deals targeting casual riders on emails, social media, and partner apps.

Seasonal Campaigns

Capitalize on summer’s allure with offers such as “Summer Special 2024” or September’s “Ride into Autumn 2024” to captivate casual riders.

Interactive Tools

Launch a “Savings Calculator” online, empowering casual riders to visualize potential savings with memberships.

SEO Strategy

  • Search Intent: Boost organic reach with an SEO campaign focusing on queries about cycling in Chicago.

  • Engaging Queries: Address questions like “Best bike spots in Chicago” or “Chicago bike rental costs” to build a memorable brand presence.