Cyclistic_Case_Study_Using

A Capstone Project for the Google Data Analytics Professional Certificate

Discover Chicago on Two Wheels with Cyclistic! 🚲

Ever wonder how Chicagoans and visitors zip around the city? At Cyclistic, we were curious too! That’s why we delved into a year-long exploration, diving deep into our ride data. Our mission? To understand the unique ways our riders experience the Windy City.

Guess what we found? Fascinating trends and patterns that are now steering our next big moves. Armed with these insights, we’re fine-tuning our services and gearing up our marketing game to give you an even smoother ride around Chicago. Because for us, it’s not just about getting from point A to B; it’s about enhancing your journey every pedal of the way.

Ready to be a part of Cyclistic’s exciting future in urban transportation? Hop on, and let’s ride the end together!

Marketing Analytics Team’s Objectives 🎯

The Marketing Analytics team has established a clear directive:

Goal: Design and implement marketing strategies with a focus on converting casual riders into annual members.

To realize this ambition, the team recognizes the need for a deeper understanding of several facets:

Distinctions between annual members and casual riders.
Incentives that might persuade casual riders to opt for a membership.
The influence of digital media on their marketing strategies.

In addition to the above, there’s a marked interest from the Marketing Analytics team in dissecting Cyclistic’s historical bike trip data to pinpoint pivotal trends.

My Hypothesis 💡

While the primary focus remains on converting casual riders into annual members, considering the significant presence of casual riders, I suggest a dual-strategy approach. Not only should we direct efforts towards conversion, but we must also amplify our engagement strategies targeting the casual rider segment. By resonating with the preferences and requirements of this substantial segment, we may uncover greater opportunities and ensure elevated user satisfaction across the board.

Sneak Peek 🔍

My analysis indicates that Cyclistic’s annual members are mainly workday commuters who prefer shorter rides. At the same time, casual riders opt for relaxed weekend explorations. As summer is the prime biking season, I recommend implementing strategies tailored to these insights, such as offering a limited Spring 2024 Pass, exclusive digital perks, and targeted SEO campaigns. By leveraging these insights, Cyclistic can convert casual riders into committed members. Check out the report for a more detailed analysis.

To learn more about other projects I have, you can visit my website ➡️ click here

Stakeholders

Director of Marketing
Marketing Analytics team
Cyclistic users

Setting up my environment

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)
library(ggplot2)
library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(dplyr)
library(hms)

## 
## Attaching package: 'hms'
## 
## The following object is masked from 'package:lubridate':
## 
##     hms

Data Preparation 💻

I’ve sourced 12 months of trip data for Cyclistic from July 2022 to June 2023. Each month’s dataset resides in its own .csv file. This first-party data, which is directly gathered and utilized by Cyclistic, is the backbone of my analysis. For this capstone, the data is licensed from “divvybikes.com,” an established bike-sharing service in Chicago. It’s worth noting that all riders’ personal information has been thoroughly anonymized for privacy reasons.

Loading .csv files

jun23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202306-divvy-tripdata.csv")

## Rows: 719618 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

may23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202305-divvy-tripdata.csv")

## Rows: 604827 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

apr23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202304-divvy-tripdata.csv")

## Rows: 426590 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

mar23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202303-divvy-tripdata.csv")

## Rows: 258678 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

feb23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202302-divvy-tripdata.csv")

## Rows: 190445 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

jan23_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202301-divvy-tripdata.csv")

## Rows: 190301 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dec22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202212-divvy-tripdata.csv")

## Rows: 181806 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

nov22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202211-divvy-tripdata.csv")

## Rows: 337735 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

oct22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202210-divvy-tripdata.csv")

## Rows: 558685 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

sep22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202209-divvy-publictripdata.csv")

## Rows: 701339 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

aug22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202208-divvy-tripdata.csv")

## Rows: 785932 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

jul22_df <- read_csv("C:/Users/irmin/OneDrive/Desktop/Case_Study_R/202207-divvy-tripdata.csv")

## Rows: 823488 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

For sep22_df, the file name deviates from the usual pattern.

However, after inspecting the dataset, the columns are consistent with other monthly datasets.

Data Credibility: The ROCCC Approach

Reliable: The dataset is both complete and accurate, encapsulating every bike ride in Chicago during our analysis period.
Original: The data has been sourced from the Google Data Analytics Capstone: Complete a Case Study.
Comprehensive: It provides a holistic view of each ride, detailing the start/end times, station names and IDs, membership types, and more.
Current: The dataset is recent, covering up to the end of June 2023.
Cited: We’ve adhered to data licensing terms, sourcing our information under the appropriate Data License Agreement.

Cleaning

Checking Dataset Column Consistency

In this section, I will verify if all the datasets have the same column structure. This ensures that they’re consistent and compatible for my further analysis.

all_same_columns <- all(
  identical(colnames(jun23_df), colnames(may23_df)),
  identical(colnames(jun23_df), colnames(apr23_df)),
  identical(colnames(jun23_df), colnames(mar23_df)),
  identical(colnames(jun23_df), colnames(feb23_df)),
  identical(colnames(jun23_df), colnames(jan23_df)),
  identical(colnames(jun23_df), colnames(dec22_df)),
  identical(colnames(jun23_df), colnames(nov22_df)),
  identical(colnames(jun23_df), colnames(oct22_df)),
  identical(colnames(jun23_df), colnames(sep22_df)),
  identical(colnames(jun23_df), colnames(aug22_df)),
  identical(colnames(jun23_df), colnames(jul22_df))
)

if (all_same_columns) {
  cat("All datasets have the same columns.\n")
} else {
  cat("Not all datasets have the same columns.\n")
}

## All datasets have the same columns.

All datasets have the same columns.

Let’s Dive Deep Into Cyclistic’s Data 🚴

Step 1: First thing’s first! Merging all our monthly data sets - from July 2022 to June 2023 - into a single dataframe.

cyclistic_df <- rbind(jun23_df, may23_df, apr23_df, mar23_df, feb23_df, jan23_df, dec22_df, nov22_df, oct22_df, sep22_df, aug22_df, jul22_df)

Step 2: Now, I’m zooming into the essential details. Let’s keep our data tidy by focusing only on the columns we really need.

trimmed_cyclistic_df <- select(cyclistic_df, rideable_type, started_at, ended_at, member_casual)

Step 3: Keeping things neat! I’ll clone our trimmed dataframe so that we can freely manipulate without messing the original.

cyclistic_new_df <- trimmed_cyclistic_df

Step 4: Time for some calculations! Figuring out how long each ride lasted.

cyclistic_new_df$ride_length <- difftime(cyclistic_new_df$ended_at, cyclistic_new_df$started_at, units = "mins")
cyclistic_new_df$ride_length <- round(cyclistic_new_df$ride_length, digits = 1)

Step 5: Knowledge is power, right? Let’s enrich our dataset with some juicy time details.

cyclistic_new_df$date <- as.Date(cyclistic_new_df$started_at) 
cyclistic_new_df$day_of_week <- format(as.Date(cyclistic_new_df$date),"%A") 
cyclistic_new_df$month <- format(as.Date(cyclistic_new_df$date), "%m") 
cyclistic_new_df$day <- format(as.Date(cyclistic_new_df$date), "%d") 
cyclistic_new_df$year <- format(as.Date(cyclistic_new_df$date), "%Y") 
cyclistic_new_df$time <- as_hms(cyclistic_new_df$started_at) 
cyclistic_new_df$hour <- hour(cyclistic_new_df$time)

Step 6: A sneak peek! Let’s see the first few rows of our enriched data.

head(cyclistic_new_df)

## # A tibble: 6 × 12
##   rideable_type started_at          ended_at            member_casual
##   <chr>         <dttm>              <dttm>              <chr>        
## 1 electric_bike 2023-06-05 13:34:12 2023-06-05 14:31:56 member       
## 2 electric_bike 2023-06-05 01:30:22 2023-06-05 01:33:06 member       
## 3 electric_bike 2023-06-20 18:15:49 2023-06-20 18:32:05 member       
## 4 electric_bike 2023-06-19 14:56:00 2023-06-19 15:00:35 member       
## 5 electric_bike 2023-06-19 15:03:34 2023-06-19 15:07:16 member       
## 6 electric_bike 2023-06-09 21:30:25 2023-06-09 21:49:52 member       
## # ℹ 8 more variables: ride_length <drtn>, date <date>, day_of_week <chr>,
## #   month <chr>, day <chr>, year <chr>, time <time>, hour <int>

Step 7: The final touch! Cleaning is essential, be it our rooms or our data. Let’s dust off any inconsistencies.

cyclistic_new_df <- na.omit(cyclistic_new_df) 
cyclistic_new_df <- distinct(cyclistic_new_df) 
cyclistic_new_df <- cyclistic_new_df[!(cyclistic_new_df$ride_length <=0),]

There you have it! With these steps, we’ve paved a smooth path to dive into deeper analyses and visuals. Let’s roll!

Deep Dive: Analyzing the Ride Data

1. General Overview

overall_stats <- cyclistic_new_df %>% 
  summarise(
    total_rides = n(),
    avg_ride_length = mean(ride_length, na.rm = TRUE),
    busiest_time = hms::as_hms(as.numeric(names(which.max(table(hour)))) * 3600),
    busiest_weekday = names(which.max(table(day_of_week))),
    busiest_month = month.name[as.numeric(names(which.max(table(month))))],
    most_popular_bike = names(which.max(table(rideable_type))),
    most_rides = ifelse(sum(member_casual == "member") > sum(member_casual == "casual"), "Members", "Casual Riders")
  )
overall_stats

## # A tibble: 1 × 7
##   total_rides avg_ride_length busiest_time busiest_weekday busiest_month
##         <int> <drtn>          <time>       <chr>           <chr>        
## 1     5768331 18.37589 mins   17:00        Saturday        July         
## # ℹ 2 more variables: most_popular_bike <chr>, most_rides <chr>

2. Breaking Down by Customer Type

stats_by_type <- cyclistic_new_df %>%
  group_by(member_casual) %>%
  summarise(
    total_rides = n(),
    avg_ride_length = mean(ride_length, na.rm = TRUE),
    busiest_time = names(which.max(table(hour))),
    busiest_weekday = names(which.max(table(day_of_week))),
    busiest_month = names(which.max(table(month))),
    most_popular_bike = names(which.max(table(rideable_type)))
  )
stats_by_type

## # A tibble: 2 × 7
##   member_casual total_rides avg_ride_length busiest_time busiest_weekday
##   <chr>               <int> <drtn>          <chr>        <chr>          
## 1 casual            2240782 27.76864 mins   17           Saturday       
## 2 member            3527549 12.40939 mins   17           Wednesday      
## # ℹ 2 more variables: busiest_month <chr>, most_popular_bike <chr>

3. Weekday Analysis

avg_ride_by_weekday <- cyclistic_new_df %>%
  group_by(member_casual, day_of_week) %>%
  summarise(average_ride_length = mean(ride_length, na.rm = TRUE), .groups = 'drop') %>%
  spread(key = member_casual, value = average_ride_length) %>%
  arrange(desc(casual), desc(member))
avg_ride_by_weekday

## # A tibble: 7 × 3
##   day_of_week casual        member       
##   <chr>       <drtn>        <drtn>       
## 1 Sunday      32.82831 mins 13.69855 mins
## 2 Saturday    32.39272 mins 13.98907 mins
## 3 Monday      27.21524 mins 11.79224 mins
## 4 Friday      26.95822 mins 12.33629 mins
## 5 Tuesday     24.24343 mins 11.86197 mins
## 6 Thursday    23.29346 mins 11.91361 mins
## 7 Wednesday   23.28850 mins 11.84987 mins

4. Members vs. Casual Riders

most_rides <- cyclistic_new_df %>%
  count(member_casual) %>%
  arrange(desc(n))
most_rides

## # A tibble: 2 × 2
##   member_casual       n
##   <chr>           <int>
## 1 member        3527549
## 2 casual        2240782

Visualizing 🎨

Total Rides

stats_by_type <- stats_by_type %>%
  arrange(desc(member_casual)) %>%
  mutate(
    percent = total_rides / sum(total_rides) * 100,
    label = sprintf("%s\n%.1f%%", scales::comma(total_rides), percent),
    lab.y = cumsum(total_rides) - 0.5 * total_rides
  )

stats_by_type %>%
  ggplot(aes(x = "", y = total_rides, fill = member_casual)) +
  geom_bar(width = 1, stat = "identity") +
  geom_text(aes(label = label, y = lab.y), size = 4) +
  coord_polar(theta = "y") +
  labs(title = "Total Rides By Customer Type", fill = "Customer Type", caption = "Data represents rides from Jan to Dec 2023.") +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank()
  )

Average Ride Length

stats_by_type %>%
  ggplot(aes(x = member_casual, y = avg_ride_length, fill = member_casual)) +
  geom_col(show.legend = FALSE) +
  geom_text(aes(label = sprintf("%.2f mins", avg_ride_length)), vjust = -0.25, size = 4) +
  labs(title = "Average Ride Length By Customer Type",
       y = "Ride Length (minutes)",
       x = "Customer Type") +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(face = "bold", size = 12))

## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

Weekday Analysis

Computing the average ride length by weekday for each member_casual category

avg_ride_by_weekday <- cyclistic_new_df %>%
  group_by(member_casual, day_of_week) %>%
  summarize(average_ride_length = mean(ride_length, na.rm = TRUE)) %>%
  arrange(member_casual, day_of_week)

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Plotting the average ride length by weekday

avg_ride_by_weekday %>%
  ggplot(aes(x = factor(day_of_week, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), 
             y = average_ride_length, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.8) +
  labs(title = "Average Ride Length by Weekday", 
       y = "Average Ride Length (minutes)", 
       x = "") +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_light() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.title = element_blank())

## Don't know how to automatically pick scale for object of type <difftime>.
## Defaulting to continuous.

Understanding Customer Behavior by Analyzing Bike Usage by Hour

Function to compute stats by hour

stats_by_time_function <- function(df) {
  df %>%
    group_by(hour, member_casual) %>%
    summarize(number_of_trips = n()) %>%
    arrange(hour, member_casual)
}

stats_by_time <- stats_by_time_function(cyclistic_new_df)

## `summarise()` has grouped output by 'hour'. You can override using the
## `.groups` argument.

Plotting the results

stats_by_time %>%
  ggplot(aes(x = hour, y = number_of_trips, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.7) +
  labs(title = "Bike Demand by Hour", 
       y = "Number of Rides", 
       x = "Hour of Day") +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.title = element_blank())

Understanding Customer Behavior by Analyzing Bike Usage by Day of the Week

Function to compute stats by weekday

stats_by_weekday_function <- function(df) {
  df %>%
    group_by(day_of_week, member_casual) %>%
    summarize(number_of_rides = n(), .groups = "drop") %>%
    mutate(day_of_week = factor(day_of_week, levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))) %>%
    arrange(day_of_week)
}

Function to plot stats by weekday

plot_by_weekday <- function(data) {
  data %>%
    ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) +
    geom_col(position = "dodge", width = 0.7) +
    labs(title = "Bike Rides by Day of the Week", 
         y = "Number of Rides", 
         x = "Day of the Week") +
    scale_y_continuous(labels = scales::comma) +
    scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          plot.title = element_text(hjust = 0.5),
          legend.title = element_blank())
}

Compute stats and plot

weekday_data <- stats_by_weekday_function(cyclistic_new_df)
plot_by_weekday(weekday_data)

Understanding Customer Behavior by Analyzing Bike Usage by Month of the Year

Function to compute stats by month

stats_by_month_function <- function(df) {
  df %>%
    group_by(month, member_casual) %>%
    summarize(number_of_rides = n(), .groups = "drop") %>%
    arrange(match(month, month.name), member_casual)
}

Compute stats

stats_by_month_data <- stats_by_month_function(cyclistic_new_df)

Plotting the results

stats_by_month_data %>%
  ggplot(aes(x = month, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.7) +
  labs(title = "Bike Rides by Month of the Year", 
       y = "Number of Rides", 
       x = "Month") +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_manual(values = c("casual" = "#B3DDF2", "member" = "#3F81B6")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        legend.title = element_blank())

Differences in Usage Between Annual Members and Casual Riders

Frequency & Duration

Members: Represent 61.2% of total rides with brisk, purpose-driven rides averaging 12.41 minutes.
Casual Riders: Constitute 38.8% of rides, indulging in more leisurely rides, averaging 27.77 minutes.

Peak Times & Weekday Patterns

Shared Peak Time: Both groups converge in usage around 17:00 (5 pm).
Usage Pattern: Members lean towards weekdays, likely for commuting. Casual riders are active on weekends, suggesting recreational intent.

Seasonal Trends

Summer Surge: Summer sees a ride surge for both categories. Members, however, persist with high usage into September.

Recommendations for Encouraging Casual Riders to Purchase Annual Memberships

Cost Efficiency for Regular Riders

Spring 2024 Pass: A limited three-month discounted pass for spring can attract casual riders by demonstrating value over single-use fares.

Weekend Incentives

Weekend Wanderer Pass: Casual riders often use the service during weekends, averaging 27.7 minutes per ride. A specialized weekend pass can extend their rides without time concerns, increasing brand affinity.

Exclusive Benefits & Perks

Member-Only Rewards: By introducing exclusive perks like special bikes, discounts, or early access events for members, Cyclistic can create a compelling case for casual riders to commit to longer-term memberships.

How can Cyclistic use digital media to influence casual riders to become members?

Targeted Advertising

Amplify brand awareness with targeted campaigns at high-frequency stations during peak hours (5 p.m.) and summer months.

Engaging Content

Weekend Wonders: Collaborate with renowned influencers to spotlight Chicago’s prime locations best explored by bike.
Healthy Commutes: Promote the health benefits of daily bike commuting.
Transformational Tales: Share impactful stories of individuals whose lives transformed positively through regular cycling.

Exclusive Digital Offers

Unveil time-limited membership deals targeting casual riders on emails, social media, and partner apps.

Seasonal Campaigns

Capitalize on summer’s allure with offers such as “Summer Special 2024” or September’s “Ride into Autumn 2024” to captivate casual riders.

Interactive Tools

Launch a “Savings Calculator” online, empowering casual riders to visualize potential savings with memberships.

SEO Strategy

Search Intent: Boost organic reach with an SEO campaign focusing on queries about cycling in Chicago.
Engaging Queries: Address questions like “Best bike spots in Chicago” or “Chicago bike rental costs” to build a memorable brand presence.

Cyclistic_Case_Study_Using_R

Irmin Corona

2023-10-10

A Capstone Project for the Google Data Analytics Professional Certificate

Discover Chicago on Two Wheels with Cyclistic! 🚲

Marketing Analytics Team’s Objectives 🎯

My Hypothesis 💡

Sneak Peek 🔍

To learn more about other projects I have, you can visit my website ➡️ click here

Stakeholders

Setting up my environment

Data Preparation 💻

Loading .csv files

For sep22_df, the file name deviates from the usual pattern.

However, after inspecting the dataset, the columns are consistent with other monthly datasets.

Data Credibility: The ROCCC Approach

Cleaning

Checking Dataset Column Consistency

All datasets have the same columns.

Let’s Dive Deep Into Cyclistic’s Data 🚴

Step 1: First thing’s first! Merging all our monthly data sets - from July 2022 to June 2023 - into a single dataframe.

Step 2: Now, I’m zooming into the essential details. Let’s keep our data tidy by focusing only on the columns we really need.

Step 3: Keeping things neat! I’ll clone our trimmed dataframe so that we can freely manipulate without messing the original.

Step 4: Time for some calculations! Figuring out how long each ride lasted.

Step 5: Knowledge is power, right? Let’s enrich our dataset with some juicy time details.

Step 6: A sneak peek! Let’s see the first few rows of our enriched data.

Step 7: The final touch! Cleaning is essential, be it our rooms or our data. Let’s dust off any inconsistencies.

There you have it! With these steps, we’ve paved a smooth path to dive into deeper analyses and visuals. Let’s roll!

Deep Dive: Analyzing the Ride Data

1. General Overview

2. Breaking Down by Customer Type

3. Weekday Analysis

4. Members vs. Casual Riders

Visualizing 🎨

Total Rides

Average Ride Length

Weekday Analysis

Computing the average ride length by weekday for each member_casual category

Plotting the average ride length by weekday

Understanding Customer Behavior by Analyzing Bike Usage by Hour

Function to compute stats by hour

Plotting the results

Understanding Customer Behavior by Analyzing Bike Usage by Day of the Week

Function to compute stats by weekday

Function to plot stats by weekday

Compute stats and plot

Understanding Customer Behavior by Analyzing Bike Usage by Month of the Year

Function to compute stats by month

Compute stats

Plotting the results

Key Findings for Cyclistic Bike Share Usage 🔑

My analysis of Cyclistic’s bike share data has revealed interesting usage patterns:

Differences in Usage Between Annual Members and Casual Riders

Frequency & Duration

Peak Times & Weekday Patterns

Seasonal Trends

Recommendations for Encouraging Casual Riders to Purchase Annual Memberships

Cost Efficiency for Regular Riders

Weekend Incentives

Exclusive Benefits & Perks

How can Cyclistic use digital media to influence casual riders to become members?

Targeted Advertising

Engaging Content

Exclusive Digital Offers

Seasonal Campaigns

Interactive Tools

SEO Strategy