NYC Taxi Demand Forecasting

Time series analysis and forecasting of New York City taxi demand using ETS models in R

Timeline 1 week

Status Completed

ETS Model Forecast - 30-Day NYC Taxi Demand Prediction

R tidyverse forecast tseries lubridate Time Series ETS Models Statistical Analysis

Project Overview

A comprehensive time series forecasting project that predicts New York City taxi demand using historical data from the NYC Taxi and Limousine Commission. This project demonstrates advanced time series analysis techniques using R programming.

            Business Impact: Accurate taxi demand forecasting helps optimize fleet management, reduce wait times, and improve overall transportation efficiency in urban environments.
          

Introduction

Problem Statement: New York City's taxi industry faces challenges in matching supply with fluctuating demand. Accurate forecasting enables better resource allocation and improved customer service.

Dataset: The dataset contains 10,320 observations of taxi demand recorded at 30-minute intervals, including timestamps and demand values ranging from 8 to 39,197 rides per interval.

            Methodology: Implemented in R using tidyverse for data manipulation, forecast package for ETS modeling, and tseries for stationarity testing.
          

Data Preprocessing

Data Transformation

Converted timestamp strings to datetime format
Aggregated 30-minute interval data to daily level for trend analysis
Handled time series conversion with proper frequency settings

Key Libraries Used

# Loading essential R libraries
library(tidyverse)    # Data manipulation and visualization
library(lubridate)    # Date-time operations
library(forecast)     # Time series forecasting
library(tseries)      # Stationarity testing
library(zoo)          # Time series objects

Exploratory Data Analysis

The time series visualization revealed important patterns in NYC taxi demand:

Long-term trends with demand fluctuations over time
Seasonal patterns with repeating high and low demand periods
Short-term spikes potentially due to external factors like weather or events

            Visualization: Created comprehensive line plots using ggplot2 to identify trends, seasonality, and anomalies in the daily aggregated data.
          

Stationarity Testing

Augmented Dickey-Fuller (ADF) Test

Null Hypothesis: Time series is non-stationary
Result: p-value = 0.0334
Conclusion: Reject null hypothesis - series is stationary

KPSS Test

Null Hypothesis: Time series is stationary
Result: p-value > 0.1
Conclusion: Fail to reject null hypothesis - series is stationary

            Key Finding: Both statistical tests confirmed the time series is stationary, eliminating the need for differencing or transformations before modeling.
          

Autocorrelation Analysis

ACF Plot Analysis

Showed how current values correlate with past values at different lags
Revealed significant autocorrelation at multiple lag periods
Helped identify the memory structure of the time series

PACF Plot Analysis

Measured direct correlations by removing intermediate lag effects
Provided insights into the appropriate model order
Supported the ETS model selection process

ETS Model Implementation

Model Selection: ETS(A,N,N)

Error: Additive (A)
Trend: None (N)
Seasonality: None (N)

Model Parameters

# ETS Model Summary
ETS(A,N,N)

Smoothing parameters:
  alpha = 0.9999

Initial states:
  l = 685786.821

sigma: 79791.59

0.9999

Smoothing Parameter (alpha)

8.93%

MAPE

79,419.6

RMSE

Results & Performance

Model Evaluation Metrics

RMSE: 79,419.6 - Measures average prediction error magnitude
MAPE: 8.93% - Percentage accuracy of forecasts
AIC: 6012.162 - Model quality indicator (lower is better)
ACF1: 0.0887 - Low residual autocorrelation indicates good fit

30-Day Forecast

Generated a 30-day forecast using the trained ETS model, providing valuable insights for:

Fleet management and resource allocation
Driver scheduling optimization
Demand anticipation for special events
Infrastructure planning

            Interpretation: The high smoothing parameter (alpha = 0.9999) indicates the model places strong emphasis on recent observations, making it responsive to recent demand changes while maintaining overall trend capture.
          

Back to Projects