Skip to main content

What is demand forecasting for hotels? How does it help?

Problem: Hotels face “random” demand spikes tied to local events (concerts, UT games, conferences). Most forecasting tools ignore events out-of-the-box, so teams either underprice sell-out nights or overprice and leave rooms empty; fixing this usually requires hiring data scientists. Our approach: We show how hotels can use the Synthefy Agent to automatically extract local event signals and add them to forecasts, improving accuracy.
You can find links to each of these steps in the table of contents on the right. Outcome: Synthefy models have half the error of other models when predicting hotel demand, allowing teams to charge the right price for their rooms and avoid over- or under-pricing.

1. Load historical demand data

This dataset comes from a hotel owner in an Austin hotel. We will attempt to improve the quality of demand forecasting by providing correlates gathered by the Synthefy Agent platform. First, we load the data.
import asyncio
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import synthefy
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error
from synthefy import SynthefyAsyncAPIClient

pricing_link = "https://drive.google.com/uc?export=download&id=1DsYn2tTR0W0bmGoz5vczUqJEcImAcyjG"
events_link = "https://drive.google.com/uc?export=download&id=1wcfQagVUz8PWYQeVPURuXxsnrOizLG0Q"

data_df = pd.read_csv(pricing_link)
# Convert date column to datetime timestamp
data_df["timestamp"] = pd.to_datetime(data_df["date"])

colors = {
    "prophet": "#16a34a",
    "synthefy": "#fea333",
    "arima": "#2563eb",
    "groundtruth": "black",
}

print(data_df.head().to_markdown(index=False))
datenum_rooms_bookednum_rooms_unusedprice_per_roommax_competitor_priceavg_competitor_pricemin_competitor_price
2024-01-016534193.64263.71227.45150.27
2024-01-026532183.9248.8224.77131.04
2024-01-036531191.45233.2213.53141.65
2024-01-046531196.9296.25179.41156.84
2024-01-056724231.27322.55305.05180.79

2. Visualize the raw data

Let’s take a look at the data we are predicting. We want to look for periodicity or general trends that a model would be expected to capture.
def plot_1_raw_demand(data_df):
    """
    Plot raw daily demand for number of rooms booked.

    Plot shows:
    - Ground truth data in black with thick lines
    - Vertical split line at 2024-08-31
    - Time series from Jan-Dec 2024

    Args:
        data_df: DataFrame with 'timestamp' and 'number_of_rooms_booked' columns
    """
    # Create figure
    _ = plt.figure(figsize=(14, 6))
    ax = plt.subplot(1, 1, 1)

    # Plot ground truth data with thick black line
    ax.plot(
        data_df["timestamp"],
        data_df["num_rooms_booked"],
        color=colors["groundtruth"],
        linewidth=2,
        label="Actual Demand",
        zorder=2,
    )

    # Add vertical line at 2024-08-31 (train/test split)
    split_date = pd.Timestamp("2024-08-31")
    ax.axvline(
        x=split_date, # type: ignore
        color="red",
        linestyle="--",
        linewidth=2,
        alpha=0.7,
        label="Train/Test Split",
        zorder=3,
    )

    # Set labels and title
    ax.set_xlabel("Time (timestamp)")
    ax.set_ylabel("Rooms booked (rooms/day)")
    ax.set_title(
        "Raw Daily Demand for Rooms (Jan–Dec 2024)",
    )

    # Add grid for better readability
    # ax.grid(True, linestyle="--", alpha=0.3)

    # Add legend
    ax.legend()  # loc="upper left", frameon=True)

    # Format x-axis to show dates nicely
    plt.xticks(rotation=45, ha="right")

    # plt.tight_layout()
    plt.savefig("hotel_demand/raw_data.png", dpi=300, bbox_inches="tight")
    plt.show()


# Visualize the raw demand data
plot_1_raw_demand(data_df)

Example Output: Raw Demand Data

Raw Daily Demand for Hotel Rooms We see that the data is highly periodic, but the height of the spikes varies greatly between different dates. Especially in weekends, the demand is much higher than the average demand. We could hypothesise from these plots that there may be events happening on these dates that causes these spikes.

3. Univariate predictions

We first set-up univariate predictions to use as baselines for our model. This way, we can also establish how much improvement adding correlates is able to give us. The univariate predictions are generated using Prophet and ARIMA models, with our API. See the forecasting code below for how this is done.
def get_univariate_forecasts(data_df, cutoff_date="2024-08-31"):
    """
    Generate univariate forecasts using Prophet and Seasonal ARIMA models.

    Args:
        data_df: DataFrame with 'timestamp' and 'num_rooms_booked' columns
        cutoff_date: Date to split history and forecast (default: '2024-08-31')

    Returns:
        prophet_df, arima_df: DataFrames with forecast results
    """
    # Prepare data
    prophet_forecast = pd.read_parquet(
        "./prophet_forecast.parquet"
    )
    prophet_forecast["timestamp"] = pd.to_datetime(
        prophet_forecast["timestamps"]
    )
    prophet_forecast["num_rooms_booked"] = prophet_forecast["values"]
    sarima_forecast = pd.read_parquet(
        "./sarima_forecast.parquet"
    )
    sarima_forecast["timestamp"] = pd.to_datetime(sarima_forecast["timestamps"])
    sarima_forecast["num_rooms_booked"] = sarima_forecast["values"]
    return prophet_forecast, sarima_forecast


prophet_df, sarima_df = get_univariate_forecasts(data_df)
We can now plot these predictions.
def plot_2_univariate_forecast(
    data_df: pd.DataFrame,
    forecast_dfs: list[pd.DataFrame],
    cutoff_date="2024-08-31",
    cutoff_start="2024-05-01",
):
    """
    Plot univariate forecast comparing actual data with Prophet and ARIMA predictions.

    Plot shows:
    - Actual data in black with thick lines
    - Prophet forecast in one color
    - ARIMA forecast in another color
    - Vertical split line at cutoff date
    - Optional shaded forecast horizon

    Args:
        data_df: Original DataFrame with actual data
        prophet_df: DataFrame with Prophet forecast results
        arima_df: DataFrame with ARIMA forecast results
        cutoff_date: Date where forecast begins (default: '2024-08-31')
    """
    # Create figure
    _ = plt.figure(figsize=(14, 6))
    ax = plt.subplot(1, 1, 1)

    prophet_df, arima_df = forecast_dfs
    # Prepare data
    df = data_df.copy()
    cutoff_ts = pd.Timestamp(cutoff_date)

    # Filter to show data starting from cutoff_start
    start_date = pd.Timestamp(cutoff_start)
    end_date = pd.Timestamp("2024-12-31")

    # Plot actual data (full range)
    actual_data = df[
        (df["timestamp"] >= start_date) & (df["timestamp"] <= end_date)
    ]
    ax.plot(
        actual_data["timestamp"],
        actual_data["num_rooms_booked"],
        color=colors["groundtruth"],
        linewidth=2,
        label="Actual Demand",
        zorder=3,
    )

    # Plot Prophet forecast
    prophet_df["timestamp"] = pd.to_datetime(prophet_df["timestamp"])
    prophet_forecast = prophet_df[
        (prophet_df["timestamp"] >= start_date)
        & (prophet_df["timestamp"] <= end_date)
    ]
    ax.plot(
        prophet_forecast["timestamp"],
        prophet_forecast["num_rooms_booked"],
        color=colors["prophet"],
        linewidth=1.5,
        label="Prophet",
        linestyle="-",
        zorder=2,
    )

    # Plot ARIMA forecast
    arima_df["timestamp"] = pd.to_datetime(arima_df["timestamp"])
    arima_forecast = arima_df[
        (arima_df["timestamp"] >= start_date)
        & (arima_df["timestamp"] <= end_date)
    ]
    ax.plot(
        arima_forecast["timestamp"],
        arima_forecast["num_rooms_booked"],
        color=colors["arima"],
        linewidth=1.5,
        label="Seasonal ARIMA",
        linestyle="-",
        zorder=2,
    )

    # Add vertical line at cutoff date
    ax.axvline(
        x=cutoff_ts, # type: ignore
        color="red",
        linestyle="--",
        linewidth=2,
        alpha=0.7,
        zorder=1,
    )

    # Add shaded forecast horizon
    forecast_dates = prophet_forecast["timestamp"]
    if len(forecast_dates) > 0:
        ax.axvspan(
            forecast_dates.min(),
            forecast_dates.max(),
            color="gray",
            alpha=0.1,
            zorder=0,
        )

    # Set labels and title
    ax.set_xlabel("Time (timestamp)")
    ax.set_ylabel("Rooms booked (rooms/day)")
    ax.set_title(
        "Univariate Forecast: Actual vs Prophet/ARIMA (Sep–Dec 2024)",
        weight="semibold",
    )

    # Add grid for better readability
    ax.grid(True, linestyle="--", alpha=0.3)

    # Add legend
    ax.legend(loc="upper left", frameon=True)

    # Format x-axis
    plt.xticks(rotation=45, ha="right")

    plt.tight_layout()
    plt.savefig(
        "hotel_demand/univariate_forecast.png", dpi=300, bbox_inches="tight"
    )
    plt.show()


# Generate the plot
plot_2_univariate_forecast(
    data_df,
    [prophet_df, sarima_df],
    cutoff_start="2024-06-01",
)

Example Output: Univariate Forecast Comparison

Univariate Forecast: Prophet vs ARIMA ARIMA predicts the mean, completely failing to model the complexity of the data. Prophet picks out an incorrect downward trend, and additionally under-forecast peaks and over-forecast quieter periods—good for regular weeks, poor on high-value nights. The magnitude of the peaks are also roughly the same every time, suggesting the model has insufficient information. Now we have concrete motivation for adding context.

4. Adding correlate information

The Synthefy Agent is great at fetching time series information that is helpful for predictions of this sort. Let’s ask it the following: “Fetch me major events that happened in Austin in 2025 with a large number of attendees”. We can then convert the number to a numerical value for the forecast.
This document is focused on the Synthefy SDK. To learn more about the Synthefy Agent, please refer to the Synthefy Agent documentation.
Agent Response with Austin Events Let’s load this data, and take a look at what kind of events it contains.
events_data = pd.read_json(
    "https://drive.google.com/uc?export=download&id=1wcfQagVUz8PWYQeVPURuXxsnrOizLG0Q"
)
events_data["timestamp"] = pd.to_datetime(events_data["date"])
events_data.drop(columns=["date"], inplace=True)
data_df = data_df.merge(events_data, on="timestamp", how="left")

# Display events data as markdown table
events_display = (
    events_data.drop_duplicates(subset=["events_around_hotel"])
    .sort_values("timestamp")
    .head(n=10)[
        ["timestamp", "events_around_hotel", "events_around_hotel_numerical"]
    ]
)
events_display.columns = [
    "Date",
    "Events Around Hotel",
    "Events Around Hotel (Numerical)",
]
print("\n### Sample Events Data\n")
print(events_display.to_markdown(index=False))
print("\n")
These events look like they would help predict the value of demand.
DateEvents Around HotelEvents Around Hotel (Numerical)
2024-02-18 00:00:00Austin Marathon1
2024-03-08 00:00:00SXSW Festival1
2024-03-28 00:00:00Texas Relays1
2024-06-06 00:00:00ROT Biker Rally1
2024-07-04 00:00:00Fourth of July1
2024-08-31 00:00:00UT Football vs Colorado State0.8
2024-09-14 00:00:00UT Football vs UTSA1
2024-09-21 00:00:00UT Football vs ULM0.4
2024-09-28 00:00:00UT Football vs Mississippi State0.8
2024-10-04 00:00:00ACL Music Festival (Weekend 1)1
Let’s plot them!
def normalize_series(series, scale_max=100):
"""
Normalize a series to 0-scale_max range using min-max scaling.

Args:
    series: pandas Series to normalize
    scale_max: Maximum value of the normalized scale (default 100)

Returns:
    Normalized series scaled to 0-scale_max
"""
min_val = series.min()
max_val = series.max()

if max_val == min_val:
    return pd.Series([scale_max / 2] * len(series), index=series.index)

return ((series - min_val) / (max_val - min_val)) * scale_max


def plot_3_events_overlay(
data_df,
events_col="events_around_hotel",
lag_days=0,
start_date=None,
end_date=None,
period_label="",
show_prices=False,
):
"""
Plot 3: How Events Track Demand
Title: Event Overlays Align with Demand Spikes (Normalized)

Shows normalized target (demand) and event indicators on the same scale
to visualize how events correlate with demand patterns.

Args:
    data_df: DataFrame with 'timestamp', 'num_rooms_booked', and events column
    events_col: Name of the events column (default: 'events_around_hotel')
    lag_days: Number of days to lag the events (default: 0)
    start_date: Start date for filtering data (default: None, uses full range)
    end_date: End date for filtering data (default: None, uses full range)
    period_label: Label for the time period (e.g., "Period 1", "Q1")
    show_prices: Whether to show price lines (price_per_room and avg_competitor_price) (default: False)
"""
# Prepare data
df = data_df.copy()

# Filter by date range if specified
if start_date is not None:
    df = df[df["timestamp"] >= pd.to_datetime(start_date)]
if end_date is not None:
    df = df[df["timestamp"] <= pd.to_datetime(end_date)]

# Reset index after filtering
df = df.reset_index(drop=True)

# Create binary event indicator (1 if event exists, 0 otherwise)
df["has_event"] = df[events_col].notna().astype(int)

# Apply lag if specified
if lag_days != 0:
    df["has_event_lagged"] = df["has_event"].shift(lag_days)
else:
    df["has_event_lagged"] = df["has_event"]

# Normalize both series to 0-100 scale
df["demand_normalized"] = normalize_series(df["num_rooms_booked"], scale_max=100)
df["event_normalized"] = (
    df["has_event_lagged"] * 100
)  # Events are already 0/1, scale to 0/100

# Normalize price series to 0-100 scale for overlay (only if show_prices is True)
if show_prices:
    df["price_per_room_normalized"] = normalize_series(
        df["price_per_room"], scale_max=100
    )
    df["avg_competitor_price_normalized"] = normalize_series(
        df["avg_competitor_price"], scale_max=100
    )

# Create figure
_ = plt.figure(figsize=(14, 8))
ax = plt.subplot(1, 1, 1)

# Plot normalized demand (target)
ax.plot(
    df["timestamp"],
    df["demand_normalized"],
    color="black",
    linewidth=2,
    label="Demand (normalized)",
    zorder=2,
)

# Plot price lines (only if show_prices is True)
if show_prices:
    # Plot price per room (purple, high transparency)
    ax.plot(
        df["timestamp"],
        df["price_per_room_normalized"],
        color="purple",
        linewidth=1.5,
        alpha=0.3,
        label="Price per room (normalized)",
        zorder=1,
    )

    # Plot average competitor price (green, high transparency)
    ax.plot(
        df["timestamp"],
        df["avg_competitor_price_normalized"],
        color="green",
        linewidth=1.5,
        alpha=0.3,
        label="Avg. competitor price (normalized)",
        zorder=1,
    )

# Plot normalized events as vertical lines (dotted with higher alpha) - behind labels
event_dates = df[df["has_event_lagged"] == 1]["timestamp"]
for date in event_dates:
    ax.axvline(
        x=date,
        color="red",
        alpha=0.6,
        linewidth=1.5,
        linestyle="--",
        zorder=0,  # Behind labels
    )

# Annotate specific events (filter for football, festival, texas relays, biker rally - keep only first occurrence)
event_rows = df[df["has_event_lagged"] == 1]
seen_labels = set()

for idx, row in event_rows.iterrows():
    if pd.notna(df.loc[idx, events_col]):
        event_name = str(df.loc[idx, events_col])
        # Only annotate events containing specified keywords
        if any(
            keyword in event_name.lower()
            for keyword in [
                "football",
                "festival",
                "texas relays",
                "biker rally",
            ]
        ):
            # Only show first occurrence of each unique label
            if event_name not in seen_labels:
                seen_labels.add(event_name)
                # Make label prettier with offset to the left
                # Catch overflows to keep labels within the plot
                ax.annotate(
                    event_name[:37] + "..."
                    if len(event_name) > 37
                    else event_name,
                    xy=(row["timestamp"], 75),  # Position near top
                    xytext=(
                        -12,
                        0,
                    ),  # Offset to the left to avoid overlapping with line
                    textcoords="offset points",
                    rotation=90,  # Vertical orientation
                    va="top",  # Align to top
                    ha="right",  # Align to right (since we offset left)
                    fontsize=11,
                    color="black",  # Black text color
                    bbox=dict(
                        boxstyle="round,pad=0.4",
                        facecolor="white",  # White background
                        alpha=1.0,  # Fully opaque
                        edgecolor="orange",  # Orange border
                    ),
                )

# Set labels and title
ax.set_xlabel("Time (timestamp)")
ax.set_ylabel("Normalized level (0–100)")

title = "Event Overlays Align with Demand Spikes (Normalized)"
if period_label:
    title += f" - {period_label}"
if lag_days != 0:
    title += f"\nEvent lag: {lag_days} days"

ax.set_title(title)

# Add grid for better readability
ax.grid(True, linestyle="--", alpha=0.3)

# Add legend
ax.legend(loc="upper left", frameon=True)

# Format x-axis
plt.xticks(rotation=45, ha="right")

# Set y-axis limits
ax.set_ylim(-5, 105)

plt.tight_layout()
plt.savefig("hotel_demand/events_overlay.png", dpi=300, bbox_inches="tight")
plt.show()


# Split the data into 3 equal time periods
df_temp = data_df.copy()
df_temp = df_temp.sort_values("timestamp")

min_date = df_temp["timestamp"].min()
max_date = df_temp["timestamp"].max()
total_days = (max_date - min_date).days
period_days = total_days / 3

# Calculate the boundaries for 3 equal periods
period1_start = min_date
period1_end = min_date + pd.Timedelta(days=period_days)
period2_start = period1_end + pd.Timedelta(days=1)
period2_end = period1_end + pd.Timedelta(days=period_days)
period3_start = period2_end + pd.Timedelta(days=1)
period3_end = max_date

print(f"Period 1: {period1_start.date()} to {period1_end.date()}")
print(f"Period 2: {period2_start.date()} to {period2_end.date()}")
print(f"Period 3: {period3_start.date()} to {period3_end.date()}")

# Generate the event overlay plots for each time period
print("\n" + "=" * 60)
print("Plot 3a: Period 1")
print("=" * 60)
plot_3_events_overlay(
data_df,
lag_days=0,
start_date=period1_start,
end_date=period1_end,
period_label="Period 1",
show_prices=True,
)

print("\n" + "=" * 60)
print("Plot 3b: Period 2")
print("=" * 60)
plot_3_events_overlay(
data_df,
lag_days=0,
start_date=period2_start,
end_date=period2_end,
period_label="Period 2",
show_prices=True,
)

print("\n" + "=" * 60)
print("Plot 3c: Period 3")
print("=" * 60)
plot_3_events_overlay(
data_df,
lag_days=0,
start_date=period3_start,
end_date=period3_end,
period_label="Period 3",
show_prices=True,
)

Example Output: Events Overlay Analysis

Event Overlays Align with Demand Spikes We see how closely the spikes in demand line up with the events. Pay attention to the fact that the pricing doesn’t always align with the demand spikes, indicating that both the competitors and our partner are not able to effectively adjust their prices to the demand. Let’s forecast with this new correlate, assigning a value of 1 if the event occurred and 0 if it didn’t.

5. Perform multi-variate forecast

Now, we use the Synthefy API to construct the forecast and check what kind of improvement it delivers. Leak columns are columns whose value will be known in the future. For example, we will know what events are going to take place next week. Therefore, events can act as a leak column. Correlates are referred to as metadata in our API.
def prepare_forecasting_data(df, cutoff_date="2024-08-31"):
    """
    Prepare data for forecasting by splitting into history and target periods.

    Args:
        df: DataFrame with 'timestamp' and 'num_rooms_booked' columns
        cutoff_date: Date to split history and forecast (default: '2024-08-31')

    Returns:
        history_df, target_period_df: DataFrames for history and target periods
    """
    # Prepare data
    df_copy = df.copy()
    df_copy["has_event"] = df_copy["events_around_hotel"].notna().astype(int)

    # Split into history and target
    cutoff_ts = pd.Timestamp(cutoff_date)
    history_df = df_copy[df_copy["timestamp"] <= cutoff_ts].copy()
    target_period_df = df_copy[df_copy["timestamp"] > cutoff_ts].copy()

    return history_df, target_period_df


async def run_forecast(history_df, target_period_df, method_name="univariate"):
    """Make actual forecast using SynthefyAsyncAPIClient"""
    print(f"\n=== {method_name} Forecasting ===")

    # Create target dataframe for forecasting
    target_df = pd.DataFrame(
        {
            "timestamp": target_period_df["timestamp"],
            "num_rooms_booked": np.nan,  # What we want to predict
        }
    )

    # Add event and price features if this is the "With Events" method
    if method_name == "multivariate":
        target_df["has_event"] = target_period_df["has_event"]
        target_df["price_per_room"] = np.nan
        target_df["avg_competitor_price"] = np.nan
        metadata_cols = [
            "has_event",
            "price_per_room",
            "avg_competitor_price",
        ]
        leak_cols = [
            "has_event",
        ]
    elif method_name == "univariate":
        metadata_cols = []
        leak_cols = []

    else:
        raise ValueError(f"Invalid method name: {method_name}")

    # Prepare historical data
    history_forecast_df = history_df[
        ["timestamp", "num_rooms_booked"] + metadata_cols
    ].copy()

    # Make the forecast using SynthefyAsyncAPIClient
    async with SynthefyAsyncAPIClient() as client:
        forecast_dfs = await client.forecast_dfs(
            history_dfs=[history_forecast_df],
            target_dfs=[target_df],
            target_col="num_rooms_booked",
            timestamp_col="timestamp",
            metadata_cols=metadata_cols,
            leak_cols=leak_cols,
            model="Migas-1.0",
        )

    forecast_df = forecast_dfs[0]

    # Ensure the forecast DataFrame has the timestamp column
    if "timestamp" not in forecast_df.columns:
        forecast_df["timestamp"] = target_period_df["timestamp"].values

    # Calculate metrics
    actual_values = target_period_df["num_rooms_booked"].values
    predicted_values = forecast_df["num_rooms_booked"].values

    mae = mean_absolute_error(actual_values, predicted_values)
    mape = (
        mean_absolute_percentage_error(actual_values, predicted_values) * 100
    )  # scikit-learn returns a fraction

    print(f"{method_name} Forecast Metrics:")
    print(f"MAE: {mae:.2f}")
    print(f"MAPE: {mape:.2f}%")

    return mae, mape, forecast_df


# Run the complete forecast comparison workflow
print("=" * 60)
print("FORECAST COMPARISON WORKFLOW")
print("=" * 60)

# Prepare data
history_df, target_period_df = prepare_forecasting_data(
    data_df, cutoff_date="2024-08-31"
)

print("Data preparation complete:")
print(
    f"History period: {history_df['timestamp'].min()} to {history_df['timestamp'].max()}"
)
print(
    f"Target period: {target_period_df['timestamp'].min()} to {target_period_df['timestamp'].max()}"
)
print(
    f"History records: {len(history_df)}, Target records: {len(target_period_df)}"
)

# Generate forecasts
print("\n" + "=" * 60)
print("GENERATING FORECASTS")
print("=" * 60)

# Multivariate forecast (with events)
multivariate_mae, multivariate_mape, forecast_with_events = await run_forecast( # type: ignore
    history_df, target_period_df, method_name="multivariate"
)
def create_plot_5a_comparison_timeseries(
    df: pd.DataFrame,
    forecasts: dict[str, pd.DataFrame],
    cutoff_date="2024-08-31",
    cutoff_start="2024-06-01",
):
    """Plot 5a: Synthefy forecast vs. Univariate Forecast - Step 5a"""
    print(
        "\n=== PLOT 5a: Aggregate Performance: Univariate vs Multivariate ==="
    )

    # Prepare data
    history_df, target_period_df = prepare_forecasting_data(df, cutoff_date)

    history_df = history_df[history_df["timestamp"] >= cutoff_start]

    forecast_prophet = forecasts["prophet"]
    forecast_arima = forecasts["arima"]
    forecast_with_events = forecasts["multivariate"]

    # Calculate metrics for Prophet
    prophet_mae = mean_absolute_error(
        target_period_df["num_rooms_booked"].values,
        forecast_prophet["num_rooms_booked"].values,
    )
    prophet_mape = (
        mean_absolute_percentage_error(
            target_period_df["num_rooms_booked"].values,
            forecast_prophet["num_rooms_booked"].values,
        )
        * 100
    )

    # Calculate metrics for ARIMA
    arima_mae = mean_absolute_error(
        target_period_df["num_rooms_booked"].values,
        forecast_arima["num_rooms_booked"].values,
    )
    arima_mape = (
        mean_absolute_percentage_error(
            target_period_df["num_rooms_booked"].values,
            forecast_arima["num_rooms_booked"].values,
        )
        * 100
    )

    # Calculate metrics for multivariate
    multivariate_mae = mean_absolute_error(
        target_period_df["num_rooms_booked"].values,
        forecast_with_events["num_rooms_booked"].values,
    )
    multivariate_mape = (
        mean_absolute_percentage_error(
            target_period_df["num_rooms_booked"].values,
            forecast_with_events["num_rooms_booked"].values,
        )
        * 100
    )

    # Create plot
    _ = plt.figure(figsize=(14, 6))
    ax = plt.subplot(1, 1, 1)

    # Ensure data is sorted by timestamp for proper line connections
    history_df = history_df.sort_values("timestamp")
    target_period_df = target_period_df.sort_values("timestamp")
    forecast_prophet = forecast_prophet.sort_values("timestamp")
    forecast_arima = forecast_arima.sort_values("timestamp")
    forecast_with_events = forecast_with_events.sort_values("timestamp")

    # Plot Prophet forecast - GREEN
    ax.plot(
        forecast_prophet["timestamp"],
        forecast_prophet["num_rooms_booked"],
        color=colors["prophet"],
        linewidth=1.5,
        label=f"Prophet (MAPE: {prophet_mape:.2f}%)",
        linestyle="-",
        alpha=0.7,
        zorder=3,
    )

    # Plot ARIMA forecast - BLUE
    ax.plot(
        forecast_arima["timestamp"],
        forecast_arima["num_rooms_booked"],
        color=colors["arima"],
        linewidth=1.5,
        label=f"Seasonal ARIMA (MAPE: {arima_mape:.2f}%)",
        linestyle="-",
        alpha=0.7,
        zorder=2,
    )

    # Plot Synthefy forecast (multivariate) - ORANGE
    ax.plot(
        forecast_with_events["timestamp"],
        forecast_with_events["num_rooms_booked"],
        color=colors["synthefy"],
        linewidth=1.5,
        label=f"Synthefy (Multivariate) (MAPE: {multivariate_mape:.2f}%)",
        linestyle="-",
        alpha=0.7,
        zorder=1,
    )

    # Plot ground truth (actual data) throughout entire period - BLACK LINE (lowest z-order)
    full_data = pd.concat([history_df, target_period_df]).sort_values(
        "timestamp"
    )
    ax.plot(
        full_data["timestamp"],
        full_data["num_rooms_booked"],
        color=colors["groundtruth"],
        linewidth=2,
        label="Actual Demand",
        zorder=0,
    )

    # Add vertical line at cutoff date - RED DASHED LINE
    cutoff_ts = pd.Timestamp(cutoff_date)
    ax.axvline(
        x=cutoff_ts, # type: ignore
        color="red",
        linestyle="--",
        linewidth=2,
        alpha=0.7,
        label="Train/Test Split",
        zorder=1,
    )

    # Add shaded forecast horizon
    forecast_dates = forecast_with_events["timestamp"]
    if len(forecast_dates) > 0:
        ax.axvspan(
            forecast_dates.min(),
            forecast_dates.max(),
            color="gray",
            alpha=0.1,
            zorder=0,
        )

    # Set labels and title
    ax.set_xlabel("Time (timestamp)")
    ax.set_ylabel("Rooms booked (rooms/day)")
    ax.set_title("Aggregate Performance: Univariate vs Multivariate")

    # Add grid for better readability
    ax.grid(True, linestyle="--", alpha=0.3)

    # Add legend
    ax.legend(loc="upper left", frameon=True)

    # Format x-axis
    plt.xticks(rotation=45, ha="right")

    plt.tight_layout()
    plot_path = "hotel_demand/plot_5a_comparison_timeseries.png"
    plt.savefig(plot_path, dpi=300, bbox_inches="tight")
    print(f"Plot 5a saved to: {plot_path}")
    plt.show()

    # Return metrics as a list of tuples (label, mae, mape) for plotting
    return [
        ("Prophet", prophet_mae, prophet_mape),
        ("SARIMA", arima_mae, arima_mape),
        ("Synthefy", multivariate_mae, multivariate_mape),
    ]


def create_plot_5b_performance_metrics(
    forecast_metrics: list,
) -> tuple[float, float]:
    """Plot 5b: Performance metrics comparison - Step 5b

    Args:
        forecast_metrics: List of tuples (label, mae, mape)
    """
    print("\n=== PLOT 5b: Aggregate Performance: Baseline vs Multivariate ===")

    # Extract data
    models = [label for label, _, _ in forecast_metrics]
    mae_values = [mae for _, mae, _ in forecast_metrics]
    mape_values = [mape for _, _, mape in forecast_metrics]

    # Create figure with 2 subplots side by side
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 7))

    x = np.arange(len(models))

    # Define colors for each model (Prophet=green, ARIMA=blue, Synthefy=orange)
    bar_colors = [colors["prophet"], colors["arima"], colors["synthefy"]]

    # ===== Plot 1: MAE =====
    bars1 = ax1.bar(
        x,
        mae_values,
        alpha=0.8,
        color=bar_colors,
        edgecolor="black",
        linewidth=1,
    )

    # Add value labels on top of MAE bars
    for i, bar in enumerate(bars1):
        ax1.text(
            bar.get_x() + bar.get_width() / 2,
            bar.get_height() + max(mae_values) * 0.02,
            f"{mae_values[i]:.2f}",
            ha="center",
            va="bottom",
            color=bar_colors[i],
        )

    # Set labels and title for MAE plot
    ax1.set_xlabel("Models")
    ax1.set_ylabel("MAE (Mean Absolute Error)")
    ax1.set_title(
        "MAE Comparison"
    )
    ax1.set_xticks(x)
    ax1.set_xticklabels(models, rotation=45, ha="right")
    ax1.grid(False)
    # Set y-axis limit to prevent overflow
    ax1.set_ylim(0, max(mae_values) * 1.15)

    # ===== Plot 2: MAPE =====
    bars2 = ax2.bar(
        x,
        mape_values,
        alpha=0.8,
        color=bar_colors,
        edgecolor="black",
        linewidth=1,
    )

    # Add value labels on top of MAPE bars
    for i, bar in enumerate(bars2):
        ax2.text(
            bar.get_x() + bar.get_width() / 2,
            bar.get_height() + max(mape_values) * 0.02,
            f"{mape_values[i]:.1f}%",
            ha="center",
            va="bottom",
            color=bar_colors[i],
        )

    # Set labels and title for MAPE plot
    ax2.set_xlabel("Models")
    ax2.set_ylabel("MAPE (Mean Absolute Percentage Error %)")
    ax2.set_title(
        "MAPE Comparison"
    )
    ax2.set_xticks(x)
    ax2.set_xticklabels(models, rotation=45, ha="right")
    ax2.grid(False)
    # Set y-axis limit to prevent overflow
    ax2.set_ylim(0, max(mape_values) * 1.15)

    plt.tight_layout()
    plot_path = "hotel_demand/plot_5b_performance_metrics.png"
    plt.savefig(plot_path, dpi=300, bbox_inches="tight")
    print(f"Plot 5b saved to: {plot_path}")
    plt.show()

    # Calculate improvement (comparing first baseline with Synthefy)
    synthefy_mape = mape_values[-1]
    improvements = (
        (mape_values[0] - synthefy_mape) / mape_values[0] * 100,
        (mae_values[1] - synthefy_mape) / mae_values[1] * 100,
    )
    print(
        f"Conclusion: Synthefy shows {improvements[0]:.1f}% improvement over baseline methods."
    )

    return improvements


# Create plots
print("\n" + "=" * 60)
print("CREATING PLOTS")
print("=" * 60)

# Plot 5a: Time series comparison
forecast_metrics = create_plot_5a_comparison_timeseries(
    data_df,
    {
        "prophet": prophet_df,
        "arima": sarima_df,
        "multivariate": forecast_with_events,
    },
    cutoff_date="2024-08-31",
)

# Plot 5b: Performance metrics comparison
improvements = create_plot_5b_performance_metrics(forecast_metrics)

# Summary
print("\n" + "=" * 60)
print("SUMMARY")
print("=" * 60)
for label, mae, mape in forecast_metrics:
    print(f"{label} MAE: {mae:.2f}, MAPE: {mape:.2f}%")
print(f"Improvement (Synthefy vs baseline SARIMA): -{improvements[0]:.1f}%")
print(f"Improvement (Synthefy vs baseline Prophet): -{improvements[1]:.1f}%")

Example Output: Forecast Comparison Results

Aggregate Performance: Univariate vs Multivariate The Synthefy model delivers a far superior performance, closely echoing the true value of demand. The peak magnitude changes with time, further improving our partners’ ability to accurately guage demand for their rooms. We also support pricing simulations, which you can find here. Performance Metrics Comparison ΔMAPE = -65.2% vs Prophet, -49.6% vs Seasonal ARIMA. The context-aware approach is consistently better across the entire time horizon. Simply providing the events and pricing information increases quality of forecast.

Key insights

From this analysis, you can answer these critical business questions:
  • How many rooms will be occupied on the weekend of F1?
  • How does competitor pricing affect demand for my rooms?
And so much more! Happy forecasting!