Skip to main content

Why Data Enrichment for Forecasting?

Forecasting models often rely heavily on historical internal data — sales, production, demand, etc. However, real-world outcomes are influenced by many external factors such as weather, economic conditions, or regional events. Data enrichment enhances forecasting accuracy by integrating these contextual signals from trusted external data providers. This broader view helps your models:
  • Capture hidden relationships and causal effects (e.g., weather affecting retail sales)
  • Improve predictive accuracy and robustness
  • Enable more informed decision-making
By combining internal and external data, you create a more complete, dynamic foundation for your forecasts.

Get started

Install the Synthefy Python package:
pip install synthefy
Set your API key as an environment variable (you can get your key here: https://prod.synthefy.com/home/api-keys):
export SYNTHEFY_API_KEY="your-api-key-here"

Weather Data

Weather conditions can significantly impact business outcomes such as sales, logistics, or energy consumption. Synthefy’s weather enrichment feature allows you to pull relevant meteorological data directly into your forecasting pipeline.

How to Fetch Weather Data from Our API

The example below demonstrates how to retrieve daily temperature, humidity, precipitation, and wind speed data for a given location and time period.
from synthefy.api_client import SynthefyAPIClient
from synthefy.synthefy_helpers import get_weather_data

client = SynthefyAPIClient()

weather_df = get_weather_data(
    client=client,
    location_name="Springfield",
    weather_parameters="basic",  # temperature, humidity, precip, wind_speed
    start_time="2012-10-01T00:00:00",
    end_time="2012-10-26T23:59:59",
    auto_select_location=True,  # Skip interactive selection for demo
)

print(weather_df.head())

Enrich Your Data with Weather

Once weather data is available, you can merge it with your internal dataset (e.g., store-level sales). This enrichment creates a richer feature set for forecasting, capturing the environmental conditions that may influence your metrics.
from synthefy.api_client import SynthefyAPIClient
from synthefy.synthefy_helpers import get_weather_data

client = SynthefyAPIClient()

walmart_sample = df.head(10)  # Small sample for demo

enriched_df = get_weather_data(
    client=client,
    location_name="Springfield",
    weather_parameters="comprehensive",
    user_dataframe=walmart_sample,
    user_timestamp_column="Date",
    auto_select_location=True,  # Skip interactive selection for demo
)

print(forecast_results["forecast"][["Weekly_Sales", "timestamp"]])

Haver Data

Economic indicators often influence consumer demand, investment decisions, and market performance. Integrating Haver Analytics data provides a macroeconomic context for your forecasts — useful for finance, retail, and manufacturing domains. Use this functionality to search for relevant Haver series (e.g., consumer spending, GDP, or employment rates) by keyword. It helps identify which economic metrics best align with your forecasting objectives.
from synthefy.api_client import SynthefyAPIClient
from synthefy.synthefy_helpers import find_haver_data

client = SynthefyAPIClient()

haver_data = find_haver_data(
    client=client,
    prompt="sales consumer spending",
    count=3,
    auto_select=True,  # Auto-select for demo
)

print("✅ Found Haver data!")
print(f"📊 Series: {haver_data.get('name')}@{haver_data.get('database_name')}")
print(f"📝 Description: {haver_data.get('description')}")

ItemValue
✅ StatusFound Haver data
📊 SeriesTSSTB@USECON
📝 DescriptionSales: Total Business (SA, Bil.$)

Fetch Haver Data

After identifying relevant series, you can easily pull historical economic time series into your forecasting pipeline. These data streams allow you to factor in broader economic trends and policy effects.
from synthefy.api_client import SynthefyAPIClient
from synthefy.synthefy_helpers import find_haver_data

client = SynthefyAPIClient()

haver_df = get_haver_data(
    client=client,
    search_prompt="unemployment rate",
    start_time="2010-01-01T00:00:00",
    end_time="2013-01-01T00:00:00",
    auto_select=True,
)

print(haver_df.head())

Forecast with Enriched Data

Once you’ve enriched your internal dataset with weather and economic features, you can feed the combined data into a forecasting model. This example shows how to generate future sales predictions using enriched time series data. The enriched dataset improves model performance by providing more signals that influence future outcomes — helping your model “see” beyond past trends.
# forecast_with_enriched_df.py
import asyncio
import numpy as np
import pandas as pd

from synthefy.api_client import SynthefyAsyncAPIClient, SynthefyAPIClient
from synthefy.synthefy_helpers import get_weather_data

async def main():
    # Load sample data
    df = pd.read_csv("walmart.csv")
    df["Date"] = pd.to_datetime(df["Date"])
    df = df[df["Store"] == "store_1"]  # Use one store for simplicity

    # Split into history vs. future target
    cutoff_date = '2012-11-02'
    history_df = df[df['Date'] <= cutoff_date].copy()

    future_dates = pd.date_range('2012-11-02', periods=7, freq='D')
    target_df = pd.DataFrame({
        'Date': future_dates,
        'Weekly_Sales': np.nan,  # unknown future target
    })

    # Enrich both frames with weather data
    base_client = SynthefyAPIClient()

    # Step 1: Enrich data with weather
    enriched_history = get_weather_data(
        client=base_client,
        location_name="Springfield",
        weather_parameters="basic",
        user_dataframe=history_df,
        user_timestamp_column="Date",
        auto_select_location=True,
    )

    enriched_target = get_weather_data(
        client=base_client,
        location_name="Springfield",
        weather_parameters="basic",
        user_dataframe=target_df,
        user_timestamp_column="Date",
        auto_select_location=True,
    )

    # Step 2: Get weather column names for covariates
    added_hist = [c for c in enriched_history.columns if c not in history_df.columns]
    added_targ = [c for c in enriched_target.columns if c not in target_df.columns]
    metadata_cols = sorted(list(set(added_hist).intersection(added_targ)))

    # Step 3: Generate forecast with enriched data
    async with SynthefyAsyncAPIClient() as client:
        forecast_dfs = await client.forecast_dfs(
            history_dfs=[enriched_history],
            target_dfs=[enriched_target],
            target_col='Weekly_Sales',
            timestamp_col='Date',
            metadata_cols=metadata_cols,
            leak_cols=[],
            model='sfm-moe-v1'
        )

    forecast_df = forecast_dfs[0]
    print(forecast_df.head())

# Run
asyncio.run(main())