How to Schedule Google Analytics 4 Data Backfill in BigQuery
Setting up the Google Analytics 4 export to BigQuery opens up a world of advanced data analysis, but many users quickly discover a frustrating limitation: the data export only starts from the day you establish the link. It doesn't include your historical data. This article will show you exactly how to perform a data backfill and, more importantly, how to schedule it so you can be confident your dataset is as complete as possible.
Why Is a GA4 BigQuery Backfill Necessary?
When you first link GA4 to BigQuery, Google creates daily tables with your event data moving forward. It’s a great start, but it leaves you with an incomplete picture. You can't analyze long-term trends, compare this year's performance to last year's, or build complete user journey models without that historical context. This is what makes a manual backfill, and eventually a scheduled one, so valuable.
Backfilling lets you:
- Achieve a Complete Dataset: Pull in data from the weeks or months before you set up the initial export, creating a unified dataset for comprehensive analysis.
- Improve Trend Analysis: A longer time horizon allows you to spot seasonal trends, measure the long-term impact of campaigns, and make more accurate forecasts.
- Fix Data Gaps: Even with an active link, GA4 data processing can sometimes have slight delays. A scheduled backfill of recent days can help ensure any straggling data is accounted for.
By scheduling a backfill, you automate the process of keeping your historical data up-to-date, ensuring your reports and dashboards are always built on the most complete information available.
Getting Started: Prerequisites for a Backfill
Before you jump in, you’ll want to have a few things in place. Think of this as getting your tools ready before you start the project to save headaches later.
You will need:
- A Linked GA4 and BigQuery Project: This process assumes you’ve already gone through the initial GA4 property settings to link to an active Google Cloud Platform (GCP) project with BigQuery enabled.
- Sufficient Permissions: In your GCP project, your user account needs the right permissions to manage data transfers. Key roles include BigQuery Data Editor or higher, Cloud Functions Developer, and Cloud Scheduler Admin for the automation steps.
- Billing Enabled: BigQuery, Cloud Functions, and Cloud Scheduler are paid services. While there are generous free tiers, you must have an active billing account linked to your project for any of this to work. Costs are typically low for this kind of process but are important to be aware of.
How to Manually Backfill GA4 Data in BigQuery
First, let's walk through the manual process. This is useful for one-time historical data pulls and serves as the foundation for the automated schedule we'll build later.
- Open the Google Cloud Console and navigate to BigQuery.
- In the BigQuery navigation menu, find and click on "Data transfers".
- You should see your existing Google Analytics transfer configuration listed. It will be named something like "Google Analytics GA4" followed by your property ID. Click on its name to go to the transfer details page.
- In the top right corner of the transfer details page, you will see a button labeled "SCHEDULE BACKFILL". Click it.
- A pop-up will appear allowing you to select a start and end date for your backfill. Select the date range for the historical data you wish to import. Note that there are limits to how far back you can go, typically correlating with when data collection in your GA4 property began.
- Click "Ok". BigQuery will queue up the transfer jobs to import the data for each day in your selected range. You can monitor the progress on the "RUN HISTORY" tab.
This is great for a one-off import, but nobody wants to do this every morning to ensure yesterday's data is fully synced. That's where automation comes in.
How to Schedule GA4 Backfills with Cloud Functions and Cloud Scheduler
To fully automate this process, we'll use two powerful GCP services together: Cloud Functions (to execute the backfill script) and Cloud Scheduler (to run that function on a daily schedule). This might sound technical, but we’ll break it down into manageable steps.
Step 1: Get Your Transfer Configuration Name
Before writing any code, you need a specific piece of information: the full resource name of your data transfer configuration. This is how your script will tell BigQuery which transfer to run.
Go back to the transfer details page you visited in the manual steps. Look at the URL in your browser's address bar. It will look something like this:
console.cloud.google.com/bigquery/transfers/locations/us/configs/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx?project=your-project-name
The full name you need is everything after transfers/ but before the ?.
So, from that example, the full name is: projects/your-project-name/locations/us/transferConfigs/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Pro-tip: You'll likely just need to copy the project name, location (e.g., us), and config ID (xxxxxxxx...) from the URL and construct the name yourself formatted as above. Copy this entire string and save it in a text editor for a later step.
Step 2: Create a Cloud Function to Trigger the Backfill
A Cloud Function is a small piece of code that runs in response to a trigger — in our case, an HTTP request from Cloud Scheduler.
- In the Google Cloud Console, navigate to Cloud Functions.
- Click "CREATE FUNCTION". Configure the initial settings:
- Click "Next" to move to the code editor. Set the following:
- In the
requirements.txtfile, add the necessary Google Cloud client library:
# requirements.txt
google-cloud-bigquery-datatransfer- In the
main.pyfile, replace the boilerplate code with the following Python script. Remember to replace the placeholder in theparentvariable with your unique Transfer Configuration Name from Step 1.
# main.py
from google.cloud import bigquery_datatransfer_v1
from datetime import datetime, timedelta, time
def schedule_daily_backfill(request):
"""
HTTP Cloud Function to trigger a backfill for the previous day
or a specific date from the request payload.
"""
transfer_client = bigquery_datatransfer_v1.DataTransferServiceClient()
# --- IMPORTANT: PASTE YOUR TRANSFER CONFIG PARENT NAME HERE ---
parent = "projects/your-project-id/locations/us/transferConfigs/your-transfer-id"
# Check for a specific date in the request first
request_json = request.get_json(silent=True)
if request_json and 'run_date' in request_json:
# Expected date format: "YYYY-MM-DD"
target_date_str = request_json['run_date']
try:
target_date = datetime.strptime(target_date_str, '%Y-%m-%d')
except ValueError:
return f"Invalid date format. Please use YYYY-MM-DD.", 400
else:
# Default to yesterday if no date is provided
utc_now = datetime.utcnow()
target_date = utc_now - timedelta(days=1)
# We need to set the time component for the request
run_time = datetime.combine(target_date, time(0, 0))
start_manual_transfer_request = bigquery_datatransfer_v1.StartManualTransferRunsRequest(
parent=parent,
requested_run_time=run_time
)
try:
response = transfer_client.start_manual_transfer_runs(
request=start_manual_transfer_request
)
run_count = len(response.runs)
print(f"Successfully started {run_count} manual transfer run(s).")
return f"Successfully started backfill for {target_date.strftime('%Y-%m-%d')}.", 200
except Exception as e:
print(f"Error starting transfer run: {e}")
return f"Error starting transfer run: {str(e)}", 500- Click "DEPLOY" and wait for the function to be created. It can take a few minutes. Once deployed, find and copy its Trigger URL.
Step 3: Create a Cloud Scheduler Job to Automate the Function
Finally, we need to create a scheduled "cron job" to call our function automatically.
- In the Google Cloud Console, navigate to Cloud Scheduler.
- Click "CREATE JOB".
- Configure the job settings:
- Under "Configure the execution," set the following:
- Click "CREATE". Your job is now active!
You can test it by clicking the "RUN NOW" button in the Cloud Scheduler list. Then, check the Run History back in the BigQuery Data Transfers interface to confirm a new manual run was successfully triggered for yesterday.
Final Thoughts
Manually starting a backfill for historical GA4 data is a solid first step, but automating the process with Cloud Functions and Scheduler transforms your setup into a resilient and reliable data pipeline. Taking the time to build this automation ensures your datasets are always as rich and complete as possible, empowering you to conduct deeper and more accurate analysis.
While this Google Cloud setup gives you powerful control, it also requires managing code, monitoring logs, and wrangling different tools to get your data in one place. We built Graphed to remove that friction completely. We connect directly to your Google Analytics account via a simple, one-click integration, handling the entire data pipeline automatically. We pull your entire historical data back-catalog without you having to touch a line of code or step inside the Google Cloud Console, allowing you to go from connecting your sources to building real-time dashboards in seconds.
Related Articles
Looker Studio vs Metabase: Which BI Tool Actually Fits Your Team?
Looker Studio and Metabase both help you turn raw data into dashboards, but they take completely different approaches. This guide breaks down where each tool fits, what they are good at, and which one matches your actual workflow.
How to Create a Photo Album in Meta Business Suite
How to create a photo album in Meta Business Suite — step-by-step guide to organizing Facebook and Instagram photos into albums for your business page.
Is Google Analytics and Data Analytics the Same?
Is Google Analytics and data analytics the same? No — Google Analytics is one tool, data analytics is the broader discipline. Here is the difference.