What is PDT in Looker?

Cody Schneider9 min read

A slow-loading dashboard can be the difference between a quick-witted decision and a missed opportunity. If you've spent any time working with large datasets in Looker, you've likely encountered queries that take just a little too long to run. That's precisely where Persistent Derived Tables, or PDTs, come in. This article will show you what PDTs are, why they are so powerful for speeding up your reports, and how to build one yourself.

The Two Types of Looker Derived Tables

Before diving into PDTs, it's helpful to understand their simpler sibling: the standard "derived table."

What is a Standard Derived Table?

A standard (or "ephemeral") derived table is a query whose results are used as if they were an actual table in your database. Think of it as a subquery that you define once in LookML and can then reference in your Explores. For example, you could write a query that joins user data with order data to create a new "user_orders" table you can analyze.

The key thing to remember is that a standard derived table is temporary. Looker re-runs the SQL to generate this table every single time a user queries it. This is great for real-time data but can become slow if the underlying query is complex or runs against massive tables.

What is a Persistent Derived Table (PDT)?

A Persistent Derived Table (PDT) takes this concept a step further. Instead of being temporary, a PDT’s results are physically written into a 'scratch schema' in your database. Looker stores this table for a specific period and only rebuilds it based on a schedule you define.

Subsequent queries from users don't rerun the whole complex query from scratch, instead, they query the much smaller, pre-calculated results stored in the PDT. This makes the query significantly faster and puts less strain on your database.

An easy way to think about the difference:

  • A standard derived table is like solving a complicated math problem from scratch every time someone asks for the answer.
  • A PDT is like solving the problem once, writing the answer on a whiteboard, and then just pointing to the whiteboard whenever someone asks. You only re-solve the problem when it's time to update the board.

Why and When Should You Use PDTs?

PDTs aren't necessary for every query, but for the right use cases, they are a complete game-changer. They offer several powerful benefits that directly impact both developers and the business users who consume the data.

1. Dramatically Improve Dashboard and Query Performance

This is the primary reason people use PDTs. Complicated queries involving multiple-table joins, window functions, and heavy aggregations across millions of rows can be sluggish. By pre-calculating these results and storing them in a PDT, you turn a minutes-long query into one that runs in seconds. Your dashboards load faster, users are happier, and data becomes much more interactive.

Example: Imagine a daily sales summary dashboard that joins sales, products, stores, and customer tables. Instead of running that resource-intensive join every time the dashboard loads, you can create a PDT that builds this summary report once a day. Everyone viewing the dashboard gets near-instant results.

2. Reduce the Load on Your Database

Constantly running the same complex queries can put a significant strain on your data warehouse. This can lead to slow performance across all applications that rely on it and can increase costs dramatically, especially with pay-per-query warehouses like Google BigQuery or Snowflake.

A PDT runs its expensive logic just once according to its "persistence strategy." So instead of hundreds of users hammering the database with a heavy query throughout the day, the database only has to deal with it a single time. This saves computational resources and, by extension, money.

3. Simplify and Abstract Complex Data Models

PDTs empower analysts and LookML developers to create their own optimized data transformations without relying on a data engineering team. You can create clean, pre-joined, and aggregated tables that are much easier for business users to understand and build reports from. This speeds up development and allows for more self-service analytics, as the base Explores given to users are much simpler to navigate.

How to Create a PDT in Looker: Step-by-Step

Creating a PDT involves a few steps within LookML. Let's walk through building a simple one for user daily activity.

Step 1: Enable PDTs on Your Database Connection

First, your Looker administrator needs to enable PDTs for your database connection. This is done in the Admin > Connections section of Looker. When editing the connection, you'll need to toggle on "Persistent Derived Tables" and specify a "Temp Database" or "Scratch Schema" - this is where Looker will have permission to write the PDTs.

Step 2: Define a Datagroup in Your Model File

While you can use simple triggers like sql_trigger_value, the best practice is to use a datagroup. A datagroup is a shareable policy that defines how an entire group of PDTs should be refreshed. You define it once in a model file (.lkml).

A datagroup has two main parts:

  • sql_trigger: A SQL query that returns one value. When the value returned by this query changes, it triggers a rebuild for any PDTs using this datagroup. A common strategy is to check the max ID or last updated timestamp of a table.
  • max_cache_age: A fallback that tells Looker how long to use the old data if the trigger rebuild somehow fails.

Here’s an example for an ecomm_main.model.lkml file:

datagroup: ecomm_daily_refresh {
  # Trigger a rebuild whenever a new event is added
  sql_trigger: SELECT MAX(id) FROM public.events,
  # Use old data for up to 24 hours if trigger fails
  max_cache_age: "24 hours"
}

Step 3: Build Your PDT in a View File

Now, let's create the LookML for the PDT itself. We'll make a daily_user_summary view that summarizes key metrics for each user daily.

Here’s what the view file (daily_user_summary.view.lkml) would look like:

view: daily_user_summary {
  derived_table: {
    # This connects our PDT rebuild to the datagroup
    datagroup_trigger: ecomm_daily_refresh

    # The SQL for our derived table
    sql: SELECT
        DATE(created_at) AS activity_date,
        user_id,
        COUNT(DISTINCT session_id) AS session_count,
        COUNT(*) AS event_count
      FROM public.events
      GROUP BY 1, 2
      ,,
  }

  # Now we define our dimensions and measures on top of the PDT

  dimension_group: activity {
    type: time
    timeframes: [raw, date, week, month, year]
    sql: ${TABLE}.activity_date ,,
  }

  dimension: user_id {
    primary_key: yes
    type: number
    sql: ${TABLE}.user_id ,,
  }

  measure: total_sessions {
    type: sum
    sql: ${TABLE}.session_count ,,
    label: "Total Unique Sessions"
  }

  measure: average_events_per_session {
    type: number
    sql: ${total_events} / NULLIF(${total_sessions}, 0) ,,
    value_format_name: decimal_2
  }
}

By adding the datagroup_trigger attribute to the derived_table block, we've officially made this view a Persistent Derived Table!

Go a Step Further with Incremental PDTs

For enormous, append-only datasets (like event logs or sensor data), rebuilding the entire PDT can still be excessive. This is where incremental PDTs shine.

An incremental PDT intelligently appends fresh data to the existing table instead of rebuilding it from scratch. You specify a time-based column for Looker to track.

To convert a standard PDT to an incremental one, add two parameters:

  • increment_key: The time-based dimension in your SQL that Looker uses to determine what new records to append (e.g., "event_date", referring to your created_at column).
  • increment_offset: A small buffer to rebuild a certain number of previous time periods, which helps capture any late-arriving data.

Example:

view: event_stream_incremental {
  derived_table: {
    datagroup_trigger: ecomm_daily_refresh
    increment_key: "event_ts"        # Which time column to increment on
    increment_offset: 3              # Number of time periods (hours) to rebuild

    sql: SELECT 
          user_id,
          CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', event_timestamp) as event_ts,
          event_name
         FROM raw_data.events
      WHERE {% condition increment_key %} event_ts {% endcondition %}
      ,,
  }
  ...
}

When this PDT rebuilds, Looker will find the latest event_ts already in the table and only pull records newer than that from the raw_data.events table, making the process incredibly efficient.

PDT Best Practices

Keep these tips in mind as you start building your awesome, performant models:

  • Choose Your Trigger Smartly: Align your PDT’s refresh schedule (sql_trigger) with the rate at which your source data updates. Rebuilding a PDT in a table that updates daily every five minutes is wasteful.
  • Not Everything Needs Persistence: If a standard derived table query is already fast, an ephemeral one is fine. PDTs add a layer of management overhead, so use them where they'll have a real performance impact.
  • Use Indexes: Looker allows you to define indexes for the physical PDT written to the database using the indexes LookML parameter. Adding an index on user_id or activity_date in our example above can further speed up downstream queries on the PDT itself.
  • Monitor with the Admin Panel: The "Persistent Derived Tables" page in the Looker Admin panel is your command center. You can see build schedules, check event logs for errors, and manually trigger rebuilds while developing.

Final Thoughts

Persistent Derived Tables are one of Looker's most powerful features for creating fast, scalable, and user-friendly data experiences. By pre-aggregating complex logic into materialized tables, you reduce database load and deliver insights to your team at the speed of thought. They're a fundamental tool for any serious Looker developer.

Of course, mastering LookML and managing a fleet of PDTs requires a significant investment in learning a specific platform and data modeling principles. For teams that want to turn their fragmented marketing and sales data into clear reports without this technical overhead, we've built a more intuitive solution. With Graphed, you simply connect your data sources - like Google Analytics, Salesforce, or Shopify - and use plain English to build real-time dashboards and ask questions. We take care of the heavy lifting so you can focus on getting insights, not on writing code or managing rebuild schedules.

Related Articles

How to Connect Facebook to Google Data Studio: The Complete Guide for 2026

Connecting Facebook Ads to Google Data Studio (now called Looker Studio) has become essential for digital marketers who want to create comprehensive, visually appealing reports that go beyond the basic analytics provided by Facebook's native Ads Manager. If you're struggling with fragmented reporting across multiple platforms or spending too much time manually exporting data, this guide will show you exactly how to streamline your Facebook advertising analytics.

Appsflyer vs Mixpanel​: Complete 2026 Comparison Guide

The difference between AppsFlyer and Mixpanel isn't just about features—it's about understanding two fundamentally different approaches to data that can make or break your growth strategy. One tracks how users find you, the other reveals what they do once they arrive. Most companies need insights from both worlds, but knowing where to start can save you months of implementation headaches and thousands in wasted budget.