Time travel in BigQuery and undelete tables

Written by Lace Rogers - 16 Aug 2023

Analytics and Data | 3 MIN READ

If you ever have that scary moment overwriting data in BigQuery, fear not - there is a way out of it. Read on and find out how to recover data quickly.

One of my favourite things is working on data warehouses using BigQuery; in fact, BigQuery provides a wealth of tools to automate and process data to build a robust ETL.

BigQuery has some super powerful features which allow us to build, manipulate and even run machine learning algorithms within the interface. However, we're all human, and things can go wrong, so what do you do when that happens?

Imagine one morning, you're working in the BigQuery interface and want to aggregate your table to check the volumes, but you forget to clear down the options before running.

SELECT
    event_name,
    COUNT(DISTINCT(user_psuedo_id) ) AS distinct count
FROM
    testing.out_table
GROUP BY
1

Just as the code finishes running, you realise you've lost months of daily updates after accidentally overwriting the table and even worse, any reporting in dashboard tools like Data Studio or Tableau using this table will now fail.

Don't Panic!

Before you look to completely re-run all your historical data (which could cost a lot of your time and increase your BigQuery spend), Google Cloud Platform has some great tools that allow you to recover and revert your datasets.

The world has ended; what can, or can't you do?

  1. If you have deleted a table or have an error on your table, you have seven days to recover - effectively you can time travel.
  2. Remember, at present, you cannot restore a deleted dataset.

The syntax fix

The syntax itself for reverting the tables is relatively simple and is not expensive to run. The interval can be seconds/hours/days; however, it cannot be over seven days due to the reversion time limit.

You must remember when you are reverting if you have added correct data in the period you are reverting to, you will lose this, so it’s crucial to check when the error occurred if possible.

You can use Data Manipulation Language (DML)* or the output functionality to overwrite the table you are reverting to. You will simply see the data in a temp table without a destination.

Here are the get-out-of-jail cards

Here are step-by-step instructions from Google Link.

CREATE OR REPLACE TABLE
    'testing.out_table' AS
SELECT
    *
FROM
    testing.out_table FOR system_time AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 75 second)

Here is a link to the script on GitHub.

Then voilà, check your table has reverted, and if all looks well, have a tea and relax in the knowledge the reports feeding on this table will be back up and running when they next process (phew).

Whilst we should always be super careful when amending or overwriting tables if something does go wrong, remember to look at this syntax before you re-run the entire history.

* A quick refresher on DML

DML stands for Data Manipulation Language. DML statement is used to insert, update or delete the records on tables and is super powerful when creating tables. With DML and DDL, you can both effectively script and manage your tables and code effectively.

How GA4 Conversion Attribution Really Works

15 Nov 2023

Sam Dunkley

Written by
Sam Dunkley

What to do with GTM now that UA has stopped collecting data

8 Aug 2023

Sam Dunkley

Written by
Sam Dunkley

Welcome Lola, Our First Academy Web Analyst

7 Aug 2023

Lee Colbran

Written by
Lee Colbran

Join our email list like thousands of other marketing professionals to get updates on key industry changes, early access to free resources and exclusive invitations to Fresh Egg events in your inbox.