5 Common BigQuery Mistakes That Cost You Money (And How to Fix Them)

BigQuery is powerful, but it can get expensive quickly if you’re not careful. Over 8 years of working with BigQuery, I’ve seen companies waste thousands of dollars on avoidable mistakes. In this post, I’ll share the 5 most common costly mistakes and exactly how to fix them.

BigQuery - Google Cloud's data warehouse platform

How BigQuery Pricing Works (Quick Refresher)

Before diving into mistakes, understand how you’re charged:

On-demand pricing: $5 per TB of data scanned
Storage: $0.02 per GB per month (active), $0.01 for long-term
First 1 TB queried per month is free

The key insight: You pay for data scanned, not data returned. A query that scans 100 GB but returns 10 rows still costs the same as one returning 1 million rows.

Mistake #1: Using SELECT * on Large Tables

This is the most common and expensive mistake I see.

The Problem

-- DON'T DO THIS
SELECT *
FROM `project.dataset.large_table`
WHERE date = '2024-01-15'

Even if you only need 3 columns, SELECT * scans ALL columns. On a table with 50 columns and 500 GB of data, you’re scanning 500 GB when you might only need 30 GB.

The Fix

-- DO THIS INSTEAD
SELECT
  customer_id,
  order_value,
  created_at
FROM `project.dataset.large_table`
WHERE date = '2024-01-15'

Real Cost Impact

Approach	Columns Scanned	Data Scanned	Monthly Cost (100 queries)
SELECT *	50 columns	500 GB	$250
SELECT specific	3 columns	30 GB	$15
Savings			$235/month

Pro Tip: Preview Without Scanning

To explore a table’s structure without scanning data:

-- Free! Shows schema only
SELECT * FROM `project.dataset.large_table` LIMIT 0

-- Or use the preview button in BigQuery console (also free)

Mistake #2: Not Using Partitioned Tables

Partitioning is the single most effective way to reduce BigQuery costs.

The Problem

-- Scans ENTIRE table (expensive!)
SELECT SUM(revenue)
FROM `project.dataset.sales`
WHERE created_at >= '2024-01-01'

Without partitioning, BigQuery must scan every row to find matches.

The Fix: Create Partitioned Tables

-- Create a partitioned table
CREATE TABLE `project.dataset.sales_partitioned`
PARTITION BY DATE(created_at)
AS SELECT * FROM `project.dataset.sales`

Now the same query only scans relevant partitions:

-- Only scans January 2024 partition
SELECT SUM(revenue)
FROM `project.dataset.sales_partitioned`
WHERE DATE(created_at) >= '2024-01-01'

Real Cost Impact

Table Type	Data Scanned	Cost per Query
Non-partitioned (1 TB)	1 TB	$5.00
Partitioned (1 month = 100 GB)	100 GB	$0.50
Savings		90%

Important: Use Partition Filters

Always include the partition column in your WHERE clause:

-- Good: Uses partition pruning
WHERE DATE(created_at) = '2024-01-15'

-- Bad: Scans all partitions (function on partition column)
WHERE EXTRACT(MONTH FROM created_at) = 1

Mistake #3: Running Expensive Queries in Looker Studio

Dashboards can secretly rack up huge bills because they run queries repeatedly.

The Problem

A Looker Studio dashboard with:

10 charts
Each chart runs a query scanning 50 GB
Data freshness set to 15 minutes
10 users viewing throughout the day

Monthly cost: 10 charts × 50 GB × (4 refreshes/hour × 8 hours × 20 days) × $5/TB = $1,600/month

The Fix: Pre-Aggregate Your Data

Create a summary table that Looker Studio queries instead:

-- Create a daily summary table (run once per day)
CREATE OR REPLACE TABLE `project.dataset.daily_summary` AS
SELECT
  DATE(created_at) AS date,
  product_category,
  region,
  COUNT(*) AS orders,
  SUM(revenue) AS total_revenue,
  AVG(revenue) AS avg_order_value
FROM `project.dataset.orders`
WHERE created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)
GROUP BY 1, 2, 3

Original table: 500 GB → Summary table: 5 GB

New monthly cost: 10 charts × 5 GB × 640 refreshes × $5/TB = $160/month

Savings: $1,440/month (90%)

Additional Dashboard Optimizations

Increase data freshness - Change from 15 min to 1 hour or longer
Use BI Engine - Caches data in memory for faster, cheaper queries
Use extracts - For data that doesn’t need real-time updates

Mistake #4: Joining Large Tables Inefficiently

Joins are necessary but can be expensive if done wrong.

The Problem

-- Expensive: Joining two large tables without filters
SELECT
  o.order_id,
  c.customer_name,
  o.revenue
FROM `project.dataset.orders` o  -- 500 GB
JOIN `project.dataset.customers` c  -- 100 GB
ON o.customer_id = c.customer_id

This scans 600 GB every time it runs.

The Fix: Filter Before Joining

-- Better: Filter both tables before joining
WITH filtered_orders AS (
  SELECT order_id, customer_id, revenue
  FROM `project.dataset.orders`
  WHERE DATE(created_at) = CURRENT_DATE()  -- Only today's orders
),
filtered_customers AS (
  SELECT customer_id, customer_name
  FROM `project.dataset.customers`
  WHERE customer_id IN (SELECT DISTINCT customer_id FROM filtered_orders)
)
SELECT
  o.order_id,
  c.customer_name,
  o.revenue
FROM filtered_orders o
JOIN filtered_customers c
ON o.customer_id = c.customer_id

Use Nested and Repeated Fields

For frequently joined data, consider denormalization:

-- Instead of joining every time, store related data together
CREATE TABLE `project.dataset.orders_enriched` AS
SELECT
  o.*,
  STRUCT(
    c.customer_name,
    c.customer_email,
    c.customer_segment
  ) AS customer
FROM `project.dataset.orders` o
JOIN `project.dataset.customers` c
ON o.customer_id = c.customer_id

Now queries on orders_enriched don’t need joins.

Mistake #5: Not Monitoring Query Costs

You can’t optimize what you don’t measure.

The Problem

Most teams don’t know:

Which queries cost the most
Who’s running expensive queries
Whether costs are trending up or down

The Fix: Enable Query Auditing

Check query cost before running:

-- Add this to see estimated bytes before running
-- Click on "More" → "Query settings" → "Data processed"

Use INFORMATION_SCHEMA to track costs:

-- Find your most expensive queries (last 7 days)
SELECT
  user_email,
  query,
  total_bytes_billed / 1024 / 1024 / 1024 AS gb_billed,
  (total_bytes_billed / 1024 / 1024 / 1024 / 1024) * 5 AS estimated_cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
  AND total_bytes_billed > 0
ORDER BY total_bytes_billed DESC
LIMIT 20

Set up budget alerts:

Go to GCP Console → Billing → Budgets & alerts → Create budget for BigQuery

Use slot reservations for predictable costs:

If spending over $2,000/month on BigQuery, consider flat-rate pricing:

Buy slot commitments
Predictable monthly cost
No per-query billing

Quick Wins Checklist

Here’s a checklist you can use today:

Action	Potential Savings	Effort
Replace SELECT * with specific columns	50-80%	Low
Add partition filters to all queries	80-95%	Low
Create summary tables for dashboards	80-95%	Medium
Filter before joining	50-70%	Low
Enable cost monitoring	Awareness	Low
Consider BI Engine	50-70%	Medium

Cost Optimization Query Templates

Template 1: Check Table Sizes

SELECT
  table_id,
  ROUND(size_bytes / 1024 / 1024 / 1024, 2) AS size_gb,
  row_count
FROM `your_dataset.__TABLES__`
ORDER BY size_bytes DESC

Template 2: Estimate Query Cost

-- Dry run (doesn't execute, shows bytes to be scanned)
-- In BigQuery console, check "Dry run" before running

Template 3: Find Unused Tables

SELECT
  table_id,
  TIMESTAMP_MILLIS(last_modified_time) AS last_modified
FROM `your_dataset.__TABLES__`
WHERE TIMESTAMP_MILLIS(last_modified_time) < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)

Summary: The 5 Mistakes

Mistake	Fix	Savings
SELECT *	Select specific columns	50-80%
No partitioning	Partition by date	80-95%
Dashboard full scans	Pre-aggregate data	80-95%
Inefficient joins	Filter before joining	50-70%
No monitoring	Track query costs	Varies

Need Help Optimizing Your BigQuery Costs?

If your BigQuery bills are higher than expected, I can help. With 8+ years of experience optimizing BigQuery for cost and performance, I’ve helped companies reduce their bills by 70-90%.

Schedule a free consultation to review your BigQuery setup and identify quick wins.

Want to learn more about BigQuery and Looker Studio? Check out our guide on Connecting BigQuery to Looker Studio for performance tips, or learn how to Build an E-commerce Dashboard.

5 Common BigQuery Mistakes That Cost You Money (And How to Fix Them)

How BigQuery Pricing Works (Quick Refresher)

Mistake #1: Using SELECT * on Large Tables

The Problem

The Fix

Real Cost Impact

Pro Tip: Preview Without Scanning

Mistake #2: Not Using Partitioned Tables

The Problem

The Fix: Create Partitioned Tables

Real Cost Impact

Important: Use Partition Filters

Mistake #3: Running Expensive Queries in Looker Studio

The Problem

The Fix: Pre-Aggregate Your Data

Additional Dashboard Optimizations

Mistake #4: Joining Large Tables Inefficiently

The Problem

The Fix: Filter Before Joining

Use Nested and Repeated Fields

Mistake #5: Not Monitoring Query Costs

The Problem

The Fix: Enable Query Auditing

Quick Wins Checklist

Cost Optimization Query Templates

Template 1: Check Table Sizes

Template 2: Estimate Query Cost

Template 3: Find Unused Tables

Summary: The 5 Mistakes

Need Help Optimizing Your BigQuery Costs?

Related Posts

Connecting BigQuery to Looker Studio: Complete Setup Guide