Aliceby TranX
← All posts
GA4BigQueryData Engineering

Stop Using the GA4 API — Use BigQuery Export Instead

Alice by TranX Team··7 min read

If you're pulling GA4 data into Looker Studio, a custom script, or a third-party connector, you're using the GA4 Data API. It works. Until it doesn't.

One day your product report shows an "(other)" row swallowing half your catalog. Your dashboard hits a quota limit and stops refreshing. Your funnel numbers look different every time you change the date range. And you start wondering: is this data even accurate?

The short answer: probably not. The GA4 Data API silently degrades your data in ways most teams never notice. BigQuery Export gives you the raw, unsampled truth — and it's free.

The five ways the GA4 API quietly ruins your data

The GA4 Data API isn't broken — it's just designed for simplicity, not precision. And for SaaS and ecommerce teams making real budget decisions, that gap is expensive.

Five ways the GA4 Data API silently degrades your data Mode 01 Sampling Kicks in over 10M events Numbers become estimates, scaled up. Mode 02 Thresholds Hides rows under ~50 users Niche segments disappear silently. Mode 03 (other) row Cardinality cap ~50,000 vals Long-tail SKUs / pages collapse into one bucket. Mode 04 Row / dim caps Per request 10K rows · 9 dim Stitching queries together by hand. Mode 05 Quota Daily token cap 200K / day Dashboards go blank. Team trusts stale data.
Fig. 1 — Each mode is invisible by default. You don't see the warning when sampling kicks in, when a row gets thresholded, when (other) eats your catalog, or when a quota burns out — the dashboard just quietly lies.

1. Sampling kicks in faster than you think

Once your query touches more than 10 million events, GA4 takes a sample and scales it up. For a mid-size site doing 50K sessions a day with enhanced ecommerce or product event tracking, that threshold can be hit with just a few weeks of data.

Standard Reports are unsampled, but they're rigid — you can't customize them. Explorations give you flexibility but introduce sampling. The API inherits the same problem. You get convenience or accuracy, not both.

2. Data thresholds hide your small segments

GA4 applies "data thresholds" to prevent identifying individual users. If a segment has fewer than roughly 40–50 users, GA4 hides the entire row — not groups it, hides it. You won't even know it's missing.

For B2B SaaS, this kills niche analysis. Want to see how trial users from a specific company-size segment behave? Or how a small-but-high-LTV demographic responds to a new campaign? GA4 silently withholds that data. You can't turn this off.

3. The "(other)" row eats your catalog

GA4 caps dimension cardinality at roughly 50,000 unique values. When a dimension like "Item name" or "Page path" exceeds 500 unique values in a single day, GA4 starts collapsing the long tail into a catch-all (other) row.

5,000-SKU catalog — what each surface actually shows you GA4 Data API BigQuery Export Top 500 individually visible 4,500 collapsed → (other) (other) 90% of catalog aggregated, opaque no per-SKU rev no margin signal All 5,000 SKUs individually queryable No cap. No (other). Raw event rows, every SKU Real margin insight long-tail visible
Fig. 2 — A 5,000-SKU catalog through the API is roughly 10% individually visible and 90% mashed into (other). BigQuery sees every SKU, every page, every UTM as its own row.

If you have 5,000 SKUs or thousands of dynamically generated landing pages, a huge portion of your data is invisible. You can see your top performers, but the mid-tier and long-tail — often where the real margin and conversion insights live — are lumped together and lost.

4. Row and dimension limits box you in

The API returns a default of 10,000 rows per request (max 250,000 with pagination) and allows only 9 dimensions per query. Need item name + category + campaign + source + medium + landing page + device + geography + a custom dimension? That's already 9. Add one more and you're splitting into multiple queries and trying to stitch them back together.

And every paginated request past the first 10K rows burns more quota tokens. Which brings us to the next problem.

5. Quota throttling kills your dashboards

Standard GA4 properties get 200,000 tokens per day and roughly 40,000 tokens per hour. A single complex query with wide date ranges and multiple dimensions can cost hundreds of tokens. A Looker Studio dashboard with 15 widgets, each hitting the API separately, can exhaust your hourly quota in minutes.

When the quota is gone, your dashboards go blank until it resets. Your team sees stale data — or worse, they don't realize the data stopped updating.

What BigQuery Export gives you instead

BigQuery Export sends your raw GA4 event data directly into Google BigQuery. No pre-aggregation, no sampling, no artificial limits. Here's what that means in practice.

GA4 Data API vs BigQuery Export — side by side GA4 Data API simple, sampled, capped BigQuery Export raw, joinable, free Sampling When does it kick in? over 10M events never Cardinality Long-tail SKUs / pages (other) over ~50K unlimited Dimensions per query Width of analysis 9 max any SQL Event params & user props Custom attributes you tracked subset, pre-aggregated every key, raw Quota / cost When you scale up usage 200K tokens/day 1 TB/month free
Fig. 3 — Every row that bites you in the API table goes away once the data lives in BigQuery. The trade-off is writing SQL instead of clicking dimensions in a UI.

100% of your events, zero sampling

Every single event — page views, sign-ups, feature usage, purchases, custom events — lands in BigQuery as an individual row. Query a year of data across billions of events and get exact numbers. No estimation, no scaling. This used to cost $150K+/year with Universal Analytics 360. With GA4, it's free for every property.

No "(other)" row — ever

BigQuery stores raw event rows, not pre-aggregated tables. There is no cardinality limit and no row cap. Every SKU, every page path, every UTM parameter is individually queryable. Your entire catalog is visible, not just the top performers.

Full SQL flexibility

No 9-dimension limit. No rigid report structure. Write any SQL query you want — joins, window functions, CTEs, subqueries. Build custom attribution models. Create cohort analyses. Calculate true LTV by acquisition source. Do things the API literally cannot do.

No quota throttling

Once data is in BigQuery, you query it with BigQuery's own pricing — the first 1 TB per month is free. Your Looker Studio dashboards, dbt models, and scheduled queries all run against BigQuery directly. No GA4 tokens consumed. No hourly caps. No blank dashboards.

Real GA4 use cases you can't do with the API

Here's what becomes possible once your GA4 data lives in BigQuery — even before you bring in any other source:

  • Full catalog or page-level analytics. Query events for all 10,000+ SKUs or landing pages without hitting cardinality limits. Find your hidden winners.
  • Custom funnel analysis. Build any funnel from raw events — not just the pre-defined funnels GA4 supports. See exactly where users drop off, by segment, by device, by campaign.
  • Every event_param, every user_property. The API exposes a curated subset; BigQuery gives you every custom attribute you tracked, exactly as it was sent.
  • True session reconstruction. Walk a user's actual event sequence rather than the API's pre-bucketed session metrics — including events that fall into the (other) row.
  • Backfilled historical analysis. Once data lands, it stays. No 14-month retention cliff, no quota when you query 18 months of history.

When the GA4 API is still the right tool

This post argues hard for BigQuery Export, but the API isn't useless — it's the right choice for a real subset of cases. To be fair:

  • You're under the limits. Small or early-stage properties under ~100K events/day rarely hit sampling, thresholds, or the (other) row. You get exact data with zero infrastructure.
  • No BigQuery set up. If nobody on the team writes SQL, the GA4 → Looker Studio connector is one click. BigQuery only pays off if someone will write the queries.
  • You need realtime. The Realtime Report API surfaces the last 30 minutes. The BigQuery daily export is t+24h, and even streaming has minute-level lag. For launch-day dashboards, the API is better.
  • Standard, pre-aggregated KPIs. Sessions, users, conversions sliced by source, medium, and page — exactly what the API was designed for. No long-tail SKUs, no nested event params.
  • Simple embeds. A few KPIs on an internal page or a Slack daily digest — the API plus a 20-line script beats setting up a warehouse.

Rule of thumb: use the API for shallow, narrow, fresh data; use BigQuery Export the moment you need depth, accuracy, or long history. Many mature teams run both — the API for live dashboards, BigQuery for the analysis that actually drives decisions.

The one caveat: the daily export limit

Standard GA4 properties have a 1 million events/day limit on the daily batch export. If you consistently exceed this, the export pauses.

Two ways to manage this:

  1. Filter your export. In GA4's BigQuery export settings, exclude high-volume but low-value events like scroll, file_download, or video_progress. Export only the events that matter for your analysis.
  2. Use streaming export. The intraday streaming tables have no daily event cap and update within minutes. For high-traffic sites, this is often the better option anyway.

For very high-volume sites (10M+ events/day), GA4 360 removes the limit entirely.

How Alice makes this easy

BigQuery Export is powerful — but writing SQL against GA4's nested event schema isn't exactly beginner-friendly. The events table uses nested arrays for event_params and items that require UNNEST() operations, and building even a basic funnel query can take dozens of lines of SQL.

Alice, the AI growth analyst at TranX, connects directly to your BigQuery warehouse and lets you ask questions in plain English:

  • "What's my conversion rate by landing page this month?"
  • "Which products had the highest add-to-cart rate but lowest purchase rate?"
  • "Show me the full funnel from /pricing to signup, broken down by source."
  • "Which event_params are most predictive of a converting session?"

Alice generates the SQL, runs it against your BigQuery data, and shows you the answer — with the full query visible so you can verify, tweak, and rerun. No sampling. No "(other)" row. No quota limits. Just the real numbers.

Let Alice query your BigQuery in plain English

Connect your GA4 BigQuery export and Alice writes the SQL for you — UNNEST and all. Ask questions, get the real numbers, see the query.

Try Alice Free