Preparing Assignment Source Tables for Warehouse Native Experimentation
Overview
To prepare Assignment Sources for Warehouse Native Experimentation, transform your raw exposure or impression logs into a clean, standardized table that serves as the foundation for experimentation analyses.
This page describes the required fields, recommended fields, and best practices for preparing your assignment source tables.
Required columns
Every Assignment SourceA data source that defines how users are assigned to different variations in an experiment. In Warehouse Native, this data is typically stored in your data warehouse. table must include the following columns:
Column | Type | Description |
---|---|---|
Unique Key | STRING | Unique identifier for the unit of randomization (for example, user_id , account_id , or a custom key). Must be stable across the experiment duration. |
Exposure Timestamp | DATETIME / TIMESTAMP | The precise time when the assignment occurred (for example, when an impression was logged, a flag evaluated, or getTreatment was called). |
Treatment (Variant Group) | STRING | The assigned experiment variant (for example, control , treatment_a , variant_1 ). |
These fields are mandatory. Without them, Warehouse Native cannot map exposures to experiment results.
Recommended columns
While not required, the following fields make debugging, filtering, and governance more efficient.
Column | Type | Description |
---|---|---|
Experiment ID / Name | STRING | Helps differentiate exposures when multiple experiments are logged in the same raw table. |
Targeting Rule | STRING | Indicates which targeting rule or condition led to the assignment. Useful for audit and debugging. If you are using FME feature flag impressions, filter by a single targeting rule to ensure the experiment analyzes the intended population. |
Environment ID | STRING | Allows filtering by environment (for example, production , staging ). When configuring an assignment source in FME, you can map column values to a matching Harness environment or hard-code a single environment. When creating an experiment, it must be scoped to one environment. |
Traffic Type | STRING | Distinguishes the unit type (for example, user , account , anonymous visitor ). When configuring an assignment source, you can map column values or hard-code the environment. Each experiment must be scoped to one traffic type. |
Common raw table schemas
Most organizations log impressions or exposures from feature flag evaluations, SDKs, or event pipelines. Below are common raw schemas and how to normalize them.
Feature Flag Evaluation Logs
Example Raw Schema | Transformations |
---|---|
user_id flag_name treatment impression_time environment rule_id | • Map flag_name values → experiment_id (if multiple flags correspond to the same experiment). • Cast evaluation_time to TIMESTAMP . • Deduplicate on (user_id, experiment_id) by keeping the earliest exposure. |
A/B Test Impression Logs
Example Raw Schema | Transformations |
---|---|
experiment_id user_id bucket or arm impression_time | • Standardize bucket → treatment . • Standardize impression_time → exposure_timestamp . • Deduplicate to keep only the first exposure per user per experiment. |
Event Logging Pipelines (Custom Analytics Events)
Example Raw Schema | Transformations |
---|---|
event_name event_time properties.experiment_id properties.variant properties.user_id | • Flatten nested fields (JSON → explicit columns). • Filter to only event_name = 'experiment_exposure' . • Standardize column names to match required schema. |
Prepare your assignment table
Follow these best practices for preparing your assignment table in your data warehouse.
-
De-duplication: Keep only the earliest exposure per user per experiment. For example:
QUALIFY ROW_NUMBER() OVER (
PARTITION BY user_id, experiment_id
ORDER BY exposure_timestamp ASC
) = 1 -
Consistent Variant Labels: Standardize variant naming (
control
,treatment
,variant_1
) across experiments. Avoid null or empty strings; default tocontrol
if needed. -
Timestamps in UTC: Store all exposure timestamps in UTC for consistent comparisons across regions.
-
Stable Identifiers: Use the same user or account key across Assignment SourceA data source that defines how users are assigned to different variations in an experiment. In Warehouse Native, this data is typically stored in your data warehouse. and Metric SourceA data source that defines how metrics are collected and calculated for an experiment. In Warehouse Native, this data is typically stored in your data warehouse. tables. If your system logs multiple IDs (for example,
cookie_id
anduser_id
), choose the most stable one. -
Environment Separation: If raw tables mix environments (for example,
staging
andproduction
), add anenvironment_id
column and filter accordingly. This prevents accidental inclusion of test data in production environments. -
Partitioning and Indexing: Partition large tables by
DATE(exposure_timestamp)
to optimize query performance. Cluster or index byexperiment_id
anduser_id
for faster lookups.
Example prepared table schema
Column | Type | Example |
---|---|---|
user_id | STRING | abc123 |
experiment_id | STRING | checkout_flow_v2 |
treatment | STRING | control |
exposure_timestamp | TIMESTAMP | 2025-03-14T12:45:00Z |
environment_id | STRING | prod |
traffic_type | STRING | user |
Once your Assignment Source tables are prepared and validated, see Setting Up an Assignment Source to connect them in Harness FME.