数据转化逻辑指导

幂简官方

396 浏览

37 试用

9 购买

Sep 22, 2025更新

数据处理文生文

提供专业的业务智能数据转化建议，逻辑清晰，实用性强。

以下为统一口径看板（销量、复购率、客单价）所需的数据转化与指标定义，覆盖时间与币种标准、重复订单去重、退款整合及口径统一。目标是形成可复用的数据层与清晰的业务规则，确保跨渠道、促销与省份维度可对齐。

一、总体转化原则

时间口径：以订单支付时间 pay_time（UTC+0）归属指标。看板默认按 UTC+0 日历日聚合；如需本地时区展示，统一在展示层转换，不改变归属口径。
币种口径：将金额统一到一种基准币（建议 CNY）。在明细层保留原币种与汇率，聚合层用基准币。若当前缺少 currency 字段，需在采集层补充或建立规则映射，避免基于省份或渠道推断币种。
去重口径：先做订单行级去重，再聚合为订单头，避免多 SKU 订单在订单数、客单价的重复计算。
退款口径：以订单维度整合退款，形成净额与全额退款标识。统一按支付归属（Order-based）回溯净额，用于 AOV/销量；退款发生日用于退款率类监控（如需）。
促销与折扣：discount_amt视为订单获得的折扣；is_campaign用于分组分析，不改变指标口径。

二、明细层（清洗与标准化）

类型与字段标准化
- 转换数据类型：order_id、user_id 为字符串/整数规范；pay_time/refund_time 转为 UTC+0 的时间戳；qty 为整型；金额为数值型。
- 生成标准字段：
  - order_date_utc = date(pay_time at UTC+0)
  - currency_code（缺失则需补采或逻辑补充）
  - fx_rate_to_cny（按 pay_time 当日或月度固定汇率）
  - pay_amt_cny = pay_amt * fx_rate_to_cny
  - discount_amt_cny = discount_amt * fx_rate_to_cny
  - channel_normalized（统一渠道枚举）
  - province_normalized（统一省份枚举）
订单行级去重（避免历史重复行）
- 重复判定：同一 order_id + sku_id + user_id + pay_time + qty + pay_amt + discount_amt + is_campaign + channel 完全相同的记录视为重复。
- 保留策略：按入库时间或行号（row_number）保留第一条，剔除其余。
订单头聚合（多 SKU 订单归并）
- 对同一 order_id 聚合：
  - order_qty = sum(qty)
  - order_pay_amt_cny = sum(pay_amt_cny)
  - order_discount_amt_cny = sum(discount_amt_cny)
  - order_is_campaign = max(is_campaign)（只要有一个行是促销则订单视为促销）
  - pay_time = max(pay_time) 或统一取订单级支付时间（若有订单头表则以其为准）
  - channel = 订单维度主渠道（如行级不一致，需定义主渠道提取规则）
用户维度增强
- first_pay_time = 用户首单支付时间（从订单头取最早 pay_time）
- register_date 保留用于新客识别与合规分析（新客以首单时间识别更稳健，注册时间可用于二次校验）

三、退款整合（订单级）

退款聚合到订单：对退款表按 order_id 汇总多笔退款，去重重复退款行（同 order_id + refund_amt + refund_time 的完全重复记录仅保留一条）。
- refund_total_amt_cny = sum(refund_amt * fx_rate_to_cny_by_refund_time)
- last_refund_time = max(refund_time)
构造订单净额与标示：
- net_order_amt_cny = order_pay_amt_cny - refund_total_amt_cny
- is_full_refund = (refund_total_amt_cny >= order_pay_amt_cny - 微小阈值) 设定阈值处理四舍五入误差（如 0.01 CNY）
- has_refund = (refund_total_amt_cny > 0)
退款归属原则（用于统一看板）：
- AOV、订单数、销量：按支付归属；排除全额退款订单；部分退款保留订单与销量，金额用净额。
- 如需按退款发生日的退款率/退款金额趋势，另建按 refund_time 的主题指标，不影响上述三大核心指标口径。

四、核心指标统一口径

订单有效性（valid_order）：
- 条件：订单有支付（order_pay_amt_cny > 0），且不是全额退款（is_full_refund = false）
销量（件数）
- 定义：sum(order_qty) over valid_order
- 说明：因缺少退款数量，无法对部分退款调整件数；如业务强需求，需补采退款件数或 SKU 级退款明细。当前口径为“剔除全额退款订单的销量”，部分退款不调整件数。
订单数
- 定义：count(distinct order_id) over valid_order
销售额（建议同时输出）
- GMV（含折扣前）：sum(order_pay_amt_cny + order_discount_amt_cny) over paid orders（不剔除全额退款，仅用于营销/定价分析）
- 已付金额（Paid Revenue，折扣后）：sum(order_pay_amt_cny) over paid orders
- 净销售额（Net Sales，用于 AOV）：sum(net_order_amt_cny) over valid_order
客单价（AOV）
- 定义（统一口径）：AOV = 净销售额 / 订单数 = sum(net_order_amt_cny over valid_order) / count(distinct order_id over valid_order)
- 说明：剔除全额退款订单；包含部分退款后的净额，确保与订单数一致。
复购率（两套常用口径，建议明确在看板中标注）
- 期间复购率（Period-based）：
  - 范围：选择统计周期 P（如自然周/月，按 pay_time 归属）
  - 分母：在周期 P 内下过至少一笔有效订单的用户数 U1
  - 分子：在周期 P 内下过至少两笔有效订单的用户数 U2
  - 复购率 = U2 / U1
  - 优点：简单直观；缺点：受周期设定影响较大
- 首购后 X 日复购率（Cohort-based，推荐作为主追踪）：
  - 分母：在周期 P 首次下单用户数（首购发生于 P 的用户）C1
  - 分子：这些用户在首购后 X 天内（如 30/60/90 天）再次下单（有效订单）的人数 C2
  - 复购率_X天 = C2 / C1
  - 技术实现：计算用户 first_pay_time 与下一笔有效订单的 pay_time 差值；支持滚动补齐（backfill）
- 注意：复购率用户统计需基于 valid_order（剔除全额退款）；部分退款不影响用户是否复购的判定。

五、示例转换逻辑（伪 SQL，供实现参考）

订单行去重
- 使用 row_number 按上述维度分组，保留 rn=1。
订单头聚合
- group by order_id 聚合 qty、pay/discount 金额，并取统一 pay_time。
退款汇总
- group by order_id 汇总 refund_amt，生成 refund_total_amt_cny 与 last_refund_time。
订单与退款整合
- left join 订单头与退款汇总，生成 net_order_amt_cny、is_full_refund、has_refund。
有效订单集
- where order_pay_amt_cny > 0 and is_full_refund = false
指标计算
- 销量：sum(order_qty)
- 订单数：count(distinct order_id)
- 净销售额：sum(net_order_amt_cny)
- AOV：sum(net_order_amt_cny) / count(distinct order_id)
- 复购率（期间）：用户粒度 count(distinct order_id) >=2
- 复购率（X天）：用户粒度计算 next_order_time - first_pay_time <= X

六、维度口径与分组建议

渠道 channel：用于分组（整体口径不变）。若一单多渠道，需要明确主渠道规则（例如按付款渠道或归因规则）。
促销 is_campaign：用于分组与归因分析，不影响 AOV/复购/销量的计算口径。
省份 province：按用户维度分组；注意用户迁移场景（以下单时用户当前省份为准或以注册省份为准，需统一规则）。
新客/老客定义（看板需一致）
- 新客（推荐）：first_pay_time 落在统计周期内的用户
- 老客：首购时间早于统计周期的用户

七、数据质量与口径风险控制

币种与汇率：必须在明细层保留原币与汇率版本；建议以支付日汇率折算到 CNY，避免退款日折算造成跨期偏差。若需财务对账，另维护会计汇率版本。
迟滞退款：净额采用支付归属会导致历史回溯；建立每日增量回补流程并记录修订标记（versioning）。
重复订单定义：如存在“同一 order_id 不同 pay_time”的异常（补单/改单），需以最新有效支付时间为准并标记修订。
缺失数据：若无法获得币种或退款件数，需在看板清晰注明口径限制（例如“部分退款不影响销量，仅影响金额”）。

结论与关键点

用订单行去重、订单头聚合、退款整合三步，形成“有效订单（剔除全额退款）”统一基础。
AOV、订单数、销量全部基于有效订单，金额用净额（折扣后、扣除退款），销量不调整部分退款件数。
复购率优先采用 cohort 口径；期间口径备用并在看板标注。
全部时间以 pay_time 的 UTC+0 归属；币种统一折算到基准币，明细层保留原币与汇率。
渠道、促销、地域仅作为分组维度，不改变核心计算规则。

如需，我可提供针对具体数据平台（SQL/ETL/模型层）的实现模板与字段字典，以加速落地。

Below is a concise, implementation-ready description of the data transformations required to make your event log usable for A/B experiments, funnels, and retention analyses. It focuses on correctness under late-arriving and out-of-order events, UTC+8 time boundaries, and identity stitching between anonymous_id and uid.

Target outputs (curated data model)

canonical_events: One row per cleaned event, ordered by event time, with standardized fields.
user_identity_map: Mapping of anonymous_id → uid (many-to-one) with first_seen_at and last_seen_at.
ab_assignments: First exposure row per user_key and experiment, with frozen variant.
sessions: Sessionized user activity (optional but recommended for funnel and retention diagnostics).
fact_orders and fact_order_items: Order-level and line-item facts derived from purchase events.
daily_user_activity: One row per user per local day for retention and DAU/WAU/MAU (based on a defined activity set).

Core challenges to address

Late and out-of-order events: Use event time semantics with watermarks; reprocess recent partitions.
Time zone: Business day is UTC+8; conversion must be systematic and consistent.
Identity stitching: Merge anonymous_id and uid into a durable user_key without contaminating A/B assignment.
Schema normalization: Parse props(json), standardize field names/types, and deduplicate.

Transformations pipeline A. Parse and normalize the raw schema

Input columns: event_name, user_id, ts, ab_group, device, props (JSON).
Parse props(JSON) and extract canonical fields (e.g., product_id, cart_id, order_id, currency, price, quantity, anonymous_id, uid, experiment_id if present).
Type cast all extracted fields (strings, numerics, ISO currency, etc.).
Standardize device (e.g., device_type, os, app/web).

B. Timestamp handling and local day boundaries (UTC+8)

Assume ts is recorded in UTC+8. Create:
- event_time_local = parse(ts) in UTC+8.
- event_time_utc = convert(event_time_local to UTC).
- event_date_local = date(event_time_local).
Add ingestion_time_utc (load time) if available.
Use event_time_local for funnels, retention, and A/B day boundaries; use event_time_utc for cross-system joins.

C. Identity stitching (anonymous_id + uid)

Extraction:
- From props, extract anonymous_id and uid (do not rely solely on user_id).
Mapping table (user_identity_map):
- When an event carries both anonymous_id and uid, create or update a link anonymous_id → uid with first_seen_at = min(link times), last_seen_at = max.
- Enforce many-to-one: if an anonymous_id later links to a different uid, keep the earliest uid and flag conflicts for QA.
Durable user_key:
- user_key = coalesce(uid, map(anonymous_id), anonymous_id).
- Backfill: Once a link is established, backfill user_key for past events of that anonymous_id within a configurable lookback window (e.g., 30 days) to unify history.
A/B caveat: For A/B analysis, freeze assignment at first exposure (see E). Do not let post-exposure identity merges change the assigned variant; handle collisions explicitly.

D. Event standardization and deduplication

Event ID:
- Use provided event_id if present in props; else create a deterministic id_hash = hash(source_id, user identifiers, event_name, critical keys like order_id/product_id, event_time_utc).
Dedup rule:
- Keep the latest record by ingestion_time_utc for identical event_id.
- If no event_id, dedup within a time window (e.g., 24h) by (user_key, event_name, product_id/order_id, event_time_utc second-level) with last-write-wins.
Ordering and late data:
- Use event_time_local as the primary sort key; ingestion_time_utc as a tie-breaker.
- Apply an event-time watermark (e.g., 72 hours) for streaming; in batch, reprocess the last N local days (e.g., 7 days) to capture late arrivals.

E. A/B experiment readiness

Exposure definition:
- Exposure occurs on the first event where ab_group (and optionally experiment_id) is observed or where props indicates assignment/exposure.
ab_assignments table:
- Keys: user_key, experiment_id (if multiple experiments), variant (ab_group normalized), exposure_time_local, exposure_date_local.
- Freeze variant at first exposure_time_local. If a later event shows a different variant for the same user_key/experiment, flag as contamination and exclude or handle per analysis policy.
- If identity merge links two identifiers with different variants, choose the earliest exposure and flag as conflict; define a rule (exclude from primary, include in sensitivity).
Attach variant to events:
- Left join canonical_events to ab_assignments on user_key and event_time_local >= exposure_time_local to tag post-exposure events for effect measurement.
- For pre-exposure covariates, use events strictly before exposure_time_local.

F. Funnel readiness

Flatten and standardize core events:
- view_product: product_id, category, price, currency, referrer.
- add_to_cart: product_id, quantity, price, currency, cart_id.
- checkout: cart_id, step (if available), order_id (if available).
- purchase: order_id (required), revenue_gross, discount, tax, shipping, currency; items[] if available.
Derivations/joins:
- If purchase is order-level with items[], explode to fact_order_items for product-level funnels.
- Link checkout and add_to_cart via cart_id; link purchase via order_id. If missing IDs, use session- and time-based heuristics (nearest prior cart within same session).
Sequencing:
- Order events per user_key by event_time_local to build funnel steps.
- Optional product-consistent funnels: require the same product_id across steps; otherwise do cart/order-consistent funnels.

G. Retention readiness

Activity set:
- Define active_event_flag = 1 for events considered “active” (e.g., any of view_product, add_to_cart, checkout, purchase; adjust as needed).
Cohort logic:
- cohort_event default: first observed active event for the user_key (configurable).
- cohort_date_local = date of cohort_event.event_time_local.
Daily activity:
- daily_user_activity aggregates by user_key, event_date_local; include device, ab variant, and activity flags.
- Retention computed as returning activity on day N after cohort_date_local (D1, D7, etc.) using event_date_local.

Canonical event schema (recommended)

Core fields:
- event_id, event_name
- event_time_local, event_time_utc, event_date_local
- user_key, uid, anonymous_id
- device_type, os, platform (from device and props)
- ab_group_raw, experiment_id (if present), ab_variant (normalized)
- product_id, category, price, currency, quantity
- cart_id, order_id, revenue_gross, discount, tax, shipping
- ingestion_time_utc, source (pipeline source)
Notes:
- Keep raw props as a JSON blob for auditability in addition to flattened fields.

Event definitions and conversion rules (implementation-ready)

view_product
- Required: product_id
- Optional: price, currency, category, referrer
- Dedup recommended: one view per user_key-product_id per session to avoid overcount inflation (configurable).
add_to_cart
- Required: product_id, cart_id (if available), quantity >= 1
- Revenue fields optional; if missing, inherit latest known price for that product in session.
checkout
- Required: cart_id; if order_id already known, capture it.
- Treat first checkout per cart_id as “checkout_start.”
purchase
- Required: order_id, currency; revenue fields (revenue_gross, discount, tax, shipping) strongly recommended.
- If items[] present, explode into fact_order_items with product_id, quantity, price.
Conversion rules (parameterized; defaults are placeholders):
- Funnel conversion window: Analyze with event_time_local ordering; default window e.g., 7 days from first step (configurable).
- Uniqueness: One conversion per user_key per funnel instance; define per cart_id/order_id for cart/order funnels; per product_id for product funnels.
- Attribution: Attribute purchase to the most recent eligible funnel path for the same cart_id/order_id; if absent, use latest session path.

Handling late and out-of-order data

Watermarks and reprocessing:
- Streaming: allow 72h allowed lateness (tune to actual delay distribution).
- Batch: daily job reprocesses last 7 local days; weekly job reprocesses last 30 days for identity backfills and order corrections.
Idempotency:
- Dedup by event_id; backfill identity safely (do not change frozen A/B variants).
Monitoring:
- Track late ratio by delay buckets (0–1h, 1–24h, 1–7d), out-of-order rate, and dedup rate.

A/B analysis safeguards

Freeze variant at first exposure_time_local in ab_assignments.
Exclude users with variant collisions by default from primary reads (publish a collision rate metric).
For cross-device merges, apply earliest exposure after identity stitch; do not reassign retroactively.
Anchor all experiment day boundaries to event_date_local (UTC+8).

Sessions (recommended)

Build sessions per user_key using 30-minute inactivity gap on event_time_local.
Store session_id, session_start_local, session_end_local, device, ab_variant at session start.
Use session_id to constrain funnels and to impute missing cart_id/order_id when needed.

Data quality checks

Required fields non-null rates per event_name (e.g., purchase must have order_id).
Currency and price sanity (positive, valid ISO code).
Identity consistency: anonymous_id mapping cardinality; collision rate.
Time sanity: no future timestamps; distribution of negative intervals (purchase before view) after ordering by event_time_local.
A/B coverage: share of events/users with ab assignment; variant balance.

Example implementation notes (SQL-level guidance)

Flatten props: SELECT json_extract_scalar(props, '$.product_id') AS product_id, ...
Identity map: INSERT INTO user_identity_map when anonymous_id and uid coexist in the same event; coalesce in canonical_events using left join to map table.
Canonical events: Build via staging → dedup (by event_id/hash) → enrichment (device/identity/AB) → publish to canonical_events partitioned by event_date_local.
AB assignments: SELECT first_value(ab_group) over (partition by user_key, experiment_id order by event_time_local) as frozen variant at first exposure.
Watermark: filter or tag events with event_time_local older than watermark; reprocess recent partitions to integrate late data.

Key conclusions

Use event-time processing with UTC+8 business-day boundaries to ensure correct funnels, retention, and A/B windows.
Create a durable user_key via anonymous_id-to-uid stitching with backfill, but freeze A/B assignment at first exposure to avoid contamination.
Standardize and deduplicate core events, enrich with order/cart links, and expose both order-level and item-level facts.
Implement watermarks and scheduled reprocessing to capture late and out-of-order arrivals reliably.
Maintain explicit ab_assignments and daily_user_activity tables to streamline experiment reads, funnel building, and retention views.

解决的问题

将零散复杂的业务数据快速转化为可直接用于分析与汇报的结构化成果。通过让 AI 充当资深业务智能顾问，围绕你的具体场景输出：清晰的字段映射与口径定义、数据清洗与归一化规则、计算逻辑与分组维度、时间与汇总策略、校验清单与风险提示。帮助团队缩短从数据到结论的路径，提升报表准确率，减少返工，实现跨部门的一致口径与高效协同。你只需提供原始数据的关键描述、目标分析或报告类型以及期望的输出语言，即可获得专业、简洁、可执行的转化方案。

适用用户

电商运营负责人

搭建销量、复购、客单价看板的数据转化方案；统一活动与日常口径；定位数据缺口，快速推动BI上线。

增长产品经理

为A/B实验、漏斗、留存分析制定转化规则与事件定义；明确埋点与报表需求，缩短实验复盘周期。

数据分析师

把杂乱原始表转为可分析结构；输出字段映射、清洗标准与聚合逻辑；提升报告速度与准确性。

特征总结

• 依据目标分析一键生成数据转化蓝图，明确清洗、标准化、聚合步骤与负责人，减少反复沟通。

• 自动对齐核心指标口径，附口径解释与边界示例，避免团队对同一指标理解不一。

• 面向电商、增长、财务等场景，输出可直接执行的数据准备清单与顺序，快速落地报告。

• 智能识别原始数据缺漏与风险，给出质检要点与补数建议，降低看板与报表出错率。

• 基于你的原始数据描述，生成字段映射与命名规范，便于跨系统协作与后续复用。

• 一键产出看板、漏斗、留存等报告所需数据结构与样例，拿来即可开始制作。

• 支持多语言商务表达，输出专业且简洁的建议文本，方便与全球团队高效协作。

• 模板化参数输入，快速复用在不同项目，保持方法一致，显著缩短交付周期。

• 结合业务问题主动澄清关键信息与假设，确保转化逻辑贴合真实场景并能落地执行。

• 给出图表粒度、维度与筛选建议，减少画图返工，提升沟通效率与决策速度。

如何使用购买的提示词模板

1. 直接在外部 Chat 应用中使用

将模板生成的提示词复制粘贴到您常用的 Chat 应用（如 ChatGPT、Claude 等），即可直接对话使用，无需额外开发。适合个人快速体验和轻量使用场景。

2. 发布为 API 接口调用

把提示词模板转化为 API，您的程序可任意修改模板参数，通过接口直接调用，轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。

3. 在 MCP Client 中配置使用

在 MCP client 中配置对应的 server 地址，让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作，让提示词在不同 AI 工具间无缝衔接。

AI 提示词价格

￥20.00元

先用后买，用好了再付款，超安全！

在线免费用提示词

您购买后可以获得什么

✓

获得完整提示词模板

- 共 266 tokens

- 3 个可调节参数

{ 原始数据描述 } { 目标分析或报告 } { 输出语言 }

✓

获得社区贡献内容的使用权

- 精选社区优质案例，助您快速上手提示词

购买

数据转化逻辑指导

解决的问题