×
¥
查看详情
🔥 会员专享 文生文 数据处理

数据转化逻辑指导

👁️ 420 次查看
📅 Sep 22, 2025
💡 核心价值: 提供专业的业务智能数据转化建议,逻辑清晰,实用性强。

🎯 可自定义参数(3个)

原始数据描述
描述需要进行转化的原始数据。例如:销售数据的字段和格式。
目标分析或报告
描述目标分析或报告的类型。例如:季度销售趋势分析。
输出语言
指定生成内容的语言。例如:中文。

🎨 效果示例

以下为统一口径看板(销量、复购率、客单价)所需的数据转化与指标定义,覆盖时间与币种标准、重复订单去重、退款整合及口径统一。目标是形成可复用的数据层与清晰的业务规则,确保跨渠道、促销与省份维度可对齐。

一、总体转化原则

  • 时间口径:以订单支付时间 pay_time(UTC+0)归属指标。看板默认按 UTC+0 日历日聚合;如需本地时区展示,统一在展示层转换,不改变归属口径。
  • 币种口径:将金额统一到一种基准币(建议 CNY)。在明细层保留原币种与汇率,聚合层用基准币。若当前缺少 currency 字段,需在采集层补充或建立规则映射,避免基于省份或渠道推断币种。
  • 去重口径:先做订单行级去重,再聚合为订单头,避免多 SKU 订单在订单数、客单价的重复计算。
  • 退款口径:以订单维度整合退款,形成净额与全额退款标识。统一按支付归属(Order-based)回溯净额,用于 AOV/销量;退款发生日用于退款率类监控(如需)。
  • 促销与折扣:discount_amt视为订单获得的折扣;is_campaign用于分组分析,不改变指标口径。

二、明细层(清洗与标准化)

  • 类型与字段标准化
    • 转换数据类型:order_id、user_id 为字符串/整数规范;pay_time/refund_time 转为 UTC+0 的时间戳;qty 为整型;金额为数值型。
    • 生成标准字段:
      • order_date_utc = date(pay_time at UTC+0)
      • currency_code(缺失则需补采或逻辑补充)
      • fx_rate_to_cny(按 pay_time 当日或月度固定汇率)
      • pay_amt_cny = pay_amt * fx_rate_to_cny
      • discount_amt_cny = discount_amt * fx_rate_to_cny
      • channel_normalized(统一渠道枚举)
      • province_normalized(统一省份枚举)
  • 订单行级去重(避免历史重复行)
    • 重复判定:同一 order_id + sku_id + user_id + pay_time + qty + pay_amt + discount_amt + is_campaign + channel 完全相同的记录视为重复。
    • 保留策略:按入库时间或行号(row_number)保留第一条,剔除其余。
  • 订单头聚合(多 SKU 订单归并)
    • 对同一 order_id 聚合:
      • order_qty = sum(qty)
      • order_pay_amt_cny = sum(pay_amt_cny)
      • order_discount_amt_cny = sum(discount_amt_cny)
      • order_is_campaign = max(is_campaign)(只要有一个行是促销则订单视为促销)
      • pay_time = max(pay_time) 或统一取订单级支付时间(若有订单头表则以其为准)
      • channel = 订单维度主渠道(如行级不一致,需定义主渠道提取规则)
  • 用户维度增强
    • first_pay_time = 用户首单支付时间(从订单头取最早 pay_time)
    • register_date 保留用于新客识别与合规分析(新客以首单时间识别更稳健,注册时间可用于二次校验)

三、退款整合(订单级)

  • 退款聚合到订单:对退款表按 order_id 汇总多笔退款,去重重复退款行(同 order_id + refund_amt + refund_time 的完全重复记录仅保留一条)。
    • refund_total_amt_cny = sum(refund_amt * fx_rate_to_cny_by_refund_time)
    • last_refund_time = max(refund_time)
  • 构造订单净额与标示:
    • net_order_amt_cny = order_pay_amt_cny - refund_total_amt_cny
    • is_full_refund = (refund_total_amt_cny >= order_pay_amt_cny - 微小阈值) 设定阈值处理四舍五入误差(如 0.01 CNY)
    • has_refund = (refund_total_amt_cny > 0)
  • 退款归属原则(用于统一看板):
    • AOV、订单数、销量:按支付归属;排除全额退款订单;部分退款保留订单与销量,金额用净额。
    • 如需按退款发生日的退款率/退款金额趋势,另建按 refund_time 的主题指标,不影响上述三大核心指标口径。

四、核心指标统一口径

  • 订单有效性(valid_order):
    • 条件:订单有支付(order_pay_amt_cny > 0),且不是全额退款(is_full_refund = false)
  • 销量(件数)
    • 定义:sum(order_qty) over valid_order
    • 说明:因缺少退款数量,无法对部分退款调整件数;如业务强需求,需补采退款件数或 SKU 级退款明细。当前口径为“剔除全额退款订单的销量”,部分退款不调整件数。
  • 订单数
    • 定义:count(distinct order_id) over valid_order
  • 销售额(建议同时输出)
    • GMV(含折扣前):sum(order_pay_amt_cny + order_discount_amt_cny) over paid orders(不剔除全额退款,仅用于营销/定价分析)
    • 已付金额(Paid Revenue,折扣后):sum(order_pay_amt_cny) over paid orders
    • 净销售额(Net Sales,用于 AOV):sum(net_order_amt_cny) over valid_order
  • 客单价(AOV)
    • 定义(统一口径):AOV = 净销售额 / 订单数 = sum(net_order_amt_cny over valid_order) / count(distinct order_id over valid_order)
    • 说明:剔除全额退款订单;包含部分退款后的净额,确保与订单数一致。
  • 复购率(两套常用口径,建议明确在看板中标注)
    • 期间复购率(Period-based):
      • 范围:选择统计周期 P(如自然周/月,按 pay_time 归属)
      • 分母:在周期 P 内下过至少一笔有效订单的用户数 U1
      • 分子:在周期 P 内下过至少两笔有效订单的用户数 U2
      • 复购率 = U2 / U1
      • 优点:简单直观;缺点:受周期设定影响较大
    • 首购后 X 日复购率(Cohort-based,推荐作为主追踪):
      • 分母:在周期 P 首次下单用户数(首购发生于 P 的用户)C1
      • 分子:这些用户在首购后 X 天内(如 30/60/90 天)再次下单(有效订单)的人数 C2
      • 复购率_X天 = C2 / C1
      • 技术实现:计算用户 first_pay_time 与下一笔有效订单的 pay_time 差值;支持滚动补齐(backfill)
    • 注意:复购率用户统计需基于 valid_order(剔除全额退款);部分退款不影响用户是否复购的判定。

五、示例转换逻辑(伪 SQL,供实现参考)

  • 订单行去重
    • 使用 row_number 按上述维度分组,保留 rn=1。
  • 订单头聚合
    • group by order_id 聚合 qty、pay/discount 金额,并取统一 pay_time。
  • 退款汇总
    • group by order_id 汇总 refund_amt,生成 refund_total_amt_cny 与 last_refund_time。
  • 订单与退款整合
    • left join 订单头与退款汇总,生成 net_order_amt_cny、is_full_refund、has_refund。
  • 有效订单集
    • where order_pay_amt_cny > 0 and is_full_refund = false
  • 指标计算
    • 销量:sum(order_qty)
    • 订单数:count(distinct order_id)
    • 净销售额:sum(net_order_amt_cny)
    • AOV:sum(net_order_amt_cny) / count(distinct order_id)
    • 复购率(期间):用户粒度 count(distinct order_id) >=2
    • 复购率(X天):用户粒度计算 next_order_time - first_pay_time <= X

六、维度口径与分组建议

  • 渠道 channel:用于分组(整体口径不变)。若一单多渠道,需要明确主渠道规则(例如按付款渠道或归因规则)。
  • 促销 is_campaign:用于分组与归因分析,不影响 AOV/复购/销量的计算口径。
  • 省份 province:按用户维度分组;注意用户迁移场景(以下单时用户当前省份为准或以注册省份为准,需统一规则)。
  • 新客/老客定义(看板需一致)
    • 新客(推荐):first_pay_time 落在统计周期内的用户
    • 老客:首购时间早于统计周期的用户

七、数据质量与口径风险控制

  • 币种与汇率:必须在明细层保留原币与汇率版本;建议以支付日汇率折算到 CNY,避免退款日折算造成跨期偏差。若需财务对账,另维护会计汇率版本。
  • 迟滞退款:净额采用支付归属会导致历史回溯;建立每日增量回补流程并记录修订标记(versioning)。
  • 重复订单定义:如存在“同一 order_id 不同 pay_time”的异常(补单/改单),需以最新有效支付时间为准并标记修订。
  • 缺失数据:若无法获得币种或退款件数,需在看板清晰注明口径限制(例如“部分退款不影响销量,仅影响金额”)。

结论与关键点

  • 用订单行去重、订单头聚合、退款整合三步,形成“有效订单(剔除全额退款)”统一基础。
  • AOV、订单数、销量全部基于有效订单,金额用净额(折扣后、扣除退款),销量不调整部分退款件数。
  • 复购率优先采用 cohort 口径;期间口径备用并在看板标注。
  • 全部时间以 pay_time 的 UTC+0 归属;币种统一折算到基准币,明细层保留原币与汇率。
  • 渠道、促销、地域仅作为分组维度,不改变核心计算规则。

如需,我可提供针对具体数据平台(SQL/ETL/模型层)的实现模板与字段字典,以加速落地。

Below is a concise, implementation-ready description of the data transformations required to make your event log usable for A/B experiments, funnels, and retention analyses. It focuses on correctness under late-arriving and out-of-order events, UTC+8 time boundaries, and identity stitching between anonymous_id and uid.

  1. Target outputs (curated data model)
  • canonical_events: One row per cleaned event, ordered by event time, with standardized fields.
  • user_identity_map: Mapping of anonymous_id → uid (many-to-one) with first_seen_at and last_seen_at.
  • ab_assignments: First exposure row per user_key and experiment, with frozen variant.
  • sessions: Sessionized user activity (optional but recommended for funnel and retention diagnostics).
  • fact_orders and fact_order_items: Order-level and line-item facts derived from purchase events.
  • daily_user_activity: One row per user per local day for retention and DAU/WAU/MAU (based on a defined activity set).
  1. Core challenges to address
  • Late and out-of-order events: Use event time semantics with watermarks; reprocess recent partitions.
  • Time zone: Business day is UTC+8; conversion must be systematic and consistent.
  • Identity stitching: Merge anonymous_id and uid into a durable user_key without contaminating A/B assignment.
  • Schema normalization: Parse props(json), standardize field names/types, and deduplicate.
  1. Transformations pipeline A. Parse and normalize the raw schema
  • Input columns: event_name, user_id, ts, ab_group, device, props (JSON).
  • Parse props(JSON) and extract canonical fields (e.g., product_id, cart_id, order_id, currency, price, quantity, anonymous_id, uid, experiment_id if present).
  • Type cast all extracted fields (strings, numerics, ISO currency, etc.).
  • Standardize device (e.g., device_type, os, app/web).

B. Timestamp handling and local day boundaries (UTC+8)

  • Assume ts is recorded in UTC+8. Create:
    • event_time_local = parse(ts) in UTC+8.
    • event_time_utc = convert(event_time_local to UTC).
    • event_date_local = date(event_time_local).
  • Add ingestion_time_utc (load time) if available.
  • Use event_time_local for funnels, retention, and A/B day boundaries; use event_time_utc for cross-system joins.

C. Identity stitching (anonymous_id + uid)

  • Extraction:
    • From props, extract anonymous_id and uid (do not rely solely on user_id).
  • Mapping table (user_identity_map):
    • When an event carries both anonymous_id and uid, create or update a link anonymous_id → uid with first_seen_at = min(link times), last_seen_at = max.
    • Enforce many-to-one: if an anonymous_id later links to a different uid, keep the earliest uid and flag conflicts for QA.
  • Durable user_key:
    • user_key = coalesce(uid, map(anonymous_id), anonymous_id).
    • Backfill: Once a link is established, backfill user_key for past events of that anonymous_id within a configurable lookback window (e.g., 30 days) to unify history.
  • A/B caveat: For A/B analysis, freeze assignment at first exposure (see E). Do not let post-exposure identity merges change the assigned variant; handle collisions explicitly.

D. Event standardization and deduplication

  • Event ID:
    • Use provided event_id if present in props; else create a deterministic id_hash = hash(source_id, user identifiers, event_name, critical keys like order_id/product_id, event_time_utc).
  • Dedup rule:
    • Keep the latest record by ingestion_time_utc for identical event_id.
    • If no event_id, dedup within a time window (e.g., 24h) by (user_key, event_name, product_id/order_id, event_time_utc second-level) with last-write-wins.
  • Ordering and late data:
    • Use event_time_local as the primary sort key; ingestion_time_utc as a tie-breaker.
    • Apply an event-time watermark (e.g., 72 hours) for streaming; in batch, reprocess the last N local days (e.g., 7 days) to capture late arrivals.

E. A/B experiment readiness

  • Exposure definition:
    • Exposure occurs on the first event where ab_group (and optionally experiment_id) is observed or where props indicates assignment/exposure.
  • ab_assignments table:
    • Keys: user_key, experiment_id (if multiple experiments), variant (ab_group normalized), exposure_time_local, exposure_date_local.
    • Freeze variant at first exposure_time_local. If a later event shows a different variant for the same user_key/experiment, flag as contamination and exclude or handle per analysis policy.
    • If identity merge links two identifiers with different variants, choose the earliest exposure and flag as conflict; define a rule (exclude from primary, include in sensitivity).
  • Attach variant to events:
    • Left join canonical_events to ab_assignments on user_key and event_time_local >= exposure_time_local to tag post-exposure events for effect measurement.
    • For pre-exposure covariates, use events strictly before exposure_time_local.

F. Funnel readiness

  • Flatten and standardize core events:
    • view_product: product_id, category, price, currency, referrer.
    • add_to_cart: product_id, quantity, price, currency, cart_id.
    • checkout: cart_id, step (if available), order_id (if available).
    • purchase: order_id (required), revenue_gross, discount, tax, shipping, currency; items[] if available.
  • Derivations/joins:
    • If purchase is order-level with items[], explode to fact_order_items for product-level funnels.
    • Link checkout and add_to_cart via cart_id; link purchase via order_id. If missing IDs, use session- and time-based heuristics (nearest prior cart within same session).
  • Sequencing:
    • Order events per user_key by event_time_local to build funnel steps.
    • Optional product-consistent funnels: require the same product_id across steps; otherwise do cart/order-consistent funnels.

G. Retention readiness

  • Activity set:
    • Define active_event_flag = 1 for events considered “active” (e.g., any of view_product, add_to_cart, checkout, purchase; adjust as needed).
  • Cohort logic:
    • cohort_event default: first observed active event for the user_key (configurable).
    • cohort_date_local = date of cohort_event.event_time_local.
  • Daily activity:
    • daily_user_activity aggregates by user_key, event_date_local; include device, ab variant, and activity flags.
    • Retention computed as returning activity on day N after cohort_date_local (D1, D7, etc.) using event_date_local.
  1. Canonical event schema (recommended)
  • Core fields:
    • event_id, event_name
    • event_time_local, event_time_utc, event_date_local
    • user_key, uid, anonymous_id
    • device_type, os, platform (from device and props)
    • ab_group_raw, experiment_id (if present), ab_variant (normalized)
    • product_id, category, price, currency, quantity
    • cart_id, order_id, revenue_gross, discount, tax, shipping
    • ingestion_time_utc, source (pipeline source)
  • Notes:
    • Keep raw props as a JSON blob for auditability in addition to flattened fields.
  1. Event definitions and conversion rules (implementation-ready)
  • view_product
    • Required: product_id
    • Optional: price, currency, category, referrer
    • Dedup recommended: one view per user_key-product_id per session to avoid overcount inflation (configurable).
  • add_to_cart
    • Required: product_id, cart_id (if available), quantity >= 1
    • Revenue fields optional; if missing, inherit latest known price for that product in session.
  • checkout
    • Required: cart_id; if order_id already known, capture it.
    • Treat first checkout per cart_id as “checkout_start.”
  • purchase
    • Required: order_id, currency; revenue fields (revenue_gross, discount, tax, shipping) strongly recommended.
    • If items[] present, explode into fact_order_items with product_id, quantity, price.
  • Conversion rules (parameterized; defaults are placeholders):
    • Funnel conversion window: Analyze with event_time_local ordering; default window e.g., 7 days from first step (configurable).
    • Uniqueness: One conversion per user_key per funnel instance; define per cart_id/order_id for cart/order funnels; per product_id for product funnels.
    • Attribution: Attribute purchase to the most recent eligible funnel path for the same cart_id/order_id; if absent, use latest session path.
  1. Handling late and out-of-order data
  • Watermarks and reprocessing:
    • Streaming: allow 72h allowed lateness (tune to actual delay distribution).
    • Batch: daily job reprocesses last 7 local days; weekly job reprocesses last 30 days for identity backfills and order corrections.
  • Idempotency:
    • Dedup by event_id; backfill identity safely (do not change frozen A/B variants).
  • Monitoring:
    • Track late ratio by delay buckets (0–1h, 1–24h, 1–7d), out-of-order rate, and dedup rate.
  1. A/B analysis safeguards
  • Freeze variant at first exposure_time_local in ab_assignments.
  • Exclude users with variant collisions by default from primary reads (publish a collision rate metric).
  • For cross-device merges, apply earliest exposure after identity stitch; do not reassign retroactively.
  • Anchor all experiment day boundaries to event_date_local (UTC+8).
  1. Sessions (recommended)
  • Build sessions per user_key using 30-minute inactivity gap on event_time_local.
  • Store session_id, session_start_local, session_end_local, device, ab_variant at session start.
  • Use session_id to constrain funnels and to impute missing cart_id/order_id when needed.
  1. Data quality checks
  • Required fields non-null rates per event_name (e.g., purchase must have order_id).
  • Currency and price sanity (positive, valid ISO code).
  • Identity consistency: anonymous_id mapping cardinality; collision rate.
  • Time sanity: no future timestamps; distribution of negative intervals (purchase before view) after ordering by event_time_local.
  • A/B coverage: share of events/users with ab assignment; variant balance.
  1. Example implementation notes (SQL-level guidance)
  • Flatten props: SELECT json_extract_scalar(props, '$.product_id') AS product_id, ...
  • Identity map: INSERT INTO user_identity_map when anonymous_id and uid coexist in the same event; coalesce in canonical_events using left join to map table.
  • Canonical events: Build via staging → dedup (by event_id/hash) → enrichment (device/identity/AB) → publish to canonical_events partitioned by event_date_local.
  • AB assignments: SELECT first_value(ab_group) over (partition by user_key, experiment_id order by event_time_local) as frozen variant at first exposure.
  • Watermark: filter or tag events with event_time_local older than watermark; reprocess recent partitions to integrate late data.

Key conclusions

  • Use event-time processing with UTC+8 business-day boundaries to ensure correct funnels, retention, and A/B windows.
  • Create a durable user_key via anonymous_id-to-uid stitching with backfill, but freeze A/B assignment at first exposure to avoid contamination.
  • Standardize and deduplicate core events, enrich with order/cart links, and expose both order-level and item-level facts.
  • Implement watermarks and scheduled reprocessing to capture late and out-of-order arrivals reliably.
  • Maintain explicit ab_assignments and daily_user_activity tables to streamline experiment reads, funnel building, and retention views.

示例详情

📖 如何使用

30秒出活:复制 → 粘贴 → 搞定
与其花几十分钟和AI聊天、试错,不如直接复制这些经过千人验证的模板,修改几个 {{变量}} 就能立刻获得专业级输出。省下来的时间,足够你轻松享受两杯咖啡!
加载中...
💬 不会填参数?让 AI 反过来问你
不确定变量该填什么?一键转为对话模式,AI 会像资深顾问一样逐步引导你,问几个问题就能自动生成完美匹配你需求的定制结果。零门槛,开口就行。
转为对话模式
🚀 告别复制粘贴,Chat 里直接调用
无需切换,输入 / 唤醒 8000+ 专家级提示词。 插件将全站提示词库深度集成于 Chat 输入框。基于当前对话语境,系统智能推荐最契合的 Prompt 并自动完成参数化,让海量资源触手可及,从此彻底告别"手动搬运"。
即将推出
🔌 接口一调,提示词自己会进化
手动跑一次还行,跑一百次呢?通过 API 接口动态注入变量,接入批量评价引擎,让程序自动迭代出更高质量的提示词方案。Prompt 会自己进化,你只管收结果。
发布 API
🤖 一键变成你的专属 Agent 应用
不想每次都配参数?把这条提示词直接发布成独立 Agent,内嵌图片生成、参数优化等工具,分享链接就能用。给团队或客户一个"开箱即用"的完整方案。
创建 Agent

✅ 特性总结

依据目标分析一键生成数据转化蓝图,明确清洗、标准化、聚合步骤与负责人,减少反复沟通。
自动对齐核心指标口径,附口径解释与边界示例,避免团队对同一指标理解不一。
面向电商、增长、财务等场景,输出可直接执行的数据准备清单与顺序,快速落地报告。
智能识别原始数据缺漏与风险,给出质检要点与补数建议,降低看板与报表出错率。
基于你的原始数据描述,生成字段映射与命名规范,便于跨系统协作与后续复用。
一键产出看板、漏斗、留存等报告所需数据结构与样例,拿来即可开始制作。
支持多语言商务表达,输出专业且简洁的建议文本,方便与全球团队高效协作。
模板化参数输入,快速复用在不同项目,保持方法一致,显著缩短交付周期。
结合业务问题主动澄清关键信息与假设,确保转化逻辑贴合真实场景并能落地执行。
给出图表粒度、维度与筛选建议,减少画图返工,提升沟通效率与决策速度。

🎯 解决的问题

将零散复杂的业务数据快速转化为可直接用于分析与汇报的结构化成果。通过让 AI 充当资深业务智能顾问,围绕你的具体场景输出:清晰的字段映射与口径定义、数据清洗与归一化规则、计算逻辑与分组维度、时间与汇总策略、校验清单与风险提示。帮助团队缩短从数据到结论的路径,提升报表准确率,减少返工,实现跨部门的一致口径与高效协同。你只需提供原始数据的关键描述、目标分析或报告类型以及期望的输出语言,即可获得专业、简洁、可执行的转化方案。

🕒 版本历史

当前版本
v2.1 2024-01-15
优化输出结构,增强情节连贯性
  • ✨ 新增章节节奏控制参数
  • 🔧 优化人物关系描述逻辑
  • 📝 改进主题深化引导语
  • 🎯 增强情节转折点设计
v2.0 2023-12-20
重构提示词架构,提升生成质量
  • 🚀 全新的提示词结构设计
  • 📊 增加输出格式化选项
  • 💡 优化角色塑造引导
v1.5 2023-11-10
修复已知问题,提升稳定性
  • 🐛 修复长文本处理bug
  • ⚡ 提升响应速度
v1.0 2023-10-01
首次发布
  • 🎉 初始版本上线
COMING SOON
版本历史追踪,即将启航
记录每一次提示词的进化与升级,敬请期待。

💬 用户评价

4.8
⭐⭐⭐⭐⭐
基于 28 条评价
5星
85%
4星
12%
3星
3%
👤
电商运营 - 张先生
⭐⭐⭐⭐⭐ 2025-01-15
双十一用这个提示词生成了20多张海报,效果非常好!点击率提升了35%,节省了大量设计时间。参数调整很灵活,能快速适配不同节日。
效果好 节省时间
👤
品牌设计师 - 李女士
⭐⭐⭐⭐⭐ 2025-01-10
作为设计师,这个提示词帮我快速生成创意方向,大大提升了工作效率。生成的海报氛围感很强,稍作调整就能直接使用。
创意好 专业
COMING SOON
用户评价与反馈系统,即将上线
倾听真实反馈,在这里留下您的使用心得,敬请期待。
加载中...