制定数据预处理步骤清单

幂简官方

180 浏览

15 试用

4 购买

Sep 28, 2025更新

数据分析文生文

生成清晰准确的数据预处理步骤，帮助用户高效完成数据分析。

以下为“新上线App的用户事件日志与画像（含时间戳、设备、事件名）”的标准化预处理步骤清单。目标是形成高质量、可复用的数据层（清洗层与建模层），为埋点分析、漏斗、归因与建模提供一致的数据基础。

数据理解与字段规范

明确数据源与刷新频率：客户端SDK、服务端日志、第三方归因/广告平台；批量/流式。
定义核心字段与数据类型：
- 事件层：event_id（生成或来源唯一ID）、user_id（匿名ID或登录ID）、device_id、event_name、event_time、server_time、event_params（JSON）、app_version、os、os_version、device_model、locale、network、ip。
- 画像层：user_id、性别/年龄段、注册时间、地区、渠道、设备属性、偏好标签等。
统一命名与取值域：事件名使用小写蛇形；设备系统统一为 ios/android；渠道、国家/地区、语言使用ISO标准码表。

原始数据接入与落地（Staging）

原样落库并打上元数据：ingest_time、source、batch_id；禁止在此层修改字段值。
半结构化字段（event_params）保留原始JSON，并记录schema版本（event_schema_version）。
分区策略：按事件发生日（event_date=toUTCDate(event_time)）分区，避免按摄取时间分区导致时区错位。

时间戳处理与时区对齐

时钟优先级：优先使用客户端事件时间（event_time），若缺失或异常则回退server_time。
时区统一：全部转换为UTC并保留原始时区字段（client_timezone）。
迟到事件处理：允许T+N天迟到窗口（例如N=7），对迟到数据进行分区补写；保留is_late标记。
异常时间过滤：剔除超出上线窗口的时间（如早于app发布日或未来>24h），或标记为suspect_time。

去重与事件排序

强去重键：coalesce(event_id, hash(user_id, device_id, event_name, event_time, app_version))，窗口±1秒内完全相同记录视为重复。
近似重复：同用户同事件在极短时间内（如<100ms）重复触发标记为burst_duplicate。
事件排序：按user_id、event_time排序，生成event_seq以支持序列分析。

设备字段归一化

设备型号标准化：映射厂商/机型别名（如 iPhone13,3 -> iPhone 12 Pro）。
系统与版本规整：os ∈ {ios, android}；版本标准化为语义化版本并拆分主/次/补丁号。
设备指纹风险标记：模拟器、越狱/Root迹象、异常分辨率/UA，产出is_emulator、is_jailbroken等标记。
标识符规范：对device_id/广告ID做不可逆哈希，存储hash值与哈希盐版本，避免明文PII。

用户标识解析与跨端合并（Identity Resolution）

标识图谱：使用login_id、device_id、install_id、第三方open_id进行确定性合并。
合并规则：登录后将历史匿名行为并到同一persistent_user_id；保留合并关系表与生效时间（SCD2）。
多设备用户：同一persistent_user_id下可能有多device_id，打标is_multi_device。

画像数据清洗与合并

缺失处理：性别/年龄段等敏感画像缺失不强填，采用unknown/未声明；地区字段标准化为行政区划码。
合同一致性：用户画像采用SCD2管理，保留画像生效起止时间，按事件发生时间点联接画像快照。
质量约束：类别域校验（如性别∈{male,female,unknown}），异常值（年龄<0或>100）重置为unknown。

事件词典与参数展开

事件词典：建立event_name白名单与定义文档；为关键事件（install、open、register、login、purchase、pay_success、add_to_cart、view_item等）定义必要参数。
JSON展开：将event_params扁平化为列；对数组字段explode；为新出现字段进行schema演进（新增列并记录版本）。
类型校验：参数强制类型转换（金额用decimal(18,2)，布尔/整型/浮点严格校验）；非法值置空并记录错误码。

会话切割（Sessionization）

基于不活跃阈值切割：默认30分钟无事件则新会话；前后台事件可用于精细切割。
产出会话指标：session_id、session_start/end、事件数、时长、是否首日会话、是否来源冷启动。
异常会话处理：时长<0或>24h标记异常；秒级爆发会话打burst标记。

归因与流量标记整合（如适用）

UTM/referrer解析：utm_source/medium/campaign/content/term统一，渠道字典映射。
触点窗口：安装归因为first-touch，转化归因为last-touch或自定义多触点权重；明确冷启动窗口（如安装后7天）。
冲突解决：优先级规则（付费渠道 > 自然流量），并保留原始触点列表以供复核。

异常与机器人流量过滤

规则示例：
- 单设备极端事件速率（如>20 events/sec持续>60秒）。
- 不可能序列（register前出现login/purchase）。
- 大量设备共享同一IP/UA组合且地理位置高速切换。
产出可解释标记：is_suspected_bot、bot_reason；默认不过滤，按分析需求可软过滤或硬过滤。

缺失值与异常值处理

时间/标识缺失：无有效时间或无user_id且无device_id的记录进入隔离区（quarantine）待回填或剔除。
数值参数：使用合理边界裁剪（如金额[0, 999999]），或基于分位数进行winsorize。
类别参数：罕见类别合并为other，避免高基数稀疏度。

特征工程准备（用于建模/分析）

用户粒度日特征：DAU标记、当日事件数/会话数、停留时长、近n日活跃天数（7/14/30）、最近一次事件到现在的recency。
设备特征：OS主版本、机型档位（高/中/低）、网络类型、是否新设备/新安装。
事件频率编码与目标泄露防控：仅使用T日前可得信息构造特征，严格时间切分。

数据一致性与质量校验

宏观校验：分日事件总量、DAU、安装量与监控平台对账（允许迟到窗口偏差）。
约束校验：主键唯一性、非空比例、取值域、画像关联完整性（联接命中率）。
分布漂移监控：事件名分布、关键参数分布的PSI/JS散度；异常报警与回滚策略。
漏斗不变式：进入下一步事件数不应超过上一步（考虑并行路径例外）。

隐私与合规

PII处理：对手机号、邮箱、IP等做哈希/脱敏，避免落地明文；最小化采集原则。
合规标记：consent_status、tracking_authorization；拒绝追踪用户不做跨域合并。
数据保留策略：原始日志与可识别ID设置TTL（如180/365天），到期脱敏或删除。

存储建模与性能优化

分层设计：
- DLS/Staging：原始落地。
- DWD/Clean：清洗后的明细事件与画像快照。
- DWM/ADS：会话表、用户日表、漏斗与归因宽表。
分区与排序：按event_date分区；对user_id、event_time排序/聚簇，加速按用户/时间的查询。
压缩与类型优化：字典编码高基数字段；JSON列使用列存+压缩；必要索引/二级分桶。

输出数据集定义（核心产物）

fact_event（干净事件明细）：含标准时间、去重、规范化字段与扁平化参数。
dim_user_scd（画像SCD2）：用户画像快照，支持按事件时间点关联。
fact_session（会话明细）：会话级指标与来源信息。
user_daily_features（用户-日特征）：用于留存、转化、LTV等建模。
事件词典与参数字典：版本化管理，供埋点与分析一致对齐。

作业编排与监控

任务链路：staging -> clean -> model -> marts 全链路DAG，支持增量与回填。
SLA与告警：延迟、行数对账、质量规则失败触发告警；自动隔离异常批次。
架构演进：schema变更检测与向前兼容策略（新增列默认空值、版本标记）。

验收标准（示例）

fact_event主键唯一性冲突率<0.01%；关键字段非空率>99%。
迟到数据在N日内自动补齐，补写成功率>99.5%。
事件词典覆盖率>95%，未知事件在7日内完成定义或下线。
会话切割一致性：基于同一规则多次重放结果稳定一致（误差<0.1%）。

以上步骤可作为实现清洗与建模流水线的操作手册。根据你的具体日志结构（字段名、参数JSON示例）可进一步细化字段映射、质量规则阈值与表结构DDL。需要时我可提供样例表结构与SQL/ETL伪代码。

以下为“多门店营销与销售日报汇总”数据集的预处理步骤，重点覆盖统一口径、去重与缺失填补，并给出可落地的规则与顺序。

一、目标与粒度定义

目标产出：按门店-日期为最小必备粒度的“营销与销售日报”事实表，可选分维度扩展至门店-日期-渠道、门店-日期-品类/商品。
主键建议：
- 销售事实：business_date, store_id[, sku_id 或 category_id]
- 营销事实：business_date, store_id, channel[, campaign_id]
关键约束：主键在清洗后唯一；指标为净额或毛额需事先约定并全局一致。

二、数据接入与基础标准化

编码与格式：统一为UTF-8；日期统一为ISO-8601（YYYY-MM-DD）；时区统一（例如 Asia/Shanghai）。
数据类型：对金额统一Decimal(18,2)，计数为整数型；布尔字段0/1统一；分类字段转为标准字典值。
货币与税：统一币种（CNY）；明确销售额与投放花费是否含税，必要时按税率转换到统一口径。
单位与精度：曝光为次、点击为次、花费为元；数量单位与换算率（箱/件）统一。

三、口径统一（指标定义与计算规则）为避免混口径，先定义统一的业务指标与计算优先级，所有来源对齐到以下口径再汇总：

销售口径：
- 订单数：支付成功订单数（去除取消/未支付）。
- 毛销售额（GMV）：成交价×数量，含/不含税需统一。
- 折扣额：活动立减+优惠券抵扣+会员折扣。
- 退款额：按“退款完成日”或“原订单日”归属，需一致（推荐按退款完成日计入净额）。
- 净销售额（Net Sales）= GMV − 折扣额 − 退款额。
- 件数/销量：出库量或支付量，须统一。
营销口径：
- 曝光/点击：去广告欺诈后的有效量（如可得）。
- 花费：平台花费+服务费-平台返利（含税/未税统一）。
- 转化：按选定归因窗口与模型（推荐店铺/本地线上以last-click同日或次日内进行）落入门店与日期。
- 新老客界定：基于统一用户主键与回溯窗口。
归因与映射：
- 渠道统一字典（如 SEM、信息流、联盟、线下DM、私域）。
- 跨来源维度映射（utm_*、campaign_name、广告位ID）到标准channel/campaign_id。
- 线上到门店映射：通过门店地理围栏、履约门店、活动门店表或POS绑定规则。

四、主数据与映射管理

门店主数据：store_id、名称标准化、营业状态（在营/停业/新开）、区域层级。
商品/品类主数据：sku_id->category_id->brand，名称、条码去噪与同义合并。
渠道/活动字典：channel_id、campaign_id标准化，活动生命周期与门店适用范围。
版本化映射：保存历史变更（门店更名/合并、SKU替代），便于回溯。

五、去重策略

完全重复记录：对原始行做行级哈希（全字段或业务字段）去重。
业务键去重：
- 销售：同一（business_date, store_id, sku_id）出现多条记录时，选“数据质量优先级最高来源”或“最后更新时间最新”的记录。其余记录标记并保留审计表。
- 营销：同一（business_date, store_id, channel[, campaign_id]）聚合为唯一记录，对数值型指标取求和（曝光/点击/花费），对单值属性取来源优先或非空优先。
近似重复与命名差异：
- campaign_name、store_name 通过规范化（去空格、大小写、全半角、常见前后缀）+ 模糊匹配（如编辑距离）对齐至标准ID；设定阈值与人工白名单校正。
冲突解决优先级：
1. 明确数据契约来源 > 2) 时间更近 > 3) 与主数据一致性更高 > 4) 值域校验通过。

六、缺失值处理策略

原则：先补维度键，再补度量值；基础度量补齐后，派生指标统一重算（如CTR、CVR、AOV、ROAS）。
维度字段：
- store_id、business_date缺失：不可填补，行剔除或返回数据源修复。
- channel/campaign缺失：若可由映射表推断则填补，否则填为“UNKNOWN”并标记数据质量告警。
- sku_id缺失：若条码/名称可唯一映射则回填，否则聚合到“UNASSIGNED_SKU”且不进入SKU层报表。
数值字段（依据业务含义区分0与缺失）：
- 明确“无事件即为0”的计数型（曝光、点击、订单数、件数）：若该粒度行存在且其他字段有效，缺失填0。
- 金额类（花费、销售额、退款额、折扣额）：
  - 若该粒度行存在且可业务推断为无（如无投放日），填0。
  - 否则按分层时序插补：
    1. 小间隙（<=3日）：门店×同粒度线性插值或同星期几中位数插补。
    2. 较大间隙：门店×品类（或渠道）近4-8周移动中位数；无则区域×品类/渠道中位数；最终全局中位数。
- 转化数/到店数等依赖上游事件的指标：优先不直接插补，按归因重算。确需插补时保持逻辑约束（转化数 ≤ 点击数）。
比率/派生指标：
- CTR=点击/曝光、CVR=订单/点击、AOV=净销售额/订单、ROAS=净销售额/花费等，一律由底层原子指标重算，不直接填补。
标记与可追溯：
- 为每个可插补字段增加 is_imputed 与 impute_method，记录层级与方法；对大额插补设置阈值告警（如超过门店近4周P95）。

七、数据一致性与质量校验

结构与唯一性：主键唯一；数据类型与范围检查（非负、金额精度、日期连续性）。
业务约束：
- 点击≥转化；GMV≥净销售额；净销售额=GMV−折扣−退款；花费≥0。
- 门店停业期间指标应为0或缺失且被合理处理。
- 汇总对账：门店/日期汇总与来源系统（POS、投放平台）日总差异在容忍阈值内（如≤1%或≤50元），超限报警。
交叉校验：
- ROI、ROAS极值与异常跳变检测（IQR或季节性残差法）；库存/销量约束（若可得）。

八、合并与产出

事实表建议：
- 销售日报：business_date, store_id[, category_id/sku_id], orders, units, gmv, discount, refund, net_sales。
- 营销日报：business_date, store_id, channel[, campaign_id], impressions, clicks, spend, conversions。
- 宽表（可选）：按business_date, store_id[, channel]对齐后左连接，确保营销为空的门店/日也存在销售数据，反之亦然。
元数据与字典：输出数据字典、口径说明、归因与税口径声明、主数据版本号。
审计表：记录去重数量、插补比例、对账差异、异常清单。

九、流程与落地建议

增量与幂等：按business_date分区，支持回填与重跑；每步生成中间层（raw→standardized→conformed→curated）。
源优先级与回写：建立source_priority配置；出现冲突或高缺失时回写工单至数据源。
自动化监控：数据延迟、主键唯一性、缺失率、异常跳变、对账差异的自动告警。

以上流程确保在多来源、多门店的场景下，实现口径统一、可靠去重与可解释的缺失填补，支撑后续建模与可视化的稳定性与一致性。

Preprocessing plan for a cross-project dataset with mixed text and numeric features, inconsistent encodings, and heterogeneous missing-value rules

Scope and objectives

Harmonize schemas, encodings, and missing-value semantics across projects.
Produce clean, typed, and comparable features for modeling while preventing leakage and preserving cross-project generalization.

Pipeline steps

Data inventory and schema harmonization

Compile a data dictionary per project: column names, types, units, allowed ranges, label definitions, missing-value codes, and known sentinels (e.g., -999, 9999, “N/A”, “Unknown”).
Create a canonical schema with:
- Standardized column names and data types.
- Crosswalk tables mapping project-specific categories/labels to canonical codes.
- Explicit unit definitions for numeric fields.
Add source metadata columns (e.g., project_id, source_version, ingestion_timestamp).

File and character encoding standardization

Detect file encodings via heuristic tools (e.g., uchardet/chardet) and fallback logic.
Convert all text to UTF-8 with strict error handling:
- Replace invalid byte sequences with U+FFFD and log occurrences.
- Normalize Unicode to NFC or NFKC consistently.
Normalize line endings and ensure consistent delimiter, quote, and escape rules for structured files.

Locale and format normalization

Dates/times: parse with project-specific formats; standardize to ISO-8601; convert time zones to UTC; record original timezone if needed.
Numbers: normalize decimal and thousands separators; enforce dot-decimal; strip locale artifacts (e.g., non-breaking spaces).
Units: convert to canonical units (e.g., kg, m/s, °C); document conversion rules and edge cases.
Booleans: unify to {0,1} or {False,True}; standardize truthy/falsey strings.

Column alignment and feature presence

Align to the canonical schema:
- For columns missing in some projects, create empty columns with NaN (or explicit missing-category token for categoricals).
- Resolve name collisions or synonyms via a mapping table.
Drop columns deemed non-informative or leakage-prone (e.g., direct label proxies, unique IDs used as features).
Handle duplicate columns (same semantics, different names) by merging with precedence rules.

Missingness harmonization (core for inconsistent rules)

Build a missingness lexicon per column and project:
- String tokens: “NA”, “N/A”, “—”, “?”, “unknown”, “none”, “null”.
- Numeric sentinels: -1, -999, 999, 9999, 1e9, 0 for non-zero-only measures, out-of-range flags.
- Domain-based invalids: negative durations, impossible dates, zero divisions.
Convert all such values to a single NaN representation; preserve a “missing_reason” code if useful.
Create missingness indicators:
- One binary flag per feature (is__missing).
- Optional reason-specific flags if they carry signal (e.g., system_not_collected vs user_refused).
Quantify per-record missingness to support downstream filtering or modeling strategies.

Type inference and casting

Infer and enforce types: numeric (int/float), categorical, text, datetime, boolean, ordinal.
Detect numeric-coded categoricals (e.g., 1=Red, 2=Blue) and cast to categorical with proper labels.
Ensure consistent types across projects for the same field.

De-duplication and record linkage

Remove exact duplicates (all fields identical) per project, then cross-project (if meaningful).
Identify probable duplicates via:
- Key-based (stable IDs) when available.
- Fuzzy matching for text fields (e.g., MinHash/LSH or cosine for embeddings).
Keep a linkage table for traceability; choose deduplication priority rules.

Outlier and invalid value handling

Define column-wise validation rules from domain knowledge (ranges, regexes, business constraints).
Flag violations as invalid; convert to NaN if irrecoverable.
For continuous features, detect outliers using robust methods (IQR/winsorization or robust z-scores using MAD) computed on the training split only.
Decide on clipping, winsorization, or leaving as-is depending on model sensitivity.

Text preprocessing (multi-project, potentially multi-lingual)

Language detection per text field; if multi-lingual, consider:
- Language-specific pipelines, or
- A language-agnostic representation (e.g., multilingual embeddings).
Normalize text:
- Unicode normalization already enforced.
- Lowercasing (unless casing is informative).
- Remove/control punctuation, excessive whitespace, control characters.
- Normalize URLs, emails, mentions, hashtags to placeholders if not informative.
- Handle emojis/symbols based on task needs.
Tokenization:
- Wordpiece/BPE for transformer models, or
- Standard tokenizers for classical models.
Optional normalization:
- Lemmatization/stemming (if using sparse models).
- Stopword handling by language.
Vectorization choices (decide once based on downstream model):
- Sparse: TF-IDF with n-grams; limit vocabulary; handle OOV.
- Dense: pretrained embeddings (e.g., multilingual models) with pooling; store embedding version.
Truncation strategy for long texts; log truncation rates.

Numeric and continuous feature processing

Unit-harmonized values already ensured.
Transformations as needed:
- Log1p for heavy-tailed distributions.
- Box-Cox/Yeo-Johnson if appropriate.
Scaling:
- Use robust scaler (median/IQR) or standard scaler.
- Compute statistics on training data only.
- Consider project-aware scaling if strong covariate shift exists (fit per project, then standardize to global space or use project residualization).
Handle zero-inflation explicitly if present (e.g., hurdle features).

Categorical feature processing

Harmonize category labels across projects via crosswalk tables.
Treat rare categories: collapse into “Other” using a frequency threshold per training data.
Encoding:
- One-hot for low-cardinality features.
- Target encoding for high-cardinality with K-fold, noise, and project-aware folds to prevent leakage.
- Hashing trick as an alternative when cardinality is extreme.
Reserve an “Unknown” category for unseen levels at inference.

Temporal features

From standardized datetimes, derive consistent time-based features (day-of-week, month, hour, season).
For time series, handle resampling/alignment; ensure no forward-looking leakage.

Cross-project distribution shift mitigation (optional but recommended)

Add project_id as a feature only if the downstream objective benefits and it does not encode leakage.
Reweight samples to align marginal distributions if class or feature imbalance is severe.
Consider per-project normalization or domain adaptation preprocessing (e.g., CORAL) where justified.

Train/validation/test splitting and leakage control

Split by project or by time to reflect deployment scenario:
- Cross-project generalization: leave-one-project-out or group-wise splits by project_id.
- Temporal generalization: chronological splits.
Fit all imputers, encoders, scalers, and vectorizers on training data only; apply to val/test.
Ensure any target-based encodings use only in-fold target information.

Imputation strategy (after harmonized missingness)

Numeric:
- Simple: median or constant with missingness indicator.
- Advanced: KNN imputer or model-based imputation; ensure scalability and leakage control.
Categorical:
- Mode or “Unknown”.
Text:
- Empty string or a special token [MISSING_TEXT].
Evaluate imputation impact per feature; prefer simple, robust methods unless gains justify complexity.

Feature selection and redundancy checks

Remove features with:
- Near-zero variance globally.
- Excessive missingness beyond a threshold.
- High multicollinearity (e.g., VIF or correlation pruning).
Preserve interpretability by documenting dropped features.

Quality assurance and validation

Automated checks:
- Type and range assertions post-transform.
- Missingness rates before/after.
- Category coverage and unseen-level handling.
- Text length and tokenization diagnostics.
Drift checks across projects/splits (e.g., PSI/KL) for key features.

Reproducibility and artifact management

Implement the pipeline as a single, versioned transformer (e.g., scikit-learn Pipeline/ColumnTransformer with custom steps).
Version control for:
- Data dictionaries, crosswalks, missingness lexicons, unit conversions, vectorizer vocabularies/embeddings.
Log configuration and random seeds; fix library versions; store training statistics for all preprocessors.

Deliverables

Canonical schema and crosswalk tables.
Missingness lexicon and imputation plan per feature.
Encoding and unit normalization rules.
A versioned preprocessing pipeline artifact with train-only fitted statistics.
QA report with pre/post distributions, missingness, and drift summaries.

解决的问题

把零散复杂的数据清洗工作，转化为“可直接执行”的步骤清单。该提示词可根据你提供的数据集场景，快速生成标准化、可复用的预处理方案，覆盖缺失值、字段规范、异常与重复检测、编码与归一、样本切分、特征工程与验证等关键环节。目标是减少返工与试错、缩短分析启动时间、提升模型表现与结论可信度，并支持指定输出语言与风格，让跨团队沟通更顺畅。

适用用户

数据分析师

快速为新数据集生成清洗与标准化方案，缩短建模准备时间，提升报表与模型稳定性。

商业运营经理

将复杂数据处理转为可执行清单，指导团队统一指标口径，减少错报漏报，提升决策可信度。

数据科学团队负责人

制定跨项目预处理规范与验收标准，一键下发检查表，保障协作一致性与交付质量。

特征总结

• 按数据集特征生成定制化预处理清单，一步到位明确清洗、转换与校验流程

• 自动识别常见数据问题并给出可执行方案，如缺失值、异常点与字段不一致

• 一键生成可复用的步骤说明与执行顺序，便于团队协作与跨项目标准化落地

• 支持多语言输出与不同写作风格，轻松适配报告、文档或培训材料的使用场景

• 结合业务目标给出数据筛选与分组建议，帮助指标口径统一，减少分析偏差

• 自动生成可视化前的数据整理建议，明确字段类型转换与分箱、标准化步骤

• 根据任务场景生成不同深度版本：速览清单、详细指南、执行检查表自由切换

• 提供数据质量评估要点与验收标准，确保预处理后数据可追溯、可对比、可复现

• 场景化提示参数，用户只需填写数据集简述，即可生成贴合业务的处理方案

• 严谨的表达与结构化输出，帮助新人快速上手，也让专家复核更高效更安心

如何使用购买的提示词模板

1. 直接在外部 Chat 应用中使用

将模板生成的提示词复制粘贴到您常用的 Chat 应用（如 ChatGPT、Claude 等），即可直接对话使用，无需额外开发。适合个人快速体验和轻量使用场景。

2. 发布为 API 接口调用

把提示词模板转化为 API，您的程序可任意修改模板参数，通过接口直接调用，轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。

3. 在 MCP Client 中配置使用

在 MCP client 中配置对应的 server 地址，让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作，让提示词在不同 AI 工具间无缝衔接。

AI 提示词价格

￥10.00元

先用后买，用好了再付款，超安全！

在线免费用提示词

您购买后可以获得什么

✓

获得完整提示词模板

- 共 247 tokens

- 2 个可调节参数

{ 简要描述数据集 } { 输出语言 }

✓

获得社区贡献内容的使用权

- 精选社区优质案例，助您快速上手提示词

购买

制定数据预处理步骤清单

解决的问题