热门角色不仅是灵感来源,更是你的效率助手。通过精挑细选的角色提示词,你可以快速生成高质量内容、提升创作灵感,并找到最契合你需求的解决方案。让创作更轻松,让价值更直接!
我们根据不同用户需求,持续更新角色库,让你总能找到合适的灵感入口。
专为商业数据场景设计,智能识别数据异常点并提供科学处理方案。覆盖异常检测、业务影响分析、处理策略和质量监控,输出结构化报告并支持Python自动化操作,提升数据质量与分析可靠性。
异常值检测报告
数据集概述
检测方法(均为统计学验证方法)
异常值统计(按指标)
异常值分析
分布特征与偏离程度(相对全样本稳健中位数)
聚集性分析
业务影响评估
处理方案建议
处理方法推荐(遵循“不随意删除、基于业务逻辑”的原则)
实施步骤(首选与备选)
处理优先级与验证机制
质量保证
Python 自动化示例(可直接运行) 说明:使用 IQR 和 MAD(回退)检测异常;标注日级事件;计算损失/增益;输出可视化用的标志位与稳健口径。仅依赖 pandas/numpy。
import pandas as pd
import numpy as np
from io import StringIO
csv = """date,channel,sessions,orders,sales,customers,refund_rate,conversion_rate,avg_order_value,gross_margin_rate
2025-06-01,APP,18000,540,324000,520,0.03,0.03,600,0.42
2025-06-02,APP,17500,525,315000,508,0.028,0.03,600,0.41
2025-06-03,APP,18200,560,333200,542,0.027,0.0308,595,0.43
2025-06-04,APP,17000,340,170000,330,0.025,0.02,500,0.39
2025-06-05,APP,19000,610,366000,586,0.029,0.0321,600,0.44
2025-06-06,APP,21000,320,160000,305,0.03,0.0152,500,0.38
2025-06-07,APP,19500,585,351300,560,0.031,0.03,600,0.42
2025-06-08,APP,22000,990,643500,960,0.026,0.045,650,0.46
2025-06-09,APP,20500,615,369000,598,0.028,0.03,600,0.43
2025-06-10,APP,20000,600,360000,580,0.12,0.03,600,0.41
2025-06-11,APP,19800,594,356400,575,0.029,0.03,600,0.42
2025-06-12,APP,20200,606,363600,590,0.03,0.03,600,0.42
"""
df = pd.read_csv(StringIO(csv), parse_dates=['date'])
# 统一的IQR阈值函数
def iqr_bounds(x, k=1.5):
q1, q3 = np.percentile(x, [25, 75])
iqr = q3 - q1
return q1 - k * iqr, q3 + k * iqr
# Robust Z(MAD),MAD=0时返回NaN以便回退IQR
def robust_z(x):
med = np.median(x)
mad = np.median(np.abs(x - med))
if mad == 0:
return np.full_like(x, np.nan, dtype=float)
return 0.6745 * (x - med) / mad
metrics = ["sessions","orders","sales","customers","refund_rate","conversion_rate","avg_order_value","gross_margin_rate"]
# 计算各指标IQR阈值与越界
flags = {}
bounds = {}
for m in metrics:
lb, ub = iqr_bounds(df[m].values)
bounds[m] = (lb, ub)
flags[f"{m}_iqr_out"] = ((df[m] < lb) | (df[m] > ub))
# 计算Robust Z并打标(仅用于MAD>0的情况)
for m in metrics:
rz = robust_z(df[m].values)
df[f"{m}_robust_z"] = rz
flags[f"{m}_rz_out"] = np.where(np.isnan(rz), False, np.abs(rz) > 3.5)
# 组合异常标记:IQR或RobustZ任一触发
for m in metrics:
df[f"{m}_is_outlier"] = flags[f"{m}_iqr_out"] | flags[f"{m}_rz_out"]
# 一致性校验(数据质量)
df["check_orders"] = np.isclose(df["orders"], df["sessions"] * df["conversion_rate"], rtol=0.01)
df["check_sales"] = np.isclose(df["sales"], df["orders"] * df["avg_order_value"], rtol=0.005)
# 稳健基线(全样本中位数)
med = df[metrics].median(numeric_only=True)
cr_med = med["conversion_rate"]
aov_med = med["avg_order_value"]
gmr_med = med["gross_margin_rate"]
# 期望值与损失/增益估计(不覆盖原值,仅用于分析)
df["expected_orders"] = (df["sessions"] * cr_med).round(0)
df["expected_sales"] = df["expected_orders"] * aov_med
df["delta_orders"] = df["orders"] - df["expected_orders"]
df["delta_sales"] = df["sales"] - df["expected_sales"]
# 退款双口径演示(此处仅保留当日口径;订单日口径需接入订单创建日维表)
df["refund_amount_approval"] = df["sales"] * df["refund_rate"]
df["refund_amount_expected"] = df["sales"] * med["refund_rate"]
df["refund_amount_delta"] = df["refund_amount_approval"] - df["refund_amount_expected"]
df["gross_margin_loss_from_refund"] = df["refund_amount_delta"] * df["gross_margin_rate"]
# 日级事件标签:同日 ≥3 个转化链路指标越界(orders/sales/customers/CR/AOV)
chain = ["orders_is_outlier","sales_is_outlier","customers_is_outlier","conversion_rate_is_outlier","avg_order_value_is_outlier"]
df["chain_outlier_cnt"] = df[chain].sum(axis=1)
def classify(row):
if row["chain_outlier_cnt"] >= 3:
# 判断方向
signs = []
for m in ["orders","sales","customers","conversion_rate","avg_order_value"]:
if row[f"{m}_is_outlier"]:
signs.append(np.sign(row[m] - med[m]))
direction = np.sign(np.nansum(signs)) # >0 高峰;<0 低谷
return "大促峰值" if direction > 0 else "转化骤降"
if row["refund_rate_is_outlier"]:
return "退货异常"
return "正常"
df["event_tag"] = df.apply(classify, axis=1)
# 输出关键汇总
outlier_summary = {
m: int(df[f"{m}_is_outlier"].sum()) for m in metrics
}
day_events = df[["date","event_tag","chain_outlier_cnt","delta_orders","delta_sales","refund_amount_delta","gross_margin_loss_from_refund"]]
print("指标异常计数:", outlier_summary)
print("\n日级事件:\n", day_events)
# 可选:稳健口径(用于图表平滑,不替代原值)
def winsorize_series(s, k=1.5):
lb, ub = iqr_bounds(s.values, k)
return s.clip(lower=lb, upper=ub)
for m in ["sales","orders","conversion_rate","avg_order_value","refund_rate","gross_margin_rate"]:
df[f"{m}_robust"] = winsorize_series(df[m])
# 导出结果
# df 包含:各指标越界标记、RobustZ、期望值、损失估计、事件标签、稳健口径字段
如何在本数据上应用处理策略
小结
数据集概述
检测方法与原理
异常值统计(按指标)
聚集性:异常高度集中在 2024-12 与 2025-02 两个月(ARPU 大幅下调 + 流失与复购断崖 + 毛利率下探)。
import io
import pandas as pd
import numpy as np
from datetime import datetime
csv = """month,plan,active_customers,new_orders,churn_customers,mrr,arpu,gross_margin_rate,repurchase_rate
2024-07,Pro,820,60,40,656000,800,0.78,0.87
2024-08,Pro,840,70,45,672000,800,0.78,0.88
2024-09,Pro,860,72,42,688800,801,0.79,0.89
2024-10,Pro,880,75,50,704000,800,0.79,0.88
2024-11,Pro,905,95,40,724000,800,0.80,0.90
2024-12,Pro,890,55,70,667500,750,0.76,0.85
2025-01,Pro,900,80,50,720000,800,0.79,0.88
2025-02,Pro,830,40,120,664000,800,0.75,0.78
2025-03,Pro,845,65,35,676000,800,0.78,0.89
2025-04,Pro,860,70,40,688000,800,0.79,0.90
2025-05,Pro,875,72,45,700000,800,0.80,0.89
2025-06,Pro,890,80,40,712000,800,0.81,0.90
"""
df = pd.read_csv(io.StringIO(csv), parse_dates=['month'])
df = df.sort_values('month').reset_index(drop=True)
# 1) 计算衍生与一致性校验字段
df['arpu_calc'] = df['mrr'] / df['active_customers']
df['mrr_consistency_err'] = (df['mrr'] - df['active_customers'] * df['arpu']).abs() / df['mrr']
# 活跃滚动平衡残差(从第二行开始)
residual = [np.nan]
for i in range(1, len(df)):
exp = df.loc[i-1, 'active_customers'] + df.loc[i, 'new_orders'] - df.loc[i, 'churn_customers']
residual.append(df.loc[i, 'active_customers'] - exp)
df['active_residual'] = residual
# 2) 稳健异常检测工具函数
def mad_z(x):
med = np.nanmedian(x)
mad = np.nanmedian(np.abs(x - med))
if mad == 0:
return pd.Series([0]*len(x), index=x.index), med, mad
z = 0.6745 * (x - med) / mad
return z, med, mad
def iqr_flag(x):
q1, q3 = np.nanpercentile(x, [25, 75])
iqr = q3 - q1
lb, ub = q1 - 1.5*iqr, q3 + 1.5*iqr
return lb, ub
anomalies = []
# churn_customers: MAD + IQR
z_churn, med_churn, mad_churn = mad_z(df['churn_customers'])
lb_c, ub_c = iqr_flag(df['churn_customers'])
for i, v in enumerate(df['churn_customers']):
methods = []
if abs(z_churn.iloc[i]) > 3.5: methods.append(f"MAD|Z|={z_churn.iloc[i]:.2f}")
if (v < lb_c) or (v > ub_c): methods.append(f"IQR[{lb_c:.1f},{ub_c:.1f}]")
if methods:
anomalies.append(dict(month=df['month'][i], metric='churn_customers', value=v, method=' & '.join(methods)))
# new_orders: MAD
z_new, med_new, mad_new = mad_z(df['new_orders'])
for i, v in enumerate(df['new_orders']):
if abs(z_new.iloc[i]) > 3.5:
anomalies.append(dict(month=df['month'][i], metric='new_orders', value=v, method=f"MAD|Z|={z_new.iloc[i]:.2f}"))
# 标注显著低(参考阈值 |Z|>2.5)
for i, v in enumerate(df['new_orders']):
if abs(z_new.iloc[i]) > 2.5 and abs(z_new.iloc[i]) <= 3.5:
anomalies.append(dict(month=df['month'][i], metric='new_orders_low', value=v, method=f"MAD|Z|={z_new.iloc[i]:.2f} (borderline)"))
# repurchase_rate: MAD
z_rep, med_rep, mad_rep = mad_z(df['repurchase_rate'])
for i, v in enumerate(df['repurchase_rate']):
if abs(z_rep.iloc[i]) > 3.5:
anomalies.append(dict(month=df['month'][i], metric='repurchase_rate', value=v, method=f"MAD|Z|={z_rep.iloc[i]:.2f}"))
# gross_margin_rate: MAD + IQR
z_gm, med_gm, mad_gm = mad_z(df['gross_margin_rate'])
lb_gm, ub_gm = iqr_flag(df['gross_margin_rate'])
for i, v in enumerate(df['gross_margin_rate']):
methods = []
if abs(z_gm.iloc[i]) > 3.5: methods.append(f"MAD|Z|={z_gm.iloc[i]:.2f}")
if (v < lb_gm) or (v > ub_gm): methods.append(f"IQR[{lb_gm:.3f},{ub_gm:.3f}]")
if methods:
anomalies.append(dict(month=df['month'][i], metric='gross_margin_rate', value=v, method=' & '.join(methods)))
# 边界低(Z<-2.0)
for i, v in enumerate(df['gross_margin_rate']):
if z_gm.iloc[i] <= -2.0 and abs(z_gm.iloc[i]) <= 3.5:
anomalies.append(dict(month=df['month'][i], metric='gross_margin_rate_low', value=v, method=f"MAD Z={z_gm.iloc[i]:.2f} (borderline)"))
# ARPU 结构性偏移:相对 rolling median 偏离>5%
roll_med_arpu = df['arpu'].rolling(5, center=True, min_periods=3).median()
rel_dev = (df['arpu'] - roll_med_arpu).abs() / roll_med_arpu
for i, v in enumerate(df['arpu']):
if pd.notna(rel_dev.iloc[i]) and rel_dev.iloc[i] > 0.05:
anomalies.append(dict(month=df['month'][i], metric='arpu', value=v, method=f"level_shift {rel_dev.iloc[i]:.2%}"))
# 一致性告警:MRR 与 Active×ARPU
for i, err in enumerate(df['mrr_consistency_err']):
if err > 0.001: # >0.1%
anomalies.append(dict(month=df['month'][i], metric='mrr_consistency', value=float(err), method="mrr != active*arpu"))
# 活跃滚动平衡统计(符号检验近似)
res = df['active_residual'].dropna()
neg_share = (res < 0).mean()
sign_bias = "negative-bias" if neg_share > 0.7 else "no-strong-bias"
anomalies_df = pd.DataFrame(anomalies).sort_values(['metric','month'])
print("Anomalies:")
print(anomalies_df.to_string(index=False))
# 3) 计算扩展业务指标
# 上月活跃(用于 churn rate)
prev_active = df['active_customers'].shift(1)
df['churn_rate'] = df['churn_customers'] / prev_active
df['net_adds'] = df['new_orders'] - df['churn_customers']
df['active_change'] = df['active_customers'].diff()
df['gross_margin_amt'] = df['mrr'] * df['gross_margin_rate']
# 4) 修正建议产物:保留 arpu_calc、other_movements(=active_residual)
df['other_movements'] = df['active_residual']
# 5) 基础校验(示例)
print("\nConsistency pass rate (MRR==Active*ARPU within 0.1%):",
(df['mrr_consistency_err'] <= 0.001).mean())
print("\nActive roll-forward residual summary:")
print(df[['month','active_residual']])
# 6) 输出用于策略的事件标注
df['event_arpu_drop'] = (df['arpu'] < df['arpu'].median()*0.975).astype(int)
df['event_feb_spike'] = ((df['month'].dt.month == 2) & (df['churn_customers'] >= df['churn_customers'].median()+3*mad_churn)).astype(int)
print("\nEvent flags:")
print(df[['month','event_arpu_drop','event_feb_spike']])
输出要点(运行结果解读)
验证机制
结论
异常值检测报告
数据集概述
检测方法(均为统计验证方法)
基线与统计量(基于稳态中位数)
异常值统计(按严重等级)
异常值分析
异常特征分析
业务影响评估(定量)
基线选取:09-11时段稳定,p0_ctr=0.05,p0_cvr=0.05(接近整体中位),AOV=600
13:00(impr=53,000)
14:00(impr=52,000)
16:00(orders=110)
业务含义
处理方案建议
处理方法推荐(遵循“重要业务异常保留原则”)
实施步骤与参数(在线监控模板)
参数建议
处理优先级和验证机制
质量保证
Python自动化示例(可直接运行) 说明:示例基于提供CSV字符串,计算鲁棒Z、二项检验、量化损失并输出标记。用于在线流亦可将p0以滚动中位数更新并持久化。
import io
import numpy as np
import pandas as pd
from scipy.stats import binomtest, norm
csv = """timestamp,campaign,impressions,clicks,orders,sales,ad_spend,ctr,cvr,refund_rate
2025-06-15 09:00,A,50000,2500,125,75000,15000,0.05,0.05,0.03
2025-06-15 10:00,A,52000,2600,130,78000,15500,0.05,0.05,0.03
2025-06-15 11:00,A,54000,2700,140,84000,16000,0.05,0.0519,0.03
2025-06-15 12:00,A,56000,2800,120,72000,17000,0.05,0.0429,0.035
2025-06-15 13:00,A,53000,2100,90,54000,16500,0.0396,0.0429,0.032
2025-06-15 14:00,A,52000,80,1,600,14000,0.0015,0.0125,0.02
2025-06-15 15:00,A,54000,2600,120,72000,16000,0.0481,0.0462,0.03
2025-06-15 16:00,A,55000,2500,110,66000,15800,0.0455,0.044,0.18
2025-06-15 17:00,A,57000,2850,140,84000,16200,0.05,0.0491,0.031
2025-06-15 18:00,A,59000,2950,150,90000,16800,0.05,0.0508,0.03
"""
df = pd.read_csv(io.StringIO(csv), parse_dates=['timestamp'])
# 衍生指标
df['aov'] = df['sales'] / df['orders']
df['cpc'] = df['ad_spend'] / df['clicks'].replace(0, np.nan)
df['cpa'] = df['ad_spend'] / df['orders'].replace(0, np.nan)
df['roas'] = df['sales'] / df['ad_spend']
# 基线(稳态中位数,异常未先验剔除的简化版)
p0_ctr = df['ctr'].median() # 0.05
p0_cvr = df['cvr'].median() # ~0.04765
p0_ref = df['refund_rate'].median() # 0.03
aov_baseline = df['aov'].median() # 600
# MAD鲁棒Z
def mad(x):
med = np.median(x)
return np.median(np.abs(x - med))
def robust_z(series):
med = np.median(series)
m = mad(series)
# 避免MAD过小导致假阳:设置数值下限(基于比例的合理最小波动)
eps = max(m, 1e-6)
return 0.6745 * (series - med) / eps
df['z_ctr'] = robust_z(df['ctr'].values)
df['z_cvr'] = robust_z(df['cvr'].values)
df['z_refund'] = robust_z(df['refund_rate'].values)
# 二项检验(单侧)
def binom_p_low(x, n, p0):
# 低尾:P(X<=x)
# 使用正态近似加速可选,这里用精确binomtest
return binomtest(int(x), int(n), p0, alternative='less').pvalue if pd.notna(x) and n>0 else np.nan
def binom_p_high(x, n, p0):
return binomtest(int(x), int(n), p0, alternative='greater').pvalue if pd.notna(x) and n>0 else np.nan
# CTR: 低尾
df['p_ctr'] = df.apply(lambda r: binom_p_low(r['clicks'], r['impressions'], p0_ctr), axis=1)
# CVR: 低尾(以clicks为试验数,订单为成功数)
df['p_cvr'] = df.apply(lambda r: binom_p_low(r['orders'], r['clicks'], p0_cvr) if r['clicks']>=1 else np.nan, axis=1)
# 退款率:高尾(以orders为试验数,退款单数未知,用近似:round(orders*refund_rate))
df['refund_cnt'] = np.round(df['orders'] * df['refund_rate']).astype(int)
df['p_refund'] = df.apply(lambda r: binom_p_high(r['refund_cnt'], r['orders'], p0_ref) if r['orders']>=1 else np.nan, axis=1)
# 损失量化(counterfactual,不改原值)
df['expected_clicks'] = df['impressions'] * p0_ctr
df['expected_orders_from_exp'] = df['expected_clicks'] * p0_cvr
df['loss_orders_total'] = df['expected_orders_from_exp'] - df['orders']
df['loss_sales_total'] = df['loss_orders_total'] * aov_baseline
# 分解:CTR与CVR贡献
df['ctr_gap_clicks'] = df['expected_clicks'] - df['clicks']
df['loss_orders_from_ctr'] = df['ctr_gap_clicks'] * p0_cvr
df['expected_orders_given_clicks'] = df['clicks'] * p0_cvr
df['loss_orders_from_cvr'] = df['expected_orders_given_clicks'] - df['orders']
df['loss_sales_from_ctr'] = df['loss_orders_from_ctr'] * aov_baseline
df['loss_sales_from_cvr'] = df['loss_orders_from_cvr'] * aov_baseline
# 严重性分级
def severity(row):
sev = []
# 基于显著性
if pd.notna(row['p_ctr']) and row['impressions']>=20000:
if row['p_ctr'] < 1e-6 or abs(row['z_ctr'])>=8:
sev.append(('ctr_drop','critical'))
elif row['p_ctr'] < 1e-3 or abs(row['z_ctr'])>=5:
sev.append(('ctr_drop','major'))
if pd.notna(row['p_cvr']) and row['clicks']>=1000:
if row['p_cvr'] < 1e-3 or abs(row['z_cvr'])>=5:
sev.append(('cvr_drop','major'))
if pd.notna(row['p_refund']) and row['orders']>=80:
if row['p_refund'] < 1e-6 or abs(row['z_refund'])>=8:
sev.append(('refund_spike','critical'))
elif row['p_refund'] < 5e-3 or abs(row['z_refund'])>=5:
sev.append(('refund_spike','major'))
if not sev:
return None, None
# 取最高级别
if any(s[1]=='critical' for s in sev):
t = [s[0] for s in sev if s[1]=='critical'][0]
return t, 'critical'
t = sev[0][0]
return t, 'major'
df[['anomaly_type','severity']] = df.apply(severity, axis=1, result_type='expand')
print(df[['timestamp','ctr','cvr','refund_rate','cpc','cpa','roas','p_ctr','p_cvr','p_refund','z_ctr','z_cvr','z_refund','anomaly_type','severity',
'loss_orders_total','loss_sales_total','loss_orders_from_ctr','loss_orders_from_cvr','loss_sales_from_ctr','loss_sales_from_cvr']])
输出解读指引(关键行):
实施与验证(可直接应用)
后续监控与预防
总结
面向数据运营、增长、风控与质控团队,打造一套“开箱即用”的异常值智能检测与处理提示词,让团队在不依赖复杂工具的前提下:1) 快速定位异常与离群点并量化影响;2) 将统计判断与业务逻辑合一,给出可落地的处理建议;3) 形成标准化报告与复用流程,缩短清洗与复核周期;4) 提升数据口径一致性与决策可信度;5) 以更低成本完成从发现问题到解决问题的闭环,驱动数据资产稳定增长与付费转化。
快速扫描大表中的异常点,生成清晰检测报告,制定并验证清洗方案,显著缩短报表上线周期。
识别活动、渠道与留存的异常波动,量化对目标的影响,定位原因并制定应急与优化动作。
发现异常交易与高风险样本,区分录入错误与可疑行为,保留关键证据,降低误判与合规风险。
将模板生成的提示词复制粘贴到您常用的 Chat 应用(如 ChatGPT、Claude 等),即可直接对话使用,无需额外开发。适合个人快速体验和轻量使用场景。
把提示词模板转化为 API,您的程序可任意修改模板参数,通过接口直接调用,轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。
在 MCP client 中配置对应的 server 地址,让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作,让提示词在不同 AI 工具间无缝衔接。
免费获取高级提示词-优惠即将到期