不止热门角色,我们为你扩展了更多细分角色分类,覆盖职场提升、商业增长、内容创作、学习规划等多元场景。精准匹配不同目标,让每一次生成都更有方向、更高命中率。
立即探索更多角色分类,找到属于你的增长加速器。
已了解。我将以数据分析专家的角色协助你完成具体的数据分析任务。为确保分析准确、可复现并与业务目标一致,请先提供任务与数据的关键信息。以下为协作方式与交付流程。
一、请提供的信息
若暂不便分享完整数据,可先提供:
二、标准分析流程(可根据你的目标裁剪)
三、数据提交与样例
四、参考代码骨架(Python/pandas)
示例(精简):
导入与审计 import pandas as pd, numpy as np df = pd.read_parquet("data.parquet")
print(df.info()) desc = df.describe(include="all") null_rate = df.isna().mean().sort_values(ascending=False)
assert df['id'].is_unique
异常值(IQR) def iqr_outlier_mask(s, k=1.5): q1, q3 = s.quantile([0.25, 0.75]) iqr = q3 - q1 lower, upper = q1 - kiqr, q3 + kiqr return (s < lower) | (s > upper)
假设检验(两组均值;正态性不足时建议非参) from scipy import stats g1, g2 = df[df.group=="A"]["metric"], df[df.group=="B"]["metric"] stat, p = stats.ttest_ind(g1, g2, equal_var=False, nan_policy="omit")
可视化(分布与箱线图) import seaborn as sns sns.histplot(df["amount"], bins=50) sns.boxplot(data=df, x="group", y="amount")
五、澄清问题(请尽可能回答)
准备就绪后,请分享数据样例或字段与指标定义。我将据此制定针对性的分析计划与交付清单,并在每一步给出可复现代码与明确结论。
Understood. I will act as a data analysis expert and focus on accurate, concise, and actionable analysis. Please provide the inputs below so I can proceed efficiently.
What I need from you
Proposed workflow
Minimal Python skeleton (ready to adapt once data is provided)
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split, StratifiedKFold, KFold, cross_val_score from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.compose import ColumnTransformer from sklearn.pipeline import Pipeline from sklearn.metrics import roc_auc_score, f1_score, mean_squared_error from sklearn.linear_model import LogisticRegression, Ridge from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
def load_data(path): df = pd.read_csv(path) return df
def summarize(df): out = {} out['shape'] = df.shape out['dtypes'] = df.dtypes.to_dict() out['missing_by_col'] = df.isna().sum().sort_values(ascending=False).to_dict() out['basic_stats'] = df.describe(include='all').to_dict() return out
def flag_outliers_iqr(s, k=1.5): q1, q3 = s.quantile([0.25, 0.75]) iqr = q3 - q1 lower, upper = q1 - kiqr, q3 + kiqr return (s < lower) | (s > upper)
def build_pipeline(df, target, problem_type='classification'): X = df.drop(columns=[target]) y = df[target] num_cols = X.select_dtypes(include=['number']).columns.tolist() cat_cols = X.select_dtypes(exclude=['number']).columns.tolist() preproc = ColumnTransformer( transformers=[ ('num', StandardScaler(with_mean=False), num_cols), ('cat', OneHotEncoder(handle_unknown='ignore', sparse=False), cat_cols) ], remainder='drop' ) if problem_type=='classification': model = LogisticRegression(max_iter=200, n_jobs=None) pipe = Pipeline(steps=[('pre', preproc), ('clf', model)]) else: model = Ridge(alpha=1.0) pipe = Pipeline(steps=[('pre', preproc), ('reg', model)]) return X, y, pipe
def evaluate(pipe, X, y, problem_type='classification'): if problem_type=='classification': cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) scorer = 'roc_auc' else: cv = KFold(n_splits=5, shuffle=True, random_state=42) scorer = 'neg_root_mean_squared_error' scores = cross_val_score(pipe, X, y, cv=cv, scoring=scorer, n_jobs=-1) return scores
from sklearn.model_selection import TimeSeriesSplit import statsmodels.api as sm
def ts_decompose(y, period): return sm.tsa.seasonal_decompose(y, period=period, model='additive', two_sided=False)
def rolling_origin_cv(y, X=None, splits=5): tscv = TimeSeriesSplit(n_splits=splits) for train_idx, test_idx in tscv.split(y): yield train_idx, test_idx
Quality and rigor practices I will follow
Next step
已收到。将以数据分析专家的角色执行任务,并严格遵循准确性与技术性要求。为高效开展分析,请提供以下信息与数据结构化描述。
所需信息与数据格式
标准分析流程(将依据你提供的数据与目标定制)
3)特征工程与建模(如需预测/分类/分群)
异常值识别说明
数据提供建议
下一步
将零散的数值摘要快速转化为“异常值清单+验证建议”,在没有图表或明确阈值的情况下精准定位风险点;帮助运营、产品、风控、财务与数据分析岗位缩短排查耗时、提升数据质量与决策可信度;为每个异常提供可能原因、修正与采样建议、分段对比与后续动作清单,形成可直接复用的结论摘要;支持多语言与多场景复用,覆盖周报、复盘、临时告警、上线前检查等高频场景。