热门角色不仅是灵感来源,更是你的效率助手。通过精挑细选的角色提示词,你可以快速生成高质量内容、提升创作灵感,并找到最契合你需求的解决方案。让创作更轻松,让价值更直接!
我们根据不同用户需求,持续更新角色库,让你总能找到合适的灵感入口。
本提示词专为AI/ML工程师设计,提供针对特定机器学习模型的超参数调优方案。通过系统化的分析框架,能够准确识别模型的关键调优参数,包括学习率、正则化系数、网络结构参数等核心配置。输出内容采用专业的技术文档风格,结构清晰、逻辑严谨,帮助工程师快速掌握模型调优要点,提升模型性能优化效率。适用于各类监督学习、无监督学习和深度学习模型的参数调优场景。
| 参数名称 | 功能描述 | 常见取值范围(CTR场景) | 调优建议 |
|---|---|---|---|
| objective | 损失函数与任务类型 | binary:logistic | CTR概率输出建议使用 binary:logistic;需要原始分数可用 binary:logitraw(离线分析) |
| eval_metric | 评估指标 | auc, aucpr, logloss | AUC稳健、PR-AUC敏感于稀有正例;线上通常关注 logloss 与校准;可同时使用多个指标 |
| booster | 基学习器 | gbtree, dart | 默认 gbtree;过拟合严重且追求更强正则可尝试 dart(dropout版GBDT) |
| tree_method | 构建直方图/加速 | hist, gpu_hist | 大规模/稀疏数据优先 hist;有GPU选 gpu_hist;exact 不推荐 |
| eta (learning_rate) | 学习率 | 0.01–0.2 | 小eta配合更多迭代更稳健:0.03–0.1是常用区间;eta越小越需增大迭代轮次 |
| num_boost_round / n_estimators | 弱学习器数 | 200–3000 | 结合early_stopping;eta=0.05时常在600–2000;先给上限再早停 |
| early_stopping_rounds | 早停轮数 | 50–200 | 验证集稳定后早停;避免过拟合与无效迭代 |
| max_depth | 树最大深度(depthwise) | 3–8 | 深度越大越易过拟合,CTR多取4–7;使用lossguide时优先用max_leaves控制复杂度 |
| grow_policy | 树生长策略 | depthwise, lossguide | lossguide(叶子优先)常对稀疏CTR有效,需设置max_leaves;depthwise收敛更稳 |
| max_leaves | 叶子数上限(lossguide) | 31–255 | 控制模型复杂度;从31/63起步,视过拟合增减;与min_child_weight联调 |
| min_child_weight | 子节点最小二阶导之和 | 1–20 | 提升该值抑制过拟合。二分类中单样本Hessian≤0.25,粗略等价于“至少样本数≥min_child_weight/0.25”,如设5≈至少20样本 |
| gamma (min_split_loss) | 节点分裂所需最小损失下降 | 0–5 | 增大可抑制细碎分裂;从0开始逐步增加到0.1–2以控过拟合 |
| subsample | 样本采样比例(每棵树) | 0.6–1.0 | 0.7–0.9常见;过拟合下降可减少到0.6–0.8,过小会增大方差 |
| colsample_bytree | 特征采样(每棵树) | 0.5–1.0 | 0.6–0.9常见;高维稀疏特征可适当降低以降相关性 |
| colsample_bylevel | 特征采样(每层) | 0.5–1.0 | 次要调节,通常保持为1或略降 |
| colsample_bynode | 特征采样(每次分裂) | 0.5–1.0 | 与bytree/level联动;轻度降低可抗过拟合 |
| reg_lambda | L2正则 | 1–10 | 对CTR常较有效;从1–2起步,过拟合增大到5–10 |
| reg_alpha | L1正则 | 0–5 | 稀疏高维下可促稀疏与稳健;从0–0.5起步,必要时到1–3 |
| scale_pos_weight | 正负样本权重比 | 负例数/正例数 的0.5–2倍区间 | 先按全局比值设置,再小范围搜索;调大有助召回但可能影响校准,需后续概率校准 |
| max_delta_step | 每步最大权重更新 | 0–10 | 不均衡时设1–10可稳定训练;常用1–3 |
| base_score | 初始预测值 | 0.1–0.5 | 近似先验CTR;通常无需调,或设为验证集CTR均值 |
| max_bin | 直方图桶数 | 256–1024 | 大一点提升精度但占用内存/GPU;gpu_hist常用256–512 |
| enable_categorical | 启用原生类别特征 | True/False | 若使用XGBoost原生类别分裂,需hist/gpu_hist且类别编码为整型;高基数特征需评估内存 |
| max_cat_to_onehot | 小基数one-hot阈值 | 4–32 | 小类目one-hot,大类目做最优分割;高基数CTR可适度调高至8–16 |
| interaction_constraints | 限制特征交互 | 列表 | 用以防止不合理交互、控复杂度与泄漏 |
| monotone_constraints | 单调性约束 | 向量 | 若业务已知单调关系,能增强可解释与稳健性 |
| dart.rate_drop | DART丢弃率 | 0.0–0.3 | 仅booster=dart有效;过拟合明显时尝试0.05–0.2 |
| dart.skip_drop | 概率跳过丢弃 | 0–0.9 | 影响波动性;默认0即可,必要时微调 |
| random_state/seed | 随机种子 | 整数 | 保持可复现;多次不同seed取平均更稳 |
| nthread/gpu_id | 计算资源设置 | - | CPU并行/选择GPU,保证资源充足避免卡顿 |
| single_precision_histogram | 直方图单精度 | True/False | 提速省显存,极少数情况下可能有微小数值差异 |
以上方案基于CTR场景常见实践与XGBoost文档要求,可作为系统化调参与上线落地的参考框架。实际取值需结合具体数据规模、稀疏度与业务指标在验证集上迭代验证。
| Parameter | Function | Common Range | Tuning Advice |
|---|---|---|---|
| optimizer (AdamW) | Adam with decoupled weight decay, standard for Transformer fine-tuning | AdamW standard | Use AdamW unless constrained; stable and well-supported |
| learning_rate | Step size for weight updates | 1e-5 to 5e-5 (typical), occasionally 7e-5 with large batches | Start at 2e-5 or 3e-5. Sweep {1e-5, 2e-5, 3e-5, 5e-5}. Scale up slightly with larger effective batch size |
| betas (β1, β2) | Momentum terms for Adam | (0.9, 0.999) | Keep default; only adjust β2 to 0.98 if you observe very slow adaptation on small data |
| adam_epsilon | Numerical stability in Adam | 1e-8 to 1e-6 | For fp16/bf16, 1e-8 or 1e-6 both work; try 1e-8 first |
| weight_decay | L2-like regularization (decoupled) | 0.0 to 0.1 (typical 0.01) | 0.01 is a strong default. Consider 0.02–0.05 if overfitting on very small datasets |
| lr_scheduler_type | Shapes LR over steps | linear, cosine, cosine_with_restarts, constant | linear is a strong default; cosine may help stability for longer runs |
| warmup_ratio / warmup_steps | Gradual LR ramp-up to stabilize early updates | 0.06–0.1 of total steps, or 100–2000 steps depending on run length | Use warmup_ratio 0.06–0.1 for short fine-tunes; tune ±2% around a chosen ratio |
| num_train_epochs | Total passes over training data | 2–5 typical; up to 10 on tiny datasets with early stopping | Start at 3–4; add early stopping to avoid overfitting |
| per_device_train_batch_size | Number of samples per device per step | 8–64 (memory-dependent) | Choose the largest stable batch. If small batches, use gradient accumulation |
| gradient_accumulation_steps | Accumulate grads to simulate larger batch | 1–8 | Adjust to reach effective batch size 32–256 without OOM |
| effective_batch_size | per_device_batch × accumulation × num_devices | 32–256 effective | For LR scaling, roughly linearly scale LR with effective batch size within reasonable bounds |
| max_grad_norm | Gradient clipping to prevent exploding grads | 0.5–1.0 (common 1.0) | Keep at 1.0; lower to 0.5 if you see instability spikes |
| eval_steps / evaluation_strategy | Validation frequency for early stopping | every 100–1000 steps or per epoch | Higher frequency for small datasets to react early to overfitting |
| seed | Controls initialization and data shuffling | integer | Run 3–5 seeds to report mean and std; fix seed for final model |
| mixed_precision | Faster training with reduced precision | fp16 or bf16 (hardware dependent) | Use bf16 if supported; otherwise fp16. Monitor loss scaling if using fp16 |
| gradient_checkpointing | Trades compute for memory | on/off | Enable when memory-limited; expect slower wall-clock |
| max_seq_length | Maximum tokenized length | 128–256 typical; 512 for long reviews | Prefer 128 for short Chinese sentiment tasks to maximize throughput; profile 256 if texts are longer |
| padding/truncation strategy | How sequences are padded/truncated | dynamic padding; truncation=longest_first | Use dynamic padding for efficiency; use longer_max_length only if accuracy requires it |
| attention_dropout (encoder) | Dropout on attention probabilities | 0.1 (BERT default) | Usually keep default; can raise to 0.15 on very small data if reloading with config override |
| hidden_dropout_prob (encoder) | Dropout on hidden states | 0.1 (BERT default) | Keep at 0.1; 0.1–0.2 if small data and you reload with modified config |
| classifier_dropout | Dropout before classification head | 0.1–0.3 | 0.1 default; try 0.2–0.3 on small/noisy datasets |
| label_smoothing_factor | Softens hard labels to reduce overconfidence | 0.0–0.1 | 0.05 is a good starting value for small or noisy data |
| class_weights | Reweights classes for imbalance | proportional to 1/freq | Use when class imbalance harms macro-F1. Validate that loss decreases stably |
| freeze_encoder_layers | Freeze early Transformer layers | 0–10 layers | Consider freezing 6–10 layers on very small data to reduce overfitting and speed training |
| layerwise_lr_decay (LLRD) | Decrease LR for lower layers | 0.8–0.95 decay factor | Start at 0.95; combine with slightly higher top LR (e.g., head ×2 LR) |
| reinit_top_layers | Re-initialize top k Transformer layers | 0–2 | Try 1–2 on small data to reduce catastrophic forgetting; stabilize with warmup |
| pooling_strategy | CLS vs mean pooling | CLS (default), mean | CLS is standard; mean pooling can help on noisy short texts—benchmark both |
| num_labels | Number of sentiment classes | 2 or 3 typically | Ensure consistent loss/metrics; reinitialize classification head when changed |
| early_stopping_patience | Stop if no val improvement | 2–3 eval checks | Use with moving average metrics; monitor macro-F1 if classes are imbalanced |
| dataloader_num_workers | Parallelism for data loading | 2–8 (system-dependent) | Tune for throughput; avoid CPU bottlenecks |
| fp16/bf16 loss scaling | Stabilizes mixed-precision training | dynamic/static | Use dynamic scaling (framework default) to avoid underflow |
This guidance is based on established practice for BERT fine-tuning and should be validated on the target dataset with controlled experiments.
| パラメータ名 | 機能・役割 | 一般的な取値範囲 | 調整の指針 |
|---|---|---|---|
| n_clusters (k) | クラスタ数。バイアス-バリアンスのトレードオフを決定。kを増やすとinertiaは単調減少するが過分割・解釈性低下のリスク。 | EC分群の初期探索では3–12程度、一般には2–20で探索 | ひじ法(inertia vs k)、シルエット(k∈[2,15])、CH/DB指標、ブートストラップ安定性で選定。ビジネス上の最小クラスタ規模・運用可能性も合わせて決定。 |
| init | 初期重心の選択。k-means++は収束の安定化と高速化に有利。randomは分散大、要n_init増。 | 'k-means++'(推奨)/ 'random' / 既知の重心初期値 | 基本は'k-means++'。明確な初期セントロイド仮説がある場合のみ手動初期化。 |
| n_init | 初期化の試行回数。最良のinertiaを採用して局所解リスクを低減。 | 10–50(高次元・大規模・k大は30–100) | 時間許す限り増やす。クラスタ構造が弱い/不安定なデータでは高め(≥50)。 |
| max_iter | Lloyd反復の上限。収束までの反復回数。 | 100–300(難問題で500–1000) | 収束未達(反復上限到達)頻発なら増加。通常は300で十分。 |
| tol | 収束判定の許容誤差(inertiaの相対改善がこの値未満で停止)。 | 1e-4(高速化重視で1e-3、精密化で1e-5) | 時間対精度のトレードオフ。初期探索は1e-4〜1e-3、最終学習は1e-4〜1e-5。 |
| algorithm | 最適化アルゴリズム。'elkan'は三角不等式で高速化(密行列・ユークリッド向け)。 | 'elkan'(推奨: 密データ)/ 'lloyd'(疎データ互換) | 密で特徴数が中程度(〜数十)の場合は'elkan'。疎行列や特殊条件では'lloyd'。 |
| random_state | 乱数シード。再現性確保。 | 任意の整数 | 実験比較や本番化では固定。 |
| copy_x | 入力Xをコピーするか。Falseでメモリ節約だが上書きの可能性。 | True/False | メモリ制約が厳しい場合のみFalse。通常はTrueで安全。 |
補足:
推奨初期設定(目安)
以上に従い、ECユーザー分群では「適切な前処理→kの系統的探索→初期化の多試行→安定性評価→ビジネス整合」という流れでKMeansの主要ハイパーパラメータを調整することで、再現性と解釈性の高いセグメンテーションを構築できます。
在新模型落地时,快速抓住关键参数,制定分阶段试验表,减少大范围盲试,按周提交清晰调优报告。
在多数据集对比中,生成标准化参数设置与结果摘要,建立可复用基线,指导后续特征与采样策略。
为团队统一调优流程与文档结构,分配任务时直接引用模板,加速评审与复盘,压缩交付周期。
将模板生成的提示词复制粘贴到您常用的 Chat 应用(如 ChatGPT、Claude 等),即可直接对话使用,无需额外开发。适合个人快速体验和轻量使用场景。
把提示词模板转化为 API,您的程序可任意修改模板参数,通过接口直接调用,轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。
在 MCP client 中配置对应的 server 地址,让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作,让提示词在不同 AI 工具间无缝衔接。
免费获取高级提示词-优惠即将到期