🔥 会员专享文生文其它

机器学习算法推荐专家

👁️ 112 次查看

📅 Nov 29, 2025

💡 核心价值： 本提示词专为AI/ML工程师设计，能够根据具体的数据类型和问题场景，精准推荐最适合的机器学习算法。通过系统化的算法分析框架，综合考虑数据特征、问题复杂度、计算资源等因素，提供10个高度相关的算法推荐，并详细说明每个算法的适用场景、优势特点及实现注意事项。该提示词特别适用于算法选型、技术方案设计、项目规划等实际工程场景，帮助工程师快速确定技术路线，提高开发效率。

终身会员免费复制

🎯 可自定义参数（5个）

数据类型和问题描述

描述数据的类型和需要解决的具体问题

输出语言

输出结果使用的语言

技术深度级别

技术描述的深度级别

应用场景

算法应用的具体业务场景

计算资源约束

计算资源的限制条件

🎨 效果示例

问题总结

数据类型：中英混合的长文档（会议记录、邮件纪要、议程），单次长度约2–6万字，含口语、缩写、时间码、人名。
任务：生成200–300字中文高管摘要 + 3条行动项（包含关键数值、责任人、截止日期）；首次出现的缩略词需给出全称；跨段去重与合并重复议题；降低幻觉。
评估与约束：以ROUGE-L、事实一致性与覆盖率为主，端到端延迟<5s；计算资源中等。
关键挑战：长文处理、代码混杂（中英）、事实可核对、行动项结构化抽取、去重与主题合并、长度与要素可控生成。

技术实现建议

总体流水线（满足<5s，GPU/中等算力前提）
1. 预处理（~0.3s）
  - 去时间码但保留“日期/周次/截止”表达；统一中英标点与大小写；句子切分（中英混合断句器）。
  - 快速规则识别人名/团队名（词典+正则）与数值/百分比/金额/日期，打标记位。
2. 主题切分与去重（~0.6s）
  - 段落级切分（TextTiling/语义突变检测）→ 句子向量（多语SBERT/MiniLM）→ HDBSCAN聚类；
  - 对相似句SimHash去重，阈值建议0.9；保留含数值与结论的句子优先。
3. 抽取骨架（~0.6s）
  - PositionRank/TextRank取Top-N句（N≈20–40，视长度）；用MMR多样化（λ≈0.7）。
  - 优先打分：含数字/人名/截止日期的句子+0.2权重。
4. 行动项抽取（~0.7s）
  - 序列标注（BiLSTM/CRF或小型RoBERTa-CRF）抽取实体/时间/指标；
  - 关系抽取将{Owner, Task, Deadline, Metric}配对；空缺用规则回填（如“本周五”“EOW”等）。
5. 缩略词扩展（~0.2s）
  - Schwartz-Hearst识别Acronym/Long Form；多义词用语义相似度从域内词表选择；首次出现处插入“全称（缩写）”。
6. 检索增强生成（~1.2s）
  - 用骨架句构建索引，针对候选要点/主题查询Top-k（k≈5–8段）作为证据；
  - 小型蒸馏mT5/mBART或编辑式模型生成200–300字中文高管摘要；启用复制指针或约束解码：
    - 长度约束：最少190、最多310字；
    - 词汇约束：必须包含Top关键数字、人名、截止日期；首次涉及缩写的长短写同现。
7. 事实校验与回写（~0.9s）
  - 从摘要抽取主张（数值、趋势、日期、人名）→ 针对原文证据做QA/NLI核对（阈值≥0.8保留）；
  - 不一致则回退为原文抽取表述或删除该主张。
8. 输出行动项
  - 按抽取四元组生成3条可执行项；若>3条，按紧迫性（最近截止）+影响度（涉及金额/流量比例）排序取前3。
工程与优化
- 模型体量：encoder类≤150–300M；生成器≤300–600M；优先INT8量化与FP16推理。
- 并行与缓存：句向量批量化；FAISS内存索引；流水线并行（抽取与缩略词并行）。
- 容错回退：事实校验未通过→降级为抽取式摘要；生成超时>1.2s→直接输出骨架压缩版。
- 评估：
  - 自动：ROUGE-1/2/L、覆盖率（主张召回）、一致性（QAFactEval分）；
  - 人工抽检：数字、人名、截止日期准确率；缩略词首次扩展正确率；跨段去重率。
- 域适配：构建行业缩略词词表（财务、流量、版本发布等）；人名/团队实体词典（HR/AD域同步）。
关键超参建议
- 句子分块：每块300–500字；
- 检索Top-k：主题级6–8段，行动项验证3–5段；
- MMR参数λ=0.6–0.8；相似度阈值（聚类/去重）0.8–0.9；
- 受控解码：coverage penalty、重复惩罚1.2–1.5；必含词清单=数字、人名、截止日期、首次缩写全称。

参考资料

TextRank: Mihalcea & Tarau. TextRank: Bringing Order into Texts. https://aclanthology.org/W04-3252/
PositionRank: Florescu & Caragea. PositionRank: An Unsupervised Approach to Keyphrase Extraction. https://aclanthology.org/P17-1102/
MMR: Carbonell & Goldstein. The Use of MMR for Diversity-Based Re-Ranking. https://dl.acm.org/doi/10.1145/290941.291025
Pointer-Generator Networks: See et al. Get To The Point. https://aclanthology.org/P17-1099/
Coverage Mechanism: Tu et al. Modeling Coverage for NMT. https://aclanthology.org/P16-1008/
Retrieval-Augmented Generation (RAG): Lewis et al. https://arxiv.org/abs/2005.11401
QAFactEval: Fabbri et al. QAFactEval. https://arxiv.org/abs/2112.08542
FactCC: Kryscinski et al. https://aclanthology.org/D19-3016/
Acronym Expansion (Schwartz-Hearst): https://dl.acm.org/doi/10.1145/669010.669017
Coreference (Span-based/SpanBERT): Joshi et al. SpanBERT. https://arxiv.org/abs/1907.10529
SBERT: Reimers & Gurevych. https://arxiv.org/abs/1908.10084
HDBSCAN: Campello et al. https://link.springer.com/article/10.1007/s41060-015-0001-3
LASERTAGGER（编辑式摘要）: Malmi et al. https://arxiv.org/abs/1909.01187
mT5: Xue et al. https://arxiv.org/abs/2010.11934

以上组合能在中等算力下实现<5秒延迟的高一致性、低幻觉中文高管摘要与行动项产出；通过“抽取骨架 + 检索证据 + 受控生成 + 事实校验”的闭环，兼顾覆盖率、准确性与稳定性。

1) Problem Summary

Data: 5k–20k user reviews and Q&A per product; bilingual (English + Chinese), with emojis and colloquial expressions.
Task: Generate English marketing copy — title (≤60 chars), 5 bullet points (each ≤120 chars), short description, and FAQ draft. Tone: trustworthy, no over-promising. Must avoid contradictions with review evidence and remove prohibited/exaggerated claims.
Constraints: Safety/compliance filtering, factual grounding in reviews, length control, readability and conversion orientation prioritized.
Resources: Medium compute; practical, deployable NLP stack.
Goal: Select algorithms that together produce grounded, safe, concise English copy from noisy bilingual UGC.

2) Recommended Algorithm List (Top 10, ranked by suitability)

Retrieval-Augmented Generation (RAG) with Multilingual Dense Retrieval

Applicable scenario: Ground the generator on the most relevant, high-signal review/Q&A snippets per product, across English/Chinese.
Main advantages:
- Reduces hallucination by citing product-specific evidence.
- Naturally handles bilingual input via multilingual embeddings.
- Scales to 5k–20k docs with fast ANN indexes.
Potential limitations:
- Quality depends on retriever recall; poor retrieval leads to weak grounding.
- Needs careful prompt/context formatting to avoid context overflow.
Typical application case: Retrieve top 20–50 snippets about “battery life,” “weight,” “durability,” and feed them to the generator to produce title, bullets, FAQ grounded in reviews.

Multilingual Instruction-Tuned Seq2Seq Transformer (e.g., mT5-like) for Copywriting

Applicable scenario: Core generator that converts retrieved snippets into structured English marketing copy with style control.
Main advantages:
- Strong controllable generation for structured outputs (title, bullets, description, FAQ).
- Handles code-switch input; outputs English consistently.
- Compatible with parameter-efficient fine-tuning (LoRA/PEFT) on medium compute.
Potential limitations:
- Requires curated instruction data (task formats + safe tone).
- Without grounding may over-generalize; best paired with RAG.
Typical application case: “Plan-and-generate” prompt that asks the model to produce: Title, 5 Bullets, Short Description, FAQ based on retrieved evidence.

Aspect-Based Sentiment Analysis (ABSA) with Multilingual Transformers

Applicable scenario: Extract product aspects (battery, weight, build, fit, material, etc.) and their polarity from mixed-language reviews.
Main advantages:
- Ensures generated claims align with sentiment (“battery lasts ~2 days” vs “all-day”).
- Helps select salient, positively perceived aspects for bullets.
Potential limitations:
- Requires domain adaptation for product categories.
- Ambiguous/contradictory reviews need aggregation logic.
Typical application case: Identify “battery life: neutral/positive,” “weight: positive,” “charging speed: negative,” to constrain what copy emphasizes.

Natural Language Inference (NLI) for Factual Consistency Checking

Applicable scenario: Post-check whether generated statements contradict retrieved evidence.
Main advantages:
- Systematic contradiction detection; prevents “over-claiming.”
- Language-agnostic with multilingual NLI variants.
Potential limitations:
- Sensitive to wording; borderline cases need thresholds and human spot-checks for high-impact products.
Typical application case: Validate “2-day battery” claim against evidence like “电池两天一充” to avoid “lasts a week” wording.

Safety/Compliance Multi-Label Classifier (Policy, Medical/Performance Claims, Banned Phrases)

Applicable scenario: Detect and block prohibited or high-risk claims (e.g., “cures,” “guaranteed,” unverified superlatives).
Main advantages:
- Explicit policy enforcement beyond simple keyword lists.
- Supports multiple labels (medical claim, extreme performance, adult content, etc.).
Potential limitations:
- Needs a tailored label set and curated policy data.
- Must combine with rules/regex for edge cases (e.g., units).
Typical application case: Flag “guaranteed results” or “clinically proven” when not supported; trigger rewrite.

Constrained Decoding (Length and Lexical Constraints)

Applicable scenario: Enforce max characters (title ≤60; bullets ≤120), require or forbid phrases during generation.
Main advantages:
- Hard control over length and banned terms.
- Can force inclusion of safe qualifiers (“up to,” “designed to,” “helps”) and exclusion of banned words.
Potential limitations:
- Over-constraining can reduce fluency.
- Character limits are stricter than token limits; may require post-edit passes.
Typical application case: Beam search with forbidden-phrase masks; length-aware decoding and post-truncation with semantic checks.

Candidate Reranking with Cross-Encoder (Learning-to-Rank)

Applicable scenario: Generate multiple candidates and select the best using a cross-encoder scoring model tuned for readability, clarity, and conversion cues.
Main advantages:
- Improves final quality without heavy generator fine-tuning.
- Allows multi-objective scoring (readability, safety, aspect coverage, consistency).
Potential limitations:
- Needs labeled preferences or proxy signals to train.
- Inference cost scales with number of candidates.
Typical application case: Score 10–20 bullet sets using features like readability, evidence coverage, and lack of contradictions; pick top-1.

Unsupervised Keyphrase/Aspect Extraction (BERTopic/KeyBERT-style)

Applicable scenario: Discover salient topics/phrases from large review sets per product without labels.
Main advantages:
- Surfaces product-specific terminology and benefits users mention most.
- Provides seed phrases for lexically constrained generation.
Potential limitations:
- Topic drift or noisy clusters if reviews are short or overly diverse.
- Requires heuristic cleaning for emojis/slang.
Typical application case: Extract “lightweight,” “sturdy,” “battery two-day charging,” to guide bullets and FAQs.

Content Planning (Plan-and-Write) with Transformer Planner + Realizer

Applicable scenario: First produce a content outline (ordered aspects + claims + FAQs), then surface realization.
Main advantages:
- Reduces repetition and improves structure for title/bullets/FAQ.
- Makes length budgeting easier by planning first.
Potential limitations:
- Two-stage models add complexity and latency.
- Planner quality impacts final outputs significantly.
Typical application case: Planner outputs: [Title focus: lightweight & durable], [Bullets: battery 2-day, comfort, materials, warranty, compatibility], [FAQ: charging time, returns]; Realizer writes final text.

Bilingual Normalization & Translation (Multilingual NMT for Canonicalization)

Applicable scenario: Normalize code-switched, emoji-rich review snippets to concise English evidence before generation.
Main advantages:
- Improves downstream retrieval and ABSA by canonicalizing noisy text.
- Helps unify measurement units and colloquialisms.
Potential limitations:
- Risk of losing nuance; needs quality checks for product terms.
- Additional inference step.
Typical application case: Convert “电池两天一充 😊” to “Battery needs charging about every two days” for consistent evidence.

3) Technical Implementation Suggestions

A. End-to-end pipeline (recommended)

Preprocess
- Language ID + sentence splitting; retain original + normalized versions.
- Emoji/colloquial mapping to plain English (custom dictionary + model-backed normalization).
- Deduplicate near-duplicates; filter spam/irrelevant content.
Evidence indexing (RAG)
- Create multilingual sentence embeddings (e.g., LaBSE/multilingual SBERT).
- Build ANN index (FAISS/HNSW). Store metadata (aspect tags, language, ratings).
- Retrieval recipe: hybrid sparse+dense (BM25 + dense) for robustness.
Aspect mining
- Run unsupervised keyphrase/topic extraction to pre-seed aspects per product.
- Run ABSA to get aspect polarity and representative quotes.
Content plan
- Build a lightweight planner that selects top positive aspects, flags negatives (to avoid or carefully phrase), and allocates length budgets per section.
Generation
- Use a multilingual instruction-tuned seq2seq model with RAG context windows (top-20 evidence snippets).
- Prompt includes: product category, content plan, do/don’t guidelines, length limits, banned words, hedging lexicon.
- Apply constrained decoding: length control (token-target aligned with character budgets), forbid list masks, optional required-phrase constraints for compliance qualifiers.
- Generate multiple candidates (e.g., 8–16) per section.
Post-checks
- NLI contradiction check between each candidate sentence and evidence; drop candidates with contradictions.
- Safety/compliance multi-label classifier; auto-rewrite unsafe lines via a small seq2seq editor or regenerate with stronger constraints.
- Readability scoring (FKGL/SMOG + a learned readability classifier).
Rerank & select
- Cross-encoder scores combine: readability, evidence coverage, aspect diversity, safety, and length adherence. Select top-1 per section.
Final QA
- Rule-based unit checks (dimensions, battery hours), product name consistency, trademark/capitalization.
- Optional human spot-check on first deployments.

B. Model/training tips

Data construction
- Build weakly supervised pairs: select top evidence snippets → draft target copy via seed LLM → human edit a small subset; use edits as high-quality fine-tuning data.
- Mine FAQs: common question templates from Q&A (charging, warranty, compatibility), summarized answers grounded in evidence.
Fine-tuning
- Use PEFT (LoRA) on 3B–7B multilingual seq2seq for medium compute; train with instruction format and structural tags (Title:, Bullet1:, …, FAQ:).
- Include English-only targets even when inputs are bilingual to bias English output.
Decoding & length control
- Map character budgets to token budgets empirically (e.g., title 60 chars ≈ 12–16 tokens for English; calibrate per tokenizer).
- Enforce hard stop via constrained decoding and post-trim with semantic-safe truncation (avoid cutting units/claims mid-phrase).
Safety/compliance
- Maintain a curated banned/hedged lexicon per product vertical (medical, electronics, cosmetics).
- Train multi-label classifier with focal loss to handle class imbalance; combine with deterministic regex rules (units, “100%,” “guaranteed,” medical verbs).
Consistency
- NLI threshold tuning: treat “contradiction” strictly, “neutral” as allowable only with hedging (“may,” “up to,” “typically”).
- Aggregate ABSA sentiment over many reviews; use confidence-weighted averages to avoid overfitting to outliers.
Reranking
- Train cross-encoder on pairwise preferences (A/B choices) with criteria: clarity, trust, benefit-first wording, and evidence alignment.
- Use features in the scorer: aspect coverage, banned-word count, length delta, NLI scores, ABSA alignment.
Evaluation
- Human evaluation rubric: clarity (1–5), trustworthiness (1–5), specificity (1–5), alignment with reviews (1–5).
- Automated: FKGL, coherence scores, contradiction rate, compliance violation rate, and coverage of top-k aspects.
Deployment
- Cache retrieval results per product.
- Batch generation and reranking for efficiency.
- Keep a rollback strategy: if constraints filter out all candidates, relax non-critical constraints (e.g., minor length overrun) and regenerate.

C. Practical parameter hints (medium compute)

Retriever: multilingual SBERT-base embeddings; HNSW index; top-200 recall → re-rank to top-20.
Generator: 3B–7B multilingual seq2seq with LoRA; max input 4–8k tokens if available; otherwise chunked RAG.
Candidates: 8–16 per section; top-3 reranked; 1 selected.
Classifiers (ABSA, NLI, safety): base-size transformer encoders for low latency.

D. Content style safeguards

Use calibrated hedging: “up to,” “typically,” “helps,” “designed for,” “may.”
Prefer measurable, review-grounded phrases (“charges about every two days,” “lightweight yet sturdy build”).
Avoid absolutes unless supported by overwhelming evidence and policy permits.

4) References

Retrieval-Augmented Generation: Lewis et al., 2020 — https://arxiv.org/abs/2005.11401
Dense Passage Retrieval (DPR): Karpukhin et al., 2020 — https://arxiv.org/abs/2004.04906
LaBSE (Multilingual sentence embeddings): Feng et al., 2020 — https://arxiv.org/abs/2007.01852
mT5 (Multilingual T5): Xue et al., 2021 — https://arxiv.org/abs/2010.11934
FLAN (Instruction tuning): Wei et al., 2022 — https://arxiv.org/abs/2109.01652
ABSA with BERT: Sun et al., 2019 — https://arxiv.org/abs/1908.07926
Natural Language Inference (MNLI): Williams et al., 2018 — https://arxiv.org/abs/1704.05426
DeBERTa improvements for NLI: He et al., 2021 — https://arxiv.org/abs/2006.03654
Constrained Decoding: Post & Vilar, 2018 — https://www.aclweb.org/anthology/N18-1119/
BERTopic (topic modeling with embeddings): Grootendorst, 2022 — https://arxiv.org/abs/2203.05794
KeyBERT: https://github.com/MaartenGr/KeyBERT
Cross-Encoder Reranking (Sentence-Transformers): Reimers & Gurevych, 2019 — https://arxiv.org/abs/1908.10084
M2M-100 (Many-to-many multilingual translation): Fan et al., 2021 — https://arxiv.org/abs/2010.11125
CTRLsum (controllable summarization): He et al., 2020 — https://arxiv.org/abs/2012.04281

These algorithms, when integrated, provide a practical, safe, and grounded solution for generating concise, trustworthy English marketing copy from mixed-language user reviews and Q&A at medium compute budgets.

문제 요약

데이터: 다중턴 고객센터 대화 로그(오탈자·이모지·타임스탬프 포함) + 정책 조항 단문(문장 단위 ID).
과제: 한국어 표준 답변 생성(≤180자), 실행 가능한 단계 포함, 공손한 맺음말, 3–5개 관련 정책 문장 ID 첨부, 비용·정책 임의 생성 금지, 필요 시 정중한 확인 질문.
제약: 단일 턴 <800ms, 초고동시성, 계산자원 제약 높음(경량·양자화 필수).
목표: 고정밀 정책 문장 검색+정확 근거 제시+통제된 짧은 생성.

추천 알고리즘 목록(적합도 순)

멀티링구얼 듀얼인코더 밀집검색(E5-multilingual-small + FAISS HNSW)

적용: 대화 요약 쿼리 → 정책 문장 레벨 검색.
장점: 초저지연(ANN), 다국어·오탈자에 강함(서브워드), 오프라인 임베딩 사전계산.
한계: 상위 후보 정밀도 한계(재랭킹 필요).
사례: 경량 RAG 지식검색, 콜센터 FAQ 검색.

하이브리드 검색(BM25 + RRF 융합)

적용: 키워드가 중요한 정책 조항(“환불”, “입금 지연” 등) 보강.
장점: 어휘 일치 보장, 쉬운 배포, E5와 상호보완.
한계: 형태소/토크나이즈 품질 영향, 동의어 취약.
사례: 전자상거래 규정 검색 보강.

경량 크로스인코더 재랭커(MiniLM/XLM-R-mini Cross-Encoder, INT8)

적용: 상위 50 후보 → 상위 3–5 문장 정밀 선별.
장점: 문맥 상호작용으로 고정밀 근거.
한계: 지연 증가(양자화·최소 Top-K만 재랭킹 필요).
사례: 생산 전 근거 문장 확정.

랭킹 학습(LambdaMART/LightGBM)로 최종 근거 선택

적용: 특징(밀집 유사도, BM25, 대화 의도, 문장 길이, 신뢰도) 기반 3–5개 선정.
장점: 재현성 높고 튜닝 용이, 실시간 빠름.
한계: 라벨 필요, 피처 드리프트 관리 필요.
사례: 웹 검색 랭킹, FAQ 최적 근거 선정.

의도 다중라벨 분류(DistilKoBERT/XLM-R-small)

적용: “결제 실패/환불/지연/중복 결제” 등 의도 라벨링 → 검색 쿼리 보강/룰 라우팅.
장점: 수백 μs~수 ms 추론, 컨텍스트 요약에 유리.
한계: 다도메인 확장 시 클래스 관리 필요.
사례: 콜 분류, 라우팅.

슬롯/엔티티 추출(DistilBERT-CRF 또는 BiLSTM-CRF)

적용: 금액, 주문번호, 시간, 결제수단 등 추출 → 실행 단계/질문 생성.
장점: 구조화 정보로 단계형 답변 안정화.
한계: 어노테이션 비용, 도메인 이식 시 재학습.
사례: 환불 처리 자동화 입력 수집.

잡음 강건 정규화·오탈자 교정(노이즈 채널+편집거리+서브워드 LM)

적용: 이모지/타임스탬프 제거, 오탈자/속어 정규화, 약어 확장.
장점: 검색 리콜 향상, 비용 대비 효과 큼.
한계: 언어별 규칙 관리 필요.
사례: SNS/채팅 전처리.

제약형 생성(mT5-small/KoT5 + 템플릿/제약 디코딩)

적용: 한국어 표준 답변 생성(≤180자, 단계 포함, 공손말), 근거 ID 삽입.
장점: 길이·스타일·금지어(비용 언급 등) 제약 가능, 경량화 용이.
한계: 창의성 낮음, 템플릿 설계 필요.
사례: 표준 운영 멘트 생성.

사실일치 검증(NLI, XLM-R-small NLI Distilled)

적용: 생성 답변이 근거 문장에 의해 함의되는지 판정(허위 비용/정책 차단).
장점: 환각 억제, 규정 위반 방지.
한계: 보수적 거부 가능, 경계 사례 조정 필요.
사례: RAG 사실 일치 게이트.

불확실성 기반 확인질문 트리거(온도 보정·임계값/보형예측)

적용: 의도/슬롯/근거 신뢰도 낮을 때 짧은 확인 질문 자동 생성.
장점: 부족정보 보완, CS 재접촉 감소.
한계: 임계값 튜닝 필요, UX 조정 필요.
사례: 의료·금융 문의의 안전 확인.

기술 구현 제안

인덱싱/검색
- 정책을 문장 단위로 분할하고 고유 ID 부여. 오프라인 임베딩(E5-multilingual-small) 계산.
- FAISS HNSW 또는 IVF-PQ로 ANN 구축(메모리 예산에 따라 PQ 압축). 쿼리당 5–10ms 수준 목표.
- 하이브리드: BM25 상위 200 ∪ E5 상위 200 → RRF 융합 → 재랭킹 50.
다중턴 이해
- 최근 N턴(예: 3) 요약/의도 집계: DistilKoBERT로 의도 라벨 앙상블, 슬롯은 CRF로 최신 값 업데이트.
- 쿼리 재작성: [의도 키워드 + 슬롯 키/값]로 검색 프롬프트 생성.
재랭킹과 최종 근거 선택
- MiniLM Cross-Encoder(INT8)로 상위 50 → 10 재랭크.
- LambdaMART로 최종 3–5개 문장 ID 선택(특징: CE 점수, E5 유사도, BM25, 의도 일치, 길이 패널티).
생성과 제약
- mT5-small/KoT5를 ONNX/INT8로 배포, 디코딩 길이 하드컷(≤180자), 금지어 리스트(비용, 임의 수치) 적용.
- 템플릿: “확인 단계 1) 필수정보(시간/금액/주문번호) 2) 결제수단/앱버전 안내. 근거:[ID…]. 감사합니다.” 등 도메인별 슬롯 채움.
- 근거 ID는 반드시 검색 결과에서만 삽입(화이트리스트).
사실 검증/안전
- NLI 스몰 모델로 [근거 집합 → 답변] entailment≥τ일 때만 통과, 아니면 확인 질문 템플릿으로 전환.
- 불확실성: 소프트맥스 온도보정+임계값, 또는 MC Dropout 4~8회로 분산 추정(지연 한계 내).
전처리
- 이모지 제거, 타임스탬프/금액 패턴 정규화, 자주 쓰는 오탈자 혼동집합+편집거리 교정(SymSpell 유사).
- 언어감지(fastText)로 한국어가 아닐 때도 출력은 한국어 유지(필요 시 “다음 정보를 한국어로 알려주세요” 템플릿).
성능/지연 최적화
- 모든 모델 INT8 양자화(ONNX Runtime/TensorRT), 동적 배칭, gRPC keep-alive.
- 캐시: 쿼리 정규화 키 기반 검색/재랭크 결과 캐시(LRU), 인기 정책 warm cache.
- 지연 예산(권장): 전처리 5ms, 검색 20ms, 재랭킹 25ms, 랭킹 2ms, 생성 80–120ms, 검증 20ms, 합계 <250ms(여유 포함).
모니터링/평가
- 검색: nDCG@10, Recall@50; 생성: 길이 준수율, 단계 포함율, 공손말 포함율; 안전: NLI 통과율, 금지어 위반 0건.
- A/B: 확인질문 트리거 임계값, 템플릿 변화가 CS 재문의율에 미치는 영향.
동시성/운영
- FAISS 샤딩+메모리맵, 모델 서버(Triton/ORT) 수평 확장, 피크 시간대 동적 오토스케일.
- 장애 시 폴백: 하이브리드 검색 + 템플릿 기반 규칙 답변(근거 ID는 항상 제공).