生成案例研究场景

幂简官方

361 浏览

32 试用

9 购买

Sep 28, 2025更新

设计文生文

设计与特定主题相关的案例研究情境，提供专业建议。

案例研究情境：基于表现性任务的课程评估与评分量表设计——“工程设计导论”本科课程

一、背景与问题陈述某工科学院一门必修的“工程设计导论”课程以团队项目为中心，旨在培养一年级学生的设计思维、工程伦理与沟通协作能力。历年期末项目评分存在以下问题：一是评分标准隐性、不同教师间评分差异较大；二是学生对期望质量特征理解不足，反馈的可操作性弱；三是项目型证据的评分可追溯性和课程层面学习成效的可累积性不足。基于上述困境，课程团队拟通过系统化的评分量表（rubric）与配套质量保障流程，重构课程评估，以提升效度、信度与公平性（AERA, APA, & NCME, 2014；Wiggins, 1998）。

二、学习成果与评估目标课程对齐的可测学习成果包括：

能基于约束与用户需求提出可行的工程方案，并提供基于证据的技术论证（工程推理）。
能展示迭代式设计过程与规范的工程文档（设计过程与规范性）。
能在口头与书面形式清晰沟通设计原理、权衡与局限（沟通）。
能识别并应对安全、伦理与社会影响（伦理与社会责任）。
能在团队情境下有效协作并进行同侪评估（协作与专业素养）。

三、评估任务设计终结性评估由两部分构成：

团队设计项目（占总评70%）：提交技术报告、功能原型演示与设计过程档案（需求分析、权衡记录、测试数据、迭代日志）。
个人反思与技术附录（占总评30%）：个人对关键设计决策的技术论证、伦理考量及个人贡献说明。设计任务为典型的表现性评估，强调情境真实性、证据多样性与可验证性（Wiggins, 1998）。

四、评分量表类型与结构采用分析型评分量表（analytic rubric），理由是其有助于拆解复杂表现的多维质量特征，提升反馈的针对性，并在有适当培训时改善评分一致性（Jonsson & Svingby, 2007；Brookhart, 2013）。设置四级表现水平：卓越、熟练、基本达标、未达标；以明确定义的、可观察的质量特征作为行为锚定，避免含糊术语（Sadler, 1989）。

示例：评价维度与权重

功能与技术性能（30%）：满足约束与需求的程度、验证数据的充分性与质量。
设计过程与证据链（25%）：迭代的完整性、权衡记录、决策依据的透明度。
工程推理与技术论证（20%）：模型/计算/试验的正确性、假设与不确定性的阐明。
沟通（15%）：报告结构、图示与可重复性、口头陈述的清晰与针对性。
专业与伦理（10%）：安全与伦理风险识别、合规性、团队分工与诚信。

“工程推理与技术论证”维度的水平描述示例：

卓越：准确构建并应用适当的模型/计算，系统呈现不确定性与边界条件；以三类以上独立证据相互印证，推理链完整可复核。
熟练：模型/计算基本正确，能识别主要假设与局限；证据充分，推理基本连贯。
基本达标：采用了基本计算或经验估计，部分假设未说明；证据与结论关联性弱但可辨。
未达标：缺乏有效模型/证据支撑；推理存在重大缺陷或不可追溯。

权重经由课程团队与外部行业顾问协商确定，以对齐学习成果的相对重要性与课程层级（AAC&U VALUE 框架对高阶能力的强调提供了对齐参照；Rhodes, 2010）。

五、效度论证与证据收集计划依据当代效度观与推理链方法构建效度论证（Messick, 1995；Kane, 2013）：

内容效度：通过三轮专家评审（2名课程教师、1名学院教学督导、1名行业工程师）检视量表维度与描述是否覆盖并忠实表征学习成果；采用内容效度指数（CVI）记录一致性意见。
反应过程：组织学生焦点小组与评分者思维出声，检验学生与评分者对术语与水平锚定的理解一致性，修订措辞以提升可判分性。
内部结构与信度：试点对40份历史匿名作品进行双评，计算维度层面的组内相关系数ICC(2,k)与加权Kappa，目标≥0.75为“良好”一致性；并开展G理论的G-study与D-study，估计评分者数与任务数对信度的贡献，优化评分设计（Shavelson & Webb, 1991）。
与他变量的关系：检验项目得分与相关课程前测（基础物理/数学诊断）之间的关联模式是否符合预期的收敛/区分效度。
后果效度：跟踪实施后学生自评与同侪评审质量的变化、重修率与申诉率的变化，审视潜在的正/负后果（AERA, APA, & NCME, 2014）。

六、评分者培训与校准

采用锚例与校准程序：为每一水平准备经共识标定的代表性样本，进行两轮独立评分与复盘，消除“宽严偏差”、维度混淆与证据权重误差（Jonsson & Svingby, 2007）。
评分手册：提供对每一维度的正反例与常见误判提示，明确“证据优先、文本表述从证”的原则。
过程监测：现场抽取10%作业进行第三方复核；计算评分者间一致性并实施必要的分数线调整与再评分。

七、公平性与可及性

语言与可访问性审查：量表措辞简洁、避免文化负载术语，必要时提供双语术语表与样例；提供合理便利（如展示方式的替代）以控制非目标构念的干扰（AERA, APA, & NCME, 2014）。
偏差监测：采用多面Rasch或分层模型检视评分者效应与潜在群体差异，在控制前测能力后观察是否存在系统性差异，必要时调整评分流程与培训重点（Eckes, 2015）。

八、实施与数据分析方案

时间线：第1—3周修订量表并开展评分者培训；第4—12周滚动收集形成性证据并提供基于量表的反馈；第13—15周完成终结性评分与信度分析；学期末完成效度证据整合与改进建议。
数据分析：维度与总分的描述统计、评分者一致性指标、G理论方差分解；考察维度间相关以识别冗余或重叠构念；对不同任务证据的权重进行敏感性分析，验证权重设定的稳健性。

九、标准设定与成绩映射

采用Body of Work法组织教师对多水平锚例的整体判断，结合量表总分分布设定各等级界值，确保持证据对最终判定的主导地位（Cizek & Bunch, 2007）。对边界案例进行复核，确保标准的可辩护性与可操作性。

十、预期结果与改进行动

预期评分一致性显著提升，学生对高质量表现的心智表征更清晰；形成性反馈的针对性增强，从而改进迭代质量（Panadero & Jonsson, 2013）。
基于数据的迭代改进：若发现“沟通”与“工程推理”高度相关且判分冲突频发，可在下一轮合并或重划维度；若G-study显示评分者方差占比偏高，增加校准频次或引入第三评分者。

参考文献

AERA, APA, & NCME. (2014). Standards for Educational and Psychological Testing. Washington, DC: AERA.
Brookhart, S. M. (2013). How to Create and Use Rubrics for Formative Assessment and Grading. Alexandria, VA: ASCD.
Cizek, G. J., & Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Thousand Oaks, CA: Sage.
Eckes, T. (2015). Introduction to Many-Facet Rasch Measurement. Frankfurt am Main: Peter Lang.
Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130–144.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances. American Psychologist, 50(9), 741–749.
Panadero, E., & Jonsson, A. (2013). The use of scoring rubrics for formative assessment revisited: A review. Educational Research Review, 9, 129–144.
Rhodes, T. (2010). Assessing Outcomes and Improving Achievement: Tips and Tools for Using Rubrics. Washington, DC: AAC&U.（含VALUE框架资源）
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18, 119–144.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability Theory: A Primer. Newbury Park, CA: Sage.
Wiggins, G. (1998). Educative Assessment: Designing Assessments to Inform and Improve Student Performance. San Francisco, CA: Jossey-Bass.

本案例情境提供了一个可操作的课程评估与评分量表的设计—验证—实施闭环框架，强调以效度为中心的证据链、评分者校准与公平性保障，从而支持高风险课程决定的可辩护性与学生学习的改进。

Case Study Scenario: Integrating Performance Appraisal and Training Debrief in an Emergency Department

Overview and Problem Statement A 900-bed tertiary hospital’s Emergency Department (ED) has observed variable team performance in early sepsis management, evidenced by inconsistent door-to-antibiotic times and adherence to the sepsis care bundle. The hospital seeks to implement an integrated performance appraisal system aligned with a simulation-based training program that embeds structured debriefs, with the dual aims of improving clinical performance and establishing defendable, fair personnel decisions. This case examines the design, implementation, and evaluation of that integrated system.

Intervention Components

Performance appraisal

Rubrics: Behaviorally Anchored Rating Scales (BARS) for non-technical skills (communication, leadership, situational awareness, teamwork) and a checklist for adherence to the sepsis bundle.
Sources: Trained peers and charge nurses rate live simulations and selected real resuscitations; self-assessments are collected; optional patient/family compliments/complaints are coded qualitatively. Multiple raters and occasions are used to mitigate idiosyncratic rater effects and halo/leniency biases (Scullen, Mount, & Goff, 2000).
Rater training and calibration: Frame-of-reference training with video vignettes, practice scoring, and group calibration sessions; quarterly drift checks. Reliability targets: intraclass correlation (ICC) ≥ 0.70 for global BARS dimensions (Shrout & Fleiss, 1979). Generalizability studies inform the optimal number of raters and occasions (Shavelson & Webb, 1991).

Training and debriefing

Simulation-based training: Interprofessional high-fidelity scenarios (sepsis recognition, escalation, bundle implementation) with deliberate practice (Salas, Tannenbaum, Kraiger, & Smith-Jentsch, 2012).
Structured debriefing: Facilitators use the PEARLS blended debrief model emphasizing psychological safety, performance gap analysis, and guided discovery; each simulation and selected real cases are followed by an after-action review (AAR) within 24–48 hours (Eppich & Cheng, 2015; Tannenbaum & Cerasoli, 2013). Debrief quality is periodically audited using the Debriefing Assessment for Simulation in Healthcare (DASH) to sustain fidelity (Brett-Fleegler et al., 2012).

Evaluation Questions

EQ1 (Effectiveness): To what extent does the integrated appraisal-plus-debrief intervention improve sepsis bundle adherence and door-to-antibiotic times?
EQ2 (Transfer): Does improved simulation performance translate to real-world behavior change in the ED?
EQ3 (Measurement quality): Are appraisal ratings reliable, valid, and fair for formative and summative use?
EQ4 (Mechanisms): Do debrief attendance and quality predict performance gains, controlling for baseline performance and case mix?
EQ5 (Implementation): What contextual factors facilitate or hinder adoption, according to the CIPP framework (context, input, process, product)?

Design and Methods

Design: Stepped-wedge, cluster-randomized rollout across six ED pods over nine months, enabling within-unit baseline comparisons and control of secular trends. The design supports practical constraints while improving causal inference relative to simple pre–post designs.
Sample: Approximately 180 clinicians (attendings, residents, nurses, respiratory therapists). All staff participate in appraisal and training; observation sampling is balanced by shift and acuity.
Outcomes and measures (aligned with Kirkpatrick): • Level 1 (Reaction): Post-session measures of training relevance and utility. • Level 2 (Learning): Knowledge test on sepsis protocols; simulation checklist scores; BARS ratings of non-technical skills. • Level 3 (Behavior/Transfer): Real-case adherence to sepsis bundle extracted from the EHR; observational BARS during live resuscitations. • Level 4 (Results): Patient-level outcomes adjusted for acuity (e.g., time to antibiotics, ICU transfer within 24 hours, in-hospital mortality).
Mechanism and climate measures: Team psychological safety (Edmondson, 1999); feedback climate and perceived fairness of appraisal. Debrief fidelity is indexed by facilitator adherence to PEARLS and DASH scores.

Validity and Reliability Strategy (Argument-Based Validation)

Scoring inference: Standardized rubrics with explicit behavioral anchors; facilitator and rater certification; double-scoring of 10% of events to estimate scorer agreement (Kane, 2013).
Generalization inference: Generalizability study to partition variance across person, rater, scenario, and occasion facets; decision study to set minimum observations per ratee for dependable scores (Shavelson & Webb, 1991).
Extrapolation inference: Convergent validity via correlations between simulation BARS and real-case adherence; predictive validity via associations with patient outcomes (Messick, 1995).
Decision inference: Cut-score policy for summative decisions set via Angoff-type standard-setting with multidisciplinary panel; impact and fairness analyses precede high-stakes use.
Fairness: Monitor subgroup differences (e.g., profession, gender, years of experience), test for rater-by-ratee interactions (idiosyncratic effects), and adjust with rater calibration. Triangulate with 360 narrative data to contextualize scores.

Bias Control and Rater Training

Frame-of-reference rater training is implemented to improve rating accuracy and reduce variability unrelated to true performance (Woehr & Huffcutt, 1994). Quarterly recalibration addresses drift; dashboards flag raters with systematic severity/leniency for coaching.
Multiple raters and occasions are used to dilute single-rater bias (Scullen et al., 2000). When feasible, analyses incorporate random rater effects in mixed models to adjust for severity.

Analysis Plan

Primary impact model: Interrupted time series with mixed-effects or segmented regression within the stepped-wedge framework, estimating changes in level and slope for door-to-antibiotic times and bundle adherence. Covariates include case severity, crowding, and staffing levels.
Reliability: ICCs for BARS dimensions; generalizability coefficients from G-studies to inform minimum observations.
Validity: Multitrait-multimethod correlations among BARS, checklists, and outcomes; examine whether simulation gains predict real-world behavior (transfer).
Mediation: Test whether debrief exposure and quality predict performance improvements through increased psychological safety and feedback utility (Tannenbaum & Cerasoli, 2013; Kluger & DeNisi, 1996).
Sensitivity analyses: Robustness to missing data (multiple imputation), contamination checks across pods, and alternative functional forms in time-series models.

Implementation and Governance

Distinguish formative coaching from summative decisions: Early quarters emphasize formative feedback; summative use follows evidence of adequate reliability and fairness thresholds (Pulakos & O’Leary, 2011).
Data governance: Role-based access; de-identification in learning analytics; transparent communication of use cases and appeal procedures for contested ratings.
Fidelity monitoring: Training completion, debrief attendance, and facilitator adherence tracked and reported monthly (Stufflebeam & Coryn, 2014).

Risks and Mitigations

Selection and history effects: Stepped-wedge design and time-series modeling mitigate threats.
Rater drift: Scheduled recalibration and reliability monitoring.
Feedback backfire: Emphasis on task-focused, behaviorally specific feedback, goal setting, and psychological safety to maximize positive effects (Kluger & DeNisi, 1996).
Burden: Streamlined instruments and sampling plans minimize observation load while meeting reliability targets.

Expected Contributions

Demonstrates how a debrief-centric training strategy can be integrated with a defensible performance appraisal system to improve clinical outcomes and support personnel decisions.
Provides a validity argument and reliability evidence necessary for responsible use of performance data.
Illustrates how debrief quality and learning climate mediate transfer of training to clinical performance.

Selected References

Brett-Fleegler, M., et al. (2012). Debriefing assessment for simulation in healthcare (DASH): Development and psychometric properties. Simulation in Healthcare, 7(4), 176–184.
Cheng, A., Eppich, W., Grant, V., Sherbino, J., Zendejas, B., & Cook, D. A. (2014). Debriefing for technology-enhanced simulation: A systematic review and meta-analysis. Medical Education, 48(7), 657–666.
Edmondson, A. C. (1999). Psychological safety and learning behavior in work teams. Administrative Science Quarterly, 44(2), 350–383.
Eppich, W., & Cheng, A. (2015). Promoting Excellence and Reflective Learning in Simulation (PEARLS): Development and rationale for a blended approach to debriefing. Simulation in Healthcare, 10(2), 106–115.
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.
Kirkpatrick, D. L., & Kirkpatrick, J. D. (2016). Evaluating Training Programs: The Four Levels (4th ed.). ATD Press.
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance. Psychological Bulletin, 119(2), 254–284.
Messick, S. (1995). Validity of psychological assessment. American Psychologist, 50(9), 741–749.
Pulakos, E. D., & O’Leary, R. S. (2011). Why is performance management broken? Industrial and Organizational Psychology, 4(2), 146–164.
Salas, E., Tannenbaum, S. I., Kraiger, K., & Smith-Jentsch, K. A. (2012). The science of training and development in organizations. Psychological Science in the Public Interest, 13(2), 74–101.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings. Journal of Applied Psychology, 85(6), 956–970.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability Theory: A Primer. Sage.
Stufflebeam, D. L., & Coryn, C. L. S. (2014). Evaluation Theory, Models, and Applications (2nd ed.). Jossey-Bass.
Tannenbaum, S. I., & Cerasoli, C. P. (2013). Do team and individual debriefs enhance performance? Human Factors, 55(1), 231–245.
Woehr, D. J., & Huffcutt, A. I. (1994). Rater training for performance appraisal: A quantitative review. Personnel Psychology, 47, 191–227.

案例研究情境：省级能源公用事业公司在能源转型与现货电力市场改革中的战略议题访谈研究

一、案例背景与目标在“双碳”目标与电力现货市场加速落地的双重压力下，A省能源公用事业公司（以下简称“本公司”）面临业务组合优化、灵活性资源布局、数据与数字化能力建设、资本开支效率、以及与监管者和大用户的博弈关系等关键不确定性。管理层拟在未来3–5年制定中期战略，并委托开展一次基于半结构化深度访谈的战略议题识别与优先级评估研究，以构建“战略议题地图”，为战略选择与资源配置提供证据支持（Yin, 2018；Eisenhardt, 1989）。

研究总目标：

识别并界定未来3–5年影响本公司的10–15个核心战略议题；
评估每一议题的战略重要性、紧迫性与可控性；
形成跨利益相关方的共识与可执行建议。

二、评估问题与分析框架

评估问题1：外部环境与政策/市场变动将如何塑造公司的机会与威胁？（PESTEL；Johnson et al., 2017）
评估问题2：公司当前能力与资源能否支撑能转与市场化交易？（资源与能力视角）
评估问题3：利益相关方（监管者、大工业用户、分布式开发商、金融机构等）对公司战略选择的期望与约束为何？（利益相关者理论；Freeman, 1984）
评估问题4：战略议题的分类与排序应遵循何种判据？（战略议题分类；Dutton & Jackson, 1987）

综合框架：PESTEL（宏观）、利益相关方分析（外部关系）、战略议题分类（诊断）、重要性–紧迫性–可控性矩阵（决策）。

三、研究设计与方法

设计类型：嵌入式单案例研究，采用质性主导的混合方法（深度访谈+文件分析+小样本定量优先级评分），以三角互证提升可信度（Yin, 2018；Creswell & Plano Clark, 2018；Patton, 2015）。
数据来源：
1. 半结构化深度访谈（管理层、业务、外部利益相关方）；
2. 内外部文件（年度报告、投资计划、监管文件、市场交易数据）；
3. 小样本量的议题评分调查（用于可比性与优先级确认）。

四、样本与招募策略

抽样方法：分层目的抽样并辅以“滚雪球”获取精英受访者（Harvey, 2011）。分层维度：组织层级（董事会/管理层/中层/一线）、业务单元（发电/配网/售电/储能/综合能源）、关键外部方（省发改/能源局、交易中心、大工业用户、分布式开发商、主要金融机构）。
规模与饱和：目标访谈30–40人次，每类分层不少于5人；同质分层内预计在12±3次达到主题饱和（Guest et al., 2006），并依据“信息力”原则动态调整样本量（Malterud et al., 2016）。

五、访谈提纲（示例，半结构化）

环境驱动：过去12–18个月中，哪三项外部变动最显著影响贵部门/贵机构对本公司的预期？其机制为何？（追问：政策、价格、技术、资本市场）
市场改革：现货市场与辅助服务机制对公司资产组合与收益波动的影响路径？
能力评估：为适应灵活性与数字化交易，需要新增或升级的关键能力是什么？当前差距如何度量？
投资与风险：未来3年内>5亿元的资本项目面临的主要不确定性与关键假设？
利益相关方：与贵方合作/博弈的关键痛点和可行的制度性改进？
价值创造：哪类客户/场景最可能带来新增EBIT/现金流？需要怎样的商业模式与合作结构？
阻碍与加速：内部政策/流程/文化哪些方面阻碍战略落地？哪些“低垂果实”可在6–9个月内验证？
优先级判断：请对候选议题的“重要性、紧迫性、可控性”进行1–5分评分并说明依据。

访谈将使用中性追问、澄清与例证验证技术，以减少社会期许偏差与引导性（Kvale & Brinkmann, 2009）。

六、质量保证与效度控制

工具效度：通过专家评审与2–3次认知访谈修订提纲；计算条目层面与量表层面的内容效度指数（I-CVI、S-CVI），接受阈值≥0.78/≥0.90（Polit & Beck, 2006）。
过程可信度：遵循COREQ/SRQR报告规范，记录样本框架、研究者立场、互动情境与数据饱和证据（Tong et al., 2007；O’Brien et al., 2014）。
研究者三角与审计追踪：两名以上编码者独立编码并开展共识过程；保留决策日志与代码本演化记录（Miles et al., 2014）。
可靠性：对关键主题集进行编码一致性检视（如协同讨论达成共识；必要时报告一致性指标与差异解决流程；O’Connor & Joffe 的实践建议见2020年文献，参照主题分析的可靠性管理）。
反思性与负例分析：记录研究者假设；主动搜寻与报告与主流叙事不一致的证据，提高解释稳健性（Lincoln & Guba 框架，参见Patton, 2015）。

七、数据分析与优先级评定

质性分析：采用主题分析，先演绎（基于PESTEL/战略议题分类）后归纳，形成一级–二级编码，产出议题陈述与证据摘录（Braun & Clarke, 2006；Miles et al., 2014）。
证据整合：以“议题卡片”展示问题定义、发生机制、影响路径、证据来源与相互依赖关系；与文件分析结果进行互证（Patton, 2015）。
定量评分与排序：对每项议题的“重要性（对长期价值影响）–紧迫性（12–18个月临界性）–可控性（内部可动员资源与影响力）”进行1–5分评分与加权汇总；权重由管理层与外部专家通过AHP确定并开展一致性检验（Saaty, 1980）。
输出：2×2或3D优先级矩阵、议题依赖网络图、情景假设表与“早期信号”监测指标。

八、伦理与合规

知情同意：书面同意书，说明研究目的、匿名化原则、数据用途与退出权。
隐私与数据安全：去标识化转录；加密存储与分级访问；数据保留与销毁政策合规本地法规（如参照GDPR原则）。
权力不对称：精英访谈采用地点与时间由受访者决定、可不录音选项与事后引言核对以降低风险（Harvey, 2011）。

九、进度与交付物（示例，12周）

第1–2周：干系人映射、工具开发与试访、CVI评审；
第3–7周：正式访谈与文件分析、滚动饱和评估；
第8–10周：主题分析与优先级评分、AHP权重与一致性检验；
第11–12周：研讨会共识构建与报告定稿。
交付物：战略议题地图与优先级矩阵；议题卡片与证据附录；方法技术报告（含COREQ清单与代码本）；执行建议与90天验证计划。

十、风险与缓解

访问受限与样本偏倚：预设替代受访者清单并采用多源互证；
社会期许与沉默偏差：匿名化汇报与“安全说明”，使用间接提问；
解释偏误：跨研究者审阅、负例分析与成员检核以校正。

参考文献

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101.
Creswell, J. W., & Plano Clark, V. L. (2018). Designing and Conducting Mixed Methods Research (3rd ed.). Sage.
Dutton, J. E., & Jackson, S. E. (1987). Categorizing strategic issues: Links to organizational action. Academy of Management Review, 12(1), 76–90.
Eisenhardt, K. M. (1989). Building theories from case study research. Academy of Management Review, 14(4), 532–550.
Freeman, R. E. (1984). Strategic Management: A Stakeholder Approach. Pitman.
Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? Field Methods, 18(1), 59–82.
Harvey, W. S. (2011). Strategies for conducting elite interviews. Qualitative Research, 11(4), 431–441.
Johnson, G., Whittington, R., Scholes, K., Angwin, D., & Regnér, P. (2017). Exploring Strategy (11th ed.). Pearson.
Kvale, S., & Brinkmann, S. (2009). Interviews: Learning the craft of qualitative research interviewing (2nd ed.). Sage.
Malterud, K., Siersma, V. D., & Guassora, A. D. (2016). Sample size in qualitative interview studies: Guided by information power. Qualitative Health Research, 26(13), 1753–1760.
Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative Data Analysis (3rd ed.). Sage.
O’Brien, B. C., Harris, I. B., Beckman, T. J., Reed, D. A., & Cook, D. A. (2014). Standards for reporting qualitative research. Academic Medicine, 89(9), 1245–1251.
Patton, M. Q. (2015). Qualitative Research & Evaluation Methods (4th ed.). Sage.
Polit, D. F., & Beck, C. T. (2006). The content validity index. Research in Nursing & Health, 29(5), 489–497.
Saaty, T. L. (1980). The Analytic Hierarchy Process. McGraw-Hill.
Tong, A., Sainsbury, P., & Craig, J. (2007). COREQ. International Journal for Quality in Health Care, 19(6), 349–357.
Yin, R. K. (2018). Case Study Research and Applications (6th ed.). Sage.

解决的问题

帮助教育机构、企业人力、研究与咨询团队，快速生成“可直接投用”的案例研究情境与评估方案。用户只需输入主题与期望语言，即可获得结构化、证据导向、风格统一的场景设计、测评思路、指标与建议，既能用于课程与培训的案例化教学，也适用于绩效考核、问卷研究与项目评估。核心价值：更快出稿、更专业、更可信、更易复用，降低试错成本，提升交付质量与决策效率。

适用用户

教育研究者与教师

快速设计课程评估案例与考核情境，明确学习目标与评分量表，生成教研报告与改进建议。

企业HR与学习发展经理

构建绩效评估案例与行为指标，设计培训效果评估与复盘方案，输出晋升与胜任力模型建议。

咨询与战略分析师

为客户议题搭建案例研究框架，制定问卷与访谈提纲，形成基于证据的洞察与实施路线。

特征总结

• 一键定制与主题匹配的案例情境，直接用于测试、绩效或问卷评估并明确目标。

• 自动输出学术化结构与论证框架，含清晰论点、证据与可执行建议与落地路径。

• 支持多语言精准回应，满足跨团队协作与跨地区教学传播的真实需求。

• 智能提炼背景与关键变量，帮助设计可测量指标与清晰的评估路径与里程碑。

• 内置严谨与事实核查要求，降低决策风险，避免无效结论与信息偏差。

• 为教育、HR与咨询场景提供模板骨架，轻松套用并快速形成评估方案。

• 自动优化写作风格与引用规范，生成可直接用于报告与答辩的成果材料。

• 灵活设定主题、受众与深度范围，按需调整角度以适配不同业务目标。

• 聚焦关键问题，避免冗余信息与跑题，让团队快速进入实质性讨论环节。

如何使用购买的提示词模板

1. 直接在外部 Chat 应用中使用

将模板生成的提示词复制粘贴到您常用的 Chat 应用（如 ChatGPT、Claude 等），即可直接对话使用，无需额外开发。适合个人快速体验和轻量使用场景。

2. 发布为 API 接口调用

把提示词模板转化为 API，您的程序可任意修改模板参数，通过接口直接调用，轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。

3. 在 MCP Client 中配置使用

在 MCP client 中配置对应的 server 地址，让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作，让提示词在不同 AI 工具间无缝衔接。

AI 提示词价格

￥25.00元

先用后买，用好了再付款，超安全！

在线免费用提示词

您购买后可以获得什么

✓

获得完整提示词模板

- 共 233 tokens

- 2 个可调节参数

{ 主题或学科 } { 输出语言 }

✓

获得社区贡献内容的使用权

- 精选社区优质案例，助您快速上手提示词

购买

生成案例研究场景

解决的问题