制定数据收集计划

幂简官方

374 浏览

36 试用

10 购买

Oct 20, 2025更新

为研究主题制定详细的数据收集计划，提供专业方法和建议。

以下数据收集计划旨在在真实教育情境中检验“课堂互动改革对学生参与度的影响”。计划基于多维参与度理论、混合方法设计与多源证据三角互证原则，以确保测量的信度、效度与可推断性（Fredricks, Blumenfeld, & Paris, 2004；Creswell & Plano Clark, 2017）。

一、研究目的与问题

目的：评估以互动为导向的课堂改革（如同伴互评、提问—同伴讨论、即时反馈等）对学生参与度（行为、情感、认知维度）的短期与中期影响。
研究问题：
1. 干预是否显著提升学生的多维参与度（总体及分维度）？
2. 干预效应是否因学科、年级或教师特征而异（跨层差异）？
3. 课堂互动行为生态（师生—生生互动模式）与学生参与度变化之间的关系为何？

二、研究设计与总体策略

设计类型：优先推荐分层随机或阶梯楔形（stepped-wedge）集群设计（以班级/教师为单位），若随机化不可行，则采用匹配的准实验前测—后测对照设计。该设计控制选择偏差，允许分层比较与时点效应估计（Cohen, 1988）。
方法范式：收敛并行式混合方法。量化数据用于估计因果效应与大小；质性材料用于阐释机制与情境条件（Creswell & Plano Clark, 2017）。
理论框架：多维参与度（行为/情感/认知）与ICAP框架（互动>建构>活动>被动）（Fredricks et al., 2004；Chi & Wylie, 2014）。

三、样本与抽样

研究场域：K-12或本科通识/理工课程（依据实际场域选择）。为提高外部效度，跨学科抽样（如数学/语言/科学）并在年级上分层。
抽样策略：按学校/年级/学科分层，班级为集群，随机分配至“互动改革先行组/延期组”或与匹配对照班级并行。
样本量与功效（简述）：以多层模型为基础估算。在典型校内班级内相关性（ICC≈0.05–0.15）条件下，目标检测中等效应（d≈0.30）时，需≥24个班级，每班≥25名学生以达0.80功效（具体以基线ICC进行事前功效计算；Cohen, 1988）。

四、干预界定与实施保真度

干预核心要素（举例）：思—对—议（Think–Pair–Share）、同伴教学/及时投票（点击器或在线投票）、高质量提问与等待时间、结构化小组讨论、形成性评价与即时反馈、以学习任务为中心的互动脚本（对齐ICAP原则；Chi & Wylie, 2014）。
保真度（Fidelity of Implementation, FOI）指标：覆盖率（计划活动实施比例）、剂量（每节互动时长/次数）、质量（互动深度/追问质量）、一致性（与培训方案的吻合度）。采用教师日志、课堂观察核查表与访谈三角互证（Century, Rudnick, & Freeman, 2010）。

五、数据来源与测量工具（多源三角互证）

学生自陈量表（多维参与度）
- 学生参与度量表（SEI；Appleton, Christenson, Kim, & Reschly, 2006）：强调认知与心理参与（适合中学与高中）。
- 课堂参与/疏离量表（Engagement vs. Disaffection；Skinner, Kindermann, & Furrer, 2009）：测量行为与情感参与/疏离。
- 高教场域可补充NSSE相关分量表以捕捉课堂参与与协作学习（Kuh, 2009）。
- 本地化与试测：采用翻译—回译程序，进行认知访谈与验证性因素分析（CFA），检验量表的结构效度与跨时测量等值性（目标α≥0.70；RMSEA≤0.08，CFI≥0.90）。
课堂观察（课堂互动行为生态）
- COPUS用于量化师生活动分布（适于本科STEM；Smith, Jones, Gilbert, & Wieman, 2013）。
- CLASS用于师生互动质量与情感/组织/教学支持（K-3或相应版本；Pianta, La Paro, & Hamre, 2008）。
- ICAP行为编码：对学习活动进行P/A/C/I层级标注（Chi & Wylie, 2014）。
- 观察抽样：系统取样覆盖每班至少3次（基线/中期/后测），每次全节课；训练观察员并建立操作手册。目标观察者一致性：Cohen’s κ或ICC≥0.70（Shrout & Fleiss, 1979）。
行为与过程数据（客观行为指标）
- 出勤、作业按时提交率、课堂发言/提问次数（由观察记录或课程管理系统统计）。
- 学习分析数据：LMS点击流、资源访问、帖子与回复数量、停留时长等（对齐隐私规范；Long & Siemens, 2011）。
- 碎片化时间抽样（momentary time-sampling）记录学生在任务上的即时状态（Altmann, 1974）。
瞬时体验抽样（ESM）
- 通过课堂内简短推送（1–3题）测量当下投入、兴趣与认知努力（每周1–2次，低负担），用于捕捉波动态势（Hektner, Schmidt, & Csikszentmihalyi, 2007）。
质性资料
- 学生焦点小组与半结构访谈：探查互动体验、心理安全感、挑战—支持平衡与阻碍因素。
- 教师访谈与反思日志：记录教学决策、实施张力与情境制约，辅助解释量化结果。
背景协变量
- 学生层面：基线学业成绩、性别/年龄、先前学习动机指标等。
- 班级/教师层面：班级规模、学科、教师年资/专业发展经历。

六、测量时点与流程

T0 基线（第0–2周）：量表（SEI/EvsD）、课堂观察（1次/班）、背景协变量、LMS初始数据抓取。
干预实施（第3–12周或一学期）：每班至少2次课堂观察；ESM每周1–2次；行为与LMS数据连续采集；教师日志每周一次；保真度核查每月一次。
T1 后测（学期末）：量表复测、课堂观察（1次/班）、教师与学生访谈。
T2 跟踪（+6–8周，可选）：量表短表复测与关键行为指标复取，用于检测持续性。
阶梯楔形设计下，按批次推进班级进入干预，保证每班均有前后对照期。

七、质量控制与信度/效度保障

工具试测与本地化：小样本先导研究（n≈60–100）检验条目理解与量表结构；根据CFA/IRT结果微调条目。
观察员培训：使用带标注视频进行校准，达到κ/ICC阈值后方可上岗；定期复训以防漂移。
三角互证：自陈—观察—行为/日志—ESM—访谈交叉验证，降低共同方法偏差。
数据完整性：建立缺失数据监测仪表板，课堂与在线收集并行以降低缺失；记录缺失原因以支持后续处理。

八、伦理与数据管理

伦理：遵循《贝尔蒙报告》与本地IRB/伦理委员会要求，获取知情同意/监护人同意与学生同意；未成年人数据最小可识别化处理；可随时退出且不影响学业（The Belmont Report, 1979）。
隐私与安全：数据去标识化、分级存取、加密存储；敏感数据单独密钥管理。
数据管理与共享：制定代码本、变量字典与版本控制；遵循FAIR原则进行元数据标注与合规共享（Wilkinson et al., 2016）；在OSF预注册研究假设、主要结果与数据收集方案，记录方案偏离。

九、偏差与威胁及缓解

选择偏差：随机/分层分配；若准实验则进行倾向评分匹配与协变量控制。
霍桑效应/观察者效应：延长观察期、掩蔽观察目的（在伦理许可范围内）、采用非侵入式日志。
教师效应/同伴污染：以班级为单位分配并最小化教师跨组教学；在模型中加入教师随机效应。
共同方法偏差：跨方法测量同一构念、错开收集时点。
实施异质性：引入FOI指标并进行亚组分析或剂量—反应探索（Century et al., 2010）。

十、时间表（一学期示例）

第-4至-1周：工具本地化与试测、观察员培训、功效与随机化实施、伦理审批与家校沟通。
第1–2周：基线收集（T0）。
第3–12周：干预与持续收集（观察/ESM/LMS/日志/FOI）。
第13–14周：后测（T1）与访谈。
第+6–8周：跟踪测量（T2，可选）。
全期：数据质控、缺失追踪、保真度核查与过程记录。

十一、可行性与负担控制

降低负担：量表采用短表与分次施测；ESM条目精简；观察安排与教学周历对齐。
反馈与激励：向教师与学校提供课堂互动画像与改进建议，以换取时间配合与数据接入。
设备与平台：采用现有LMS与手机端微问卷平台，减少额外部署。

参考文献（APA第7版）

Appleton, J. J., Christenson, S. L., Kim, D., & Reschly, A. L. (2006). Measuring cognitive and psychological engagement: Validation of the Student Engagement Instrument. Journal of School Psychology, 44(5), 427–445.
Century, J., Rudnick, M., & Freeman, C. (2010). A framework for measuring fidelity of implementation: A foundation for shared language and accumulation of knowledge. American Journal of Evaluation, 31(2), 199–218.
Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.
Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research (3rd ed.). SAGE.
Fredricks, J. A., Blumenfeld, P. C., & Paris, A. H. (2004). School engagement: Potential of the concept, state of the evidence. Review of Educational Research, 74(1), 59–109.
Hektner, J. M., Schmidt, J. A., & Csikszentmihalyi, M. (2007). Experience sampling method: Measuring the quality of everyday life. SAGE.
Kuh, G. D. (2009). The National Survey of Student Engagement: Conceptual and empirical foundations. New Directions for Institutional Research, 2009(141), 5–20.
Long, P., & Siemens, G. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 31–40.
Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System (CLASS) manual, K–3. Paul H. Brookes.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.
Skinner, E. A., Kindermann, T. A., & Furrer, C. J. (2009). A motivational perspective on engagement and disaffection: Conceptualization and assessment of children’s behavioral and emotional participation in academic activities in the classroom. Journal of Educational Psychology, 101(4), 765–781.
Smith, M. K., Jones, F. H. M., Gilbert, S. L., & Wieman, C. E. (2013). The Classroom Observation Protocol for Undergraduate STEM (COPUS): A new tool for characterizing university STEM classroom practices. CBE—Life Sciences Education, 12(4), 618–627.
The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979). The Belmont report.
Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

该计划以多源数据、分层设计与实施保真度跟踪为核心，既能捕捉课堂互动改革对参与度的平均效应，也能揭示情境差异与机制路径，确保研究结论的可信度与可应用性。

Data Collection Plan: The Effect of Blended Learning on Academic Self-Efficacy

Purpose and Research Questions

Purpose: To estimate the causal effect of blended learning (BL) on students’ academic self-efficacy (ASE) and to document implementation fidelity and contextual moderators.
Primary question: Does participation in a well-specified BL course increase ASE relative to traditional face-to-face instruction?
Secondary questions:
- Do effects vary by baseline ASE, prior achievement, discipline, or gender?
- Are BL effects mediated by mastery experiences and instructional presence?
- How do implementation fidelity and engagement relate to ASE change?

Rationale: BL may enhance ASE by increasing mastery opportunities, feedback, and learner control (Bandura, 1997; Graham, 2006; Means et al., 2013).

Design Overview

Preferred design: Cluster randomized controlled trial (cRCT) at the course-section level to minimize contamination (randomize sections to BL vs. business-as-usual face-to-face).
Alternative (if randomization not feasible): Quasi-experimental matched comparison with propensity score methods and difference-in-differences using baseline ASE, prior achievement, demographics, and prior online experience as covariates (Murnane & Willett principles of causal inference; see also WWC standards).
Mixed-methods concurrent design: Quantitative (surveys, LMS logs, administrative data) with qualitative interviews/focus groups for triangulation and explanatory depth (Creswell & Plano Clark, 2017).

Setting, Sampling, and Participants

Setting: Multiple undergraduate gateway courses (e.g., introductory psychology, biology, statistics) across 2–4 institutions to enhance generalizability (Graham, 2006).
Units:
- Clusters: Course sections (target 40–80 sections total, balanced across arms and disciplines).
- Students: All enrolled students in selected sections; anticipated n per section = 25–40.
- Instructors: Faculty teaching participating sections.
Inclusion criteria: Degree-seeking undergraduates enrolled at census; instructors willing to implement the specified BL model or the standard face-to-face format.
Exclusion criteria: Sections taught by graduate assistants without training support; courses with existing extensive online components in the control arm.

Intervention and Comparison (for Fidelity Anchoring)

BL treatment: Pre-specified blended model with defined dosage (e.g., 30–50% online), structured weekly online modules (readings, quizzes, discussion), and in-person active learning sessions. Alignment documented via a BL design template and a checklist derived from established blended/course quality frameworks (e.g., Community of Inquiry presence indicators; Arbaugh et al., 2008; Graham et al., 2013).
Control: Business-as-usual face-to-face delivery without systematic online components beyond standard LMS posting.

Outcome Measures Primary outcome: Academic self-efficacy

Instrument: MSLQ Self-Efficacy for Learning and Performance subscale (8 items; 1–7 Likert) administered at baseline (T0), midterm (T1), and end of term (T2) (Pintrich et al., 1991; Pintrich et al., 1993).
Evidence: Consistently high internal consistency for college samples (typically α ≈ .90) and demonstrated predictive validity for achievement (Pintrich et al., 1993).

Secondary/auxiliary measures (for mechanism, covariate adjustment, and sensitivity)

Sources of self-efficacy (mastery experiences, vicarious experiences, social persuasion, physiological states), adapted from domain-appropriate scales following Bandura’s construction guidelines (Bandura, 2006; Usher & Pajares, 2009).
Teaching, social, and cognitive presence (CoI survey; Arbaugh et al., 2008) to capture instructional/learning environment characteristics associated with BL.
Prior achievement: Cumulative GPA or standardized placement scores; baseline course diagnostic if available.
Engagement/effort: LMS activity metrics (time-on-task, resource views, assignment submission patterns) and brief self-report engagement scale.
Demographics: Age, gender, major/discipline, first-generation status, prior online course experience.

Implementation Fidelity and Exposure

Instructor-reported adherence: Weekly implementation logs aligned to the BL design template (dosage of online components, use of active learning).
Independent observations: Structured observations of in-person sessions using a validated active-learning checklist.
LMS analytics: Actual BL dosage (e.g., completion of online modules, discussion participation, quiz attempts).
Fidelity rubric: Global fidelity scores synthesized from logs, observations, and analytics (O’Donnell, 2008).

Timing and Procedures

Pre-semester (T−1): Instructor recruitment; randomization at section level; training for BL instructors; pilot testing of instruments; LMS instrumentation.
Week 1–2 (T0): Baseline student survey (ASE, demographics, prior online experience), consent, and retrieval of prior GPA. Randomization concealed from analysts; treatment known to implementers.
Midterm (T1): Short ASE assessment, CoI survey, and engagement check to examine trajectories and reduce common method bias via temporal separation (Podsakoff et al., 2003).
End of term (T2): Posttest ASE, CoI, course grade collection; instructor fidelity summaries; LMS data export.
Postterm (T3, optional): Follow-up ASE 6–8 weeks later to assess persistence.
Qualitative sampling: Purposive subsample (e.g., n ≈ 20–30 students per condition across disciplines; 8–12 instructors) for semi-structured interviews near T2 to explain quantitative patterns (Creswell & Plano Clark, 2017).

Data Quality Assurance

Pilot: Cognitive interviews with 8–12 students to ensure clarity and contextual fit of items; small pilot (n ≈ 60–80) to examine reliability and preliminary factor structure.
Measurement equivalence: Test measurement invariance (configural, metric, scalar) for the MSLQ ASE subscale across conditions and time points before estimating effects (Putnick & Bornstein, 2016).
Administration controls: Uniform survey windows, standardized reminders, proctoring in class where feasible to maximize response rates; incentives (e.g., small course credit or raffle) approved by IRB.
Nonresponse management: Track response propensity; implement targeted reminders; document reasons for attrition.

Sample Size and Power Guidance

Plan for a cRCT with students nested in sections. Use established software (e.g., Optimal Design) with plausible assumptions: small effect size (d = 0.20–0.25), intraclass correlation at section level ICC ≈ .03–.07 for psychosocial outcomes, covariate R2 ≈ .40 from baseline ASE and GPA. Aim for at least 30–40 sections per arm with 25–35 students each to achieve ≈ .80 power for small effects (Hedges & Rhoads, 2010; Raudenbush & Bryk, 2002). Final targets should be set via a formal power analysis with local ICC estimates.

Data Management and Security

Unique study IDs linking surveys, LMS logs, and grades; no storage of direct identifiers with analytic datasets.
Secure, encrypted storage on institutional servers; access controls; audit logs.
Codebook, metadata, and version control (e.g., Git with restricted access).
Pre-registration of hypotheses, primary outcomes, and analysis plan (e.g., OSF) before data collection.
Data sharing: De-identified datasets and instruments archived following FAIR principles when permitted (Wilkinson et al., 2016).

Ethical Considerations

IRB approval; informed consent specifying voluntary participation, data uses, and withdrawal rights.
FERPA-compliant handling of educational records; minimal-risk classification anticipated.
Mitigation of risks: Anonymized reporting; instructor training to ensure equitable learning opportunities in both arms.

Minimizing Bias and Ensuring Internal Validity

Randomization by an independent coordinator; blocked by course and instructor to balance prior outcomes.
Baseline measures to adjust for any residual imbalance.
Analyst blinding to treatment during data cleaning and pre-specification of models.
Multiple data sources (surveys, logs, grades) and temporal separation to reduce common method variance (Podsakoff et al., 2003).
Monitoring protocol deviations and documenting contamination.

Analysis-Linked Data Needs (to inform collection)

Primary impact: Change in ASE (T2 minus T0) estimated with multilevel modeling (students nested in sections), adjusting for baseline ASE and covariates (Raudenbush & Bryk, 2002).
Moderation: Interaction terms for baseline ASE, discipline, gender.
Mediation (exploratory): Mastery experiences and teaching presence measured at T1 and T2 (Bandura, 1997; Arbaugh et al., 2008).
Sensitivity: Per-protocol and complier-average causal effect analyses using fidelity indices; missing data handled via FIML or multiple imputation under MAR assumptions (Enders, 2010).

Reporting and Documentation

Document recruitment, participation flow, response rates, and attrition by arm (CONSORT for cRCTs adapted to education).
Report reliability (Cronbach’s alpha, omega), CFA/invariance results, and fidelity statistics.
Provide a transparent account of deviations from the pre-registered plan and robustness checks.

References

Arbaugh, J. B., Cleveland-Innes, M., Diaz, S. R., Garrison, D. R., Ice, P., Richardson, J. C., & Swan, K. P. (2008). Developing a community of inquiry instrument: Testing a measure of the Community of Inquiry framework using a multi-institutional sample. The Internet and Higher Education, 11(3–4), 133–136.
Bandura, A. (1997). Self-efficacy: The exercise of control. W. H. Freeman.
Bandura, A. (2006). Guide for constructing self-efficacy scales. In F. Pajares & T. Urdan (Eds.), Self-efficacy beliefs of adolescents (pp. 307–337). Information Age.
Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research (3rd ed.). SAGE.
Enders, C. K. (2010). Applied missing data analysis. Guilford Press.
Graham, C. R. (2006). Blended learning systems: Definition, current trends, and future directions. In C. J. Bonk & C. R. Graham (Eds.), The handbook of blended learning (pp. 3–21). Pfeiffer.
Graham, C. R., Woodfield, W., & Harrison, J. B. (2013). A framework for institutional adoption and implementation of blended learning in higher education. The Internet and Higher Education, 18, 4–14.
Hedges, L. V., & Rhoads, C. (2010). Statistical power analysis in education research (NCEE 2010-4017). U.S. Department of Education.
Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record, 115(3), 1–47.
O’Donnell, C. L. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes. Review of Educational Research, 78(1), 33–84.
Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). University of Michigan.
Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1993). Reliability and predictive validity of the MSLQ. Educational and Psychological Measurement, 53(3), 801–813.
Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research. Journal of Applied Psychology, 88(5), 879–903.
Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance in the social and behavioral sciences. Child Development Perspectives, 10(1), 29–34.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models (2nd ed.). SAGE.
Usher, E. L., & Pajares, F. (2009). Sources of self-efficacy in mathematics: A validation study. Contemporary Educational Psychology, 34(1), 89–101.
Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

Note: If a quasi-experimental design is required, add a data element for teacher/course matching variables (e.g., historical grading distributions, class size, instructor experience) to strengthen the propensity model and include a clear overlap/diagnostics protocol before estimating effects.

以下为“教师培训对教学质量的前后测评估”之数据收集计划。方案以因果推断与测量严谨性为核心，强调多源数据、标准化工具、信度效度保障与实施忠实度监测，以确保对培训影响的估计准确、可复现与可推广。

一、研究目的与评估框架

研究目的：评估特定教师培训项目对课堂教学质量的影响，并探究影响机制（如培训实施忠实度、教师自我效能感变化）与异质性效应（年资、学科、学段）。
核心假设：与对照组相比，接受培训的教师在课堂教学质量观察评分、学生感知教学质量、与培训内容相关的教师知识/技能测验上表现更优，并在随访期保持或扩大效果；学生学习投入与学业成绩存在正向但相对较小的效应。
评估原则：三角互证（课堂观察+学生调查+客观作业/测评+教师自陈）、多时点测量（基线/后测/随访）、实施过程数据并行采集（忠实度与质量）。

二、研究设计与样本

设计类型：
1. 优先方案：集群随机对照试验（学校或教研组为单位），T0基线测，培训后T1短期测（培训结束后2–4周）、T2中期随访（3–4个月）、T3远期随访（6–9个月）。
2. 次优方案：准实验差异中的差异（DiD），基线等值检验并应用倾向评分加权/匹配与协变量调整。
样本与分层：覆盖不同学段与学科，按学段×学科×地区分层抽样，保证外部效度；每校抽取目标教师全体或随机抽取；学生调查与成绩数据来自所教班级。
规模与功效（示例，需据真实参数精算）：以教师观察评分为主结果、期望标准化效应d≈0.30、显著性α=0.05、检验效能80%、两层结构（教师嵌套学校，学校内相关ICC≈0.05–0.10），若每校10名目标教师，则每组需约10–15所学校（总教师数约200–300）更为稳健；建议使用PowerUpR或Optimal Design按实际ICC、变异度和失访率进行功效分析（Spybrook等, 2011）。
基线等值：按WWC规范检验标准化差异；若0.05<|SMD|≤0.25，后续模型需协变量调整（What Works Clearinghouse, 2020）。

三、指标体系与测量工具 A. 主要结果（课堂教学质量）

课堂观察量表（择一为主、可辅以第二量表做稳健性检验）：
1. CLASS（学前、小学、初高中版本）：情感支持、课堂组织、教学支持维度，适合跨学段评估（Pianta等）。
2. Danielson《教学框架》：计划与准备、课堂环境、教学、专业责任（Danielson, 2013）。
3. 若培训聚焦学科教学，可选学科专向量表，如数学教学质量（MQI）（Hill等）。
观察设计：每名教师每时点采集2–3节常态化课堂视频（每节35–45分钟），两名独立评分员盲评；至少20%样本双评，定期漂移校准；采用多面向Rasch或一般化理论分解误差并进行评分者严厉度修正（Shavelson & Webb, 1991）。

B. 次要结果

学生感知教学质量：标准化学生问卷（如Tripod 7Cs或同类经验证问卷），班级层面聚合分数；每班至少20名学生问卷以提升信度（MET项目证据；Kane等）。
学生学习结果：与课程一致的标准化测验或高质量课程基测/期末分；如使用自编测验，需设置锚题实现前后等值并进行项目反应理论（IRT）标定。
教师知识与技能（与培训内容对齐）：如学科教学法/内容知识测验；采用既有经验证题库或自编并进行专家审查、试测、信度分析与等值。
教师自我效能感：教师教学效能量表TSES（长/短版）（Tschannen-Moran & Hoy, 2001），经跨文化适配与测量等值检验。
教学产出与作业质量：收集教案、作业与评阅样本，采用经验证评分规准（如IQA框架）进行盲评。
学生行为性投入：短式课堂投入观察清单或经验采样（在有条件时）。

C. 实施过程与忠实度

指标维度：到课率/时长（剂量）、活动完成度（依从/一致性）、培训者素质与互动质量、实践反馈与同伴交流强度（参照Desimone, 2009；Century等, 2010）。
工具：签到与时长记录、培训者与教练日志、抽样旁听评分表、教师反思札记与作业完成度量表、访谈/焦点小组。

四、数据收集流程与时间安排

T0基线（培训前2–4周）：
1. 教师：课堂视频2–3节；知识/技能测验；TSES；基本背景（年资、最高学历、职称、班额、课时量等）。
2. 学生：感知教学质量问卷；可用的最近一次标准化学业数据或组织基线测。
3. 学校：资源与管理特征（规模、师生比、教学常规、地区属性）。
培训实施期：过程数据滚动采集（每次活动签到、日志、随堂质性记录）；教练/督导定期记录与抽检。
T1后测（培训结束后2–4周）：
1. 教师：课堂视频2–3节；知识/技能测验平行版或等值卷；TSES。
2. 学生：感知教学质量问卷；单元或期末标准化测验。
3. 实施忠实度：完整汇总并质控。
T2/T3随访（3–4月/6–9月）：同T1的缩减版，至少包含课堂视频与学生问卷，优先获取标准化成绩并评估维持效应。
日程与现场组织：避免考试周与节假日；随机排列拍摄顺序；统一摄录规范（机位、收音、取景）；建立异常事件报告与替代采集预案。

五、质量控制与信度效度保障

工具选择与跨文化适配：优先采用经验证量表；若需翻译，采用“翻译—回译—专家评审—认知访谈—小样本试测”的流程（Beaton等, 2000），并开展验证性因素分析与测量等值检验（跨时点与组别）。
评分员培训与校准：
1. 培训天数与合格标准：依据量表官方要求（如CLASS需认证）；考核通过方可上岗。
2. 盲评与漂移控制：评分员对组别与时点双盲；每月进行共评分与阈值校准；计算ICC/加权Kappa，目标≥0.70。
3. 一般化理论/多面向Rasch：估计任务数、评分员数与时长对信度的贡献，优化采样设计以达G系数≥0.70。
测验质量：
1. 前后测等值：设置锚题；用IRT或同等化方法链接分数，控制天花板/地板效应。
2. 内部一致性：Cronbach’s α或ω≥0.70；条目反应特征检查与偏差项处理。
学生调查实施：
1. 统一监考与电子化施测；最小化社会期许偏差与教师干预。
2. 班级层面聚合并报告设计效应；对小样本班级设最小样本阈值（建议≥15–20）。
盲法与偏差控制：观察与评分全程盲法；对照组提供延迟培训以提高参与同意率；预防霍桑效应与补偿性竞争。
数据完整性与缺失：
1. 监控关键KPI（视频完成率、双评比例、问卷有效率、测验缺失率）。
2. 设立追补与替代采样机制；记录缺失原因；分析阶段使用多重插补并报告敏感性分析。

六、数据管理与伦理合规

审批与同意：通过机构伦理审查；获取教师与学生家长（或监护人）知情同意/同意豁免；对视频与问卷用途、保存期限、去标识化与撤回权利进行明示。
隐私与安全：分离身份信息与研究数据；视频与PII加密存储、限权访问；建立数据保留与销毁计划；遵循《教育和心理测量标准》(AERA/APA/NCME, 2014)。
数据治理：编制数据字典与版本控制；双人复核关键变量；日志记录数据变更；预设异常值与逻辑校验规则。

七、预注册与分析准备（与数据收集直接相关的要点）

预注册：在OSF登记研究与前分析计划，明确定义主次结果、时点、排除准则与协变量集。
统计框架（概述）：多层模型（学生/课堂/教师/学校），对基线值与关键协变量调整；组别×时点交互项为主效应；聚类稳健标准误；在有评分者的结果中使用多面向Rasch或多层模型控制评分者效应。
异质性与机制：预先定义分层（年资、学科、基线水平、学校特征）；以实施忠实度作为中介/调节进行机制探索。
报告规范：遵循WWC与透明报告原则，公开材料与（去标识化）数据字典。

八、关键操作细节清单（供现场执行对照）

课堂视频：每位教师每时点2–3节；两名盲评评分员；≥20%双评；每月漂移校准一次。
学生调查：同一周内完成；每班≥20份有效问卷为目标；统一施测说明。
测验：平行卷或IRT等值；锚题比例建议≥20%；设回收与补测窗口。
忠实度：每次培训全程签到（时长到分钟）、内容覆盖核对表、教练/讲师日志、抽样旁听评分；形成每位教师的剂量与质量指数组合。
质量阈值：观察评分ICC≥0.70；问卷α≥0.80（总分）；测验信息函数覆盖目标能力区间；缺失率控制在<10%并记录原因。

参考文献（示例，按APA风格）

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing.
Beaton, D. E., Bombardier, C., Guillemin, F., & Ferraz, M. B. (2000). Guidelines for the process of cross-cultural adaptation of self-report measures. Spine, 25(24), 3186–3191.
Century, J., Rudnick, M., & Freeman, C. (2010). A framework for measuring fidelity of implementation. American Journal of Evaluation, 31(2), 199–218.
Danielson, C. (2013). The framework for teaching evaluation instrument. Danielson Group.
Desimone, L. M. (2009). Improving impact studies of teachers’ professional development: Toward better conceptualizations and measures. Educational Researcher, 38(3), 181–199.
Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identified effective teachers? Bill & Melinda Gates Foundation MET Project.
Kraft, M. A., Blazar, D., & Hogan, D. (2018). The effect of teacher coaching on instruction and achievement. Review of Educational Research, 88(4), 547–588.
Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System (CLASS) manual. Paul H. Brookes.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. SAGE.
Spybrook, J., et al. (2011). Optimal design for longitudinal and multilevel research. In S. H. Cole (Ed.), The Oxford handbook of quantitative methods.
Tschannen-Moran, M., & Hoy, A. W. (2001). Teacher efficacy: Capturing an elusive construct. Teaching and Teacher Education, 17(7), 783–805.
What Works Clearinghouse. (2020). Procedures and standards handbook (Version 4.1/5.0). Institute of Education Sciences.

注：具体工具的中文版本与使用授权需在项目启动前确认；若进行本土化适配，须完成系统的翻译、等值与信效度检验流程，并在预注册中明确主结果的操作化定义与计分方案。上述采样规模、ICC与效应量为常见取值的示意，实际研究应基于先导研究与现地参数进行功效与设计优化。

解决的问题

将“研究主题”快速转化为一份严谨、可执行的数据收集蓝图，覆盖研究设计、样本与抽样、测量工具、采集流程、质量控制、伦理合规、时间线与资源配置。
提供专家级方法建议与清晰论证，帮助用户增强方案的科学性、可审查性与说服力。
支持多语言学术表达，可直接用于立项材料、伦理/IRB申请、开题报告与论文方法章节。
以结构化输出减少往返沟通，缩短准备时间，降低试错成本，提升评审通过率与协作效率。

适用用户

教研主任/校长

快速搭建校级教学改革的数据收集方案：明确样本与工具、排期与人员分工，提交合规材料一次通过。

高校教育学研究生/博士

为论文或课题制定可发表的收集计划：样本量估算、量表选择、访谈提纲与引用格式一步到位。

教师发展中心/培训机构运营

设计培训成效评估：前后测方案、课堂观察表与学员访谈脚本，确保可落地与可汇报。

特征总结

• 一键生成结构化数据收集计划，涵盖目标、样本、工具、流程、质控与时间表。

• 自动匹配主题与最优方法（问卷、访谈、观察、实验），并给出选择理由与适用场景。

• 提供可落地的抽样与样本量建议，兼顾成本与偏差控制，保障结论可靠可复现。

• 内置伦理合规与隐私提示，生成知情同意与风险清单，减少审查往返与合规风险。

• 自动设计问卷与访谈纲要，含量表选择与题项示例，可一键导出用于实地实施。

• 生成数据质量与偏差控制方案，含培训指引、试测步骤与排错流程，提升数据可用性。

• 支持多语言学术写作输出，规范引用与格式，直接用于立项、申报与论文撰写。

• 提供分析路径与指标映射，衔接收集到统计与质性分析，避免中途返工与浪费。

• 按预算与周期自动优化计划，给出资源分配与排期建议，确保按时按质完成任务。

• 模板化参数输入，快速适配K12、高校、职培与在线教育等多场景研究。

如何使用购买的提示词模板

1. 直接在外部 Chat 应用中使用

将模板生成的提示词复制粘贴到您常用的 Chat 应用（如 ChatGPT、Claude 等），即可直接对话使用，无需额外开发。适合个人快速体验和轻量使用场景。

2. 发布为 API 接口调用

把提示词模板转化为 API，您的程序可任意修改模板参数，通过接口直接调用，轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。

3. 在 MCP Client 中配置使用

在 MCP client 中配置对应的 server 地址，让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作，让提示词在不同 AI 工具间无缝衔接。

AI 提示词价格

￥25.00元

先用后买，用好了再付款，超安全！

在线免费用提示词

您购买后可以获得什么

✓

获得完整提示词模板

- 共 231 tokens

- 2 个可调节参数

{ 研究主题 } { 输出语言 }

✓

获得社区贡献内容的使用权

- 精选社区优质案例，助您快速上手提示词

购买

制定数据收集计划

解决的问题