制定数据收集计划

0 浏览
0 试用
0 购买
Sep 29, 2025更新

为研究主题制定详细的数据收集计划,提供专业方法和建议。

示例1

以下数据收集计划旨在在真实教育情境中检验“课堂互动改革对学生参与度的影响”。计划基于多维参与度理论、混合方法设计与多源证据三角互证原则,以确保测量的信度、效度与可推断性(Fredricks, Blumenfeld, & Paris, 2004;Creswell & Plano Clark, 2017)。

一、研究目的与问题
- 目的:评估以互动为导向的课堂改革(如同伴互评、提问—同伴讨论、即时反馈等)对学生参与度(行为、情感、认知维度)的短期与中期影响。
- 研究问题:
  1) 干预是否显著提升学生的多维参与度(总体及分维度)?
  2) 干预效应是否因学科、年级或教师特征而异(跨层差异)?
  3) 课堂互动行为生态(师生—生生互动模式)与学生参与度变化之间的关系为何?

二、研究设计与总体策略
- 设计类型:优先推荐分层随机或阶梯楔形(stepped-wedge)集群设计(以班级/教师为单位),若随机化不可行,则采用匹配的准实验前测—后测对照设计。该设计控制选择偏差,允许分层比较与时点效应估计(Cohen, 1988)。
- 方法范式:收敛并行式混合方法。量化数据用于估计因果效应与大小;质性材料用于阐释机制与情境条件(Creswell & Plano Clark, 2017)。
- 理论框架:多维参与度(行为/情感/认知)与ICAP框架(互动>建构>活动>被动)(Fredricks et al., 2004;Chi & Wylie, 2014)。

三、样本与抽样
- 研究场域:K-12或本科通识/理工课程(依据实际场域选择)。为提高外部效度,跨学科抽样(如数学/语言/科学)并在年级上分层。
- 抽样策略:按学校/年级/学科分层,班级为集群,随机分配至“互动改革先行组/延期组”或与匹配对照班级并行。
- 样本量与功效(简述):以多层模型为基础估算。在典型校内班级内相关性(ICC≈0.05–0.15)条件下,目标检测中等效应(d≈0.30)时,需≥24个班级,每班≥25名学生以达0.80功效(具体以基线ICC进行事前功效计算;Cohen, 1988)。

四、干预界定与实施保真度
- 干预核心要素(举例):思—对—议(Think–Pair–Share)、同伴教学/及时投票(点击器或在线投票)、高质量提问与等待时间、结构化小组讨论、形成性评价与即时反馈、以学习任务为中心的互动脚本(对齐ICAP原则;Chi & Wylie, 2014)。
- 保真度(Fidelity of Implementation, FOI)指标:覆盖率(计划活动实施比例)、剂量(每节互动时长/次数)、质量(互动深度/追问质量)、一致性(与培训方案的吻合度)。采用教师日志、课堂观察核查表与访谈三角互证(Century, Rudnick, & Freeman, 2010)。

五、数据来源与测量工具(多源三角互证)
1) 学生自陈量表(多维参与度)
   - 学生参与度量表(SEI;Appleton, Christenson, Kim, & Reschly, 2006):强调认知与心理参与(适合中学与高中)。
   - 课堂参与/疏离量表(Engagement vs. Disaffection;Skinner, Kindermann, & Furrer, 2009):测量行为与情感参与/疏离。
   - 高教场域可补充NSSE相关分量表以捕捉课堂参与与协作学习(Kuh, 2009)。
   - 本地化与试测:采用翻译—回译程序,进行认知访谈与验证性因素分析(CFA),检验量表的结构效度与跨时测量等值性(目标α≥0.70;RMSEA≤0.08,CFI≥0.90)。

2) 课堂观察(课堂互动行为生态)
   - COPUS用于量化师生活动分布(适于本科STEM;Smith, Jones, Gilbert, & Wieman, 2013)。
   - CLASS用于师生互动质量与情感/组织/教学支持(K-3或相应版本;Pianta, La Paro, & Hamre, 2008)。
   - ICAP行为编码:对学习活动进行P/A/C/I层级标注(Chi & Wylie, 2014)。
   - 观察抽样:系统取样覆盖每班至少3次(基线/中期/后测),每次全节课;训练观察员并建立操作手册。目标观察者一致性:Cohen’s κ或ICC≥0.70(Shrout & Fleiss, 1979)。

3) 行为与过程数据(客观行为指标)
   - 出勤、作业按时提交率、课堂发言/提问次数(由观察记录或课程管理系统统计)。
   - 学习分析数据:LMS点击流、资源访问、帖子与回复数量、停留时长等(对齐隐私规范;Long & Siemens, 2011)。
   - 碎片化时间抽样(momentary time-sampling)记录学生在任务上的即时状态(Altmann, 1974)。

4) 瞬时体验抽样(ESM)
   - 通过课堂内简短推送(1–3题)测量当下投入、兴趣与认知努力(每周1–2次,低负担),用于捕捉波动态势(Hektner, Schmidt, & Csikszentmihalyi, 2007)。

5) 质性资料
   - 学生焦点小组与半结构访谈:探查互动体验、心理安全感、挑战—支持平衡与阻碍因素。
   - 教师访谈与反思日志:记录教学决策、实施张力与情境制约,辅助解释量化结果。

6) 背景协变量
   - 学生层面:基线学业成绩、性别/年龄、先前学习动机指标等。
   - 班级/教师层面:班级规模、学科、教师年资/专业发展经历。

六、测量时点与流程
- T0 基线(第0–2周):量表(SEI/EvsD)、课堂观察(1次/班)、背景协变量、LMS初始数据抓取。
- 干预实施(第3–12周或一学期):每班至少2次课堂观察;ESM每周1–2次;行为与LMS数据连续采集;教师日志每周一次;保真度核查每月一次。
- T1 后测(学期末):量表复测、课堂观察(1次/班)、教师与学生访谈。
- T2 跟踪(+6–8周,可选):量表短表复测与关键行为指标复取,用于检测持续性。
- 阶梯楔形设计下,按批次推进班级进入干预,保证每班均有前后对照期。

七、质量控制与信度/效度保障
- 工具试测与本地化:小样本先导研究(n≈60–100)检验条目理解与量表结构;根据CFA/IRT结果微调条目。
- 观察员培训:使用带标注视频进行校准,达到κ/ICC阈值后方可上岗;定期复训以防漂移。
- 三角互证:自陈—观察—行为/日志—ESM—访谈交叉验证,降低共同方法偏差。
- 数据完整性:建立缺失数据监测仪表板,课堂与在线收集并行以降低缺失;记录缺失原因以支持后续处理。

八、伦理与数据管理
- 伦理:遵循《贝尔蒙报告》与本地IRB/伦理委员会要求,获取知情同意/监护人同意与学生同意;未成年人数据最小可识别化处理;可随时退出且不影响学业(The Belmont Report, 1979)。
- 隐私与安全:数据去标识化、分级存取、加密存储;敏感数据单独密钥管理。
- 数据管理与共享:制定代码本、变量字典与版本控制;遵循FAIR原则进行元数据标注与合规共享(Wilkinson et al., 2016);在OSF预注册研究假设、主要结果与数据收集方案,记录方案偏离。

九、偏差与威胁及缓解
- 选择偏差:随机/分层分配;若准实验则进行倾向评分匹配与协变量控制。
- 霍桑效应/观察者效应:延长观察期、掩蔽观察目的(在伦理许可范围内)、采用非侵入式日志。
- 教师效应/同伴污染:以班级为单位分配并最小化教师跨组教学;在模型中加入教师随机效应。
- 共同方法偏差:跨方法测量同一构念、错开收集时点。
- 实施异质性:引入FOI指标并进行亚组分析或剂量—反应探索(Century et al., 2010)。

十、时间表(一学期示例)
- 第-4至-1周:工具本地化与试测、观察员培训、功效与随机化实施、伦理审批与家校沟通。
- 第1–2周:基线收集(T0)。
- 第3–12周:干预与持续收集(观察/ESM/LMS/日志/FOI)。
- 第13–14周:后测(T1)与访谈。
- 第+6–8周:跟踪测量(T2,可选)。
- 全期:数据质控、缺失追踪、保真度核查与过程记录。

十一、可行性与负担控制
- 降低负担:量表采用短表与分次施测;ESM条目精简;观察安排与教学周历对齐。
- 反馈与激励:向教师与学校提供课堂互动画像与改进建议,以换取时间配合与数据接入。
- 设备与平台:采用现有LMS与手机端微问卷平台,减少额外部署。

参考文献(APA第7版)
- Appleton, J. J., Christenson, S. L., Kim, D., & Reschly, A. L. (2006). Measuring cognitive and psychological engagement: Validation of the Student Engagement Instrument. Journal of School Psychology, 44(5), 427–445.
- Century, J., Rudnick, M., & Freeman, C. (2010). A framework for measuring fidelity of implementation: A foundation for shared language and accumulation of knowledge. American Journal of Evaluation, 31(2), 199–218.
- Chi, M. T. H., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243.
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.
- Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research (3rd ed.). SAGE.
- Fredricks, J. A., Blumenfeld, P. C., & Paris, A. H. (2004). School engagement: Potential of the concept, state of the evidence. Review of Educational Research, 74(1), 59–109.
- Hektner, J. M., Schmidt, J. A., & Csikszentmihalyi, M. (2007). Experience sampling method: Measuring the quality of everyday life. SAGE.
- Kuh, G. D. (2009). The National Survey of Student Engagement: Conceptual and empirical foundations. New Directions for Institutional Research, 2009(141), 5–20.
- Long, P., & Siemens, G. (2011). Penetrating the fog: Analytics in learning and education. EDUCAUSE Review, 46(5), 31–40.
- Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System (CLASS) manual, K–3. Paul H. Brookes.
- Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428.
- Skinner, E. A., Kindermann, T. A., & Furrer, C. J. (2009). A motivational perspective on engagement and disaffection: Conceptualization and assessment of children’s behavioral and emotional participation in academic activities in the classroom. Journal of Educational Psychology, 101(4), 765–781.
- Smith, M. K., Jones, F. H. M., Gilbert, S. L., & Wieman, C. E. (2013). The Classroom Observation Protocol for Undergraduate STEM (COPUS): A new tool for characterizing university STEM classroom practices. CBE—Life Sciences Education, 12(4), 618–627.
- The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. (1979). The Belmont report.
- Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

该计划以多源数据、分层设计与实施保真度跟踪为核心,既能捕捉课堂互动改革对参与度的平均效应,也能揭示情境差异与机制路径,确保研究结论的可信度与可应用性。

示例2

Data Collection Plan: The Effect of Blended Learning on Academic Self-Efficacy

1. Purpose and Research Questions
- Purpose: To estimate the causal effect of blended learning (BL) on students’ academic self-efficacy (ASE) and to document implementation fidelity and contextual moderators.
- Primary question: Does participation in a well-specified BL course increase ASE relative to traditional face-to-face instruction?
- Secondary questions:
  - Do effects vary by baseline ASE, prior achievement, discipline, or gender?
  - Are BL effects mediated by mastery experiences and instructional presence?
  - How do implementation fidelity and engagement relate to ASE change?

Rationale: BL may enhance ASE by increasing mastery opportunities, feedback, and learner control (Bandura, 1997; Graham, 2006; Means et al., 2013).

2. Design Overview
- Preferred design: Cluster randomized controlled trial (cRCT) at the course-section level to minimize contamination (randomize sections to BL vs. business-as-usual face-to-face).
- Alternative (if randomization not feasible): Quasi-experimental matched comparison with propensity score methods and difference-in-differences using baseline ASE, prior achievement, demographics, and prior online experience as covariates (Murnane & Willett principles of causal inference; see also WWC standards).
- Mixed-methods concurrent design: Quantitative (surveys, LMS logs, administrative data) with qualitative interviews/focus groups for triangulation and explanatory depth (Creswell & Plano Clark, 2017).

3. Setting, Sampling, and Participants
- Setting: Multiple undergraduate gateway courses (e.g., introductory psychology, biology, statistics) across 2–4 institutions to enhance generalizability (Graham, 2006).
- Units:
  - Clusters: Course sections (target 40–80 sections total, balanced across arms and disciplines).
  - Students: All enrolled students in selected sections; anticipated n per section = 25–40.
  - Instructors: Faculty teaching participating sections.
- Inclusion criteria: Degree-seeking undergraduates enrolled at census; instructors willing to implement the specified BL model or the standard face-to-face format.
- Exclusion criteria: Sections taught by graduate assistants without training support; courses with existing extensive online components in the control arm.

4. Intervention and Comparison (for Fidelity Anchoring)
- BL treatment: Pre-specified blended model with defined dosage (e.g., 30–50% online), structured weekly online modules (readings, quizzes, discussion), and in-person active learning sessions. Alignment documented via a BL design template and a checklist derived from established blended/course quality frameworks (e.g., Community of Inquiry presence indicators; Arbaugh et al., 2008; Graham et al., 2013).
- Control: Business-as-usual face-to-face delivery without systematic online components beyond standard LMS posting.

5. Outcome Measures
Primary outcome: Academic self-efficacy
- Instrument: MSLQ Self-Efficacy for Learning and Performance subscale (8 items; 1–7 Likert) administered at baseline (T0), midterm (T1), and end of term (T2) (Pintrich et al., 1991; Pintrich et al., 1993).
- Evidence: Consistently high internal consistency for college samples (typically α ≈ .90) and demonstrated predictive validity for achievement (Pintrich et al., 1993).

Secondary/auxiliary measures (for mechanism, covariate adjustment, and sensitivity)
- Sources of self-efficacy (mastery experiences, vicarious experiences, social persuasion, physiological states), adapted from domain-appropriate scales following Bandura’s construction guidelines (Bandura, 2006; Usher & Pajares, 2009).
- Teaching, social, and cognitive presence (CoI survey; Arbaugh et al., 2008) to capture instructional/learning environment characteristics associated with BL.
- Prior achievement: Cumulative GPA or standardized placement scores; baseline course diagnostic if available.
- Engagement/effort: LMS activity metrics (time-on-task, resource views, assignment submission patterns) and brief self-report engagement scale.
- Demographics: Age, gender, major/discipline, first-generation status, prior online course experience.

6. Implementation Fidelity and Exposure
- Instructor-reported adherence: Weekly implementation logs aligned to the BL design template (dosage of online components, use of active learning).
- Independent observations: Structured observations of in-person sessions using a validated active-learning checklist.
- LMS analytics: Actual BL dosage (e.g., completion of online modules, discussion participation, quiz attempts).
- Fidelity rubric: Global fidelity scores synthesized from logs, observations, and analytics (O’Donnell, 2008).

7. Timing and Procedures
- Pre-semester (T−1): Instructor recruitment; randomization at section level; training for BL instructors; pilot testing of instruments; LMS instrumentation.
- Week 1–2 (T0): Baseline student survey (ASE, demographics, prior online experience), consent, and retrieval of prior GPA. Randomization concealed from analysts; treatment known to implementers.
- Midterm (T1): Short ASE assessment, CoI survey, and engagement check to examine trajectories and reduce common method bias via temporal separation (Podsakoff et al., 2003).
- End of term (T2): Posttest ASE, CoI, course grade collection; instructor fidelity summaries; LMS data export.
- Postterm (T3, optional): Follow-up ASE 6–8 weeks later to assess persistence.
- Qualitative sampling: Purposive subsample (e.g., n ≈ 20–30 students per condition across disciplines; 8–12 instructors) for semi-structured interviews near T2 to explain quantitative patterns (Creswell & Plano Clark, 2017).

8. Data Quality Assurance
- Pilot: Cognitive interviews with 8–12 students to ensure clarity and contextual fit of items; small pilot (n ≈ 60–80) to examine reliability and preliminary factor structure.
- Measurement equivalence: Test measurement invariance (configural, metric, scalar) for the MSLQ ASE subscale across conditions and time points before estimating effects (Putnick & Bornstein, 2016).
- Administration controls: Uniform survey windows, standardized reminders, proctoring in class where feasible to maximize response rates; incentives (e.g., small course credit or raffle) approved by IRB.
- Nonresponse management: Track response propensity; implement targeted reminders; document reasons for attrition.

9. Sample Size and Power Guidance
- Plan for a cRCT with students nested in sections. Use established software (e.g., Optimal Design) with plausible assumptions: small effect size (d = 0.20–0.25), intraclass correlation at section level ICC ≈ .03–.07 for psychosocial outcomes, covariate R2 ≈ .40 from baseline ASE and GPA. Aim for at least 30–40 sections per arm with 25–35 students each to achieve ≈ .80 power for small effects (Hedges & Rhoads, 2010; Raudenbush & Bryk, 2002). Final targets should be set via a formal power analysis with local ICC estimates.

10. Data Management and Security
- Unique study IDs linking surveys, LMS logs, and grades; no storage of direct identifiers with analytic datasets.
- Secure, encrypted storage on institutional servers; access controls; audit logs.
- Codebook, metadata, and version control (e.g., Git with restricted access).
- Pre-registration of hypotheses, primary outcomes, and analysis plan (e.g., OSF) before data collection.
- Data sharing: De-identified datasets and instruments archived following FAIR principles when permitted (Wilkinson et al., 2016).

11. Ethical Considerations
- IRB approval; informed consent specifying voluntary participation, data uses, and withdrawal rights.
- FERPA-compliant handling of educational records; minimal-risk classification anticipated.
- Mitigation of risks: Anonymized reporting; instructor training to ensure equitable learning opportunities in both arms.

12. Minimizing Bias and Ensuring Internal Validity
- Randomization by an independent coordinator; blocked by course and instructor to balance prior outcomes.
- Baseline measures to adjust for any residual imbalance.
- Analyst blinding to treatment during data cleaning and pre-specification of models.
- Multiple data sources (surveys, logs, grades) and temporal separation to reduce common method variance (Podsakoff et al., 2003).
- Monitoring protocol deviations and documenting contamination.

13. Analysis-Linked Data Needs (to inform collection)
- Primary impact: Change in ASE (T2 minus T0) estimated with multilevel modeling (students nested in sections), adjusting for baseline ASE and covariates (Raudenbush & Bryk, 2002).
- Moderation: Interaction terms for baseline ASE, discipline, gender.
- Mediation (exploratory): Mastery experiences and teaching presence measured at T1 and T2 (Bandura, 1997; Arbaugh et al., 2008).
- Sensitivity: Per-protocol and complier-average causal effect analyses using fidelity indices; missing data handled via FIML or multiple imputation under MAR assumptions (Enders, 2010).

14. Reporting and Documentation
- Document recruitment, participation flow, response rates, and attrition by arm (CONSORT for cRCTs adapted to education).
- Report reliability (Cronbach’s alpha, omega), CFA/invariance results, and fidelity statistics.
- Provide a transparent account of deviations from the pre-registered plan and robustness checks.

References
- Arbaugh, J. B., Cleveland-Innes, M., Diaz, S. R., Garrison, D. R., Ice, P., Richardson, J. C., & Swan, K. P. (2008). Developing a community of inquiry instrument: Testing a measure of the Community of Inquiry framework using a multi-institutional sample. The Internet and Higher Education, 11(3–4), 133–136.
- Bandura, A. (1997). Self-efficacy: The exercise of control. W. H. Freeman.
- Bandura, A. (2006). Guide for constructing self-efficacy scales. In F. Pajares & T. Urdan (Eds.), Self-efficacy beliefs of adolescents (pp. 307–337). Information Age.
- Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research (3rd ed.). SAGE.
- Enders, C. K. (2010). Applied missing data analysis. Guilford Press.
- Graham, C. R. (2006). Blended learning systems: Definition, current trends, and future directions. In C. J. Bonk & C. R. Graham (Eds.), The handbook of blended learning (pp. 3–21). Pfeiffer.
- Graham, C. R., Woodfield, W., & Harrison, J. B. (2013). A framework for institutional adoption and implementation of blended learning in higher education. The Internet and Higher Education, 18, 4–14.
- Hedges, L. V., & Rhoads, C. (2010). Statistical power analysis in education research (NCEE 2010-4017). U.S. Department of Education.
- Means, B., Toyama, Y., Murphy, R., & Baki, M. (2013). The effectiveness of online and blended learning: A meta-analysis of the empirical literature. Teachers College Record, 115(3), 1–47.
- O’Donnell, C. L. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes. Review of Educational Research, 78(1), 33–84.
- Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1991). A manual for the use of the Motivated Strategies for Learning Questionnaire (MSLQ). University of Michigan.
- Pintrich, P. R., Smith, D. A. F., Garcia, T., & McKeachie, W. J. (1993). Reliability and predictive validity of the MSLQ. Educational and Psychological Measurement, 53(3), 801–813.
- Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research. Journal of Applied Psychology, 88(5), 879–903.
- Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance in the social and behavioral sciences. Child Development Perspectives, 10(1), 29–34.
- Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models (2nd ed.). SAGE.
- Usher, E. L., & Pajares, F. (2009). Sources of self-efficacy in mathematics: A validation study. Contemporary Educational Psychology, 34(1), 89–101.
- Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

Note: If a quasi-experimental design is required, add a data element for teacher/course matching variables (e.g., historical grading distributions, class size, instructor experience) to strengthen the propensity model and include a clear overlap/diagnostics protocol before estimating effects.

示例3

以下为“教师培训对教学质量的前后测评估”之数据收集计划。方案以因果推断与测量严谨性为核心,强调多源数据、标准化工具、信度效度保障与实施忠实度监测,以确保对培训影响的估计准确、可复现与可推广。

一、研究目的与评估框架
- 研究目的:评估特定教师培训项目对课堂教学质量的影响,并探究影响机制(如培训实施忠实度、教师自我效能感变化)与异质性效应(年资、学科、学段)。
- 核心假设:与对照组相比,接受培训的教师在课堂教学质量观察评分、学生感知教学质量、与培训内容相关的教师知识/技能测验上表现更优,并在随访期保持或扩大效果;学生学习投入与学业成绩存在正向但相对较小的效应。
- 评估原则:三角互证(课堂观察+学生调查+客观作业/测评+教师自陈)、多时点测量(基线/后测/随访)、实施过程数据并行采集(忠实度与质量)。

二、研究设计与样本
- 设计类型:
  1) 优先方案:集群随机对照试验(学校或教研组为单位),T0基线测,培训后T1短期测(培训结束后2–4周)、T2中期随访(3–4个月)、T3远期随访(6–9个月)。
  2) 次优方案:准实验差异中的差异(DiD),基线等值检验并应用倾向评分加权/匹配与协变量调整。
- 样本与分层:覆盖不同学段与学科,按学段×学科×地区分层抽样,保证外部效度;每校抽取目标教师全体或随机抽取;学生调查与成绩数据来自所教班级。
- 规模与功效(示例,需据真实参数精算):以教师观察评分为主结果、期望标准化效应d≈0.30、显著性α=0.05、检验效能80%、两层结构(教师嵌套学校,学校内相关ICC≈0.05–0.10),若每校10名目标教师,则每组需约10–15所学校(总教师数约200–300)更为稳健;建议使用PowerUpR或Optimal Design按实际ICC、变异度和失访率进行功效分析(Spybrook等, 2011)。
- 基线等值:按WWC规范检验标准化差异;若0.05<|SMD|≤0.25,后续模型需协变量调整(What Works Clearinghouse, 2020)。

三、指标体系与测量工具
A. 主要结果(课堂教学质量)
- 课堂观察量表(择一为主、可辅以第二量表做稳健性检验):
  1) CLASS(学前、小学、初高中版本):情感支持、课堂组织、教学支持维度,适合跨学段评估(Pianta等)。
  2) Danielson《教学框架》:计划与准备、课堂环境、教学、专业责任(Danielson, 2013)。
  3) 若培训聚焦学科教学,可选学科专向量表,如数学教学质量(MQI)(Hill等)。
- 观察设计:每名教师每时点采集2–3节常态化课堂视频(每节35–45分钟),两名独立评分员盲评;至少20%样本双评,定期漂移校准;采用多面向Rasch或一般化理论分解误差并进行评分者严厉度修正(Shavelson & Webb, 1991)。

B. 次要结果
- 学生感知教学质量:标准化学生问卷(如Tripod 7Cs或同类经验证问卷),班级层面聚合分数;每班至少20名学生问卷以提升信度(MET项目证据;Kane等)。
- 学生学习结果:与课程一致的标准化测验或高质量课程基测/期末分;如使用自编测验,需设置锚题实现前后等值并进行项目反应理论(IRT)标定。
- 教师知识与技能(与培训内容对齐):如学科教学法/内容知识测验;采用既有经验证题库或自编并进行专家审查、试测、信度分析与等值。
- 教师自我效能感:教师教学效能量表TSES(长/短版)(Tschannen-Moran & Hoy, 2001),经跨文化适配与测量等值检验。
- 教学产出与作业质量:收集教案、作业与评阅样本,采用经验证评分规准(如IQA框架)进行盲评。
- 学生行为性投入:短式课堂投入观察清单或经验采样(在有条件时)。

C. 实施过程与忠实度
- 指标维度:到课率/时长(剂量)、活动完成度(依从/一致性)、培训者素质与互动质量、实践反馈与同伴交流强度(参照Desimone, 2009;Century等, 2010)。
- 工具:签到与时长记录、培训者与教练日志、抽样旁听评分表、教师反思札记与作业完成度量表、访谈/焦点小组。

四、数据收集流程与时间安排
- T0基线(培训前2–4周):
  1) 教师:课堂视频2–3节;知识/技能测验;TSES;基本背景(年资、最高学历、职称、班额、课时量等)。
  2) 学生:感知教学质量问卷;可用的最近一次标准化学业数据或组织基线测。
  3) 学校:资源与管理特征(规模、师生比、教学常规、地区属性)。
- 培训实施期:过程数据滚动采集(每次活动签到、日志、随堂质性记录);教练/督导定期记录与抽检。
- T1后测(培训结束后2–4周):
  1) 教师:课堂视频2–3节;知识/技能测验平行版或等值卷;TSES。
  2) 学生:感知教学质量问卷;单元或期末标准化测验。
  3) 实施忠实度:完整汇总并质控。
- T2/T3随访(3–4月/6–9月):同T1的缩减版,至少包含课堂视频与学生问卷,优先获取标准化成绩并评估维持效应。
- 日程与现场组织:避免考试周与节假日;随机排列拍摄顺序;统一摄录规范(机位、收音、取景);建立异常事件报告与替代采集预案。

五、质量控制与信度效度保障
- 工具选择与跨文化适配:优先采用经验证量表;若需翻译,采用“翻译—回译—专家评审—认知访谈—小样本试测”的流程(Beaton等, 2000),并开展验证性因素分析与测量等值检验(跨时点与组别)。
- 评分员培训与校准:
  1) 培训天数与合格标准:依据量表官方要求(如CLASS需认证);考核通过方可上岗。
  2) 盲评与漂移控制:评分员对组别与时点双盲;每月进行共评分与阈值校准;计算ICC/加权Kappa,目标≥0.70。
  3) 一般化理论/多面向Rasch:估计任务数、评分员数与时长对信度的贡献,优化采样设计以达G系数≥0.70。
- 测验质量:
  1) 前后测等值:设置锚题;用IRT或同等化方法链接分数,控制天花板/地板效应。
  2) 内部一致性:Cronbach’s α或ω≥0.70;条目反应特征检查与偏差项处理。
- 学生调查实施:
  1) 统一监考与电子化施测;最小化社会期许偏差与教师干预。
  2) 班级层面聚合并报告设计效应;对小样本班级设最小样本阈值(建议≥15–20)。
- 盲法与偏差控制:观察与评分全程盲法;对照组提供延迟培训以提高参与同意率;预防霍桑效应与补偿性竞争。
- 数据完整性与缺失:
  1) 监控关键KPI(视频完成率、双评比例、问卷有效率、测验缺失率)。
  2) 设立追补与替代采样机制;记录缺失原因;分析阶段使用多重插补并报告敏感性分析。

六、数据管理与伦理合规
- 审批与同意:通过机构伦理审查;获取教师与学生家长(或监护人)知情同意/同意豁免;对视频与问卷用途、保存期限、去标识化与撤回权利进行明示。
- 隐私与安全:分离身份信息与研究数据;视频与PII加密存储、限权访问;建立数据保留与销毁计划;遵循《教育和心理测量标准》(AERA/APA/NCME, 2014)。
- 数据治理:编制数据字典与版本控制;双人复核关键变量;日志记录数据变更;预设异常值与逻辑校验规则。

七、预注册与分析准备(与数据收集直接相关的要点)
- 预注册:在OSF登记研究与前分析计划,明确定义主次结果、时点、排除准则与协变量集。
- 统计框架(概述):多层模型(学生/课堂/教师/学校),对基线值与关键协变量调整;组别×时点交互项为主效应;聚类稳健标准误;在有评分者的结果中使用多面向Rasch或多层模型控制评分者效应。
- 异质性与机制:预先定义分层(年资、学科、基线水平、学校特征);以实施忠实度作为中介/调节进行机制探索。
- 报告规范:遵循WWC与透明报告原则,公开材料与(去标识化)数据字典。

八、关键操作细节清单(供现场执行对照)
- 课堂视频:每位教师每时点2–3节;两名盲评评分员;≥20%双评;每月漂移校准一次。
- 学生调查:同一周内完成;每班≥20份有效问卷为目标;统一施测说明。
- 测验:平行卷或IRT等值;锚题比例建议≥20%;设回收与补测窗口。
- 忠实度:每次培训全程签到(时长到分钟)、内容覆盖核对表、教练/讲师日志、抽样旁听评分;形成每位教师的剂量与质量指数组合。
- 质量阈值:观察评分ICC≥0.70;问卷α≥0.80(总分);测验信息函数覆盖目标能力区间;缺失率控制在<10%并记录原因。

参考文献(示例,按APA风格)
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing.
- Beaton, D. E., Bombardier, C., Guillemin, F., & Ferraz, M. B. (2000). Guidelines for the process of cross-cultural adaptation of self-report measures. Spine, 25(24), 3186–3191.
- Century, J., Rudnick, M., & Freeman, C. (2010). A framework for measuring fidelity of implementation. American Journal of Evaluation, 31(2), 199–218.
- Danielson, C. (2013). The framework for teaching evaluation instrument. Danielson Group.
- Desimone, L. M. (2009). Improving impact studies of teachers’ professional development: Toward better conceptualizations and measures. Educational Researcher, 38(3), 181–199.
- Kane, T. J., McCaffrey, D. F., Miller, T., & Staiger, D. O. (2013). Have we identified effective teachers? Bill & Melinda Gates Foundation MET Project.
- Kraft, M. A., Blazar, D., & Hogan, D. (2018). The effect of teacher coaching on instruction and achievement. Review of Educational Research, 88(4), 547–588.
- Pianta, R. C., La Paro, K. M., & Hamre, B. K. (2008). Classroom Assessment Scoring System (CLASS) manual. Paul H. Brookes.
- Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. SAGE.
- Spybrook, J., et al. (2011). Optimal design for longitudinal and multilevel research. In S. H. Cole (Ed.), The Oxford handbook of quantitative methods.
- Tschannen-Moran, M., & Hoy, A. W. (2001). Teacher efficacy: Capturing an elusive construct. Teaching and Teacher Education, 17(7), 783–805.
- What Works Clearinghouse. (2020). Procedures and standards handbook (Version 4.1/5.0). Institute of Education Sciences.

注:具体工具的中文版本与使用授权需在项目启动前确认;若进行本土化适配,须完成系统的翻译、等值与信效度检验流程,并在预注册中明确主结果的操作化定义与计分方案。上述采样规模、ICC与效应量为常见取值的示意,实际研究应基于先导研究与现地参数进行功效与设计优化。

适用用户

教研主任/校长

快速搭建校级教学改革的数据收集方案:明确样本与工具、排期与人员分工,提交合规材料一次通过。

高校教育学研究生/博士

为论文或课题制定可发表的收集计划:样本量估算、量表选择、访谈提纲与引用格式一步到位。

教师发展中心/培训机构运营

设计培训成效评估:前后测方案、课堂观察表与学员访谈脚本,确保可落地与可汇报。

在线教育产品经理/增长负责人

构建学习行为与留存研究方案:混合方法设计、实验分组与数据质检,支撑增长决策。

教育公益组织/基金会项目官员

建立项目评估框架:目标与指标对齐、抽样与伦理把关、成本可控的现场执行清单。

区县教科所/督导办

筹划区域监测与督导评估:分层抽样、学校走访路线、质控表与督导报告所需证据框架。

解决的问题

- 将“研究主题”快速转化为一份严谨、可执行的数据收集蓝图,覆盖研究设计、样本与抽样、测量工具、采集流程、质量控制、伦理合规、时间线与资源配置。 - 提供专家级方法建议与清晰论证,帮助用户增强方案的科学性、可审查性与说服力。 - 支持多语言学术表达,可直接用于立项材料、伦理/IRB申请、开题报告与论文方法章节。 - 以结构化输出减少往返沟通,缩短准备时间,降低试错成本,提升评审通过率与协作效率。

特征总结

一键生成结构化数据收集计划,涵盖目标、样本、工具、流程、质控与时间表。
自动匹配主题与最优方法(问卷、访谈、观察、实验),并给出选择理由与适用场景。
提供可落地的抽样与样本量建议,兼顾成本与偏差控制,保障结论可靠可复现。
内置伦理合规与隐私提示,生成知情同意与风险清单,减少审查往返与合规风险。
自动设计问卷与访谈纲要,含量表选择与题项示例,可一键导出用于实地实施。
生成数据质量与偏差控制方案,含培训指引、试测步骤与排错流程,提升数据可用性。
支持多语言学术写作输出,规范引用与格式,直接用于立项、申报与论文撰写。
提供分析路径与指标映射,衔接收集到统计与质性分析,避免中途返工与浪费。
按预算与周期自动优化计划,给出资源分配与排期建议,确保按时按质完成任务。
模板化参数输入,快速适配K12、高校、职培与在线教育等多场景研究。

如何使用购买的提示词模板

1. 直接在外部 Chat 应用中使用

将模板生成的提示词复制粘贴到您常用的 Chat 应用(如 ChatGPT、Claude 等),即可直接对话使用,无需额外开发。适合个人快速体验和轻量使用场景。

2. 发布为 API 接口调用

把提示词模板转化为 API,您的程序可任意修改模板参数,通过接口直接调用,轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。

3. 在 MCP Client 中配置使用

在 MCP client 中配置对应的 server 地址,让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作,让提示词在不同 AI 工具间无缝衔接。

¥15.00元
平台提供免费试用机制,
确保效果符合预期,再付费购买!

您购买后可以获得什么

获得完整提示词模板
- 共 231 tokens
- 2 个可调节参数
{ 研究主题 } { 输出语言 }
自动加入"我的提示词库"
- 获得提示词优化器支持
- 版本化管理支持
获得社区共享的应用案例
限时免费

不要错过!

免费获取高级提示词-优惠即将到期

17
:
23
小时
:
59
分钟
:
59