题目与定位(Purpose and Alignment)
- 中文:本表现任务旨在评估软件工程毕业设计中学生在真实情境下的端到端工程能力,包括需求工程、体系结构与设计、实现与代码质量、验证与确认、DevOps 与可维护性、安全合规与风险管理、团队过程与专业沟通。任务与 ABET 工程教育学生学习结果(问题解决、设计、沟通、团队合作、伦理与职业责任等)以及 ACM/IEEE SE2014 课程指南中核心能力(需求、设计、构建、验证、过程、质量)对齐,质量属性采用 ISO/IEC 25010 框架。此设计遵循有效性证据框架,兼顾内容代表性与评分推断的合理性。
- English: This performance task evaluates end-to-end engineering competence in an authentic context: requirements engineering; architecture and design; implementation and code quality; verification and validation; DevOps and maintainability; security compliance and risk management; team process; and professional communication. It aligns with ABET student outcomes and ACM/IEEE SE2014 core competencies, and maps quality attributes to ISO/IEC 25010. The design follows contemporary validity frameworks to support content coverage and defensible score interpretations.
任务情境与目标(Task Scenario and Objectives)
- 中文:团队(3–5 人)在8–10周内为现实或拟真客户开发一款云原生、可部署的软件产品(例如:面向校园活动管理的微服务应用或等价复杂度系统)。目标是交付在受控运行环境中可演示并可运维的最小可行产品(MVP),满足明确的利益相关者需求与质量属性目标(ISO/IEC 25010),并落实基本安全控制(OWASP ASVS L2)。
- English: A team of 3–5 students has 8–10 weeks to deliver a cloud-native, deployable software product for a real or realistic client (e.g., a microservice-based campus event management system or equivalent complexity). The MVP must be demonstrable and operable in a controlled environment, satisfy stakeholder requirements and ISO/IEC 25010 quality attributes, and implement baseline security controls (OWASP ASVS Level 2).
必交成果(Evidence and Deliverables)
- 中文:
- 需求包:SRS(依循 ISO/IEC/IEEE 29148),用例/用户故事与验收标准,质量属性场景(响应时间、可靠性、安全、可维护性等)。
- 架构与设计:体系结构视图与决策记录(ADR),关键设计的权衡分析,接口契约与数据模型。
- 代码与配置:版本库(Git)全量历史,遵循约定的分支策略与提交规范;基础设施即代码(IaC)脚本;关键模块注释与静态分析报告。
- 测试资产:测试策略与计划(依据 ISO/IEC/IEEE 29119),自动化单元/集成/端到端测试,测试覆盖率与缺陷报告,性能与安全测试报告。
- 运维与发布:可重复的 CI/CD 流水线,部署工件(容器镜像、清单)、运行手册与SLA草案、监控与告警面板(可演示)。
- 安全与合规:威胁建模(STRIDE 或等价)、依赖与容器镜像漏洞扫描、ASVS 差距分析与整改单。
- 演示与沟通:15 分钟产品演示与技术答辩,面向用户的简明使用手册(含可访问性说明)。
- English:
- Requirements package: SRS per ISO/IEC/IEEE 29148; use cases/user stories with acceptance criteria; quality attribute scenarios.
- Architecture and design: views and ADRs; trade-off analyses; interface contracts and data models.
- Code and configuration: complete Git history with branching/commit conventions; Infrastructure-as-Code; key module documentation and static analysis reports.
- Test assets: test strategy and plan (ISO/IEC/IEEE 29119); automated unit/integration/E2E tests; coverage and defect reports; performance and security test reports.
- Operations and release: reproducible CI/CD pipeline; deployable artifacts (container images/manifests); runbook and draft SLA; monitoring and alert dashboards.
- Security and compliance: threat model (e.g., STRIDE); SCA/SAST scan results; ASVS gap analysis and remediation items.
- Demo and communication: 15-minute product demo and technical defense; end-user guide including accessibility notes.
可量化最低门槛(Non-negotiable Minimums)
- 中文:
- 构建与测试:主分支CI通过率≥90%;关键路径自动化测试覆盖率≥70%;无未关闭P1缺陷。
- 质量与安全:静态分析零高/严重问题;依赖/镜像零高/严重漏洞;落实 ASVS L2 的身份鉴别与会话管理控制;关键API具速率限制与输入验证。
- 可运行性:在指定环境中“一键”部署成功;演示期间 p95 关键操作延迟<300ms(或达成经论证的性能目标)。
- English:
- Build and test: ≥90% main-branch CI pass rate; ≥70% automated coverage on critical paths; no open P1 defects.
- Quality and security: zero high/critical static analysis issues; zero high/critical SCA/container vulnerabilities; ASVS L2 auth/session controls; rate limiting and input validation on critical APIs.
- Operability: one-command deployment in target environment; p95 latency <300 ms for key operations (or justified target achieved).
评分量表与权重(Analytic Rubric and Weights, 100%)
- 中文(四级:卓越/熟练/发展中/不足;示例性描述,评分细则见下):
- 需求工程 15%:可追踪性矩阵完备;质量属性场景可验证;变更基线管理规范(对齐 29148)。
- 架构与设计 20%:架构决策有证据支撑的权衡;接口稳定性与耦合控制;与 ISO 25010 属性的映射清晰。
- 实现与代码质量 20%:可读性、模块化、复杂度受控;静态分析与代码规范遵循;关键路径覆盖与缺陷密度低。
- 验证与确认 15%:测试金字塔合理;性能与安全测试设计基于风险;缺陷生命周期管理(对齐 29119)。
- DevOps 与可维护性 10%:可重复的CI/CD、回滚与灰度;监控指标(错误率、延迟、SLO)与告警有效。
- 安全与合规 10%:威胁建模的误用案例覆盖;ASVS L2 控制落实;依赖治理与密钥管理。
- 团队过程与项目管理 5%:迭代计划与燃尽一致性;工单粒度与完成定义(DoD)明确;同伴评价一致。
- 专业沟通与文档 5%:SRS/设计/测试/运维文档相互一致、受控版本;演示针对利益相关者有效。
评分等级要点(摘选):卓越=系统性证据与度量齐备、决策有可复现论证;熟练=满足大部分标准且偏差有充分理由;发展中=存在重要缺口但具备基本可运行性;不足=未达最低门槛或关键证据缺失。
- English (four levels: Exemplary/Proficient/Developing/Insufficient; highlights):
- Requirements 15%: complete traceability; testable quality scenarios; disciplined change baselines (per 29148).
- Architecture and design 20%: evidence-backed trade-offs; controlled coupling; clear mapping to ISO 25010 attributes.
- Implementation and code quality 20%: readability/modularity/complexity control; static analysis conformance; low defect density and strong coverage on critical paths.
- Verification and validation 15%: well-shaped test pyramid; risk-based performance/security tests; managed defect lifecycle (per 29119).
- DevOps and maintainability 10%: reproducible CI/CD with rollback/canary; actionable observability (error rate, latency, SLOs).
- Security and compliance 10%: threat model with misuse cases; ASVS L2 controls implemented; dependency hygiene and secret management.
- Team process and project management 5%: iteration planning fidelity; work item granularity and DoD clarity; consistent peer assessments.
- Professional communication and documentation 5%: consistent, versioned SRS/design/test/ops docs; stakeholder-targeted demo.
Level anchors: Exemplary = systematic evidence and metrics, reproducible rationale; Proficient = most standards met with justified deviations; Developing = notable gaps but basic operability; Insufficient = below minimums or missing critical evidence.
评分程序与标准设定(Scoring Procedures and Standard Setting)
- 中文:
- 采用解析型量表加权求和;同时设定“硬门槛”(见最低门槛)确保证据充足性。
- 绝对标准设定:修改型 Angoff 结合边界群体法校准及样题锚定;合格线建议≥70/100 且所有硬门槛达成。
- 评阅可靠性:双评与争议调解;目标 ICC≥0.75;评前校准会基于标定样本进行对齐,期间抽检10%项目复核。
- English:
- Analytic rubric with weighted sum plus non-negotiable gates.
- Absolute standard setting: modified Angoff with borderline-group review using anchored exemplars; recommended pass ≥70/100 and all gates met.
- Rater reliability: double marking with adjudication; target ICC ≥ 0.75; pre-scoring calibration with benchmark samples and 10% moderation.
实施与证据收集(Administration and Evidence Collection)
- 中文:
- 里程碑:第2周需求评审;第4周架构评审;第6周中期集成演示;第8–10周最终交付与答辩。
- 证据三角验证:仓库分析(提交、评审、Issue 链接)、演示可运行性、文档一致性检查。
- 工具边界:允许使用生成式工具但须在 ADR/提交信息中标注用途与人工复核;严禁引入未知许可证代码。
- English:
- Milestones: wk2 requirements review; wk4 architecture review; wk6 mid-term integration demo; wk8–10 final delivery and viva.
- Triangulation: repository analytics (commits/reviews/issue links), operability demo, document consistency checks.
- Tools: generative tools allowed with disclosure in ADR/commits and human verification; no code with incompatible licenses.
有效性、公平性与学术诚信(Validity, Fairness, Academic Integrity)
- 中文:
- 构念效度:指标覆盖ISO 25010关键质量特性与 SE2014 能力域,避免构念缺失(如仅以代码行数代表质量)。
- 评分推断效度:量表锚点以可观察证据与客观度量(覆盖率、缺陷密度、延迟、SLO 达成)支撑。
- 公平与可及性:为不同技术栈提供等价证据路径;提供可访问性指南;允许合理便利。
- 学术诚信:使用相似度与依赖溯源工具;分析异常提交模式;口试核验个人贡献;同行互评作为辅证。
- English:
- Construct validity: indicators span ISO 25010 attributes and SE2014 competencies, avoiding construct underrepresentation.
- Interpretive validity: rubric anchors tied to observable evidence and objective metrics (coverage, defect density, latency, SLO conformance).
- Fairness/accessibility: technology-agnostic evidence pathways; accessibility guidance and reasonable accommodations.
- Integrity: plagiarism and dependency provenance checks; commit-pattern analytics; viva to verify individual contribution; peer ratings as corroboration.
形成性反馈与总结性报告(Formative Feedback and Summative Reporting)
- 中文:
- 形成性:每次评审后提供针对性改进建议,突出高风险缺口(性能、安全、可运维性)。
- 总结性:返回加权分、各维度等级与证据摘录,附改进建议与基准样例链接。
- English:
- Formative: post-review targeted guidance emphasizing high-risk gaps (performance, security, operability).
- Summative: weighted score, level per dimension, evidence excerpts, and improvement suggestions with exemplar links.
风险与权衡说明(Rationale and Evidence Base)
- 中文:真实情境的表现评估能更有效捕捉综合能力,但须通过多源证据、明确量表与评分者校准提升可评分性与信度。采用国际标准(ISO/IEC 25010、29148、29119、12207)与安全基线(OWASP ASVS)确保内容效度与行业一致性。引入CI/CD与可观测性指标,借鉴工程绩效研究,以可测度量辅助判断并降低评分主观性。
- English: Authentic performance tasks better capture integrated competencies; multi-source evidence, explicit rubrics, and rater calibration improve scoreability and reliability. International standards and security baselines anchor content validity and industry alignment. DevOps and observability metrics provide measurable evidence to reduce subjectivity.
参考文献(References)
[1] ISO/IEC 25010:2011, Systems and software engineering—Systems and software Quality Requirements and Evaluation (SQuaRE)—System and software quality models.
[2] ISO/IEC/IEEE 29148:2018, Systems and software engineering—Life cycle processes—Requirements engineering.
[3] ISO/IEC/IEEE 29119-3:2013, Software and systems engineering—Software testing—Part 3: Test documentation.
[4] ISO/IEC/IEEE 12207:2017, Systems and software engineering—Software life cycle processes.
[5] OWASP Foundation, OWASP Application Security Verification Standard (ASVS) 4.0.3, 2021.
[6] ABET Engineering Accreditation Commission, Criteria for Accrediting Engineering Programs, 2024–2025.
[7] ACM/IEEE-CS, Software Engineering 2014: Curriculum Guidelines for Undergraduate Degree Programs in Software Engineering, 2014.
[8] N. Forsgren, J. Humble, and G. Kim, Accelerate: The Science of Lean Software and DevOps. IT Revolution, 2018.
[9] S. Messick, “Validity of psychological assessment: Validation of inferences from persons’ responses and performances,” Educational Measurement: Issues and Practice, vol. 15, no. 4, pp. 5–8, 1995.
[10] M. T. Kane, “Validating the interpretations and uses of test scores,” Journal of Educational Measurement, vol. 50, no. 1, pp. 1–73, 2013.
[11] T. K. Koo and M. Y. Li, “A guideline of selecting and reporting intraclass correlation coefficients for reliability research,” Journal of Chiropractic Medicine, vol. 15, no. 2, pp. 155–163, 2016.
[12] N. Falchikov and J. Goldfinch, “Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks,” Review of Educational Research, vol. 70, no. 3, pp. 287–322, 2000.