Under active development Content is continuously updated and improved
Home / Frameworks / NIST AI 600-1 / MEASURE-2 — Measure 2: AI System Performance and Trustworthiness

MEASURE-2 Measure 2: AI System Performance and Trustworthiness

Official SCF Download

60 requirements in the Measure 2: AI System Performance and Trustworthiness function

MEASURE 2.2Evaluations involving human subjects meet applicable requirements (including human subject
MS-2.2-001Assess and manage statistical biases related to GAI content provenance through techniques such as
MS-2.2-002Document how content provenance data is tracked and how that data interacts with privacy and
MS-2.2-003Provide human subjects with options to withdraw participation or revoke theirconsent for present
MS-2.2-004Use techniques such as anonymization, differential privacy or other privacy- enhancing
MEASURE 2.3AI system performance or assurance criteria are measured qualitatively or quantitatively and
MS-2.3-001Consider baseline model performance on suites of benchmarks when selecting a model for fine tuning
MS-2.3-002Evaluate claims of model capabilities using empirically validated methods
MS-2.3-003Share results of pre-deployment testing with relevant GAI Actors, such as those with system
MS-2.3-004Utilize a purpose-built testing environment such as NIST Dioptra to empirically evaluate GAI
MEASURE 2.5The AI system to be deployed is demonstrated to be valid and reliable
MS-2.5-001Avoid extrapolating GAI system performance or capabilities from narrow, non- systematic, and
MS-2.5-002Document the extent to which human domain knowledge is employed to improve GAI system performance
MS-2.5-003Review and verify sources and citations in GAI system outputs during pre- deployment risk
MS-2.5-004Track and document instances of anthropomorphization (e.g., human images, mentions of human
MS-2.5-005Verify GAI system training data and TEVV data provenance, and that fine-tuning or
MS-2.5-006Regularly review security and safety guardrails, especially if the GAI system is being operated in
MEASURE 2.6The AI system is evaluated regularly for safety risks -- as identified in the MAP function
MS-2.6-001Assess adverse impacts, including health and wellbeing impacts for value chain or other AI Actors
MS-2.6-002Assess existence or levels of harmful bias, intellectual property infringement, data privacy
MS-2.6-003Re-evaluate safety features of fine-tuned models when the negative risk exceeds organizational
MS-2.6-004Review GAI system outputs for validity and safety: Review generated code to assess risks that may
MS-2.6-005Verify that GAI system architecture can monitor outputs and performance, and handle, recover from
MS-2.6-006Verify that systems properly handle queries that may give rise to inappropriate, malicious, or
MS-2.6-007Regularly evaluate GAI system vulnerabilities to possible circumvention of safety measures
MEASURE 2.7AI system security and resilience -- as identified in the MAP function -- are evaluated and
MS-2.7-001Apply established security measures to assess likelihood and magnitude of vulnerabilities and
MS-2.7-002Benchmark GAI system security and resilience related to content provenance against industry
MS-2.7-003Conduct user surveys to gather user satisfaction with the AI-generated content and user
MS-2.7-004Identify metrics that reflect the effectiveness of security measures, such as data provenance, the
MS-2.7-005Measure reliability of content authentication methods, such as watermarking, cryptographic
MS-2.7-006Measure the rate at which recommendations from security checks and incidents are implemented
MS-2.7-007Perform AI red-teaming to assess resilience against Abuse to facilitate attacks on other systems
MS-2.7-008Verify fine-tuning does not compromise safety and security controls
MS-2.7-009Regularly assess and verify that security measures remain effective and have not been compromised
MEASURE 2.8Risks associated with transparency and accountability -- as identified in the MAP function -- are
MS-2.8-001Compile statistics on actual policy violations, take-down requests, and intellectual property
MS-2.8-002Document the instructions given to data annotators or AI red-teamers
MS-2.8-003Use digital content transparency solutions to enable the documentation of each instance where
MS-2.8-004Verify adequacy of GAI system user instructions through user testing
MEASURE 2.9The AI model is explained, validated, and documented, and AI system output is interpreted within
MS-2.9-001Apply and document ML explanation results such as Analysis of embeddings, Counterfactual prompts
MS-2.9-002Document GAI model details including Proposed use and organizational value; Assumptions and
MEASURE 2.10Privacy risk of the AI system -- as identified in the MAP function -- is examined and documented
MS-2.10-001Conduct AI red-teaming to assess issues such as Outputting of training data samples, and
MS-2.10-002Engage directly with end-users and other stakeholders to understand their expectations and
MS-2.10-003Verify deduplication of GAI training data samples, particularly regarding synthetic data
MEASURE 2.11Fairness and bias -- as identified in the MAP function -- are evaluated and results are documented
MS-2.11-001Apply use-case appropriate benchmarks (e.g., Bias Benchmark Questions, Real Hateful or Harmful
MS-2.11-002Conduct fairness assessments to measure systemic bias
MS-2.11-003Identify the classes of individuals, groups, or environmental ecosystems which might be impacted
MS-2.11-004Review, document, and measure sources of bias in GAI training and TEVV dataDifferences in
MS-2.11-005Assess the proportion of synthetic to non-synthetic training data and verify training data is not
MEASURE 2.12Environmental impact and sustainability of AI model training and management activities -- as
MS-2.12-001Assess safety to physical environments when deploying GAI systems
MS-2.12-002Document anticipated environmental impacts of model development, maintenance, and deployment in
MS-2.12-003Measure or estimate environmental impacts (e.g., energy and water consumption) for training, fine
MS-2.12-004Verify effectiveness of carbon capture or offset programs for GAI training and applications, and
MEASURE 2.13Effectiveness of the employed TEVV metrics and processes in the MEASURE function are evaluated and
MS-2.13-001Create measurement error models for pre-deployment metrics to demonstrate construct validity for