Chapter 18 Appendix A: Responsible AI Checklist

As we conclude this book, it is crucial to remember that technical skills are only half of the equation. Data science has real-world consequences. Before deploying any model, analysis, or reliable pipeline to production, use this checklist to ensure your work is robust, fair, and transparent.

This checklist is designed to be actionable for R users, pointing to specific packages and practices where applicable.

18.1 Data Quality & Lineage

  • “Garbage in, garbage out” applies to ethics as well as accuracy.*

    • Tip: Use packages like introdat or custom scripts to scan for patterns resembling PII (emails, SSNs) before data leaves your secure environment.
    • Action: Check distribution of key demographics in your train vs. production sets.
    • Tool: Use the pointblank or validator packages to define and enforce data quality rules (e.g., col_vals_between(age, 0, 120)).

18.2 Fairness & Bias

Algorithms can reinforce existing inequalities.

    • Tool: Use fairness, fairmodels, or dalex to calculate metrics like Disparate Impact or Equal Opportunity difference.
    • Example Code: fairness_check(explainer, protected = data$gender, privileged = "Male")

18.3 Transparency & Explainability

Black boxes should not make high-stakes decisions.

    • Tool: Use dalex, lime, or iml to create feature contribution plots or breakdown plots.

18.4 Reproducibility & Integrity

Science must be reproducible.

    • Tool: Use renv to capture package versions in a renv.lock file.

18.5 GenAI Specifics

If using Large Language Models (LLMs).

“With great power comes great responsibility.” — Stan Lee (and every Data Scientist)