Hardcoding Secrets in Notebooks? Lessons from a DataOps Misstep

Author: Prateek Arora
Date: Oct 25, 2021   |   Read time: 5 min

Code with secrets highlighted

During a fast-moving data science pilot, we unknowingly committed a high-risk security oversight. Our team hardcoded secrets β€” database credentials, API tokens, and private keys β€” directly into Jupyter notebooks. These notebooks were shared across teams, stored in a public Git repo, and left entirely exposed. It was a DataOps red flag we caught just in time.

🧠 What Happened?

Analysts were granted access to prototype new data pipelines using Python notebooks inside a shared JupyterHub environment. The goal was speed and agility β€” but at the cost of hygiene. Rather than using secure references, users copied credentials into variables within cells:


db_password = "P@ssw0rd123"
api_key = "sk_test_51M9xyc..."
    

These notebooks were then pushed to Git repositories β€” without any pre-commit checks or secret scanning. With no access boundaries in place, any contributor could view or misuse these secrets. Worse, some of those credentials pointed to production systems.

πŸ›‘ Why This Matters

  • Security exposure: Secrets in source code can be exploited if repos are cloned, leaked, or made public β€” intentionally or accidentally.
  • Audit failure risk: Regulatory audits (e.g., ISO 27001, GDPR) frown on hardcoded secrets, especially in production environments.
  • Propagation problem: Once secrets are in Git history, removing them completely requires rewriting commit history β€” a painful and error-prone task.

🧰 What We Should've Done

  • Use Azure Key Vault or AWS Secrets Manager: Store and retrieve secrets securely with access policies and audit trails.
  • Access secrets via environment variables: Configure notebooks to pull credentials from environment variables injected during job or container startup.
  • Implement secret scanning tools: Integrate tools like truffleHog, gitleaks, or GitHub Advanced Security to detect secrets in commits automatically.
  • Pre-commit hooks: Add hooks to prevent credentials from being staged accidentally (e.g., using pre-commit framework).
  • Isolate dev/test from prod: Use different credentials and environments for sandbox work β€” never reuse production keys in exploratory work.

βš–οΈ Real-World Trade-offs

It’s easy to dismiss prototypes as "disposable," but the truth is, temporary code becomes permanent more often than we admit. Once notebooks were shared for review or demoed to leadership, they were treated as trusted artefacts β€” and assumptions about security fell apart.

We learned that governance must scale with the environment β€” and even in exploratory phases, you need sensible defaults and guardrails.

πŸ“˜ Takeaways

  • Don’t trust notebooks to stay private: Always assume that what goes into version control may one day become visible to others.
  • Shift-left secrets management: Make secure secret handling part of the dev workflow β€” not an afterthought left to the ops team.
  • Create a security-first culture: Train analysts and developers alike to understand the impact of insecure practices β€” especially in data workflows.

As teams adopt notebooks for data engineering, machine learning, and rapid prototyping, the line between dev and prod continues to blur. Embed secrets management into your stack from day one β€” even when you think you're just experimenting.