Preparing for GDPR: A Technical Checklist

February 27, 2017

The General Data Protection Regulation (GDPR) takes effect in May 2018. It represents the most significant change to data protection law in decades, with substantial fines for non-compliance (up to 4% of global revenue or €20 million).

While legal and compliance teams handle policy, engineering teams must implement technical capabilities. Here’s what you need to know and do.

Understanding GDPR Requirements

GDPR applies if you:

“Personal data” is broadly defined: any information relating to an identified or identifiable person. Names, emails, IP addresses, device identifiers—all count.

Key Principles

Lawful basis: You need a legal reason to process personal data. Common bases include consent, contractual necessity, and legitimate interest.

Purpose limitation: Data collected for one purpose shouldn’t be used for incompatible purposes.

Data minimization: Collect only what you need.

Accuracy: Keep data accurate and up to date.

Storage limitation: Don’t keep data longer than necessary.

Security: Implement appropriate technical and organizational security measures.

User Rights

GDPR grants individuals rights you must technically support:

Right of access: Users can request all data you hold about them.

Right to rectification: Users can correct inaccurate data.

Right to erasure (“right to be forgotten”): Users can request deletion of their data.

Right to data portability: Users can request their data in a portable format.

Right to object: Users can object to certain processing.

Technical Preparations

Data Inventory

You can’t comply with regulations about data you don’t know you have.

Map your data:

Create a data inventory documenting all personal data flows. This is the foundation for everything else.

If consent is your legal basis, you need robust consent management:

Explicit consent: Pre-checked boxes don’t count. Users must actively consent.

Specific consent: Consent for one purpose doesn’t cover others. Be specific about what you’re asking consent for.

Withdrawable consent: Users must be able to withdraw consent as easily as they gave it.

Documented consent: Record what users consented to and when.

Technical implementation:

class Consent:
    user_id: str
    purpose: str  # "marketing", "analytics", etc.
    granted_at: datetime
    withdrawn_at: datetime | None
    version: str  # Version of privacy policy

def check_consent(user_id: str, purpose: str) -> bool:
    consent = get_latest_consent(user_id, purpose)
    return consent and not consent.withdrawn_at

Data Subject Access Requests

Users can request all data you hold about them. You have 30 days to respond.

Technical requirements:

Build tooling to export user data:

def export_user_data(user_id: str) -> dict:
    return {
        "profile": get_profile(user_id),
        "orders": get_orders(user_id),
        "activity": get_activity_log(user_id),
        "preferences": get_preferences(user_id),
        # ... all personal data
    }

Consider building a self-service portal for common requests.

Right to Erasure

Users can request deletion of their data. You must delete or anonymize personal data across all systems.

Challenges:

Implementation approach:

def delete_user_data(user_id: str):
    # Primary deletion
    delete_profile(user_id)
    delete_orders(user_id)
    delete_activity(user_id)

    # Queue for backup removal
    queue_backup_deletion(user_id)

    # Anonymize where deletion isn't possible
    anonymize_analytics(user_id)

    # Log the deletion request
    log_deletion_request(user_id)

For backups, you may need to mark records for exclusion and delete on next backup cycle, or accept that backup data will age out per retention policy.

Data Portability

Users can request their data in a “structured, commonly used, machine-readable format.”

Implement data export in standard formats:

def export_portable_data(user_id: str) -> bytes:
    data = get_user_data(user_id)
    return json.dumps(data).encode('utf-8')
    # Or CSV, XML, etc.

Pseudonymization and Encryption

GDPR encourages pseudonymization (replacing identifiers with pseudonyms) and encryption as security measures.

Encryption:

Pseudonymization:

Where possible, store data with pseudonymous identifiers that can only be linked to individuals with additional information stored separately.

Data Retention

Implement retention policies and enforcement:

RETENTION_POLICIES = {
    "user_profile": timedelta(days=0),  # Deleted on request
    "order_history": timedelta(years=7),  # Legal requirement
    "activity_log": timedelta(days=90),
    "analytics": timedelta(years=2),
}

def enforce_retention():
    for data_type, retention in RETENTION_POLICIES.items():
        delete_expired_data(data_type, retention)

Automated retention enforcement ensures data isn’t kept indefinitely.

Breach Notification

GDPR requires breach notification within 72 hours of discovery.

Technical requirements:

Ensure you can detect breaches (logging, monitoring, intrusion detection) and respond quickly.

Privacy by Design

New systems should incorporate privacy from the start:

Third-Party Data Processors

If you share data with third parties (analytics, marketing, infrastructure), you need:

Audit your third-party integrations and ensure agreements are in place.

Implementation Roadmap

Phase 1: Inventory (Now)

Phase 2: Foundations (Q3 2017)

Phase 3: Operations (Q4 2017)

Phase 4: Verification (Q1 2018)

Phase 5: Maintenance (Ongoing)

Common Technical Challenges

Data Scattered Across Systems

Personal data ends up in many places: databases, logs, analytics, backups, third-party services. Comprehensive data mapping is essential but difficult.

Backups and Retention

Backups contain personal data. Deleting from live systems doesn’t delete from backups. Consider backup retention policies and the ability to exclude specific records.

Log Data

Logs often contain personal data (IP addresses, user IDs). Implement log retention limits and consider pseudonymization in logs.

Analytics and ML

Historical data used for analytics or machine learning contains personal data. You may need to anonymize or delete from these systems too.

Legacy Systems

Older systems may not support required capabilities (granular deletion, export). Budget time for legacy system updates.

Key Takeaways