The General Data Protection Regulation (GDPR) takes effect in May 2018. It represents the most significant change to data protection law in decades, with substantial fines for non-compliance (up to 4% of global revenue or €20 million).
While legal and compliance teams handle policy, engineering teams must implement technical capabilities. Here’s what you need to know and do.
Understanding GDPR Requirements
GDPR applies if you:
- Process personal data of EU residents
- Are established in the EU
- Offer goods or services to EU residents
“Personal data” is broadly defined: any information relating to an identified or identifiable person. Names, emails, IP addresses, device identifiers—all count.
Key Principles
Lawful basis: You need a legal reason to process personal data. Common bases include consent, contractual necessity, and legitimate interest.
Purpose limitation: Data collected for one purpose shouldn’t be used for incompatible purposes.
Data minimization: Collect only what you need.
Accuracy: Keep data accurate and up to date.
Storage limitation: Don’t keep data longer than necessary.
Security: Implement appropriate technical and organizational security measures.
User Rights
GDPR grants individuals rights you must technically support:
Right of access: Users can request all data you hold about them.
Right to rectification: Users can correct inaccurate data.
Right to erasure (“right to be forgotten”): Users can request deletion of their data.
Right to data portability: Users can request their data in a portable format.
Right to object: Users can object to certain processing.
Technical Preparations
Data Inventory
You can’t comply with regulations about data you don’t know you have.
Map your data:
- What personal data do you collect?
- Where is it stored? (Databases, files, backups, logs, third-party services)
- Why do you collect it?
- How long do you keep it?
- Who has access?
Create a data inventory documenting all personal data flows. This is the foundation for everything else.
Consent Management
If consent is your legal basis, you need robust consent management:
Explicit consent: Pre-checked boxes don’t count. Users must actively consent.
Specific consent: Consent for one purpose doesn’t cover others. Be specific about what you’re asking consent for.
Withdrawable consent: Users must be able to withdraw consent as easily as they gave it.
Documented consent: Record what users consented to and when.
Technical implementation:
class Consent:
user_id: str
purpose: str # "marketing", "analytics", etc.
granted_at: datetime
withdrawn_at: datetime | None
version: str # Version of privacy policy
def check_consent(user_id: str, purpose: str) -> bool:
consent = get_latest_consent(user_id, purpose)
return consent and not consent.withdrawn_at
Data Subject Access Requests
Users can request all data you hold about them. You have 30 days to respond.
Technical requirements:
- Query all systems containing user data
- Export data in a readable format
- Verify requester identity
- Handle requests at scale (if you have many users)
Build tooling to export user data:
def export_user_data(user_id: str) -> dict:
return {
"profile": get_profile(user_id),
"orders": get_orders(user_id),
"activity": get_activity_log(user_id),
"preferences": get_preferences(user_id),
# ... all personal data
}
Consider building a self-service portal for common requests.
Right to Erasure
Users can request deletion of their data. You must delete or anonymize personal data across all systems.
Challenges:
- Data in backups
- Data in logs
- Data in derived systems (analytics, ML training data)
- Data required for legal/regulatory purposes (you can refuse deletion)
Implementation approach:
def delete_user_data(user_id: str):
# Primary deletion
delete_profile(user_id)
delete_orders(user_id)
delete_activity(user_id)
# Queue for backup removal
queue_backup_deletion(user_id)
# Anonymize where deletion isn't possible
anonymize_analytics(user_id)
# Log the deletion request
log_deletion_request(user_id)
For backups, you may need to mark records for exclusion and delete on next backup cycle, or accept that backup data will age out per retention policy.
Data Portability
Users can request their data in a “structured, commonly used, machine-readable format.”
Implement data export in standard formats:
def export_portable_data(user_id: str) -> bytes:
data = get_user_data(user_id)
return json.dumps(data).encode('utf-8')
# Or CSV, XML, etc.
Pseudonymization and Encryption
GDPR encourages pseudonymization (replacing identifiers with pseudonyms) and encryption as security measures.
Encryption:
- Encrypt data at rest
- Encrypt data in transit (TLS)
- Encrypt backups
Pseudonymization:
Where possible, store data with pseudonymous identifiers that can only be linked to individuals with additional information stored separately.
Data Retention
Implement retention policies and enforcement:
RETENTION_POLICIES = {
"user_profile": timedelta(days=0), # Deleted on request
"order_history": timedelta(years=7), # Legal requirement
"activity_log": timedelta(days=90),
"analytics": timedelta(years=2),
}
def enforce_retention():
for data_type, retention in RETENTION_POLICIES.items():
delete_expired_data(data_type, retention)
Automated retention enforcement ensures data isn’t kept indefinitely.
Breach Notification
GDPR requires breach notification within 72 hours of discovery.
Technical requirements:
- Breach detection capabilities
- Incident response procedures
- Ability to assess impact (what data, which users)
- Notification infrastructure
Ensure you can detect breaches (logging, monitoring, intrusion detection) and respond quickly.
Privacy by Design
New systems should incorporate privacy from the start:
- Minimize data collection
- Default to privacy (opt-in rather than opt-out)
- Build retention and deletion into the design
- Consider privacy impact during architecture decisions
Third-Party Data Processors
If you share data with third parties (analytics, marketing, infrastructure), you need:
- Data processing agreements
- Assurance they’re GDPR compliant
- Understanding of what data goes where
Audit your third-party integrations and ensure agreements are in place.
Implementation Roadmap
Phase 1: Inventory (Now)
- Map all personal data
- Identify data processors
- Document data flows
- Assess current compliance gaps
Phase 2: Foundations (Q3 2017)
- Implement consent management
- Build data export capability
- Implement data deletion
- Establish retention policies
Phase 3: Operations (Q4 2017)
- Train staff on procedures
- Test subject access requests
- Test deletion requests
- Review and update privacy notices
Phase 4: Verification (Q1 2018)
- Audit implementation
- Test incident response
- Verify third-party compliance
- Final gap assessment
Phase 5: Maintenance (Ongoing)
- Process subject requests
- Enforce retention
- Monitor and improve
- Adapt to regulatory guidance
Common Technical Challenges
Data Scattered Across Systems
Personal data ends up in many places: databases, logs, analytics, backups, third-party services. Comprehensive data mapping is essential but difficult.
Backups and Retention
Backups contain personal data. Deleting from live systems doesn’t delete from backups. Consider backup retention policies and the ability to exclude specific records.
Log Data
Logs often contain personal data (IP addresses, user IDs). Implement log retention limits and consider pseudonymization in logs.
Analytics and ML
Historical data used for analytics or machine learning contains personal data. You may need to anonymize or delete from these systems too.
Legacy Systems
Older systems may not support required capabilities (granular deletion, export). Budget time for legacy system updates.
Key Takeaways
- Start with a comprehensive data inventory; you can’t comply without understanding your data
- Build consent management with explicit, specific, withdrawable, and documented consent
- Implement data subject request handling: access, rectification, erasure, portability
- Establish and automate data retention policies
- Prepare for breach detection and 72-hour notification requirement
- Audit third-party data processors and establish agreements
- Build privacy into new systems from the start
- Start now; May 2018 will arrive quickly