Synthetic Data
Overview of the Synthetic Data feature for sharing sensitive data while maintaining compliance and protecting user privacy.
Overview
Synthetic data is artificially generated data that mimics the statistical properties of real-world data. It is used to maintain privacy and compliance by allowing organizations to share sensitive information without exposing actual user data.
The Synthetic Data feature is designed for customers who need to share limited amounts of sensitive data, including PHI, with third-party destinations while maintaining compliance and protecting user privacy. Synthetic data allows you to generate and assign properties to visitors that are statistically inferred or anonymized versions of the real data.
Why Use Synthetic Data?
- Enhanced Privacy: While hashing is an option, hashes can sometimes be reverse-engineered or decrypted, especially with large datasets. Synthetic data provides an additional layer of protection.
- Fill Data Gaps: If your organization lacks certain user information (e.g., city, state), synthetic data can generate these properties based on statistical models.
- Flexible Mapping: Synthetic properties are stored alongside visitor data, and you can map these synthetic fields to destinations like Facebook, Google Ads, or other vendors.
For example, instead of sending visitor.email
to a destination, you can map visitor.synthetic_email
.
Synthetic data is generated by analyzing geolocation data, real user data, and statistical trends for each property. These insights are then used to create anonymized data that mimics an account's real data, ensuring privacy and compliance while maintaining utility.
Synthetic Properties
Ours automatically generates the following synthetic properties for every visitor:
- synthetic_email: A synthetic version of the visitor's email address.
- synthetic_first_name: A synthetic first name.
- synthetic_last_name: A synthetic last name.
- synthetic_gender: A synthetic gender, based on statistical allocation.
- synthetic_date_of_birth: A synthetic date of birth.
- synthetic_phone_number: A synthetic phone number.
- synthetic_city: A synthetic city.
- synthetic_state: A synthetic state.
- synthetic_zip: A synthetic zip code.
- synthetic_country: A synthetic country.
- synthetic_ip: A synthetic IP address.
Synthetic data is generated by default for every visitor that interacts with your application. However, by default, mappings use "real" data properties unless you explicitly opt-in to use synthetic data.
How to Use Synthetic Data
- Navigate to the Mapping Section of a destination.
- Review the variable dictionary or type-ahead variable inputs to select a synthetic property.
- To use synthetic data for a specific destination, update your mapping settings:
- Replace real properties (e.g.,
visitor.email
) with synthetic properties (e.g.,visitor.synthetic_email
) for the desired fields.
- Replace real properties (e.g.,
The default mappings Ours generates when you create a destination generally use the real data properties. You will have to explicitly update the mapping to use synthetic data.
Key Considerations
Customization
Synthetic data generation is not customizable at this time. If you have specific use cases or requirements for generating synthetic data, please contact our team to discuss potential solutions.
Compliance
Synthetic data can help organizations maintain compliance with frameworks like HIPAA and GDPR by ensuring no direct PHI or identifiable data is shared with third parties. However, always review your mappings to ensure compliance with your specific needs.
Testing Synthetic Data
If you want to test how synthetic data behaves, create a duplicate destination for Facebook or Google ads and modify the mapping to use synthetic properties. This allows you to preview the data without affecting existing configurations.
FAQs
Q: Can I customize how synthetic properties are generated?
A: Not at this time. If you have specific use cases, feel free to reach out to our team.
Q: Is synthetic data generated for all visitors by default?
A: Yes, synthetic data is generated automatically for all visitors. However, you have to opt-into using it for your destinations via editing your mappings.
Q: Where is synthetic data stored?
A: Synthetic data is stored alongside your existing visitor data and follows the same security and retention policies.
Updated 2 months ago