Databricks

Deposit your event, dispatch, and visitor data as daily Parquet files into your Amazon S3 or Google Cloud Storage bucket for analysis in Databricks.

Open Destinations in app

Use this page to connect Ours Privacy to Databricks. Ours Privacy deposits your event, dispatch, and visitor data as daily Parquet files into a cloud storage bucket you own — Amazon S3 or Google Cloud Storage — which Databricks reads as an external location. This works whether your Databricks workspace runs on AWS or Google Cloud.


How the integration works

  • Daily deposits: Events and dispatches are deposited into your S3 or Google Cloud Storage bucket once per day.
  • Visitor updates: Visitor data contains records whose last-seen timestamp is from the previous day or later, so it should be merged with an upsert.
  • Parquet format: Data is written in Parquet, ready for Delta Lake operations and efficient querying.
  • Your bucket, your control: Data lands in a bucket you own — you decide retention, access, and how Databricks ingests it.

Data organization

Files are written in a partitioned structure (shown here for an S3 bucket; Google Cloud Storage uses the same layout under gs://):

your-bucket/
  ├── events/
  │   └── YYYY/MM/DD/*.parquet
  ├── dispatches/
  │   └── YYYY/MM/DD/*.parquet
  └── visitors/
      └── YYYY/MM/DD/*.parquet

Partitioning by year/month/day lets you query specific time ranges efficiently, manage retention, and take advantage of Delta Lake partition discovery.

Data processing considerations

Events and dispatches

Events and dispatches are complete daily snapshots — each day's Parquet files contain all events and dispatches that occurred on that date.

Visitors

Visitor files contain recently updated records, so merge them into your Databricks tables with an upsert (MERGE INTO):

  1. Read the Parquet files from the visitors directory for the current day.
  2. Match against existing rows using the visitor identifier.
  3. Update matched rows with the newer values.
  4. Insert rows for visitors not yet in your table.
  5. Resolve conflicts with your own rule (for example, latest timestamp wins).

Before you start

  • A Databricks workspace on AWS or Google Cloud, with permission to create a Unity Catalog external location.
  • A cloud storage bucket you own to receive the deposits — an Amazon S3 bucket or a Google Cloud Storage bucket.
  • Access to the Destinations screen in Ours Privacy.

Setup

  1. In Ours Privacy, create a Databricks destination from the Destinations screen.
  2. Choose your Cloud Provider (Google or AWS) and enter your bucket name, region, and optional key prefix. Reach out to your account manager so we can confirm the access policy your bucket needs to grant.
  3. Connect Databricks to that bucket as an external location:
  4. Point a Delta Lake table or notebook at the partitioned paths above, applying upsert logic for the visitors directory.

Once configured, your data is deposited daily and is ready to query in Databricks.


Next Steps

Need help?

Contact us at support@oursprivacy.com.

How is this guide?

On this page