One Snowflake, Multiple Vaults: A Solution to Data Residency

Data residency requirements, which govern where sensitive data can be stored or processed in the cloud (or in an on-prem server) are a common feature of many modern data protection laws. Because of data residency requirements, the location of sensitive data has significant regulatory compliance implications in countries and regions around the world.

In this post, we’ll look at the challenges of managing data residency with Snowflake. We’ll start by examining how Snowflake Cloud Regions address data residency challenges, and consider the compliance implications of this approach — especially when loading data from cloud storage. Then, we’ll look at how to simplify data residency compliance using one or more regional data privacy vaults.

Let’s begin with a deeper dive into data residency, and how it impacts compliance.

The implications of data residency on compliance

When you work with personally identifiable information (PII), where you store and process this information has a direct impact on your legal compliance requirements. Some jurisdictions have regulations that govern the protection and privacy of their residents’ PII, restricting how and where it’s used by businesses and other organizations.

For example, the personal data (i.e., PII) of European Union residents cannot be transferred outside the EU without appropriate safeguards.

The laws of each jurisdiction impact how you transmit, manage, process, and store sensitive data in that jurisdiction. Because data residency dictates where (geographically ) data is stored in the cloud, data residency becomes a critical concern in cloud environments that handle sensitive data.

Choose your cloud region carefully

Cloud service providers have data centers located in multiple regions around the world. When businesses sign up for cloud services and configure storage regions and other tooling, they select specific regions where their data is stored.

For many businesses, the selection of regions and locations for data storage is an afterthought.

But, treating this decision as an afterthought is a costly mistake that can come back to haunt you if you’re handling sensitive data. That’s because choosing storage regions is a weighty decision that can have a long-term impact on compliance, and on your business operations.

Snowflake Cloud Regions: a data residency solution?

Snowflake Cloud Regions let you choose the geographic location where your Snowflake data is stored across the data centers provided by the Snowflake-supported public cloud providers — AWS, GCP, and Azure. Each cloud provider offers a set of regions across the globe, with specific geographic data center locations in each cloud provider region.

Source: Snowflake Documentation Supported Cloud Regions

If your company uses Snowflake Cloud Regions, you have your choice of providers, as well as regions where your data can be stored. When you create an account to deploy and set up Snowflake, whichever region you select becomes the primary location for data storage and for data processing resources.

At first glance, it might seem like Snowflake Cloud Regions provides a simple, effective solution to your data residency and compliance concerns. But for global companies who need global analytics, it isn’t that simple. That’s because, as noted in the Snowflake Cloud Regions documentation:

Each Snowflake account is hosted in a single region. If you wish to use Snowflake across multiple regions, you must maintain a Snowflake account in each of the desired regions.

This means that for each region where your business operates that has data residency requirements, you’ll need a different Snowflake account hosted in that region. Compliance becomes increasingly complex as you scale globally to more and more regions around the world. With this approach, running global analytics operations across different accounts to get a comprehensive view of your business can be a massive and ongoing challenge.

Instead of managing multiple Snowflake accounts with multiple Snowflake instances distributed in various regions around the world, you’d rather maintain a Snowflake instance in a single region to support global data operations. However, you still need to consider the need to honor data residency requirements for sensitive data so you can uphold your compliance obligations and safeguard customer trust.

For example, if you collect the personal data (PII) of customers located in the EU, but your Snowflake instance is located somewhere else, then you need to think through the privacy and compliance impact of storing and processing that data.

Loading data from cloud storage into Snowflake

Snowflake also lets businesses load data from cloud storage services like AWS S3, Google Cloud Storage, Microsoft Azure — regardless of which cloud platform hosts the businesses’ Snowflake account. This can present additional challenges when working to ensure data residency compliance.

For example, let’s say that your company collects PII from both US and EU customers using its website. And, let’s say that this sensitive data is then stored in a Google Cloud Storage bucket that’s located in the AUSTRALIA-SOUTHEAST1 (Sydney) region.

How does transmitting this PII data to Australia, and then storing it in Australia, affect your compliance with regulations like the EU’s GDPR?

The answer is: doing this likely puts you out of compliance with GDPR. This is just one example of how the location where sensitive data is stored — and where it’s processed and replicated — complicates the compliance requirements faced by businesses that handle sensitive PII.

Businesses that handle PII must ensure regulatory compliance by aligning their choice of cloud storage regions with the data residency requirements of markets where they operate.

And beyond compliance issues, businesses should also consider data transfer costs. Transferring data between cloud storage regions can incur significant additional costs, especially if your company is frequently transferring large volumes of data. So, we not only have compliance concerns with cross-border transfers of PII, we also have a cost concern.

So, to briefly recap our problem:

  • Countries and regions have their own laws and regulations that govern how to handle their residents’ sensitive data (PII).
  • The geographic location where your business stores and processes sensitive data impacts whether you’re compliant with the data residency requirements of the markets where you operate.
  • If you use Snowflake to perform analytics on PII, then the complexity of meeting your compliance obligations will depend on the location of your Snowflake account.
  • If you load PII data into Snowflake from cloud storage, then your compliance obligations are also impacted by the location of your cloud storage.

So, how can we meet data residency requirements, support global analytics operations, and remove the operational overhead of managing multiple Snowflake accounts and instances?

We can solve our data residency problems and protect sensitive data with one or more data privacy vaults.

How a data privacy vault simplifies data privacy

data privacy vault isolates, protects, and governs access to sensitive customer data. Sensitive data is stored in the vault, while opaque tokens that serve as references to this data are stored in traditional cloud storage or used in data warehouses. A data privacy vault can store sensitive data in a specific geographic location, and tightly controls access to this data. Other systems only have access to non-sensitive tokenized data.

In the example architecture shown below, a phone number is collected by a front end application. Ideally, we should de-identify (i.e., tokenize) this sensitive information as early in the data lifecycle as possible. A data privacy vault lets us do just that.

This phone number, along with any other PII, is stored securely in the vault, which is isolated outside of your company’s existing infrastructure. Any downstream services — the application database, data warehouse, analytics, any logs, etc. — store only a token representation of the data, and are removed from the scope of compliance:

Example of reducing compliance scope with a data privacy vault

Snowflake handles only de-identified data

Because no sensitive data is stored outside the data privacy vault, your compliance scope is restricted to just the vault. This removes the compliance burden from your Snowflake instance.

Example pipeline where sensitive data is isolated and protected within a data privacy vault

To satisfy data residency requirements, we can extend this approach by using multiple regional data privacy vaults placed near customers whose data is subject to these requirements. With sensitive data stored in these data privacy vaults, Snowflake contains only de-identified, tokenized data. It no longer matters if you operate a single global instance of Snowflake or multiple Snowflake accounts across different regions because data residency concerns no longer apply to your Snowflake instances.

Compliance with data residency requirements now depends solely on where your data privacy vaults are located. You no longer need to worry about data residency for all the different parts of your data tech stack, including cloud storage and Snowflake. All sensitive data goes into your data privacy vaults, and these vaults become the only component of your architecture subject to data residency requirements.

Store PII in a data privacy vault in a specific region

With Skyflow Data Privacy Vault you can host your vaults in a wide variety of regions around the world. You can also route sensitive data to a data privacy vault located in a specific region for storage.

For example, consider how the application architecture shown below supports data residency requirements from multiple regions:

Using vaults to satisfy multiple data residency requirements for one Snowflake instance
  1. Your company’s e-commerce site collects customer PII whenever a customer places an order.
  2. On the client side, the website detects the customers’ location.
  3. Detecting that the customer is in the EU, the client-side code uses Skyflow’s API to send the PII data to your company’s data privacy vault in Frankfurt, Germany.
    Note: For customers based in the US, the PII data is instead routed to the data privacy vault in the US (in this case, Virginia).
  4. This EU-based customer’s sensitive PII is stored in the EU-based data privacy vault, and Skyflow’s API responds with tokenized data.
  5. The client-side code sends the customer order request, now with tokenized data, to the server.
  6. The server processes the order, storing the data (now de-identified and tokenized) in cloud storage in the “Oregon, US” region.
  7. At the end of the week, your company’s Snowflake instance in Tokyo, Japan, loads the data (already de-identified and tokenized) from cloud storage to perform analytics.

By using multiple vaults located in different regions around the world, you can easily manage all of your sensitive data to meet various data residency compliance obligations across each of your global markets.

The data privacy vault architectural pattern vastly simplifies the challenges of data residency and compliance. Additionally, by de-scoping Snowflake from the compliance burden of data residency, global analytics executes as normal — within a single Snowflake instance.

Final thoughts

Compliance regulations and their data residency requirements require that businesses uphold stringent standards for data localization, protection, privacy, and security to reduce their risk of breaches, penalties, and reputational damage. However, businesses with customers (and data) located in a variety of global regions face the added challenge of managing multiple regulations across jurisdictions.

Using data privacy vaults lets businesses simplify their global compliance obligations around data residency as they relate to Snowflake and cloud storage.

Using a data privacy vault, companies can isolate and secure all sensitive data in one or more data privacy vaults, removing Snowflake and cloud storage from their compliance footprint. At the same time, by leveraging data privacy vaults in different regions, companies can help ensure that sensitive data is stored and transmitted according to the laws and regulations of each specific region where they operate.

Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer.
Sean Falconer

Sean Falconer is Head of Developer Relations at Skyflow, an all-in-one data privacy solution delivered as a simple API. Prior to Skyflow, Sean was Head of Developer Relations for Google’s Business Communications product suite.

Software Daily

Software Daily

 
Subscribe to Software Daily, a curated newsletter featuring the best and newest from the software engineering community.