Managing Cloud Data Services with Heroku

Article Tuesday, March 31 2020

Nearly all modern web applications depend on persisted data in order to function. Since the introduction of frameworks such as JavaScript in the late 1990s, developers have demanded more functionality from their web-based programs than traditional “static” websites could provide. Today, single-page applications (SPAs) such as Gmail provide a dynamic user experience by interacting with the server to rewrite individual components of a page. Increasing demands for interactivity and a customized UX, along with a broadened horizon of what a website could or should do, meant that persistent data storage was more and more critical to a web application’s engineering.

Heroku was the first, and remains the most prominent, Layer 2 Cloud Provider. Heroku is a “Platform-as-a-Service” provider that builds upon the infrastructure of Layer 1 Cloud Providers, such as AWS, to create a streamlined, developer-first platform for the deployment and management of 12-Factor Web Apps. Heroku is a strong proponent of 12-Factor Web App best principles, and the 12-Factor “manifesto” was written by Heroku engineers.

One critical element of 12-Factor Web Apps is so-called “backing services;” that is, “any service the app consumes over the network as part of its normal operation.” Best practices dictate that these backing services- including databases- should be treated as “attached resources,” and an app’s code should be agnostic to whether the resource is accessed locally or over a network.

The principle of “loose coupling” of backing services comes with an implicit contract- that the increased flexibility will not create a trade-off in availability or durability. This is especially important for services such as attached databases, which provide critical data such as user account information. Users of web applications expect fast, accurate rendering of the data they expect to see.

Heroku provides several managed data resources, including PostgreSQL, Redis, and Kafka. In addition, the Heroku Elements Marketplace contains dozens of add-ons available to developers using Heroku’s platform-as-a-service offering.

Heroku’s flagship data management offering is Heroku Postgres. PostgreSQL is an open-source relational database management system (RDBMS) that has been widely adopted since its release in 1996 due to its support for a wide variety of data types, its ACID-compliant transactions, and its use of write-ahead logging to increase fault tolerance. Heroku adopted Postgres in 2007, and it continues to be the most popular data storage offering on the platform. Heroku Postgres allows users to manage schema migrations, database access controls, and scaling from the Heroku platform. A Heroku Postgres database can be shared between several applications by a simple set of commands from the CLI. Heroku Postgres has a feature called rollback, which acts like a time machine for the database, allowing a developer to “roll back” the database to a previous point in time without affecting the present state of the database.

In addition to Postgres and the add-ons in the Elements Marketplace, Heroku offers official integrations with Redis and Kafka. Redis is a key-value store that supports a wide variety of abstract data types. While Redis traditionally holds all data in memory, Heroku Redis is configured to persist data to disk by using an Append-Only File (AOF) and maintaining a high-availability standby for failover. Heroku Redis also provides tools to federate data with Postgres; this ability to manage data from multiple sources in a streamlined fashion is another advantage of a platform-as-a-service offering abstracting away the work of creating a common data model across data sources.

Heroku also offers a managed Kafka service for streaming data. Apache Kafka is a distributed streaming platform which provides four core APIs (Producers, Consumers, Streams, and Connectors) which allow communication across a distributed system using an abstraction called a “topic.” A topic is a stream of records, created by Producers, which other members in a distributed application can subscribe to (these are the Consumers). Kafka builds on the concept of an event-driven architecture (EDA), which uses messages between services as the drivers of application state. Kafka also acts as a transport for large volumes of immutable event streams, making it a key tool for real-time data streaming and parallel processing of Big Data. Kafka acts as a “distributed commit log”, storing key-value records of these messages across several nodes in a cluster. Kafka also works hand-in-hand with Zookeeper, which helps orchestrate nodes across the cluster and perform failover migration.

As distributed applications scale, the management complexity increases rapidly. As before, Heroku’s focus is on streamlining the management of data resources. Heroku Kafka allows management of Kafka through the web platform or the CLI, while lower-level configuration tasks are abstracted away. Heroku Kafka allows straightforward monitoring of Kafka clusters, and is built to be secure and compliant with regulations involving streaming of Personal Identifiable Information (PII).

For developers building 12-Factor Web Apps- or any cloud-based applications which can benefit from a streamlined development-to-production workflow- Heroku’s data management tools offer significant benefits. Less time spent in configuration can equate to more time spent coding the application itself. For more information on Heroku Postgres, we did a deep dive on the subject with Jon Daniel, an infrastructure engineer at Heroku. For more information on Heroku, check out their website, or visit our Heroku archives at SoftwareDaily.com.