Emerging Architectures for Real-Time Analytics in Applications
There’s been a proliferation of applications that are all about using data to derive actionable insights. Companies like Command Alkon digitize construction logistics by providing a platform for suppliers, transportation providers and contractors to monitor and coordinate the shipment of construction materials. And, fintech startup Matter uses natural language processing to evaluate the sustainability of investments. In the last five years, the number of these data applications has grown by 73%, representing more than 20% of SaaS applications (data from Crunchbase).
This wave of data applications incorporates predictive features, to automate decision-making, as well as drill-down features so users can reach their own conclusions. For example, security analytics applications notify users to potential threats but security teams still want to dig into the data to determine the risk to the organization. In many scenarios, including in security analytics applications, there’s a symbiotic relationship between predictive capabilities and embedded search and analytics.
The challenge in delivering real-time analytics in applications is different than surfacing analytics to a team of analysts. For one, users don’t want to wait seconds for in-app analytics to load. They want the same snappy, responsive experience across the application. User engagement wanes when users have to wait for queries to load. They run fewer queries, log in less frequently and don’t take the desired actions in the product. So, slow analytics ends up reducing the value of the application to the organization.
The other big change is that users want to take immediate action on the freshest data. Industries like security, logistics, and advertising need to access the latest data to make operational decisions; data delays can result in security breaches, suboptimal routes and inefficient bidding. These aren’t just periodic, strategic questions that are being asked. Many applications are using analytics for operational decision-making and that requires the latest data.
Getting data applications to meet user expectations requires a new type of data architecture: a data architecture designed for sub-second search and analytics across 1000s of concurrent users at scale. In this article, two emerging architectures are shared that support this new wave of real-time analytics in applications. One architecture that accelerates data warehouse performance to meet the sub-second query latency requirement of applications. And, the other, that indexes streaming data from event streaming platforms or database change data capture to support time-critical use cases.
Accelerate Data Warehouse Performance
Interactive analytics enables users to slice and dice the data to derive strategic insights. One goal of accelerating data warehouse performance is to shorten the query latency or the time from when a request is made and the result is returned to the application. To keep users engaged, you want that latency to be under a second.
Waiting for analytics in applications to load is a less than desirable user experience.
For example, a music analytics application is used by record labels to uncover trends in the industry and monitor the success of artists and songs across channels. When users hop into the application, they want to dive into the data to quickly uncover insights. Waiting seconds for each query to load adds time to the decision-making process and isn’t a desirable user experience.
Many engineering teams will look to build data applications on data warehouses as they excel at analytics. While data warehouses are excellent solutions for BI and data science, they were built for a world where analytical queries were executed manually by teams of analysts. Not for a new world where hundreds to thousands of users are executing queries from within an application.
That’s why it’s a best practice to pair a data warehouse with Rockset to accelerate the performance of analytics for applications. Rockset’s indexing approach optimizes for millisecond-latency, high-concurrency queries in a cost-effective way.
In this architecture, Rockset is accelerating the performance of data warehouses for interactive, sub-second analytics.
There are four main components of this data architecture:
Data Warehouse: Data warehouses such as Amazon Redshift, Google BigQuery and Snowflake serve as a central repository for data in the enterprise for business intelligence, reporting and data science. Data warehouses are optimized for processing large-scale data but have not been designed to serve applications.
dbt: dbt allows teams to transform data in their warehouse using SQL. In dbt, SQL queries are models that can be represented as materialized views. These views can be ingested into Rockset and served to applications.
Rockset: Rockset is a real-time indexing layer on Snowflake that accelerates query response times using an indexing approach. Rockset uses a Converged IndexTM to index any data- structured, semi-structured, geo or time series data- in a columnar store, search index and row store for fast search and analytics. Rockset is designed for sub-second, high-concurrency analytics.
API: Creating a data API with Rockset is as simple as saving your SQL query as a REST endpoint and triggering it from your application.
Real-Time Analytics for Data Streams and Operational Databases
In time-critical scenarios, real-time analytics uses newly generated data to make predictions, ask questions, and automate decision-making in the application. Real-time analytics optimizes for both low query latency and low data latency, or the time from when data is generated to when it is queryable.
For example, Rumble is an application that enables users to earn money and redeem rewards by adopting and maintaining healthy habits. To nudge individuals towards desired behavior, such as exercising on a regular basis, Rumble counts steps and creates leaderboards so users can track activity relative to their peers. Users of the application want to see in real time how their steps help move them up or down the leaderboard. In this scenario, data freshness is key.
Real-time analytics relies on real-time data. Real-time data can either be streamed from a transactional database using change data capture or from applications or devices using an event streaming platform. Rockset ingests and indexes the data to serve real-time analytics in applications.
In this architecture, Rockset is indexing real-time data from data streams and operational databases to serve real-time analytics.
There are four main components of this data architecture:
OLTP Databases: Operational databases such as MongoDB, Amazon DynamoDB, PostgreSQL and MySQL are excellent at processing transactions but have not been designed for complex analytical workloads at scale. Engineering teams will often choose to move analytics to a separate serving layer for complex search, aggregations and joins at scale.
Data Streams: Application and device data is streamed through event streaming platforms such as Amazon Kinesis and Kafka to data sinks, like Rockset, to power downstream applications.
Rockset: Rockset continuously syncs data from OLTP databases and streams so new data is queryable within seconds of being generated. Rocket indexes the data in real time to support millisecond-latency SQL queries.
API: Create an API and query Rockset directly from your application using Query Lambdas, which are named parameterized SQL queries stored in Rockset and executed from a REST endpoint.
Integrating Real-Time Analytics into Applications
Many SaaS applications look at active users as one of the golden measures of product health. If a user goes into an application and takes an action on a recurrent basis, the product is sticky. More and more, data is the key ingredient helping users take desired actions in products.
Keeping users engaged in products also requires a seamless, snappy experience. That means analytics loads incredibly fast, recommendations are made on the latest data and that the application experience does not change as usage grows.
The two emerging architectures for real-time analytics shared in this article have been designed to achieve the low query and data latency demanded of applications in a scalable way.