Rockset, Tecton and the rise of real-time data

The term “real-time” has been ubiquitous in technology, from real-time stock picking to real-time pizza tracking. But there is near real time, and then there is real time.

As everyday businesses begin to integrate data tools and tactics used within larger tech companies, an industry of data service providers has emerged to help them take advantage of analytics and data-driven approaches. real-time machine learning that only giant companies with much larger database teams and resources could have afforded in the past. Companies like Hazelcast, Rockset, Tecton, and others enable split-second analytics and machine learning for things like financial fraud prevention, dynamic pricing, or product recommendations that address this on what you just clicked.

These companies promise to ditch laborious batch data processing for old-fashioned business intelligence analysis. But it remains to be seen whether every company needs, wants or is ready to operate at the speed of a Citibank, Uber or Amazon.

Updating data every few days, every night or even every hour or so for trade analysis using a typical batch approach “it’s like playing Monday morning quarterback “, said Venkat Venkataramani, CEO and co-founder of Rockset, a company that provides a database to build applications for real-time data, analytics and queries. “That will no longer be enough. I am six points behind, the match is not over yet – what should I do differently to change the outcome of the match? he said, continuing a football metaphor which he predicts will represent more and more business scenarios involving new data over the next two to three years.

These startups believe that the growing influx of real-time data into data lakes and lake houses — from clicks on e-commerce sites to pings from IoT sensors — will force businesses to use this information immediately as it happens. as they arrive.

Some of Hazelcast’s customers process the data they consume in real time for predictive analytics based on machine learning models used to service oil rig and wind turbine equipment, the CEO said. Kelly Herrel. Tecton, which helps companies manage real-time data pipelines to power ML models, has seen insurers use its services to leverage their driver behavior-based rebate programs, according to company founder Mike Del Balso. .

But these are rare use cases. These startups say most of their inbound interest comes from banking and e-commerce customers who want to prevent fraud while someone waits for a banking transaction to occur at an ATM, or to immediately detect when a payment stops working in a particular country. “Wherever there’s real money and risk involved if you don’t run something in real time,” that’s where companies use real-time data services, Venkataramani said.

“The primary use case for us is recommendations,” said Del Balso, who said customers “want to make a recommendation based on what the user just did.” Anyone who’s browsed through products on an e-commerce site or scrolled through movie options on a content delivery platform knows the micro-frustrations that result from systems that don’t recognize their most recent moves.

Machine learning helps drive engagement in real time

Across the business spectrum, there are a few key factors driving the rise of real-time data processing and analytics, and the rise of AI and machine learning is significant. Businesses want to use machine learning systems that improve as they are exposed to new information in hopes of making smarter decisions and optimizing existing efforts within milliseconds.

While traditional business intelligence analytics efforts don’t need real-time data or processing, “real-time and machine learning really go hand in hand,” said Gaetan Castelein, VP – president of marketing at Tecton, who explained that real-time data and machine learning trends converge and feed off each other.

Consider a bank performing millions of transactions every hour, for example. Whether or not the model decides to approve the transaction may depend on as many as 2,000 individual pieces of information: some relatively static, such as a postal code, and some brand new, such as the numerical amount of a wire transfer. species. However, data systems like Tecton’s can optimize the most efficient approach to managing this process by separating data elements that are fresh from those that remain the same.

“Because you have to respond very quickly to the transaction, but you don’t want to calculate that stuff in real time,” Del Balso said. “It’s okay if some of the signals are a bit delayed,” he said, adding, “It becomes a performance trade-off.”

Optimizing a recommendation engine to respond in real time to something a user just did half a second ago “is the difference between Netflix and TikTok,” said Manish Devgan, product manager at Hazelcast. “As you browse, it’s actually updating a machine learning model,” he said of TikTok’s content recommendation system.

Along with the machine learning boom, innovations in database architecture have also helped drive interest in working with real-time data. The availability and ease of use of database technologies used for real-time data analytics such as Apache Kafka and Confluent help companies manage real-time data initiatives with larger engineering teams small.

More generally, the explosion in the flow of data from online and IoT systems, coupled with the adoption of the cloud and its profitability, is also driving interest in the use of real-time data.

“As more businesses continue to migrate to the cloud and invest in digital transformations, the volume, variety and speed of machine-generated data – clickstreams, logs, metrics, IoT – will proliferate exponentially,” said Derek Zanutto, general partner of CapitalG, who added that Google’s investment arm currently has no business in the real-time data space.

“As the volume of machine-generated data continues to proliferate, forward-looking, data-driven organizations will increasingly seek opportunities to leverage this data for operational analytics use cases in real time that will help them maintain or improve their market leadership,” Zanutto said.

When near real time is enough

There’s a difference between what’s real-time and what’s just approximate, according to database experts. “People use the word ‘real-time’ in a very abusive way,” said Ravi Mayuram, chief technology officer at Couchbase, a database company that enables real-time data processing.

He and others say that if data analysis is done in minutes rather than seconds or fractions of seconds, it is not real time; it’s just near real time. Venkataramani said he defines real-time data processing as something that takes less than two seconds.

“It’s a bit complicated and nuanced, so it’s easy to mix up real-time and near-real-time,” Del Balso said, adding that for most businesses and use cases, near-real-time is sufficient.

Indeed, some experts say processing data every few minutes should be enough for many businesses.

“There are extreme ends where you really, really need [real time]said Ryan Blue, co-founder and CEO of data platform startup Tabular, and a former Netflix database engineer who helped build Iceberg, a core data architecture used for the Lakehouse-type analysis. “The question is, when is a five-minute batch enough?” said Blue.

Some Rockset customers don’t even use the company’s most extreme real-time data capabilities. Seesaw, an e-learning platform, uses Rockset to enable analysis, data visualization, and data queries. But for now, said Seesaw product manager Emily Voigtlander, batching every night is fine. While she didn’t rule out future needs for Rockset’s real-time data services, Voigtlander said, “That’s not really what’s most critical to our business right now.”

But wait, some say. Today, companies that are still mastering batch processing might decide to leapfrog the competition, said Preeti Rathi, general partner at venture capital firm Icon Ventures. These types of businesses might ask, “If we can jump right in here, why not? ” she says.

The growing interest in real-time analytics and data processing represents what Gerrit Kazmaier, Google Cloud VP and GM for Database, Data Analytics, and Looker, has called a “paradigm shift” in data stacks. traditional data to systems that “connect intelligence systems” to applications that enable businesses to influence customer behavior or take action using machine learning and on-premises analytics.

“So now you’re at a tipping point, where suddenly the strategic business platform is no longer the functional system, it’s the data system,” he said.

Comments are closed.