UPDATED 08:00 EDT / APRIL 10 2024

BIG DATA

Starburst to release its Iceberg-based lakehouse as a managed service

Starburst Data Inc., which sells a commercial version of the open-source Trino distributed query engine, today announced a fully managed Icehouse data lake on its Galaxy cloud.

Icehouse is an open-source lakehouse that combines Trino and Apache Iceberg storage. A lakehouse is a new type of data architecture that combines the flexibility of data lakes with the performance of data warehouses. Iceberg is an open-source data table format designed for data warehouses and data lakes that supports SQL tables on large amounts of storage.

Iceberg’s schema evolution feature allows the data structure of tables to change over time. It keeps track of all data versions within a table, including additions, updates and deletes. It also integrates easily with popular big data processing frameworks such as Apache Spark, Apache Flink and Apache Hive.

A push for openness

The announcement furthers Starburst’s efforts to promote an open data lake architecture. “We’re seeing thousands of companies combining Trino with Iceberg for a data warehouse that allows them to own the data and isn’t locked into a proprietary architecture like Snowflake,” said Chief Executive Justin Borgman, referring to Snowflake Inc., a data warehousing rival.

Although statistics on Iceberg adoption are scarce, there is a growing consensus that it’s becoming the preferred option for data lake storage. “Iceberg turns the model upside down,” Borgman said. “No one owns your Iceberg tables except you. It can turn a data lake into a fully featured data warehouse and the query engine and formats are sophisticated enough that you can combine them into an open warehouse.” Starburst will continue to support Databricks Inc.’s Delta Lake format.

The managed Icehouse adds support for near real-time data ingestion into a managed Iceberg table at a petabyte scale. Data and development teams can use SQL to prepare and optimize their data and make it available for production.

“Many customers are using Kafka,” said Starburst co-founder Matthew Fuller, referring to the Apache Kafka stream processor. “We connect to a Kafka topic and can read from that topic to land on the lake. Ultimately, we’re limited only by the speed of writing to [Amazon Web Services Inc.’s] S3, which is pretty good. From the time data shows up in Kafka to your being able to query it is one minute or less.”

Developed in tandem

Iceberg isn’t necessarily the fastest storage engine in all cases, but Fuller said, “Trino and Iceberg were built together, so there’s some symbiosis for performance.”

Borgman said with the new managed service, “the time to productivity is measured in minutes. The cluster can be up almost instantaneously, and customers can connect to Kafka and existing data lakes. You can also query data from multiple clouds and multiple data sources.”

Starburst said the list of companies adopting the icehouse architecture included Netflix Inc., Apple Inc., Shopify Inc. and Stripe Inc.

Borgman didn’t specify pricing but said it would be based on consumption. “In our own benchmarks, we’re significantly less expensive than traditional cloud data warehouses,” he said. “We’re consistently one-half to one-tenth the cost of Snowflake.”

The service isn’t generally available yet. Starburst said it has started reviews with a few customers to get feedback. Starting today, customers can sign up for early access to Galaxy Icehouse here.

Photo: Janvanbizar/Pixabay

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU