Short Recap of the Data + AI Summit
This year, some of the data and cloud engineering team attended the Data + AI Summit in San Francisco from June 26th - 29th. From the first morning, it was clear that the success of ChatGPT and other large language models (LLMs) had skyrocketed the popularity of the conference from previous years. The attendance was massive, with every breakout session overbooked and conference halls overflowing with data engineers and data scientists leaning forward in their seats to hear what companies are doing to capitalize on AI.
Most of the talks were centered around the Databricks platform—the primary sponsor of the event—as the data lakehouse that powered many of the speakers' tech stacks to support their AI models.
Takeaway #1: The Big Trends in Data Lakehouses Are All about AI
Unified Governance Solution for Data and AI on the Lakehouse
Most of the topics involved some level of discussion around Unity Catalog. The tool has many features including cross-warehouse platform account management and permission handling, ML monitoring, and collaboration, as well as several new additions including Governance for AI and Lakehouse Federation.
Knowledge Engine for a Proprietary LLM
Another big topic that was covered in the Keynote speech was LakehouseIQ. Built on the infrastructure of the Unity Catalog, LakehouseIQ is an AI tool that offers a proprietary solution for businesses to leverage their data through their own personal AI assistant—trained on their data—to help engineers and even non-technical users work more efficiently and effectively.
A demo of LakehouseAI showcased its ability to understand business’ various acronyms and KPI definitions to help solve a prompted question that a ChatGPT would not be able to answer without specific information about that business.
Data Preparation, Simplified
On the data preparation side of the house, the Databricks team announced Delta Lake 3.0, with several key features including a new uniform data format dubbed Universal Format, aka UniForm. The new table format can read and write data in all three popular data table formats, including Delta Table, Apache Iceberg, and Apache Hudi—an update which, based on the shouts and cheers the announcement received, solved a seemingly widespread issue for platform users.
Another improvement that was showcased was the new liquid clustering feature that optimizes query performance as data grows with more cost-efficient clustering.
Takeaway #2: “The Hottest New Programming Language is English”
The above quote—referring to the expanded AI and ML capabilities to understand common English to assist with programming—came from Andrej Karpathy, former Director of AI at Tesla, and it received a good amount of attention at the summit.
This is perhaps most evident in the way Databricks is positioning itself within the industry—as a technology pioneer who will popularize AI and ML technologies for nontechnical users, data engineers, and data scientists alike.
Databricks recently acquired Mosaic ML, as Satya Nadella, Chairman and CEO at Microsoft, announced in a guest Skype appearance at the summit. Mosaic ML is a smaller company working on optimized machine learning models to save companies massive LLM training costs.
Takeaway #3: The Data + AI Summit Evolved our Lakehouse Outlook
Data Democratization Ahead
Businesses are looking to avoid vendor lock-ins, enable AI, bring developer efficiency with faster development time at reduced costs. Databricks shows its leadership in value creation with its single unified data platform, i.e. it’s cloud-agnostic and simpler to govern. The conference demonstrated several upcoming AI-enabled platform features that will reduce cost and enable efficiency for everyone.
AI and ML Demonstrate Incremental Value
Our team was excited and inspired by many breakout sessions that demonstrated how these tools can multiply companies’ effectiveness in various areas. Talks from Collins Aerospace on how they use ML models to monitor commercial airplane components to predict when parts would fail and proactively order new parts, and Chase bank who use ML models to predict cases of fraud to save significant money for their customers (me included) were perfect examples of success stories that showed real value in putting efforts towards AI and ML.
There’s Still an Opportunity to Clarify the Value Proposition
On the flip side of these inspiring talks, some of the talks covered concepts that spanned multiple professions, and, by the third session in a row, my brain had melted trying to keep up with the amount of information coming at me.
Sessions would cover engineering topics, then dove deep into ML Ops or data science algorithms. Several times during live demos, presenters opened up their terminal to run cURL commands or do some other scripting that I wasn't familiar with, and that left me lost and had groups of people standing up and heading for the door. I suspect that some of my issues understanding these concepts was due to my own inexperience, but if I was struggling to keep up with the topics, then I pray for the C-level executive that has to approve the budget for one of these lakehouse platforms.
Build Solid Foundations to Weather More Changes
As we all left the conference, I reflected on how I felt Further should be advising our many clients on what we learned and what they should do to keep up with all we saw during our time there.
The bottom line is that some companies have been working in lakehouses for years, and they are the ones that have seen the most value out of platforms like Databricks. However, it seems that not all companies are able to resolve data silo problems and reap the benefits of the ongoing AI revolution.
A trusted partner like Further is the key to helping businesses assess, implement, migrate to, and maintain a data lakehouse that fits your business needs and capitalizes on (and creates!) your unique opportunities. When you work with our data engineering experts, we deliver more value than other partners because of our experience and deep expertise in analytics and data science. Contact us below to get case studies of how we’ve helped teams begin and optimize their data lakehouse success.
Databricks enables businesses to move the data once, efficiently, and then enable several capabilities, but it requires significant uplift and knowledge of best practices. Just like the shift to the cloud from on premise servers, the shift to the lakehouse and AI + ML is happening, and the companies that can learn how to get there will reap the rewards.