The lakehouse concept spent about three years being debated in blog posts and conference talks. Data warehouse vendors argued that transactional tables on object storage could never match query performance on columnar formats. Data lake vendors argued that strict schemas were unnecessary overhead. The debate was interesting but it obscured a more practical question: what are enterprises actually deploying, and is it working?
The data is settling. Across our portfolio and our broader deal flow, the lakehouse architecture has moved from experimental workloads to production analytics at scale. Not universally — and not without operational complexity — but the pattern is clear enough that we have been actively increasing our data infrastructure exposure over the past 18 months.
What Changed
Two things happened around the same time that unlocked enterprise adoption:
Open table formats matured enough to be production-ready. The formats that allow ACID transactions on object storage — Delta Lake being the most widely deployed in our portfolio companies' customer base — became stable and well-supported enough that an enterprise data engineering team could actually trust them in production. Two years ago, there were too many edge cases. Today the failure modes are well-understood and the operational tooling to manage them exists.
Query engines got genuinely fast. Sub-second query latency on petabyte-scale datasets is now achievable with well-tuned configurations on modern engines. That matters because it removes the performance argument for maintaining a separate, expensive data warehouse alongside the lake. The moment you can run your BI queries directly against the lake with comparable latency, the cost case for decommissioning the warehouse becomes compelling. We have seen two portfolio companies' customers run that calculation and eliminate six-figure annual data warehouse spend within 12 months of deploying a lakehouse.
The Implementation Reality
What the enthusiast press gets wrong is the smoothness of the transition. The architecture is sound. The execution is hard.
The biggest friction points we hear consistently from data engineering teams:
"Governance is not solved. You can ingest everything into the lake, but knowing what you have, who can access it, and what the lineage is — that is still largely a manual or best-effort process at most organizations."
This is the most common complaint. Lakehouses inherit the data catalog problems that plagued earlier lake architectures. The table format solves the transaction and performance problems. It does not solve the organizational discipline problems that cause a lake to become a data swamp.
The second friction point is streaming ingestion. Batch ingestion into a lakehouse is well-solved. Real-time streaming with sub-minute latency while maintaining ACID guarantees requires careful architecture choices that most teams are still working through. We see this as an ongoing investment category — the companies that make streaming into lakehouses as operationally simple as batch ingestion will have a significant market.
Where Enterprise Adoption Is Concentrating
Not every enterprise use case is moving to the lakehouse architecture at the same pace:
| Use Case | Adoption Speed | Primary Driver |
|---|---|---|
| Historical analytics and reporting | Fast | Cost reduction vs. data warehouse |
| Machine learning feature stores | Fast | Unified access to structured and unstructured data |
| Real-time operational analytics | Moderate | Streaming architecture complexity |
| Transactional workloads | Slow | OLTP systems remain separate for now |
| Regulatory reporting | Moderate | Audit trail requirements still being solved |
The fastest-moving use cases share a common trait: they are read-heavy, the data volumes are large, and the primary performance requirement is throughput rather than latency. Those workloads are natural fits for the architecture. The slower ones involve either strict transactional guarantees or real-time requirements that the current tooling ecosystem handles less gracefully.
Our Portfolio Perspective
Crestdata, one of our Series B portfolio companies, is sitting in an interesting position in this market. They built specifically for the sub-second analytics query requirement — the performance gap that still exists between theoretical lakehouse capability and what a business analyst can run interactively against a petabyte-scale dataset. Their early customers are analytics teams that outgrew their data warehouse on cost but could not fully migrate because of query performance requirements.
What we are watching in their customer base gives us a reasonable proxy for where enterprise adoption broadly is heading. The pattern we see: initial deployment for a specific high-volume analytics workload, gradual migration of adjacent workloads as confidence builds, and then a strategic decision about the data warehouse contract at next renewal. That renewal decision — keep the warehouse, reduce it, or eliminate it — is the clearest indicator of where an enterprise's lakehouse conviction has landed.
The architecture debate is over. The execution work is where the value is being created now, and that is where we are focused.