From Research To Production: Kiran Gadhave On Druid, Kubernetes, And Data Provenance |

In April, Apache Druid 33.0.0 was released, marking one of the project’s most substantial updates yet. The new version focuses on making real-time analytics smoother to build and easier to maintain, from faster data loading to more flexible control over storage and monitoring. For engineers, it means less manual tuning and more confidence that their systems will keep up with growing data demands.

To understand what these improvements mean in practice, we spoke with Kiran Gadhave, software engineer and IEEE Senior Member. Blending research and real-world experience, Gadhave explains how the latest release helps teams strengthen reliability in cloud environments. 

Kiran is a full-stack software engineer at Imply Data who works closely with Apache Druid through his role on Imply Polaris. During his PhD in Computer Science at the University of Utah, Gadhave developed Trrack, a JavaScript and TypeScript library that tracks the whole history of user interactions in visualization tools. Scientific and healthcare applications of the tool have been cited in over 60 academic publications. He also contributed to open-source platforms like reVISit and Loops, helping teams make analytical workflows more transparent and reproducible. “Tracking provenance is about answering the question: how did this state come to be? That perspective hasn’t changed, whether I’m building visual tools or distributed systems,” he says.

Now working on real-time pipelines at Imply, Gadhave brings the same focus on clarity and control to large-scale infrastructure. Druid 33.0.0, he explains, introduces features that reduce the operational complexity of ingestion and monitoring. One of the most impactful is scheduled ingestion, which allows periodic batch loads to be defined directly inside Druid using cron syntax. “In previous versions, batch ingestion had to be orchestrated through external systems. That added more points of failure and made debugging more difficult. With scheduled ingestion built into Druid, it becomes easier to keep pipelines reliable and self-contained.”

Another improvement is the turbo segment loading mode, which speeds up getting new data online. This is especially useful during scale-ups or cluster restarts. “When a node is restarted or a large batch is ingested, turbo loading helps ensure the data becomes queryable faster. It’s a practical feature in scenarios where low latency and up-to-date results are essential.”

This kind of attention to system responsiveness builds on Mr. Gadhave’s earlier work in interactive data analysis. While at the University of Utah, he developed tools to support exploratory research in healthcare and cancer studies. In a study involving 350 users, a machine learning system he designed for scatterplot visualizations achieved 77 % accuracy in predicting user intent. This result demonstrated how interaction data could improve responsiveness and usability in visual analytics. His open-source libraries for tracking user interactions have been downloaded over 20,000 times and continue to support reproducibility and provenance in scientific applications.

During internships at the Bosch Center for Artificial Intelligence, Gadhave worked on production-ready user interfaces for biomedical research, including visualizing segmentation results from more than 10,000 medical images. These experiences sharpened his ability to translate complex analytical processes into clear, interactive systems. 

This breadth of experience informs how he approaches infrastructure-level challenges. In Druid 33.0.0, for example, performance enhancements to cloud storage now reduce latency during segment uploads by leveraging the S3 Transfer Manager. “In cloud-native systems, you often hit I/O limits before CPU or memory limits. Optimizing that segment push step means fresher data in dashboards and lower ingestion lag.”

Additional gains come from improvements in the Druid web console and its integration with Kubernetes-based deployments. The UI now includes clearer task status indicators and more flexible filters, which help streamline day-to-day monitoring. “These aren’t flashy features, but they reduce friction when operating a live system,” says Gadhave. “The ability to quickly assess supervisor health, filter logs, or export results in the right format saves a lot of time.” For teams deploying Druid in containerized environments, version 33.0.0 improves system stability through hostname-based service registration, which prevents common coordination failures during pod churn.

Recognition of Gadhave’s expertise and achievements extend beyond the systems he builds. This year, he was elected to membership in Sigma Xi, an honor society that admits researchers based on peer evaluation of their scientific contributions. He has reviewed submissions for leading human-computer interaction and data visualization conferences, including the ACM Conference on Human Factors in Computing Systems and IEEE Pacific Visualization Symposium. His selection for the Dagstuhl Seminar 2023 placed him among a small group of international experts invited to shape the future of interactive visualization research.

Beyond ingestion and monitoring, Gadhave sees value in new API-level control over compaction and task status. These capabilities help teams maintain performance consistency as datasets grow. “When compaction is opaque, it’s hard to tune or optimize,” he explains. “The new APIs make triggering or checking compaction programmatically easier, which helps maintain predictable query speed over time.”

Gadhave emphasizes the importance of resilience in Kubernetes environments. Updates in Druid 33.0.0 reduce reliance on dynamic pod IPs, instead registering services using stable hostnames. “In earlier deployments, service discovery could break after a pod restart if addresses changed unexpectedly. With hostname-based registration, those issues occur far less often, and system coordination becomes more reliable.”

This focus on long-term maintainability extends beyond code. During Gadhave’s doctoral years, he mentored students and junior developers on building interactive systems and led training sessions on data analysis and Python for researchers from non-technical backgrounds. Communicating complex technical ideas, in code, documentation, or teaching, has remained a consistent part of his practice.

Another small but meaningful update is the improved handling of supervisor restarts. “If there are no changes to the ingestion spec, Druid can now skip restarting the supervisor. That might sound minor, but it reduces churn and lowers the chance of ingestion gaps or duplicates.”

These changes are more than technical niceties, they reflect real use cases from production. Imply Polaris, which provides a managed experience on top of Druid, incorporates these enhancements to offer greater stability with less operational overhead. “The aim is to make the capabilities of Druid accessible without requiring users to engage with its internal complexity,” Gadhave says.

He shares a few principles that guide his work. First: automate early. “If you’re still relying on manual triggers for ingestion or compaction, you’re taking on risk. Let the system do the routine work.” Second: design for failure. “Logs, metrics, lineage, these aren’t optional. You can’t fix what you can’t observe.” And third, keep the architecture minimal. “Druid now handles more natively, streaming ingestion, batch loads, querying, so it often makes sense to consolidate rather than rely on external glue.”

For Gadhave, there has never been a clear line between academic research and production systems. “Transparency, whether in visual analysis or infrastructure, builds trust. And trust is what makes analytics useful.”


LEAVE A REPLY

Please enter your comment!
Please enter your name here