Modern application delivery relies on visibility—seeing, understanding, and acting on what’s happening within dynamic containerized environments. Our platform engineering team at NetApp IT faced a familiar challenge: ensuring real-time observability and security across our Kubernetes clusters, without added complexity or cost.
What began as an experiment soon evolved into replacing an industry-standard tool with one of our own NetApp solutions: NetApp Data Infrastructure Insights (DII).
We previously used Sysdig to monitor our Kubernetes clusters, relying on its observability and security features. However, as we introduced a new modern SDLC security suite onto our platform, we needed to reassess our tooling strategy.
At the same time, DII was rapidly maturing. After evaluating its observability capabilities, we strategically decided to retire Sysdig Monitor, reallocate our licenses to Sysdig Secure for runtime protection, and adopt DII as our core monitoring and diagnostics platform.
“It wasn’t a decision we took lightly,” said David Fox, NetApp Platform Engineering Team Lead. “We put DII through the same scrutiny we would any external tool. We had to be confident it would give us the insight we needed, especially when things go wrong in production.”
For our team, observability is mission-critical. We need to respond quickly to issues like dropped connections or performance slowdowns. DII has become our go-to platform for identifying root causes and resolving incidents quickly.
When developers flag issues, we use DII to track metrics like CPU usage, memory pressure, host utilization, and error rates. DII helps us correlate resource spikes with pod configurations or limits, enabling us to identify containers hitting their CPU cap, for example, before they cause wider disruptions.
Another game-changer is DII’s change-tracking capability. We can trace changes to application manifests over time and correlate them with incident reports. If something started breaking three days ago, DII shows us exactly what changed and when. That historical lens is essential for solving intermittent issues.
Our journey with DII wasn’t seamless. Early versions lacked full feature parity with Sysdig, and there were compatibility hurdles. But we didn’t go it alone—we partnered closely with the DII product team, submitting enhancement requests and validating new capabilities as they came online.
“In many ways, we’ve acted as both customer and collaborator,” said Fox. “We’ve helped shape the direction of DII, and that’s been a unique and rewarding aspect of this journey.”
This close collaboration means NetApp IT directly influences the tool’s development—a rare and valuable position when working with internal solutions.
As DII continues to evolve, we’re driving toward deeper integration across platforms, increased automation, and support for multi-cluster and multi-cloud environments.
We’re leveraging services like Amazon FSx for NetApp ONTAP to extend this visibility into our public cloud footprint. With FSx for ONTAP, we maintain consistent observability and automation standards across both on-prem and cloud environments, powered by Trident and tightly integrated into the DII platform.
DII is also becoming a cornerstone of our broader observability and FinOps strategy, providing insights into health, performance, cost, and application behavior at scale.
This shift is more than a tooling change—it’s part of our commitment to operational excellence, platform resiliency, and delivering unified observability across NetApp IT’s hybrid infrastructure.