Data science

How to Solve Kubernetes’ Persistent Storage Challenges

Click to learn more about author Romuald Vandepoel

Kubernetes and containerization may be thought of as transformative when compared to more traditional cloud deployment approaches. Well, that’s how it appears in development environments. However, heads of infrastructure, platform managers, and other professionals charged with overseeing production environments know the benefits are finite.

That’s because the scalability, resource efficiency, fast failover, and other Kubernetes benefits generally apply just to ephemeral workloads. Stateless applications – those without the need to store data from one session to the next – perform very well in a Kubernetes infrastructure. However, stateful applications that need persistent storage still lean on legacy infrastructure when in production. As such, the benefits linked with Kubernetes remain unattainable. Thankfully, the challenges associated with persistent storage in a Kubernetes environment can be solved if you adopt the right approach. 

The Kubernetes Persistent Storage Paradox

Containers work optimally with stateless applications. Kubernetes can create and remove containers in a rapid and dynamic way. This is because applications within those containers are packaged up with the dependencies they need to run. Wherever a new container is spun up – in the same cluster or a different cloud provider – Kubernetes ensures the application has access to fundamental operational resources.

Dynamic creation and deletion of containers doesn’t sit well with applications that need to store persistent data. A stateful containerized application must know where its data is. It must have consistent, reliable access to that information. It must also ensure integrity as it’s created or destroyed across a Kubernetes cluster. This isn’t possible if the stored state on an application is destroyed each time its container is destroyed.

Naturally, platform managers and developers want the best of both worlds. They want easy deployment, fast failover, and containerized efficiencies alongside stateful workload persistence. There are methods of establishing persistent storage for cloud-based applications – but, as we’ll find out, a lot have downsides.

Platform Manager Responses

Build It Yourself: Ephemeral-only solutions and downtime are not options in many modern industries, including the financial and health care sectors. In these instances, businesses deploy traditional storage systems to manage their stateful applications. And they invest a sizeable amount of time and resources in the process.

The main problem here is that this approach doesn’t provide benefits that come along with Kubernetes deployments. It’s not platform-agnostic. It must be run in a very specific environment. It’s challenging to scale and will be more likely to suffer downtime. Plus, you’ll have to be very hands-on in managing things like failover and storage. Self-service? Who needs it!

Construct a Workaround: The decision not to use Kubernetes creates a whole raft of problems. Some developers choose to build arcane workarounds to provide Kubernetes with persistent storage and make it work with stateful applications. But the issue with these workarounds is that they often undermine the positive reasons for choosing Kubernetes.

You might, for example, pin your application’s storage to its container. This means Kubernetes can’t move your application if that node should fail. Alternatively, you might find a way to attach your containers to cloud storage. This can slow down your system considerably and create another potential failure point.

Furthermore, it will be almost impossible to hire anyone with sufficient expertise to manage your storage array as it will be completely unique to your organization. Instead, they’ll have to learn on the job and be a quick study. What’s more, there will only ever be a handful of highly specialized individuals at your organization that understand how your infrastructure operates. So, unless you’ve got incredible job retention at your company, it’s likely you’ll always be on the backfoot and not have a full complement of experts on board. 

Most importantly, however, traditional storage systems lack the community knowledge and fast-paced innovation inherent in open-source projects such as Kubernetes. For these reasons, your system is likely to depreciate faster, updates will take longer, and problems will be more difficult to fix. 

Software-Defined, Cloud-Native Storage: An Alternative Approach

In an ideal world, your application should handle storage as one more declared requirement. If a containerized application doesn’t have sudden trouble accessing its declared load balancer when a node is spun down, for example, why can’t storage work in a similar way?

Luckily, there’s a class of solutions that provide just this kind of functionality. Storage orchestrators are cloud-native, software-defined solutions that allow applications to manage storage as a declarative resource. They deliver storage that’s linked to the same container as the application – allowing it to survive if a node fails.

These solutions combine a cluster’s stored data within a shared pool. Storage orchestrators work as the intermediary when an application requests data access. They pull the appropriate volume from the pool and make it available to the container within the cluster that requested it.  

Therefore, when a container is spun down or in the event that a node goes offline, the data continues to be accessible in the cluster’s pool. This means that platform managers do not need to establish a complex external storage solution for stateful workloads purely to support developer working environments.

How Do I Find the Right Storage Orchestrator?

Deploying a storage orchestrator will bring about massive improvements in reliability, scalability, and performance. But this doesn’t mean that each is equally suitable for the same use cases. To that end, we commissioned a benchmarking study that compared the performance of four leading solutions:

Longhorn OpenEBS Rook/StorageOS

Each orchestrator was subjected to the same tests – using the same hardware and software configuration to evaluate their performance under different circumstances. For platform managers wanting to deploy the optimal storage orchestrator to manage Kubernetes’ persistent storage in production, it’s a valuable first step in selecting the right option. If you’d like to know more, you can read more about the study here.

Back to top button