Consider a scenario where you have a MySQL pod that your application uses, data can be added, updated, and deleted as the situation demands. By default, when you restart the MySQL pod, all the data will be gone because Kubernetes does not provide data persistence out of the box. You have to explicitly configure this for each application that needs to persist data between pod restarts.
- A storage that is not dependent on pod lifecycle
- A storage that is available on all nodes
- A highly available storage i.e storage that can survive cluster crashes
Kubernetes allows you to define how you want to persist data and how to access them. It also provides several persistence options like local, cloud, network file system - NFS, etc.
In this article, you will learn how to persist data in Kubernetes using abstractions like persistent volume, persistence volume claim, storage classes, and how each component is created and used for data persistence. You will also learn how to use config maps and secrets to define configuration files for your applications.
Persistent Volumes - PV
A persistent volume is a cluster resource that is used to store data. It can be created using a YAML file. It's an abstraction that needs actual physical storage like a local hard drive, NFS storage, cloud storage for persisting data. Storage in Kubernetes needs to be managed by an administrator as Kubernetes only provides the interface for storing data and doesn't manage them. You can have multiple storages configured for your cluster where one application uses local disk storage, NFS server, or cloud storage. It's also possible for one application to use different storage backends. The different storage options can be configured under the
spec section of the Persistent volume configuration file e.g
This is a sample persistent volume with
nfs as the storage backend
apiVersion: v1 kind: PersistentVolume metadata: name: pv-name spec: capacity: storage: 5Gi volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle storageClassName: slow mountOptions: - hard - nfsvers=4.0 nfs: path: /dir/path/on/nfs/server server: nfs-server-ip-address
An example of persistent volume with google cloud storage as the backend
apiVersion: v1 kind: PersistentVolume metadata: name: test-volume labels: failure-domain.beta.kubernetes.io/zone: us-central1-a__us-central1-b spec: capacity: storage: 400Gi accessModes: - ReadWriteOnce gcePersistentDisk: pdName: my-data-disk fsType: ext4
Depending on the storage type the
spec attribute of the YAML configuration will be different because it's specific to the storage type. Kubernetes supports over 25 storage backends for your persistent volumes. You can check them out here
Persistent volumes are not namespaced, which means that they are accessible to the whole cluster. Unlike other Kubernetes objects like pods, deployments, replicaset, etc. Persistent volumes can be accessed from anywhere in the cluster
Persistent Volume Claims - PVC
A persistent volume claim is a request for storage by a pod. A Kubernetes administrator creates a Persistent volume that access data using local or external storage. Then, the Persistent volume claim requests for resources from the Persistent Volume created by the administrator. The pod can now access the local or external storage by using the persistent volume claim. Storage will not be assigned when the requested storage exceeds available storage defined in the Persistent volume
For example, the PVC below is trying to claim 10Gi of storage from the Persistent volume in the cluster
kind: PersistentVolumeClaim apiVersion: v1 metadata: name: pvc-name spec: storageClassName: manual volumeMode: Filesystem accessModes: - ReadWriteOnce resources: requests: storage: 10Gi
So the process involved in claiming storage from the Persistent Volume includes:
- Pod requests for volume through the Persistent Volume Claim
- The Persistent Volume Claim tries to find the Persistent Volume in the cluster that satisfies the requirements
- The Persistent Volume contains the actual resources and is only released when the claim satisfies the available resource requirements
It's important to note that, claims must exist in the same namespace as the pod. Once the pod finds the matching persistent volume through the persistent volume claim, the volume is then mounted into the pod
apiVersion: v1 kind: Pod metadata: name: mypod spec: containers: - name: myfrontend image: nginx volumeMounts: - mountPath: "/var/www/html" name: mypd volumes: - name: mypd persistentVolumeClaim: claimName: pvc-name
Storage classes allow you to provision Persistent Volumes dynamically whenever a Persistent Volume Claim, claims it. Storage class can also be created using YAML configuration file e.g
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: storage-class-name provisioner: kubernetes.io/aws-ebs parameters: type: io1 iopsPerGB: "10" fsType: ext4
provisioner attribute on the storage class configuration file is what is used to determine the persistent volume storage backend. Each storage backend in the Persistent Volume has its own
Storage classes are another abstraction level, that abstracts the underlying storage provider and parameters for that storage. They can then be used to provision Persistent Volumes dynamically as the situation demands. Storage classes are usually requested by a Persistent Volume Claim. You can think of the flow for claiming storage using the steps below:
- Pod claims storage via a Persistent Volume Claim
- Persistent Volume Claim requests storage from Storage Class
- Storage class creates Persistent Volume that satisfies the claim's requirement using provisioner from the actual storage backend
An example of how Persistent Volume Claim claims storage from the Storage class.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mypvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: storage-class-name
With config maps, you can store non-confidential data as key-value pairs which can be used as configurations files for your applications. These configuration files can be consumed by Pods as environment variables, files, command-line arguments, or as a volume.
It is important to note that Config maps do not provide encryption or secrecy of data. If you need to store confidential data you can consider using a Kubernetes secret
# Sample config map apiVersion: v1 kind: ConfigMap metadata: name: special-config namespace: default data: special.how: very apiVersion: v1 kind: ConfigMap metadata: name: env-config namespace: default data: log_level: INFO
How the config map is consumed in the pod
apiVersion: v1 kind: Pod metadata: name: dapi-test-pod spec: containers: - name: test-container image: k8s.gcr.io/busybox command: [ "/bin/sh", "-c", "env" ] env: - name: SPECIAL_LEVEL_KEY valueFrom: configMapKeyRef: name: special-config key: special.how - name: LOG_LEVEL valueFrom: configMapKeyRef: name: env-config key: log_level restartPolicy: Never
There are several other configurations you can set when creating a config map like creating config maps from files, passing values value command-line arguments, etc. To read more about this, you can check here
With Kubernetes secrets, you can store confidential information. It prevents you from hardcoding sensitive data into your code. Secrets are similar to config maps but differ in terms of data security. There are different kinds of secrets, namely:
- Opaque secrets
- Service account token secrets
- Docker config Secrets
- Basic authentication Secret
- SSH authentication secrets
- TLS secrets
- Bootstrap token Secrets
You can read more about each type of secret here
NB: It's important to note that anyone with access to your Kubernetes API or
etcd can safely modify secrets as they are stored unencrypted in the API Server. To prevent this you can consider
- Enabling Encryption at Rest for Secrets.
- Configuring Role-based access control (RBAC) rules to limit who can create and access secrets
Secrets can be used as files in a volume mounted on one or more containers. It can also be used as an environment variable.
An example of a secret
apiVersion: v1 kind: Secret metadata: name: mysecret type: Opaque data: username: YWRtaW4= password: MWYyZDFlMmU2N2Rm
The values in the data section should be in base64. To convert string to base64, you can run
echo -n 'admin' | base64 # YWRtaW4= echo -n '1f2d1e2e67df' | base64 # MWYyZDFlMmU2N2Rm
admin is the username and
1f2d1e2e67df is the password.
To use the secret as environment variables in a pod
apiVersion: v1 kind: Pod metadata: name: mypod spec: containers: - name: myapp image: ubuntu env: - name: USERNAME valueFrom: secretKeyRef: name: mysecret key: username - name: PASSWORD valueFrom: secretKeyRef: name: mysecret key: password
Storage is one of the components of modern applications. Understanding how to configure different storage backends and how to access them is very crucial. Kubernetes provides several levels of abstractions like persistent volumes, persistent volume claims, storage classes, secrets, config maps for defining how data can be requested and used in your applications. Also, setting permissions and putting appropriate measures to prevent data breaches or intrusion is of utmost importance. You can read more on RBAC and Encrypting secrets at rest