TL; DR;

korbというツールを使うことで既存PVのStorage Classを移行することができた
タイムアウト値が不足していることに起因すると思われるエラーが発生したため最新版に搭載されているオプションを使うことで解消できた

モチベーション

以前、こちらやこちらで書いたように、我が家の自宅サーバー環境に新たにSynology製のNASを導入しました。それに伴い、既存のPersistent Volume (PV)のStorage ClassをNASのNFS共有ファイルに切り替えるというのが自然な発想です。

ただし、PVの中のデータを維持したまま利用するアプリケーションのKubernetesリソースを書き換えるというのは少し難しいです。色々調べてみたところ、korbというOSSのツールがこのモチベーションど真ん中だったため、このツールを利用してStorage Classの変更を行ってみました。

KorbによるStorage Classの変更

前提として、すでに新しいStorage Classは作成されているものとします。

korbはコンパイル済みのバイナリが提供されているため、GitHubからダウンロードします。

$ curl -LO https://github.com/BeryJu/korb/releases/download/v2.3.2/korb_2.3.2_linux_amd64.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 10.9M  100 10.9M    0     0  5764k      0  0:00:01  0:00:01 --:--:-- 13.5M

$ tar -xvzf korb_2.3.2_linux_amd64.tar.gz
LICENSE
README.md
korb

$ ls
LICENSE  README.md  korb  korb_2.3.2_linux_amd64.tar.gz

それでは、OpenSearchのデータストアとして利用しているPVのStorage Classを変更してみましょう。このPVは現在nfs-volume1というStorage Classを利用していますがこれをnfs-volume2に変更します。

$ kubectl get pvc -n monitoring
NAME                                                                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
opensearch-cluster-master-opensearch-cluster-master-0                                                    Bound    pvc-ddde3b36-70b2-416d-b394-a2148242fe29   8Gi        RWO            nfs-volume1    15d

$ kubectl get sc
NAME          PROVISIONER                                           RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
nfs-volume1   k8s-sigs.io/nfs-subdir-external-provisioner-volume1   Delete          Immediate           false                  22d
nfs-volume2   k8s-sigs.io/nfs-subdir-external-provisioner-volume2   Delete          Immediate           false                  22d

PVを利用しているPodを一時的に削除し、strategyとしてcopy-twice-name（既存のPVをコピーした新しいPVを作成してその名前を置き換える）を指定してコマンドを実行します。

$ ./korb --new-pvc-storage-class nfs-volume2 --source-namespace monitoring --strategy=copy-twice-name  opensearch-cluster-master-opensearch-cluster-master-0
...
WARN[0018] failed to copy                                component=mover-job error=EOF
WARN[0018] failed to copy                                component=mover-job error=EOF
WARN[0018] failed to copy                                component=mover-job error=EOF
INFO[0018] And we're done                                component=strategy strategy=copy-twice-name
INFO[0018] Cleaning up...                                component=strategy strategy=copy-twice-name

WARNINGが多く出て怖いですが、最終的には成功したようです。実際、PVをみてみるとStorage Classが変わっています。

$ kubectl get pvc -n monitoring
NAME                                                                                                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
opensearch-cluster-master-opensearch-cluster-master-0                                                    Bound    pvc-65bbc892-eaa7-4cd0-b478-d7aa4f30ddee   8Gi        RWO            nfs-volume2    102s

サイズが大きいPVの変更時の注意点

上記のような流れで移行作業を行っていたのですが、ファイル数が多かったりするPVの移行で何回かエラーが出たためその対処法を紹介します。

途中までは移行がされているように見えるものの、数分経った時点で以下のようなエラーが出てコマンドの実行が中断されてしまいます。

$ ./korb --strategy copy-twice-name --new-pvc-storage-class nfs-client-hdd-ds1522 --source-namespace monitoring prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 --timeout 3600s
DEBU[0000] Created client from kubeconfig                component=migrator kubeconfig=/home/localadmin/.kube/config
DEBU[0000] Got current namespace                         component=migrator namespace=default
DEBU[0000] Got Source PVC                                component=migrator name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 uid=97923774-c989-4e14-b349-1310bd49b80f
DEBU[0000] No new Name given, using old name             component=migrator
DEBU[0000] Compatible Strategies:                        component=migrator
DEBU[0000] Copy the PVC to the new Storage class and with new size and a new name, delete the old PVC, and copy it back to the old name.  component=migrator identifier=copy-twice-name
DEBU[0000] Export PVC content into a tar archive.        component=migrator identifier=export
DEBU[0000] Import data into a PVC from a tar archive.    component=migrator identifier=import
DEBU[0000] User selected strategy                        component=migrator identifier=copy-twice-name
DEBU[0000] Set timeout from PVC size                     component=strategy strategy=copy-twice-name timeout=1m0s
WARN[0000] This strategy assumes you've stopped all pods accessing this data.  component=strategy strategy=copy-twice-name
DEBU[0000] creating temporary PVC                        component=strategy stage=1 strategy=copy-twice-name
DEBU[0000] skipping waiting for PVC to be bound          component=strategy stage=2 strategy=copy-twice-name
DEBU[0000] starting mover job                            component=strategy stage=2 strategy=copy-twice-name
DEBU[0002] Pod not in correct state yet                  component=mover-job phase=Pending
...
WARN[0064] Failed to move data                           component=strategy error="client rate limiter Wait returned an error: context deadline exceeded" strategy=copy-twice-name
INFO[0064] Cleaning up...                                component=strategy strategy=copy-twice-name

エラー文で検索してみるとgolangのKubernetes APIクライアント用ライブラリ内でのタイムアウトのようです。

色々試してみたところ、korbのバージョンを最新（v2.3.2を使っていましたがコミットハッシュ cd58f99e5029a770ef771704a6fe1f1fa9d57404 をcloneしてセルフビルドしました）にしてcopyTimeoutオプションを指定することで解消することができました。

$ ./hogehoge --strategy copy-twice-name --new-pvc-storage-class nfs-client-hdd-ds1522 --source-namespace monitoring prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 --timeout 3600s --copyTimeout 3600s
DEBU[0000] Created client from kubeconfig                component=migrator kubeconfig=/home/localadmin/.kube/config
DEBU[0000] Got current namespace                         component=migrator namespace=default
DEBU[0000] Got Source PVC                                component=migrator name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 uid=97923774-c989-4e14-b349-1310bd49b80f
DEBU[0000] No new Name given, using old name             component=migrator
DEBU[0000] Compatible Strategies:                        component=migrator
DEBU[0000] Copy the PVC to the new Storage class and with new size and a new name, delete the old PVC, and copy it back to the old name.  component=migrator identifier=copy-twice-name
DEBU[0000] Export PVC content into a tar archive.        component=migrator identifier=export
DEBU[0000] Import data into a PVC from a tar archive.    component=migrator identifier=import
DEBU[0000] User selected strategy                        component=migrator identifier=copy-twice-name
DEBU[0000] Set timeout from PVC size                     component=strategy strategy=copy-twice-name timeout=1h0m0s
WARN[0000] This strategy assumes you've stopped all pods accessing this data.  component=strategy strategy=copy-twice-name
DEBU[0000] creating temporary PVC                        component=strategy stage=1 strategy=copy-twice-name
DEBU[0000] skipping waiting for PVC to be bound          component=strategy stage=2 strategy=copy-twice-name
DEBU[0000] starting mover job                            component=strategy stage=2 strategy=copy-twice-name
DEBU[0002] Pod not in correct state yet                  component=mover-job phase=Pending
DEBU[0144] Cleaning up successful job                    component=mover-job
DEBU[0144] deleting original PVC                         component=strategy stage=3 strategy=copy-twice-name
DEBU[0144] Waiting for PVC Deletion, retrying            component=strategy pvc-name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0 strategy=copy-twice-name
DEBU[0146] creating final destination PVC                component=strategy stage=4 strategy=copy-twice-name
DEBU[0146] starting mover job to final PVC               component=strategy stage=5 strategy=copy-twice-name
DEBU[0148] Pod not in correct state yet                  component=mover-job phase=Pending
DEBU[0330] Cleaning up successful job                    component=mover-job
DEBU[0330] deleting temporary PVC                        component=strategy stage=6 strategy=copy-twice-name
DEBU[0330] Waiting for PVC Deletion, retrying            component=strategy pvc-name=prometheus-kube-prometheus-kube-prome-prometheus-db-prometheus-kube-prometheus-kube-prome-prometheus-0-copy-1734180022 strategy=copy-twice-name
INFO[0332] And we're done                                component=strategy strategy=copy-twice-name
INFO[0332] Cleaning up...                                component=strategy strategy=copy-twice-name

おそらくですが、korbはデータサイズからタイムアウトの時間を自動設定していますが、その時間に間に合わなかったことでキャンセルが起きているものと思われます。そのため、タイムアウトを手動で設定してやることでキャンセルされる前に正常終了されます。

なお、エラーが起きてコマンドが停止しても、バックグラウンドでkubernetesのjobが動き続けています。エラーが起きた際にはこのjobを削除しないと再度実行することはできないため注意が必要です。

KorbでKubernetesの既存PVのStorage Classを変更する

TL; DR;

モチベーション

KorbによるStorage Classの変更

サイズが大きいPVの変更時の注意点