티스토리 뷰
이 포스팅은 Cloud@net의 기시다님의 온라인 강좌 내용을 정리한 스터디 요약 자료입니다.
실습 환경 배포
[Case2] External LB (L4스위치) → 컨트플 플레인 노드(3대) + (Worker Client-Side LoadBalancing)

스터디 내용은 Vagrant 를 활용한 노드를 구성하고 있으나 실제 프러덕션 환경으로 테스트하기 위해 오픈스택 환경에 설치 및 구성하였기 때문에 설치 방식은 다릅니다.
[구성 정보]
- 하이퍼바이저 : OPENSTACK 7.1.4 (버전: 2023.2 Antelope)
- EXTERNAL LB : 가상화 L4 (PAS-KS 사용)
- admin-node : Kubespray, k8s 베포를 위한 노드
- 서버노드 : 인스턴스 5식
- 운영체제 : Rocky linux 9.x
- Kubespray
| NAMED | escription | Flavor | CPU | RAM | NIC | floating ip |
| PAS-KS | 가상화 L4스위치 | m1.pask | 1 | 3GB | - | 192.168.7.222 |
| k8s-admin | K8S admin | m1.small | 1 | 2GB | 100.100.100.244 | 192.168.7.244 |
| k8s-c1 | K8S ControlPlane | m1.large | 4 | 8GB | 100.100.100.245 | 192.168.7.245 |
| k8s-c2 | K8S ControlPlane | m1.large | 4 | 8GB | 100.100.100.246 | 192.168.7.246 |
| k8s-c3 | K8S ControlPlane | m1.large | 4 | 8GB | 100.100.100.247 | 192.168.7.247 |
| k8s-w1 | K8S Worker | m1.medium | 2 | 4GB | 100.100.100.248 | 192.168.7.248 |
| k8s-w2 | K8S Worker | m1.medium | 2 | 4GB | 100.100.100.249 | 192.168.7.249 |

네트워크 구성도

실습 환경 배포 수행
- 오픈스택 Orchestration Stack 노드 베포 설정
heat_template_version: "2018-08-31"
description: >
Rocky Linux 9.6 기반 K8S 클러스터 구성 (6개 노드)
부팅 소스 충돌 해결 및 OS::Heat::CloudConfig 적용 버전
parameters:
key_name:
type: string
default: "cli-key"
flavor_admin:
type: string
default: "m1.small"
flavor_ctrl:
type: string
default: "m1.large"
flavor_worker:
type: string
default: "m1.medium"
private_subnet:
type: string
default: "cli-subnet"
resources:
# [통합 초기화 스크립트 리소스]
common_script:
type: OS::Heat::CloudConfig
properties:
cloud_config:
runcmd:
- ln -sf /usr/share/zoneinfo/Asia/Seoul /etc/localtime
- systemctl stop firewalld && systemctl disable firewalld
- setenforce 0
- sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
- swapoff -a
- sed -i '/swap/s/^/#/' /etc/fstab
- |
cat <<EOF > /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
- modprobe overlay
- modprobe br_netfilter
- |
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
- sysctl --system
- dnf install -y sshpass jq git tree vim tar curl >/dev/null 2>&1
- echo 'root:qwe123' | chpasswd
- echo 'vagrant:qwe123' | chpasswd
- sed -i "s/^#PasswordAuthentication yes/PasswordAuthentication yes/g" /etc/ssh/sshd_config
- echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
- systemctl restart sshd
- |
cat << EOF >> /etc/hosts
100.100.100.244 k8s-admin
100.100.100.245 k8s-c1
100.100.100.246 k8s-c2
100.100.100.247 k8s-c3
100.100.100.248 k8s-w1
100.100.100.249 k8s-w2
EOF
# [1] 관리 노드 (20GB)
k8s_admin:
type: OS::Nova::Server
properties:
name: k8s-admin
flavor: { get_param: flavor_admin }
key_name: { get_param: key_name }
networks: [{ subnet: { get_param: private_subnet }, fixed_ip: 100.100.100.244 }]
block_device_mapping_v2: [{ boot_index: 0, image: rocky9, volume_size: 20, delete_on_termination: true }]
user_data_format: RAW
user_data: { get_resource: common_script }
# [2] 컨트롤 플레인 1 (80GB)
k8s_c1:
type: OS::Nova::Server
properties:
name: k8s-c1
flavor: { get_param: flavor_ctrl }
key_name: { get_param: key_name }
networks: [{ subnet: { get_param: private_subnet }, fixed_ip: 100.100.100.245 }]
block_device_mapping_v2: [{ boot_index: 0, image: rocky9, volume_size: 80, delete_on_termination: true }]
user_data_format: RAW
user_data: { get_resource: common_script }
# [3] 컨트롤 플레인 2 (80GB)
k8s_c2:
type: OS::Nova::Server
properties:
name: k8s-c2
flavor: { get_param: flavor_ctrl }
key_name: { get_param: key_name }
networks: [{ subnet: { get_param: private_subnet }, fixed_ip: 100.100.100.246 }]
block_device_mapping_v2: [{ boot_index: 0, image: rocky9, volume_size: 80, delete_on_termination: true }]
user_data_format: RAW
user_data: { get_resource: common_script }
# [4] 컨트롤 플레인 3 (80GB)
k8s_c3:
type: OS::Nova::Server
properties:
name: k8s-c3
flavor: { get_param: flavor_ctrl }
key_name: { get_param: key_name }
networks: [{ subnet: { get_param: private_subnet }, fixed_ip: 100.100.100.247 }]
block_device_mapping_v2: [{ boot_index: 0, image: rocky9, volume_size: 80, delete_on_termination: true }]
user_data_format: RAW
user_data: { get_resource: common_script }
# [5] 워커 노드 1 (40GB)
k8s_w1:
type: OS::Nova::Server
properties:
name: k8s-w1
flavor: { get_param: flavor_worker }
key_name: { get_param: key_name }
networks: [{ subnet: { get_param: private_subnet }, fixed_ip: 100.100.100.248 }]
block_device_mapping_v2: [{ boot_index: 0, image: rocky9, volume_size: 40, delete_on_termination: true }]
user_data_format: RAW
user_data: { get_resource: common_script }
# [6] 워커 노드 2 (40GB)
k8s_w2:
type: OS::Nova::Server
properties:
name: k8s-w2
flavor: { get_param: flavor_worker }
key_name: { get_param: key_name }
networks: [{ subnet: { get_param: private_subnet }, fixed_ip: 100.100.100.249 }]
block_device_mapping_v2: [{ boot_index: 0, image: rocky9, volume_size: 40, delete_on_termination: true }]
user_data_format: RAW
user_data: { get_resource: common_script }
Kubespary 를 통한 k8s 배포 (Worker Client-Side LoadBalancing)
k8s-admin 노드 설정 작업
sudo -i
cat >/etc/hosts <<'EOF'
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
100.100.100.244 k8s-admin
100.100.100.245 k8s-c1
100.100.100.246 k8s-c2
100.100.100.247 k8s-c3
100.100.100.248 k8s-w1
100.100.100.249 k8s-w2
EOF
필수 패키지 설치
sudo -i
dnf install -y git rsync gcc python3 python3-pip python3-devel
dnf install -y nmap-ncat jq
SSH 키 생성 및 전송
sudo -i
mkdir -p /root/.ssh && chmod 700 /root/.ssh
[ -f /root/.ssh/id_ed25519 ] || ssh-keygen -t ed25519 -N "" -f /root/.ssh/id_ed25519
NODES=(k8s-c1 k8s-c2 k8s-c3 k8s-w1 k8s-w2)
for h in "${NODES[@]}"; do
ssh-copy-id -i /root/.ssh/id_ed25519.pub -o StrictHostKeyChecking=no rocky@"$h"
done
# 로그인 검증
for h in "${NODES[@]}"; do
echo -n ">> $h : "
ssh -o BatchMode=yes -i /root/.ssh/id_ed25519 rocky@"$h" "hostname; whoami" && echo "OK" || echo "FAIL"
done
# k8s-c1~c3(컨트롤플레인/etcd)에서 확인
ssh rocky@k8s-c1 "python3 -V; sudo -n true && echo SUDO_OK"
ssh rocky@k8s-c2 "python3 -V; sudo -n true && echo SUDO_OK"
ssh rocky@k8s-c3 "python3 -V; sudo -n true && echo SUDO_OK"
# k8s-w1~w2(워커)에서 확인
ssh rocky@k8s-w1 "python3 -V; sudo -n true && echo SUDO_OK"
ssh rocky@k8s-w2 "python3 -V; sudo -n true && echo SUDO_OK"
Kubespray 설치/가상환경
Kubespray clone + venv 구성
sudo -i
mkdir -p ~/k8s-ha-kubespray && cd ~/k8s-ha-kubespray
git clone https://github.com/kubernetes-sigs/kubespray.git
cd kubespray
# python3.11을 쓰는 경우: python3.11 -m venv venv
python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements.txt
인벤토리 작성
# inventory 생성
cd ~/k8s-ha-kubespray/kubespray
source venv/bin/activate
cp -rfp inventory/sample inventory/mycluster
# inventory.ini
cat > inventory/mycluster/inventory.ini <<'EOF'
[all]
k8s-c1 ansible_host=100.100.100.245 ip=100.100.100.245 access_ip=100.100.100.245
k8s-c2 ansible_host=100.100.100.246 ip=100.100.100.246 access_ip=100.100.100.246
k8s-c3 ansible_host=100.100.100.247 ip=100.100.100.247 access_ip=100.100.100.247
k8s-w1 ansible_host=100.100.100.248 ip=100.100.100.248 access_ip=100.100.100.248
k8s-w2 ansible_host=100.100.100.249 ip=100.100.100.249 access_ip=100.100.100.249
[kube_control_plane]
k8s-c1
k8s-c2
k8s-c3
[etcd]
k8s-c1
k8s-c2
k8s-c3
[kube_node]
k8s-w1
k8s-w2
[k8s_cluster:children]
kube_control_plane
kube_node
[all:vars]
ansible_user=rocky
EOF
# 검증
ansible-inventory -i inventory/mycluster/inventory.ini --host k8s-c1 \
| egrep -i 'ansible_host|ip|access_ip|ansible_user'
ansible-inventory -i inventory/mycluster/inventory.ini --host k8s-c1 \
| egrep -i 'ansible_host|ip|access_ip|ansible_user'
클러스터 변수 최적화 (Addons & Network)
cd /root/kubespray/ 2>/dev/null || cd ~/k8s-ha-kubespray/kubespray
source venv/bin/activate
# 기본 설정 변경 (owner, plugin, proxy 모드 등)
sed -i 's|kube_owner: kube|kube_owner: root|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|kube_network_plugin: calico|kube_network_plugin: flannel|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|kube_proxy_mode: ipvs|kube_proxy_mode: iptables|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|enable_nodelocaldns: true|enable_nodelocaldns: false|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# [중요] 외부 L4 VIP(PAS-KS)를 API Server 로드밸런서로 등록
# -> 인증서 SAN에 VIP가 포함되고, kubectl이 L4를 통해 통신 가능
# (중복 방지: 이미 있으면 추가하지 않음)
grep -q '^apiserver_loadbalancer_domain_name:' inventory/mycluster/group_vars/all/all.yml \
|| echo 'apiserver_loadbalancer_domain_name: "PAS-KS"' >> inventory/mycluster/group_vars/all/all.yml
grep -q '^loadbalancer_apiserver:' inventory/mycluster/group_vars/all/all.yml \
|| echo 'loadbalancer_apiserver: { address: 192.168.7.222, port: 6443 }' >> inventory/mycluster/group_vars/all/all.yml
# Flannel 인터페이스 지정 (대부분 eth0, 노드에서 실제 인터페이스명 확인 권장)
# (중복 방지)
grep -q '^flannel_interface:' inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml \
|| echo "flannel_interface: eth0" >> inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# Metrics Server 활성화 및 자원 최적화
sed -i 's|metrics_server_enabled: false|metrics_server_enabled: true|g' inventory/mycluster/group_vars/k8s_cluster/addons.yml
grep -q '^metrics_server_requests_cpu:' inventory/mycluster/group_vars/k8s_cluster/addons.yml \
|| echo "metrics_server_requests_cpu: 25m" >> inventory/mycluster/group_vars/k8s_cluster/addons.yml
grep -q '^metrics_server_requests_memory:' inventory/mycluster/group_vars/k8s_cluster/addons.yml \
|| echo "metrics_server_requests_memory: 16Mi" >> inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 적용값 확인
grep -iE 'kube_owner:|kube_network_plugin:|kube_proxy_mode:|enable_nodelocaldns:' \
inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
egrep -n 'apiserver_loadbalancer_domain_name|loadbalancer_apiserver' \
inventory/mycluster/group_vars/all/all.yml
grep -n '^flannel_interface:' inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
egrep -n 'metrics_server_enabled|metrics_server_requests_' \
inventory/mycluster/group_vars/k8s_cluster/addons.yml
VIP(PAS-KS) 연동 설정
# L4스위치를 VIP 진입점으로 사용할 때
cat >> inventory/mycluster/group_vars/all/all.yml <<'EOF'
apiserver_loadbalancer_domain_name: "PAS-KS"
loadbalancer_apiserver:
address: 192.168.7.221
port: 6443
EOF
가상화 L4스위치(PAS-K) 설정
# L4스위치 설정
# 리얼서버 3식 (control-plane)
# LB 방식 : Round-Robin
# port : 6443
# vip : 192.168.7.221
test-L4# show run brief slb k8s_cluster
!
! Application switch configuration (v2.2.7.3.1)
! 2026/02/06 16:08:16
!
! Slb configuration
!
slb k8s_cluster
nat-mode lan-to-lan
health-check 100
! id 100, type tcp, port 6443
vip 192.168.7.221 protocol tcp vport 6443
lan-to-lan 0.0.0.0/0
apply
filter 1
protocol tcp
dip 192.168.7.221/32
dport 6443
apply
real 245
! id 245, rip 192.168.7.245
real 246
! id 246, rip 192.168.7.246
real 247
! id 247, rip 192.168.7.247
apply
exit
배포 전 최종 접속 점검
# ping 테스트
cd ~/k8s-ha-kubespray/kubespray
source venv/bin/activate
ansible -i inventory/mycluster/inventory.ini all -u rocky -b -m ping
Kubespray 배포 시작
cd ~/k8s-ha-kubespray/kubespray
source venv/bin/activate
# 배포 전 Task 확인
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --list-tasks
# 실제 설치 실행 (v1.32.9 기준)
ANSIBLE_FORCE_COLOR=true \
ansible-playbook -i inventory/mycluster/inventory.ini -b -v cluster.yml \
-e kube_version="1.32.9" | tee kubespray_install.log
배포 후 검증
# 1. API 호출 확인 (PAS-KS L4 VIP 기준)
echo ">> [Checking K8S API via PAS-KS L4] <<"
curl -sk https://192.168.7.222:6443/version | grep Version
# 2. 자격증명 설정 (admin 노드에서 실행)
mkdir -p /root/.kube
scp k8s-c1:/root/.kube/config /root/.kube/config
# 3. API Server 주소를 L4 VIP로 고정 (고가용성 확보)
sed -i 's/127.0.0.1/192.168.7.222/g' /root/.kube/config
kubectl cluster-info
# 4. 노드 상태 확인
kubectl get nodes -owide
# 5. ETCD 상태 확인 (c1~c3 대상)
for i in {1..3}; do
echo ">> k8s-c$i <<";
ssh k8s-c$i etcdctl.sh endpoint status -w table;
echo;
done
# 6. 자동완성 및 별칭 설정
cat <<EOF >> /etc/profile
source <(kubectl completion bash)
alias k=kubectl
alias kc=kubecolor
complete -F __start_kubectl k
EOF
source /etc/profile
클러스터 상태

PAS-K L4 스위치 클러스터 서비스 확인

(Worker Client-Side LoadBalancing) Kubespary 를 통한 k8s 배포
사전설정(ssh 키설정 동기화)
NODES="k8s-c1 k8s-c2 k8s-c3 k8s-w1 k8s-w2"
for n in $NODES; do
echo "===== APPLY: $n ====="
ssh root@"$n" 'set -e
# 1) 스크립트 생성
cat > /usr/local/sbin/fix-root-authorizedkeys.sh <<'"'"'EOF'"'"'
#!/usr/bin/env bash
set -euo pipefail
install -d -m 700 -o root -g root /root/.ssh
cat > /root/.ssh/authorized_keys <<'"'"'EOK'"'"'
no-port-forwarding,no-agent-forwarding,no-X11-forwarding,command="echo '"'"'Please login as the user \"rocky\" rather than the user \"root\".'"'"';echo;sleep 10;exit 142" ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCs1SkMGLxNzWgK75TDiII1KWqNfLHWk5X792nNCuAM3zjMvSiE2T9rEpM9Gko3gyxIaInCsmqAHqyqFX5LP6txy+dZEPusKKSDZvbWFcg/8tyuYXR0gC4G8tR/PEmtEh5RoGJWQKXNMzN1NXdW8X1saaRP1Usxl9pX0/nFKyMlD6Fn9VR1dMqEUfybENTHSxeMgkCi3agvprtbgfsBUxvX87n5G967sZWDTgAgf2/+MKPfY4xEPcfZI0efE/YfDotbKP2QANtzEeXDb986G72cy4r8lIGjowmrrDyYmCU9vkKsDwP5tq4Fwi0wuCs2QXUbK0PVMBKEx+3iWDZe0bcr Generated-by-Nova
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFRGOoRW7879SCsJIbqmYEKDpa6jJNBla2FgEmrrlpcR root@k8s-admin.novalocal
EOK
chmod 600 /root/.ssh/authorized_keys
chown root:root /root/.ssh/authorized_keys
restorecon -Rv /root/.ssh >/dev/null 2>&1 || true
EOF
chmod +x /usr/local/sbin/fix-root-authorizedkeys.sh
# 2) systemd service
cat > /etc/systemd/system/fix-root-authorizedkeys.service <<'"'"'EOF'"'"'
[Unit]
Description=Force /root/.ssh/authorized_keys to lab value
After=network-online.target sshd.service
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/sbin/fix-root-authorizedkeys.sh
[Install]
WantedBy=multi-user.target
EOF
# 3) systemd timer (원복 대비)
cat > /etc/systemd/system/fix-root-authorizedkeys.timer <<'"'"'EOF'"'"'
[Unit]
Description=Periodically enforce /root/.ssh/authorized_keys
[Timer]
OnBootSec=10
OnUnitActiveSec=30
Unit=fix-root-authorizedkeys.service
[Install]
WantedBy=timers.target
EOF
# 4) enable + run
systemctl daemon-reload
systemctl enable --now fix-root-authorizedkeys.service
systemctl enable --now fix-root-authorizedkeys.timer
/usr/local/sbin/fix-root-authorizedkeys.sh
# 5) verify
echo "### verify on $(hostname)"
nl -ba /root/.ssh/authorized_keys
'
done
워커노드 배포
cd /root/k8s-ha-kubespray/kubespray
source venv/bin/activate
pwd
# /root/k8s-ha-kubespray/kubespray
# inventory 디렉터리 확인
cd /root/k8s-ha-kubespray/kubespray
source venv/bin/activate
tree inventory/mycluster/
# inventory.ini
cat inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-c1 ansible_host=100.100.100.245 ip=100.100.100.245 etcd_member_name=etcd1
k8s-c2 ansible_host=100.100.100.246 ip=100.100.100.246 etcd_member_name=etcd2
k8s-c3 ansible_host=100.100.100.247 ip=100.100.100.247 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-w1 ansible_host=100.100.100.248 ip=100.100.100.248
k8s-w2 ansible_host=100.100.100.249 ip=100.100.100.249
# 확인
ansible-inventory -i inventory/mycluster/inventory.ini --graph
# 워커에 ssh 접속 가능한지 확인
ssh rocky@100.100.100.248 "hostname; id"
ssh rocky@100.100.100.249 "hostname; id"
# inventory.ini의 워커 라인에 ansible_user 추가
ansible -i inventory/mycluster/inventory.ini kube_node -m ping -b -o
# Ansible ping으로 최종 확인
ansible -i inventory/mycluster/inventory.ini kube_node -m ping -b -o
# k8s-cluster.yml
grep -iE 'kube_owner:|kube_network_plugin:|kube_proxy_mode:|enable_nodelocaldns:|enable_dns_autoscaler:' \
inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# flannel 인터페이스
grep -n "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# 배포 전 Task 확인
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --list-tasks
# 워커만 설치 실행
ANSIBLE_FORCE_COLOR=true \
ansible-playbook -i inventory/mycluster/inventory.ini -b -v cluster.yml \
-e kube_version="1.32.9" \
--limit kube_node | tee kubespray_worker_install.log
설치 확인
kubectl get node -owide
kubectl describe node k8s-w1 | egrep -i 'Ready|Taints|NetworkUnavailable|Conditions'
kubectl describe node k8s-w2 | egrep -i 'Ready|Taints|NetworkUnavailable|Conditions'
kubectl get pod -A -owide | head -n 80
Kubespray 변수 우선순위 구조 및 검색
- Ansible variable precedence : 숫자가 높을수록 우선순위가 높음 - Ansible_Docs
- Command-line ****values (for example, u my_user, these are not variables)
- Role defaults (as defined in Role directory structure) 1
- Inventory file or script group vars 2
- Inventory group_vars/all 3
- Playbook group_vars/all 3
- Inventory group_vars/* 3
- Playbook group_vars/* 3
- Inventory file or script host vars 2
- Inventory host_vars/* 3
- Playbook host_vars/* 3
- Host facts and cached set_facts 4
- Play vars
- Play vars_prompt
- Play vars_files
- Role vars (as defined in Role directory structure)
- Block vars (for tasks in block only)
- Task vars (for the task only)
- include_vars
- Registered vars and set_facts
- Role (and include_role) params
- include params
- Extra vars (for example, e "user=my_user")(always win precedence)
# 예시) 특정 변수 선언 및 사용 검색
grep -Rn "allow_unsupported_distribution_setup" inventory/mycluster/ playbooks/ roles/ -A1 -B1
inventory/mycluster/group_vars/all/all.yml-141-## If enabled it will allow kubespray to attempt setup even if the distribution is not supported. For unsupported distributions this can lead to unexpected failures in some cases.
inventory/mycluster/group_vars/all/all.yml:142:allow_unsupported_distribution_setup: false
--
roles/kubernetes/preinstall/tasks/0040-verify-settings.yml-22- assert:
roles/kubernetes/preinstall/tasks/0040-verify-settings.yml:23: that: (allow_unsupported_distribution_setup | default(false)) or ansible_distribution in supported_os_distributions
roles/kubernetes/preinstall/tasks/0040-verify-settings.yml-24- msg: "{{ ansible_distribution }} is not a known OS"
# Kubespray 변수 우선순위 구조 (Override Flow)
[ 낮은 우선순위 ]
┌─────────────────────────────────────────────┐
│ roles/*/defaults/main.yml │ ← Kubespray role 기본값
│ (예: bin_dir, kube_version 기본값) │
└─────────────────────────────────────────────┘
⬇ override
┌─────────────────────────────────────────────┐
│ roles/*/vars/main.yml │ ← role 내부 강제 변수 (웬만해선 안 건드림)
└─────────────────────────────────────────────┘
⬇ override
┌─────────────────────────────────────────────┐
│ inventory/mycluster/group_vars/all/*.yml │ ← 전체 노드 공통 설정 # 99% 여기서 조절
│ inventory/mycluster/group_vars/k8s_cluster/*.yml
│ inventory/mycluster/group_vars/etcd.yml │
└─────────────────────────────────────────────┘
⬇ override
┌─────────────────────────────────────────────┐
│ inventory/mycluster/host_vars/<node>.yml │ ← 특정 노드에만 적용 # 특정 노드만 다르게 쓸 때
└─────────────────────────────────────────────┘
⬇ override
┌─────────────────────────────────────────────┐
│ playbook vars (vars:, vars_files:) │ ← reset.yml / cluster.yml 내부 vars # 실행하는 플레이북에 선언된 경우
└─────────────────────────────────────────────┘
⬇ override
┌─────────────────────────────────────────────┐
│ --extra-vars (-e) │ ← CLI에서 준 값 (최강자)
│ ex) -e kube_version=v1.29.3 │
└─────────────────────────────────────────────┘
[ 높은 우선순위 ]
# 전체 노드 공통 설정(예시) : 99% 여기서 조절
inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
kube_version: v1.29.3
kube_network_plugin: cilium
# 특정 노드에만 적용(예시) : 특정 노드만 다르게 쓸 때
inventory/mycluster/host_vars/k8s-ctr1.yml
node_labels:
node-role.kubernetes.io/control-plane: "true"
# 실행하는 플레이북에 선언된 경우(예) : 아래 경우 플레이북을 import 할 때만 적용되는 “로컬 변수 override”
cat playbooks/scale.yml | grep 'Install etcd' -A5
- name: Install etcd
vars: # inventory/group_vars 보다 우선순위가 높음, 이 playbook import 범위 안에서만 유효
etcd_cluster_setup: false # etcd 신규 클러스터 bootstrap 로직 비활성화
etcd_events_cluster_setup: false # 이벤트 전용 etcd 클러스터 구성 안 함
import_playbook: install_etcd.yml # install_etcd.yml이라는 별도의 플레이북 파일을 현재 위치에 포함시켜 실행하라는 명령
# 예시) dns autoscaler 미설치 하기 위해 검색
grep -Rni "autoscaler" inventory/mycluster/ playbooks/ roles/ -A2 -B1
grep -Rni "autoscaler" inventory/mycluster/ playbooks/ roles/ --include="*.yml" -A2 -B1
roles/kubespray_defaults/defaults/main/main.yml:130:# Enable dns autoscaler
roles/kubespray_defaults/defaults/main/main.yml:131:enable_dns_autoscaler: true
...
kube-ops-view 설치
# kube-ops-view
## helm show values geek-cookbook/kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
# macOS 사용자
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 \
--set service.main.type=NodePort,service.main.ports.http.nodePort=30000 \
--set env.TZ="Asia/Seoul" --namespace kube-system \
--set image.repository="abihf/kube-ops-view" --set image.tag="latest"
# Windows 사용자
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 \
--set service.main.type=NodePort,service.main.ports.http.nodePort=30000 \
--set env.TZ="Asia/Seoul" --namespace kube-system
# 설치 확인
kubectl get deploy,pod,svc,ep -n kube-system -l app.kubernetes.io/instance=kube-ops-view
# kube-ops-view 접속 URL 확인 (1.5 , 2 배율) : nodePor 이므로 IP는 all node 의 IP 가능!
open "http://192.168.10.14:30000/#scale=1.5"
open "http://192.168.10.14:30000/#scale=2"

샘플 애플리케이션 배포, 반복 호출
- 샘플 애플리케이션 배포
# 샘플 애플리케이션 배포
cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: webpod
spec:
replicas: 2
selector:
matchLabels:
app: webpod
template:
metadata:
labels:
app: webpod
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sample-app
topologyKey: "kubernetes.io/hostname"
containers:
- name: webpod
image: traefik/whoami
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: webpod
labels:
app: webpod
spec:
selector:
app: webpod
ports:
- protocol: TCP
port: 80
targetPort: 80
nodePort: 30003
type: NodePort
EOF
- 반복 호출

# 배포 확인
kubectl get deploy,svc,ep webpod -owide
[admin-lb] # IP는 node 작업에 따라 변경
while true; do curl -s http://192.168.10.14:30003 | grep Hostname; sleep 1; done
# (옵션) k8s-node 에서 service 명 호출 확인
ssh k8s-node1 cat /etc/resolv.conf
# Generated by NetworkManager
search default.svc.cluster.local svc.cluster.local
nameserver 10.233.0.3
nameserver 168.126.63.1
nameserver 8.8.8.8
options ndots:2 timeout:2 attempts:2
# 성공
ssh k8s-node1 curl -s webpod -I
HTTP/1.1 200 OK
# 성공
ssh k8s-node1 curl -s webpod.default -I
HTTP/1.1 200 OK
# 실패
ssh k8s-node1 curl -s webpod.default.svc -I
ssh k8s-node1 curl -s webpod.default.svc.cluster -I
# 성공
ssh k8s-node1 curl -s webpod.default.svc.cluster.local -I
HTTP/1.1 200 OK
(장애 재현) 만약 컨트롤 플레인 1번 노드 장애 발생 시 영향도

# [admin-lb] kubeconfig 자격증명 사용 시 정보 확인
cat /root/.kube/config | grep server
server: https://192.168.10.11:6443
# 모니터링 : 신규 터미널 4개
# ----------------------
## [admin-lb]
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
## [k8s-node2]
watch -d kubectl get pod -n kube-system
kubectl logs -n kube-system nginx-proxy-k8s-node4 -f
## [k8s-node4]
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date; sleep 1; echo ; done
# ----------------------
# 장애 재현
[k8s-node1] poweroff
# [k8s-node2]
kubectl logs -n kube-system nginx-proxy-k8s-node4 -f
2026/01/28 12:47:08 [error] 20#20: *3145 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: 127.0.0.1:6443, upstream: "192.168.10.11:6443", bytes from/to client:0/0, bytes from/to upstream:0/0
2026/01/28 12:47:08 [warn] 20#20: *3145 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: 127.0.0.1:6443, upstream: "192.168.10.11:6443", bytes from/to client:0/0, bytes from/to upstream:0/0
# [k8s-node4] 하지만 백엔드 대상 서버가 나머지 2대가 있으니 아래 요청 처리 정상!
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date; sleep 1; echo ; done
"gitVersion": "v1.32.9",
# [admin-lb] 아래 자격증명 서버 정보 수정 필요
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
Unable to connect to the server: dial tcp 192.168.10.11:6443: connect: no route to host # << 요건 실패!
"gitVersion": "v1.32.9", # << 요건 성공!
sed -i 's/192.168.10.11/192.168.10.12/g' /root/.kube/config
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
NAME STATUS ROLES AGE VERSION
k8s-node1 NotReady control-plane 4h35m v1.32.9
k8s-node2 Ready control-plane 4h35m v1.32.9
k8s-node3 Ready control-plane 4h35m v1.32.9
k8s-node4 Ready <none> 4h34m v1.32.9
"gitVersion": "v1.32.9",
External LB → HA 컨트플 플레인 노드(3대) : k8s apiserver 호출 설정
#
curl -sk https://192.168.10.10:6443/version | grep gitVersion
"gitVersion": "v1.32.9",
#
sed -i 's/192.168.10.12/192.168.10.10/g' /root/.kube/config
# 인증서 SAN list 확인
kubectl get node
E0128 23:53:41.079370 70802 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.10.10:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 192.168.10.11, 127.0.0.1, ::1, 192.168.10.12, 192.168.10.13, 10.0.2.15, fd17:625c:f037:2:a00:27ff:fe90:eaeb, not 192.168.10.10"
# 인증서에 SAN 정보 확인
ssh k8s-node1 cat /etc/kubernetes/ssl/apiserver.crt | openssl x509 -text -noout
...
ssh k8s-node1 kubectl get cm -n kube-system kubeadm-config -o yaml
apiServer:
certSANs:
- kubernetes
- kubernetes.default
- kubernetes.default.svc
- kubernetes.default.svc.cluster.local
- 10.233.0.1
- localhost
- 127.0.0.1
- ::1
- k8s-node1
- k8s-node2
- k8s-node3
- lb-apiserver.kubernetes.local
- 192.168.10.11
- 192.168.10.12
- 192.168.10.13
- 10.0.2.15
- fd17:625c:f037:2:a00:27ff:fe90:eaeb
# 인증서 SAN 에 'IP, Domain' 추가
echo "supplementary_addresses_in_ssl_keys: [192.168.10.10, k8s-api-srv.admin-lb.com]" >> inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "kube-apiserver" --list-tasks
# ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "kubeadm" --list-tasks
# ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "facts" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "control-plane" --list-tasks
...
play #10 (kube_control_plane): Install the control plane TAGS: []
tasks:
...
kubernetes/control-plane : Kubeadm | aggregate all SANs TAGS: [control-plane, facts]
...
# (신규터미널) 모니터링
[k8s-node4]
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date ; sleep 1; echo ; done
# 1분 이내 완료
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "control-plane" --limit kube_control_plane -e kube_version="1.32.9"
Gather minimal facts ------------------------------------------------------------------------------------------------------- 2.00s
kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN hosts --------------------------------------------------------- 1.57s
kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN IPs ----------------------------------------------------------- 1.33s
kubernetes/control-plane : Backup old certs and keys ----------------------------------------------------------------------- 1.26s
Gather necessary facts (hardware) ------------------------------------------------------------------------------------------ 0.98s
kubernetes/control-plane : Install | Copy kubectl binary from download dir ------------------------------------------------- 0.95s
kubernetes/preinstall : Create other directories of root owner ------------------------------------------------------------- 0.92s
win_nodes/kubernetes_patch : debug ----------------------------------------------------------------------------------------- 0.84s
kubernetes/control-plane : Backup old confs -------------------------------------------------------------------------------- 0.83s
kubernetes/control-plane : Update server field in component kubeconfigs ---------------------------------------------------- 0.78s
kubernetes/control-plane : Kubeadm | Create kubeadm config ----------------------------------------------------------------- 0.76s
kubernetes/preinstall : Create kubernetes directories ---------------------------------------------------------------------- 0.67s
kubernetes/control-plane : Kubeadm | regenerate apiserver cert 2/2 --------------------------------------------------------- 0.50s
kubernetes/control-plane : Renew K8S control plane certificates monthly 2/2 ------------------------------------------------ 0.46s
kubernetes/control-plane : Create kube-scheduler config -------------------------------------------------------------------- 0.41s
Gather necessary facts (network) ------------------------------------------------------------------------------------------- 0.38s
kubernetes/control-plane : Install script to renew K8S control plane certificates ------------------------------------------ 0.37s
kubernetes/control-plane : Kubeadm | regenerate apiserver cert 1/2 --------------------------------------------------------- 0.34s
kubernetes/control-plane : Kubeadm | aggregate all SANs -------------------------------------------------------------------- 0.29s
kubernetes/control-plane : Check which kube-control nodes are already members of the cluster ------------------------------- 0.28s
# 192.168.10.10 엔드포인트 요청 성공!
kubectl get node -v=6
...
I0129 00:17:13.825729 81610 round_trippers.go:560] GET https://192.168.10.10:6443/api/v1/nodes?limit=500 200 OK in 8 milliseconds
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready control-plane 7h v1.32.9
k8s-node2 Ready control-plane 7h v1.32.9
k8s-node3 Ready control-plane 7h v1.32.9
k8s-node4 Ready <none> 6h59m v1.32.9
# ip, domain 둘 다 확인
sed -i 's/192.168.10.10/k8s-api-srv.admin-lb.com/g' /root/.kube/config
# 추가 확인
ssh k8s-node1 cat /etc/kubernetes/ssl/apiserver.crt | openssl x509 -text -noout
X509v3 Subject Alternative Name:
DNS:k8s-api-srv.admin-lb.com, DNS:k8s-node1, DNS:k8s-node2, DNS:k8s-node3, DNS:kubernetes, DNS:kubernetes.default,
DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:lb-apiserver.kubernetes.local,
DNS:localhost, IP Address:10.233.0.1, IP Address:192.168.10.11, IP Address:127.0.0.1,
IP Address:0:0:0:0:0:0:0:1, IP Address:192.168.10.10, IP Address:192.168.10.12, IP Address:192.168.10.13,
IP Address:10.0.2.15, IP Address:FD17:625C:F037:2:A00:27FF:FE90:EAEB
# 해당 cm은 최초 설치 후 자동 업데이트 X, 업그레이드에 활용된다고 하니, 위 처럼 kubeadm config 변경 시 직접 cm도 같이 변경해두자.
kubectl get cm -n kube-system kubeadm-config -o yaml
...
kubectl edit cm -n kube-system kubeadm-config # or k9s -> cm kube-system
...
노드 관리
노드 추가 : playbook(scale.yml), role(playbooks/scale.yml) , 기존 클러스터는 건드리지 않고, 새로 추가된 노드만 단계적으로 합류

#
cat scale.yml
---
- name: Scale the cluster
ansible.builtin.import_playbook: playbooks/scale.yml
cat playbooks/scale.yml
---
- name: Common tasks for every playbooks
import_playbook: boilerplate.yml
- name: Gather facts
import_playbook: internal_facts.yml
- name: Install etcd # 기존 etcd 클러스터는 변경하지 않음, 새 노드가 etcd 멤버일 경우에만 join
vars:
etcd_cluster_setup: false
etcd_events_cluster_setup: false
import_playbook: install_etcd.yml
- name: Download images to ansible host cache via first kube_control_plane node # download_run_once 설정이 되어 있다면, 첫 번째 control-plane 노드에서만 실행 : 이미지/바이너리 캐시를 ansible host에 적재
hosts: kube_control_plane[0]
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults, when: "not skip_downloads and download_run_once and not download_localhost" }
- { role: kubernetes/preinstall, tags: preinstall, when: "not skip_downloads and download_run_once and not download_localhost" }
- { role: download, tags: download, when: "not skip_downloads and download_run_once and not download_localhost" }
- name: Target only workers to get kubelet installed and checking in on any new nodes(engine) # 워커 노드 준비
hosts: kube_node
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/preinstall, tags: preinstall }
- { role: container-engine, tags: "container-engine", when: deploy_container_engine }
- { role: download, tags: download, when: "not skip_downloads" }
- role: etcd # (조건부): Calico 같은 네트워크 플러그인이 etcd를 직접 사용하는 경우, 워커 노드에서도 접속 가능하도록 설정합니다.
tags: etcd
vars:
etcd_cluster_setup: false
when:
- etcd_deployment_type != "kubeadm"
- kube_network_plugin in ["calico", "flannel", "canal", "cilium"] or cilium_deploy_additionally | default(false) | bool
- kube_network_plugin != "calico" or calico_datastore == "etcd"
- name: Target only workers to get kubelet installed and checking in on any new nodes(node) # kubelet 설치
hosts: kube_node
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/node, tags: node } # kubelet 설치, systemd 등록, 아직 클러스터 join X
- name: Upload control plane certs and retrieve encryption key # kubeadm 인증서 공유
## 새 노드가 클러스터에 안전하게 조인할 수 있도록 kubeadm을 통해 인증서를 업로드하고, 조인에 필요한 certificate_key를 추출하여 변수로 저장합니다.
hosts: kube_control_plane | first # 대상: 첫 번째 마스터 노드
environment: "{{ proxy_disable_env }}"
gather_facts: false
tags: kubeadm
roles:
- { role: kubespray_defaults }
tasks:
- name: Upload control plane certificates
command: >-
{{ bin_dir }}/kubeadm init phase # kubeadm init phase upload-certs --upload-certs
--config {{ kube_config_dir }}/kubeadm-config.yaml
upload-certs
--upload-certs
environment: "{{ proxy_disable_env }}"
register: kubeadm_upload_cert
changed_when: false
- name: Set fact 'kubeadm_certificate_key' for later use
set_fact:
kubeadm_certificate_key: "{{ kubeadm_upload_cert.stdout_lines[-1] | trim }}"
when: kubeadm_certificate_key is not defined
- name: Target only workers to get kubelet installed and checking in on any new nodes(network) # 클러스터 조인 및 네트워크 설정
hosts: kube_node
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/kubeadm, tags: kubeadm } # 새 워커 노드에서 kubeadm join 명령을 실행하여 클러스터에 공식적으로 등록합니다.
- { role: kubernetes/node-label, tags: node-label } # 노드에 지정된 라벨(Label)과 테인트(Taint)를 적용합니다.
- { role: kubernetes/node-taint, tags: node-taint } # 상동
- { role: network_plugin, tags: network } # CNI(Calico, Flannel 등) 설정을 적용하여 노드 간 통신이 가능하게 합니다.
- name: Apply resolv.conf changes now that cluster DNS is up # DNS 설정
hosts: k8s_cluster
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: kubernetes/preinstall, when: "dns_mode != 'none' and resolvconf_mode == 'host_resolvconf'", tags: resolvconf, dns_late: true }
# resolvconf: 클러스터 내부 DNS(CoreDNS 등)가 활성화되었으므로, 각 노드의 /etc/resolv.conf를 업데이트하여 노드들이 내부 도메인을 해석할 수 있도록 수정합니다.
노드 추가 k8s-node5 : 3분 소요
# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
EOF
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --graph
@all:
|--@ungrouped:
|--@etcd:
| |--@kube_control_plane:
| | |--k8s-node1
| | |--k8s-node2
| | |--k8s-node3
|--@kube_node:
| |--k8s-node4
| |--k8s-node5
# ansible 연결 확인
ansible -i inventory/mycluster/inventory.ini k8s-node5 -m ping
# 모니터링
watch -d kubectl get node
kube-ops-view
# 워커 노드 추가 수행 : 3분 정도 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --list-tasks
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --limit=k8s-node5 -e kube_version="1.32.9" | tee kubespray_add_worker_node.log
# 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 48m v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 48m v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 48m v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 47m v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node5 Ready <none> 66s v1.32.9 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
kubectl get pod -n kube-system -owide |grep k8s-node5
kube-flannel-ds-arm64-2djxl 1/1 Running 1 (80s ago) 114s 192.168.10.15 k8s-node5 <none> <none>
kube-proxy-x6cmm 1/1 Running 0 114s 192.168.10.15 k8s-node5 <none> <none>
nginx-proxy-k8s-node5 1/1 Running 0 113s 192.168.10.15 k8s-node5 <none> <none>
# 변경 정보 확인
ssh k8s-node5 tree /etc/kubernetes
ssh k8s-node5 tree /var/lib/kubelet
ssh k8s-node5 pstree -a
# 샘플 파드 분배
kubectl get pod -owide
kubectl scale deployment webpod --replicas 1
kubectl get pod -owide
kubectl scale deployment webpod --replicas 2
노드 삭제 : playbook(remove-node.yml), role(playbooks/remove_node.yml) 특정 노드를 안전하게 제거(Remove/Graceful Termination)

#
cat remove-node.yml
---
- name: Remove node
ansible.builtin.import_playbook: playbooks/remove_node.yml
#
cat playbooks/remove_node.yml
---
- name: Validate nodes for removal # “어떤 노드를 지울 건지 명확히 지정했는지” 강제 체크
hosts: localhost
gather_facts: false
become: false
tasks:
- name: Assert that nodes are specified for removal
assert:
that:
- node is defined
- node | length > 0
msg: "No nodes specified for removal. The `node` variable must be set explicitly."
- name: Common tasks for every playbooks # 공통 설정 로드(모든 playbook에서 반복되는 준비 단계) : kubespray 공통 변수, handler, 기본 설정 로딩 등
import_playbook: boilerplate.yml
- name: Confirm node removal # 실제로 노드가 삭제되기 전 사용자에게 최종 확인을 받습니다. 사용자가 yes라고 입력해야만 다음 단계로 넘어갑니다. 자동화 스크립트 등에서 이 단계를 건너뛰려면 -e skip_confirmation=true 옵션을 사용합니다.
hosts: "{{ node | default('this_is_unreachable') }}"
gather_facts: false
tasks:
- name: Confirm Execution
pause:
prompt: "Are you sure you want to delete nodes state? Type 'yes' to delete nodes."
register: pause_result
run_once: true # 노드 여러 개여도 한 번만 묻기
when:
- not (skip_confirmation | default(false) | bool)
- name: Fail if user does not confirm deletion
fail:
msg: "Delete nodes confirmation failed"
when: pause_result.user_input | default('yes') != 'yes'
- name: Gather facts
import_playbook: internal_facts.yml
when: reset_nodes | default(True) | bool
- name: Reset node # 실제 노드 제거*
hosts: "{{ node | default('this_is_unreachable') }}"
gather_facts: false
environment: "{{ proxy_disable_env }}"
pre_tasks:
- name: Gather information about installed services
service_facts:
when: reset_nodes | default(True) | bool
roles:
- { role: kubespray_defaults, when: reset_nodes | default(True) | bool } # 기본 변수 로딩
- { role: remove_node/pre_remove, tags: pre-remove } # 노드에서 실행 중인 파드들을 다른 노드로 옮기고(Drain), 더 이상 스케줄링되지 않게 만듭니다(Cordon). kubelet 중지.
- role: remove-node/remove-etcd-node # 해당 노드가 etcd 멤버인 경우, etcd 클러스터 정족수에서 해당 노드를 안전하게 제거합니다. (데이터 무결성 유지)
when: "'etcd' in group_names"
- { role: reset, tags: reset, when: reset_nodes | default(True) | bool } # 노드에 설치된 쿠버네티스 구성 요소(kubeadm reset -f, binaries, configs, network interfaces)를 삭제하여 클린 상태로 만듭니다.
# Currently cannot remove first control plane node or first etcd node # 첫 번째 마스터 노드나 첫 번째 etcd 노드는 이 플레이북으로 제거할 수 없습니다(클러스터 파괴 위험)
- name: Post node removal # 클러스터의 마스터 노드(Control Plane) 설정에서 제거된 노드에 대한 잔재 정보를 완전히 삭제합니다.
hosts: "{{ node | default('this_is_unreachable') }}"
gather_facts: false
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults, when: reset_nodes | default(True) | bool }
- { role: remove-node/post-remove, tags: post-remove } # kubectl delete node <node명> 로 노드 메타데이터 삭제
노드 삭제 (2분 소요) 후 다시 노드 추가 (3분 소요) k8s-node5
# webpod deployment 에 pdb 설정 : 해당 정책은 항상 최소 2개의 Pod가 Ready 상태여야 함 , drain / eviction 시 단 하나의 Pod도 축출 불가
kubectl scale deployment webpod --replicas 1
kubectl scale deployment webpod --replicas 2
cat <<EOF | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: webpod
namespace: default
spec:
maxUnavailable: 0
selector:
matchLabels:
app: webpod
EOF
# 확인
kubectl get pdb
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
webpod N/A 0 0 6s
# 삭제 실패
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml --list-tags
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5
...
PLAY [Confirm node removal] *******************************************************************************************************
Thursday 29 January 2026 14:10:10 +0900 (0:00:00.106) 0:00:01.562 ******
[Confirm Execution]
Are you sure you want to delete nodes state? Type 'yes' to delete nodes.: yes
...
TASK [remove_node/pre_remove : Remove-node | List nodes] **************************************************************************
ok: [k8s-node5 -> k8s-node1(192.168.10.11)] => {"changed": false, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "get", "nodes", "-o", "go-template={{ range .items }}{{ .metadata.name }}{{ \"\\n\" }}{{ end }}"], "delta": "0:00:00.159970", "end": "2026-01-31 15:02:13.863633", "msg": "", "rc": 0, "start": "2026-01-31 15:02:13.703663", "stderr": "", "stderr_lines": [], "stdout": "k8s-node1\nk8s-node2\nk8s-node3\nk8s-node4\nk8s-node5", "stdout_lines": ["k8s-node1", "k8s-node2", "k8s-node3", "k8s-node4", "k8s-node5"]}
Saturday 31 January 2026 15:02:13 +0900 (0:00:00.552) 0:00:22.561 ******
FAILED - RETRYING: [k8s-node5 -> k8s-node1]: Remove-node | Drain node except daemonsets resource (3 retries left).
CTRL+C 취소
# pdb 삭제
kubectl delete pdb webpod
# 다시 삭제 시도 : 2분 20초 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5
...
PLAY [Confirm node removal] *******************************************************************************************************
Thursday 29 January 2026 14:10:10 +0900 (0:00:00.106) 0:00:01.562 ******
[Confirm Execution]
Are you sure you want to delete nodes state? Type 'yes' to delete nodes.: yes
...
# 확인
kubectl get node -owide
# 삭제 확인
ssh k8s-node5 tree /etc/kubernetes
ssh k8s-node5 tree /var/lib/kubelet
ssh k8s-node5 pstree -a
# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
EOF
모니터링 설정
NFS subdir external provisioner 설치
# NFS subdir external provisioner 설치 : admin-lb 에 NFS Server(/srv/nfs/share) 설정 되어 있음
kubectl create ns nfs-provisioner
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm install nfs-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner -n nfs-provisioner \
--set nfs.server=192.168.10.10 \
--set nfs.path=/srv/nfs/share \
--set storageClass.defaultClass=true
# 스토리지 클래스 확인
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client (default) cluster.local/nfs-provisioner-nfs-subdir-external-provisioner Delete Immediate true 30s
# 파드 확인
kubectl get pod -n nfs-provisioner -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nfs-provisioner-nfs-subdir-external-provisioner-b549b9dff-b2bsn 1/1 Running 0 57s 10.244.1.4 k8s-w1 <none> <none>
kube-prometheus-stack 설치, 대시보드 추가
# repo 추가
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
# 파라미터 파일 생성
cat <<EOT > monitor-values.yaml
prometheus:
prometheusSpec:
scrapeInterval: "20s"
evaluationInterval: "20s"
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
additionalScrapeConfigs:
- job_name: 'haproxy-metrics'
static_configs:
- targets:
- '192.168.10.10:8405'
externalLabels:
cluster: "myk8s-cluster"
service:
type: NodePort
nodePort: 30001
grafana:
defaultDashboardsTimezone: Asia/Seoul
adminPassword: prom-operator
service:
type: NodePort
nodePort: 30002
alertmanager:
enabled: false
defaultRules:
create: false
kubeProxy:
enabled: false
prometheus-windows-exporter:
prometheus:
monitor:
enabled: false
EOT
cat monitor-values.yaml
# 배포
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 80.13.3 \
-f monitor-values.yaml --create-namespace --namespace monitoring
# 확인
helm list -n monitoring
kubectl get pod,svc,ingress,pvc -n monitoring
kubectl get prometheus,servicemonitors,alertmanagers -n monitoring
kubectl get crd | grep monitoring
# 각각 웹 접속 실행 : NodePort 접속
open http://192.168.10.14:30001 # prometheus
open http://192.168.10.14:30002 # grafana : 접속 계정 admin / prom-operator
# 프로메테우스 버전 확인
kubectl exec -it sts/prometheus-kube-prometheus-stack-prometheus -n monitoring -c prometheus -- prometheus --version
prometheus, version 3.9.1
# 그라파나 버전 확인
kubectl exec -it -n monitoring deploy/kube-prometheus-stack-grafana -- grafana --version
grafana version 12.3.1
그라파나 Dashboard : 15661, 12693 & https://github.com/dotdc/grafana-dashboards-kubernetes
# 대시보드 다운로드
curl -o 12693_rev12.json https://grafana.com/api/dashboards/12693/revisions/12/download
curl -o 15661_rev2.json https://grafana.com/api/dashboards/15661/revisions/2/download
curl -o k8s-system-api-server.json https://raw.githubusercontent.com/dotdc/grafana-dashboards-kubernetes/refs/heads/master/dashboards/k8s-system-api-server.json
# sed 명령어로 uid 일괄 변경 : 기본 데이터소스의 uid 'prometheus' 사용
sed -i -e 's/${DS_PROMETHEUS}/prometheus/g' 12693_rev12.json
sed -i -e 's/${DS__VICTORIAMETRICS-PROD-ALL}/prometheus/g' 15661_rev2.json
sed -i -e 's/${DS_PROMETHEUS}/prometheus/g' k8s-system-api-server.json
# my-dashboard 컨피그맵 생성 : Grafana 포드 내의 사이드카 컨테이너가 grafana_dashboard="1" 라벨 탐지!
kubectl create configmap my-dashboard --from-file=12693_rev12.json --from-file=15661_rev2.json --from-file=k8s-system-api-server.json -n monitoring
kubectl label configmap my-dashboard grafana_dashboard="1" -n monitoring
# 대시보드 경로에 추가 확인
kubectl exec -it -n monitoring deploy/kube-prometheus-stack-grafana -- ls -l /tmp/dashboards
-rw-r--r-- 1 grafana 472 333790 Jan 22 06:27 12693_rev12.json
-rw-r--r-- 1 grafana 472 198839 Jan 22 06:27 15661_rev2.json
...

업그레이드
(사전 작업) flannel cni plugin upgrade
# 관련 변수 검색
grep -Rni "flannel" inventory/mycluster/ playbooks/ roles/ --include="*.yml" -A2 -B1
...
roles/kubespray_defaults/defaults/main/download.yml:115:flannel_version: 0.27.3
roles/kubespray_defaults/defaults/main/download.yml:116:flannel_cni_version: 1.7.1-flannel1
roles/kubespray_defaults/defaults/main/download.yml:219:flannel_image_repo: "{{ docker_image_repo }}/flannel/flannel"
roles/kubespray_defaults/defaults/main/download.yml:220:flannel_image_tag: "v{{ flannel_version }}"
roles/kubespray_defaults/defaults/main/download.yml:221:flannel_init_image_repo: "{{ docker_image_repo }}/flannel/flannel-cni-plugin"
roles/kubespray_defaults/defaults/main/download.yml:222:flannel_init_image_tag: "v{{ flannel_cni_version }}"
# 현재 정보 확인
kubectl get ds -n kube-system -owide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-flannel 0 0 0 0 0 <none> 167m kube-flannel docker.io/flannel/flannel:v0.27.3 app=flannel
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
# 노드에 미리 이미지 다운로드 해두기 : play 로 미리 다운로드 후 적용이니 굳이 아래 과정 할 필요 없음
ssh k8s-node3 crictl pull ghcr.io/flannel-io/flannel:v0.27.4
ssh k8s-node3 crictl pull ghcr.io/flannel-io/flannel-cni-plugin:v1.8.0-flannel1
# flannel 설정 수정
cat << EOF >> inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
flannel_version: 0.27.4
EOF
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# 모니터링
watch -d "ssh k8s-node3 crictl ps"
# flannel tag : Network plugin flannel => 아래 전부 실패
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" --limit k8s-node3 -e kube_version="1.32.9"
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "network,flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "network,flannel" --limit k8s-node3 -e kube_version="1.32.9"
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "cni,network,flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "cni,network,flannel" --limit k8s-node3 -e kube_version="1.32.9"
## cordon -> apiserver 파드 재생성 -> uncordon
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --limit k8s-node3 -e kube_version="1.32.9"
# flannel 은 ds 이므로 특정 대상 노드로 수행 불가 -> 민감한 클러스터 환경이라면 cni plugin 은 kubespary 와 별로 배포 관리 후 특정 노드별 순차 적용 해야 될듯.
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" -e kube_version="1.32.9"
# 확인
kubectl get ds -n kube-system -owide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-flannel 0 0 0 0 0 <none> 3h27m kube-flannel docker.io/flannel/flannel:v0.27.4 app=flannel
...
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
docker.io/flannel/flannel v0.27.4 7a52f3ae4ee60 33.2MB
kubectl get pod -n kube-system -l app=flannel -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel-ds-arm64-48r2f 1/1 Running 0 98s 192.168.10.11 k8s-node1 <none> <none>
kube-flannel-ds-arm64-hchn8 1/1 Running 0 108s 192.168.10.15 k8s-node5 <none> <none>
kube-flannel-ds-arm64-jbjw9 1/1 Running 0 2m13s 192.168.10.12 k8s-node2 <none> <none>
kube-flannel-ds-arm64-qf6q9 1/1 Running 0 112s 192.168.10.13 k8s-node3 <none> <none>
kube-flannel-ds-arm64-qtv2m 1/1 Running 0 2m2s 192.168.10.14 k8s-node4 <none> <none>
kubespary 업그레이드 공식 문서 - Docs
- Unsafe upgrade 안전하지 않은 (즉시) 업그레이드
- cluster.yml 사용
- -e upgrade_cluster_setup=true 정상적인 업그레이드 과정에서만 수행되는 kube-apiserver와 같은 배포를 즉시 마이그레이션
- Graceful upgrade 우아한 업그레이드 : 노드의 cordon 설정, drain 및 uncordon 를 지원 ← 최소 1개의 kube_control_plane이 이미 배포되어 있는 환경
- upgrade-cluster.yml 사용
- serial: 20%(기본값) 기본값 사용 시 20% 비중 먼지 실행, 만약 1로 설정하면 worker Node를 1개씩 업그레이드 수행
- Ansible serial : 지정된 수 혹은 백분율의 호스트에 play 실행 후 다음 호스트 실행 - Docs
- Pausing the upgrade
- upgrade_node_confirm: true
- 이 기능을 사용하면 각 노드를 업그레이드하기 전에 플레이북 실행이 일시 중지됩니다.
- 터미널에서 "yes"를 입력하여 수동으로 승인하면 플레이북 실행이 다시 시작됩니다
- upgrade_node_pause_seconds: 60
- 이 기능을 사용하면 각 노드를 업그레이드하기 전에 플레이북 실행이 60초 동안 일시 중지됩니다. 60초 후 플레이북 실행이 자동으로 재개됩니다.
- upgrade_node_post_upgrade_confirm: true
- 이 옵션은 각 노드 업그레이드 후, 노드의 차단이 해제되기 전에 플레이북 실행을 일시 중지합니다. 터미널에서 "yes"를 입력하여 수동으로 승인하면 플레이북 실행이 다시 시작됩니다.
- upgrade_node_post_upgrade_pause_seconds: 60
- 이 옵션은 각 노드 업그레이드 후, 노드의 차단이 해제되기 전에 플레이북 실행을 60초 동안 일시 중지합니다. 60초 후 플레이북 실행이 자동으로 재개됩니다.
- upgrade_node_confirm: true
- 각 업그레이드 *전에* 일시 중지하면 해당 노드에서 실행 중인 Pod를 검사하거나 노드에서 수동 작업을 수행하는 데 유용 할 수 있음
# 수행
# limit 사용 전 모든 노드의 facts 캐시 최신화
ansible-playbook playbooks/facts.yml -b -i inventory/sample/hosts.ini
# 컨트롤 플레인
ansible-playbook upgrade-cluster.yml -b -i inventory/sample/hosts.ini -e kube_version=1.20.7 --limit "kube_control_plane:etcd"
# 워커 노드
ansible-playbook upgrade-cluster.yml -b -i inventory/sample/hosts.ini -e kube_version=1.20.7 --limit "node4:node6:node7:node12"
ansible-playbook upgrade-cluster.yml -b -i inventory/sample/hosts.ini -e kube_version=1.20.7 --limit "node5*"
# 워컨 노드 1번에 1대의 노드만 업그레이드
ansible-playbook upgrade-cluster.yml -b -i inventory/sample/hosts.ini -e kube_version=1.20.7 -e "serial=1"
설치 순서
- 도커
- 컨테이너
- 기타
- kubelet과 kube-proxy
- 네트워크 플러그인(예: Calico)
- kube-apiserver, kube-scheduler 및 kube-controller-manager
- 추가 기능(예: KubeDNS)
'Kubernetes' 카테고리의 다른 글
| [7주차] RKE2 & Cluster API (0) | 2026.02.22 |
|---|---|
| [6주차] Kubespray offline 설치 (0) | 2026.02.15 |
| [4주차] Kubespary 배포 분석 (0) | 2026.02.06 |
| [3주차] Kubeadm & K8S Upgrade (0) | 2026.01.20 |
| [2주차] Ansible 기초 (0) | 2026.01.12 |