agentskills.codes
K8

Kubernetes debugging commands and troubleshooting for the K3s cluster

Install

mkdir -p .claude/skills/k8s-debug && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/13379" && unzip -o skill.zip -d .claude/skills/k8s-debug && rm skill.zip

Installs to .claude/skills/k8s-debug

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Kubernetes debugging commands and troubleshooting for the K3s cluster
69 charsno explicit “when” trigger

About this skill

🔧 Kubernetes Debugging

This skill provides debugging commands and troubleshooting guides for the K3s cluster on AWS.

Cluster Overview

ComponentDetails
CloudAWS (Tel Aviv region)
InstanceEC2 t3.small
KubernetesK3s
IngressNGINX Ingress Controller
TLScert-manager + Let's Encrypt
GitOpsArgoCD

Quick Status Commands

Check All Resources

kubectl get all

Check Pods

# All pods
kubectl get pods

# With more details
kubectl get pods -o wide

# Watch for changes
kubectl get pods -w

Check Specific App Pods

kubectl get pods -l app=barak-web           # Frontend
kubectl get pods -l app=barak-web-backend   # Backend
kubectl get pods -l app=barak-web-db        # Database

Common Debugging Commands

View Pod Logs

# Frontend logs
kubectl logs -l app=barak-web

# Backend logs
kubectl logs -l app=barak-web-backend

# Follow logs in real-time
kubectl logs -f -l app=barak-web-backend

# Last 100 lines
kubectl logs -l app=barak-web --tail=100

Describe Pod (for errors)

kubectl describe pod <pod-name>

# Example
kubectl describe pod barak-web-backend-xxx-xxx

Check Services

kubectl get svc

# Expected:
# barak-web-service         ClusterIP   80/TCP
# barak-web-backend-service ClusterIP   80/TCP
# barak-web-db-service      ClusterIP   5432/TCP

Check Ingress

kubectl get ingress

# Detailed
kubectl describe ingress barak-web-ingress

Check Secrets

kubectl get secrets

# View secret (base64 encoded)
kubectl get secret barak-web-secrets -o yaml

Troubleshooting Scenarios

1. Pods Not Starting (CrashLoopBackOff)

# Check pod status
kubectl get pods

# If CrashLoopBackOff, check logs
kubectl logs <pod-name>

# Check events
kubectl describe pod <pod-name>

Common Causes:

  • Missing secrets
  • Database connection failed
  • Image pull error

2. 502 Bad Gateway

# Check if pods are running
kubectl get pods

# Check ingress backend
kubectl describe ingress barak-web-ingress

# Check service endpoints
kubectl get endpoints

Common Causes:

  • Pod not ready
  • Service selector mismatch
  • Health check failing

3. Database Connection Issues

# Check DB pod
kubectl get pods -l app=barak-web-db

# Check DB logs
kubectl logs -l app=barak-web-db

# Test connection from backend
kubectl exec -it <backend-pod> -- sh
> nc -zv barak-web-db-service 5432

4. TLS/Certificate Issues

# Check certificate
kubectl get certificate

# Check cert-manager logs
kubectl logs -n cert-manager -l app=cert-manager

# Check secret
kubectl get secret barak-web-tls

5. ArgoCD Sync Issues

# Check ArgoCD app status
argocd app get barak-web

# Force sync
argocd app sync barak-web

# Check sync status
argocd app list

Useful One-Liners

# Restart deployment (trigger rollout)
kubectl rollout restart deployment/barak-web
kubectl rollout restart deployment/barak-web-backend

# Check rollout status
kubectl rollout status deployment/barak-web

# Scale deployment
kubectl scale deployment/barak-web --replicas=2

# Execute shell in pod
kubectl exec -it <pod-name> -- /bin/sh

# Port forward for local testing
kubectl port-forward svc/barak-web-service 8080:80
kubectl port-forward svc/barak-web-backend-service 3000:80

# Get pod resource usage
kubectl top pods

Check Resource Limits

# Current usage
kubectl top pods

# Configured limits (from app.yaml)
# Frontend: 64Mi-128Mi, 250m-500m CPU
# Backend: No limits configured (should add!)
# Database: No limits configured

SSH to EC2 (If Needed)

ssh -i <key.pem> ec2-user@<ec2-public-ip>

# Then use kubectl as normal
kubectl get pods

Quick Health Check Checklist

  • All pods are Running (not CrashLoopBackOff or Pending)
  • Services have endpoints
  • Ingress has ADDRESS assigned
  • TLS certificate is valid
  • ArgoCD shows Synced status
  • Website loads at https://dogs.barakaloni.com
  • Admin dashboard loads at /admin

Search skills

Search the agent skills registry