๐งฑ CI/CD Pipelines
-
Jenkins test stage fails
-
Answer: Check environment variables, script logic, and if dependencies are missing. Use
echo
statements for debugging.
-
-
GitLab pipeline stuck in pending
-
Answer: Runner may be offline, not registered, or tags mismatch with job definition.
-
-
Secrets printing in logs
-
Answer: Mask variables, use credential stores, and avoid printing secret variables.
-
-
Old version deployed
-
Answer: Check source branch, Git tags, image tag in Docker, or pipeline caching issues.
-
-
Approval before deploy
-
Answer: Use
input
step in Jenkins or manual approval in GitLab/GitHub Actions.
-
-
Trigger build on tag
-
Answer: Configure webhook or job trigger for tag pattern (e.g.,
refs/tags/*
).
-
-
Build fails on one branch
-
Answer: Compare differences in branch-specific configs or pipeline YAML.
-
-
Skip tests on branches
-
Answer: Add branch-based condition in pipeline:
if: $CI_COMMIT_BRANCH != 'main'
.
-
-
Scheduled pipeline not running
-
Answer: Check cron syntax, timezone, and whether the schedule is enabled.
-
-
Missing Maven in Jenkins
-
Answer: Install Maven via Jenkins global tools or Docker image with Maven.
-
☁️ Cloud Platforms
-
EC2 instance unreachable
-
Answer: Check security group, public IP, and SSH key.
-
-
Azure VM Terraform fails intermittently
-
Answer: Could be resource quota, rate limit, or dependency timing.
-
-
Restrict access to S3
-
Answer: Use bucket policies, IAM roles, and ACLs.
-
-
Blue-green in AWS
-
Answer: Use ELB with two target groups, switch traffic between them.
-
-
GCP Cloud Run 503
-
Answer: Check logs, start timeouts, and ensure correct container port is exposed.
-
-
Autoscale VMs
-
Answer: Use AWS Auto Scaling Groups or Azure VMSS.
-
-
Static IP in Azure
-
Answer: Define a public IP resource in Terraform and associate with the VM NIC.
-
-
Rotate cloud access keys
-
Answer: Use IAM best practices: create new key, update, then delete old one.
-
-
Recover deleted cloud resource
-
Answer: Use backups, snapshots, or reapply Terraform.
-
-
Azure user locked out
-
Answer: Check AAD MFA settings, RBAC, or reset password via admin.
-
๐ Infrastructure as Code
-
Terraform state lock error
-
Answer: Unlock manually with
terraform force-unlock
or wait until automatic timeout.
-
-
Manual change in portal
-
Answer: Terraform will detect drift; reapply or import manually changed resource.
-
-
Resume Ansible playbook
-
Answer: Use
--start-at-task
or handle idempotency withwhen
conditions.
-
-
Secure Terraform secrets
-
Answer: Use environment variables,
vault
, ortfvars
ignored in.gitignore
.
-
-
Unexpected resource replacement
-
Answer: Check for immutable fields in the config like
name
orregion
.
-
-
Rollback infra version
-
Answer: Use VCS for
.tf
files, revert to last working version.
-
-
Pushed tfstate to Git
-
Answer: Remove from repo, rotate secrets if any exposed, add to
.gitignore
.
-
-
terraform taint vs destroy
-
Answer:
taint
marks a resource to be recreated;destroy
removes it.
-
-
Multi-env with Terraform
-
Answer: Use workspaces or directory-based separation with separate state files.
-
-
New module not picked up
-
Answer: Run
terraform get -update=true
or ensure module source path is correct.
-
๐ฆ Docker & Containers
-
Container won’t start
-
Answer: Check logs, entrypoint errors, or port conflicts.
-
-
App crashes in container
-
Answer: Missing dependencies, wrong base image, or environment mismatch.
-
-
Reduce image size
-
Answer: Use smaller base images (alpine), multi-stage builds, clean up cache.
-
-
Port in use error
-
Answer: Use a different port or stop the conflicting service.
-
-
Share data between containers
-
Answer: Use Docker volumes or bind mounts.
-
-
ENTRYPOINT vs CMD
-
Answer:
ENTRYPOINT
is the fixed binary;CMD
passes default arguments.
-
-
Secrets in Docker
-
Answer: Use Docker secrets or environment variables injected via orchestrator.
-
-
Debug a running container
-
Answer: Use
docker exec -it <id> /bin/sh
or attach to logs.
-
-
.dockerignore usage
-
Answer: Prevents unnecessary files from being sent to Docker daemon.
-
-
Container exits immediately
-
Answer: Entrypoint script finishes or crashes.
-
☨ Kubernetes (Beginner)
-
Pod in CrashLoopBackOff
-
Answer: View logs (
kubectl logs
), check init containers, and probes.
-
-
Service not reachable
-
Answer: Check labels, service selector, and port mapping.
-
-
Scale deployment
-
Answer:
kubectl scale deployment <name> --replicas=n
-
-
Rollback deployment
-
Answer:
kubectl rollout undo deployment <name>
-
-
Secret not mounting
-
Answer: Ensure secret exists, proper volumeMount path and names are correct.
-
-
Zero-downtime deploy
-
Answer: Use readiness/liveness probes and rolling updates.
-
-
Apply vs Create
-
Answer:
apply
updates existing resources,create
only adds new.
-
-
Expose pod externally
-
Answer: Use LoadBalancer or Ingress.
-
-
View pod logs
-
Answer:
kubectl logs <pod-name>
-
-
Schedule on specific node
-
Answer: Use nodeSelector, affinity, or tolerations.
-
-----------------------------------------------------------------------------------------------------------------------
-
Accidentally committed secrets to GitHub
-
Answer: Immediately remove the file, rotate the secrets, and use tools like
git filter-branch
orBFG Repo-Cleaner
to scrub history.
-
-
Manage access keys in CI/CD
-
Answer: Store them securely in credential stores or environment variables, never hardcode in scripts.
-
-
Scan for vulnerabilities
-
Answer: Use tools like Snyk, Trivy, or GitHub Advanced Security to detect known CVEs.
-
-
chmod 777 issue
-
Answer: Avoid it; it's insecure. Assign minimum required permissions with chown and chmod.
-
-
Enforce MFA
-
Answer: Use IAM policies, enable MFA in cloud provider account settings.
-
-
Jenkins UI exposed publicly
-
Answer: Add authentication, firewall rules, and use HTTPS.
-
-
Access private Docker registries securely
-
Answer: Use
docker login
, store creds securely, or use orchestrator secrets.
-
-
Secure Terraform state
-
Answer: Use remote backends like S3 with encryption and versioning.
-
-
Rotate SSH keys
-
Answer: Generate new keys, update authorized_keys on all hosts, and remove old keys.
-
-
Implement least privilege
-
Answer: Define fine-grained roles, restrict permissions to what's necessary.
-
๐ Monitoring & Logging
-
App is slow
-
Answer: Check CPU, memory, I/O, network, and response time metrics.
-
-
Logs not in CloudWatch
-
Answer: Ensure log group and stream exist; verify IAM role permissions.
-
-
High CPU alerts
-
Answer: Set CloudWatch or Prometheus alerts with defined thresholds.
-
-
Push vs Pull monitoring
-
Answer: Push: metrics sent to a collector (e.g., StatsD); Pull: metrics scraped (e.g., Prometheus).
-
-
Track deployment events
-
Answer: Emit custom logs or use deployment tracking tools (e.g., Rollbar, Datadog).
-
-
Disk space filling
-
Answer: Use
du
,df
, and log rotation; check temp files or large directories.
-
-
Monitor K8s pods
-
Answer: Use
kubectl top pod
or integrate Prometheus/Grafana.
-
-
Log visualization
-
Answer: Use tools like Kibana, Grafana Loki, or ELK stack.
-
-
False positive alerts
-
Answer: Tune thresholds, use alert deduplication and anomaly detection.
-
-
Duplicate alerts
-
Answer: Check for misconfigured alert rules or overlapping checks.
-
๐งช Automation & Scripting
-
Automate log cleanup
-
Answer: Write a Bash script with
find
and schedule with cron.
-
-
Script fails on server
-
Answer: Check for shell compatibility, permissions, and dependencies.
-
-
Schedule backup with cron
-
Answer: Create a script and add an entry to crontab like
0 2 * * * /backup.sh
.
-
-
Test Bash script safely
-
Answer: Use
set -x
and run in a controlled test environment.
-
-
Permission denied in Python
-
Answer: Check file permissions, user privileges, and SELinux/AppArmor if enabled.
-
-
Send alerts to Slack
-
Answer: Use
curl
to post JSON to a Slack webhook URL.
-
-
Capture script output
-
Answer: Redirect to a log file:
./script.sh > output.log 2>&1
-
-
Cronjob not running
-
Answer: Check crontab syntax, path to script, and user permissions.
-
-
Check and restart service
-
Answer:
if ! systemctl is-active --quiet myservice; then systemctl restart myservice; fi
-
-
Automate passwordless SSH
-
Answer: Generate SSH key and copy public key using
ssh-copy-id
.
-
๐ Git & Version Control
-
Pushed to wrong branch
-
Answer: Revert the commit or cherry-pick to the correct branch and force push.
-
-
Revert merge commit
-
Answer: Use
git revert -m 1 <merge_commit_hash>
.
-
-
git reset vs revert
-
Answer:
reset
changes history;revert
adds a new commit to undo.
-
-
Remove secrets from Git history
-
Answer: Use
BFG
orgit filter-branch
, then force push.
-
-
Force push broke build
-
Answer: Identify last working commit, create hotfix, avoid force pushes.
-
-
Squash commits
-
Answer: Use
git rebase -i
to squash, then push with--force-with-lease
.
-
-
package-lock.json conflict
-
Answer: Resolve manually by merging changes or regenerating the lock file.
-
-
Enforce branch naming
-
Answer: Use Git hooks or CI validation scripts.
-
-
Git branching strategy
-
Answer: Use GitFlow, trunk-based, or feature branch workflows.
-
-
Make hotfix without disrupting main
-
Answer: Branch from main, fix, test, and merge with minimal changes.
-
⟳ Troubleshooting & Operations
-
Server is down
-
Answer: Ping, SSH, check logs, disk, memory, and restart services.
-
-
IP blacklisted
-
Answer: Contact provider, rotate IP, or review firewall/email settings.
-
-
Setup load balancer
-
Answer: Use NGINX, HAProxy, or cloud LB service with backend configs.
-
-
Jenkins agent not connecting
-
Answer: Check network, authentication tokens, and agent logs.
-
-
Deploy works in staging but not prod
-
Answer: Check environment variables, configs, secrets, and IAM roles.
-
-
Allow only Cloudflare traffic
-
Answer: Use firewall rules to allow only Cloudflare IP ranges.
-
-
Track config changes
-
Answer: Store configs in Git, use tools like Ansible or Puppet.
-
-
Breakage due to dependency
-
Answer: Use version pinning and virtual environments.
-
-
Test without affecting prod
-
Answer: Use staging environment or feature flags.
-
-
Prepare for high traffic
- **Answer:** Scale resources, use CDN, caching, load testing
No comments:
Post a Comment