Article

Kubernetes in the cloud: Architecture, security and best practices

140x140px_Aniruddha_Chakrabarti.png
By:
Aniruddha Chakrabarti
Kubernetes in the cloud: Architecture, security and best practices - Grant Thornton Bharat
Kubernetes has transformed the way modern applications are deployed and managed, becoming a fundamental part of cloud-native architecture. Kubernetes cloud’s declarative configuration model, self-healing capabilities and extensibility allow organisations to automate infrastructure tasks and focus on delivering scalable, resilient applications.
Contents

Managed Kubernetes services such as Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE) have simplified cluster provisioning and maintenance. These services enable teams to innovate quickly without being weighed down by operational overhead.

As of 2025, Kubernetes has become the leading orchestration platform for cloud-native enterprises. Kubernetes cloud’s adaptability and extensive ecosystem are crucial for managing complex, distributed workloads across hybrid and multi-cloud environments.

However, successful adoption involves navigating challenges across architecture, operations, security and cost. This article explores those challenges and outlines best practices, drawing from Grant Thornton Bharat’s experience in delivering enterprise-grade Kubernetes solutions.

Key challenges in running Kubernetes in production environments

  1. Design: 

    In a production environment, design is not limited to the choice of cloud provider. It also includes how workloads are structured, isolated and scaled.
    • Multi-tenancy complexity: Designing clusters for multiple teams or business units can lead to namespace sprawl, conflicting resource quotas and shared infrastructure risks.
    • Cluster per environment vs shared clusters: Teams often face challenges deciding whether to isolate development, staging and production environments into separate clusters or manage them within a single cluster. This decision affects cost, security and operational overheads.
    • Pod placement and affinity rules: Incorrect use of node affinity, taints and tolerations may result in uneven workload distribution and underutilised nodes.
  2. Architecture: 

    Architectural decisions directly impact the scalability, resilience and maintainability of a Kubernetes environment.
    • Control plane limitations: In managed services such as AKS and EKS, the control plane is abstracted. However, API throttling and performance of etc., can still affect large-scale deployments.
    • Network architecture: Cloud-native networking, including container network interface (CNI) plugins and service meshes, can introduce latency, complicate debugging and make it difficult to enforce network policies across namespaces.
    • Ingress and traffic routing: Managing ingress controllers such as NGINX, application load balancer (ALB) and Istio in multi-tenant environments can lead to routing conflicts, transport layer security (TLS) misconfigurations and performance bottlenecks.
  3. Build: 

    Building for Kubernetes in production requires secure integration with continuous integration/continuous delivery (CI/CD), along with a focus on consistency and compliance.
    • Image vulnerabilities: Without automated scanning tools such as Trivy or Clair, vulnerable base images may reach production, exposing clusters to common vulnerabilities and exposures (CVEs).
    • Helm chart mismanagement: Poor version control or inconsistent values in Helm charts can lead to failed deployments, particularly across different environments.
    • GitOps drift: In GitOps workflows, manual changes to live clusters can cause drift from the declared state, resulting in unpredictable behaviour and failed rollbacks.
  4. Operation: 

    Operations in production Kubernetes environments are where most teams feel the pain especially in performance, security, and cost.
    • Performance
      • Resource misallocation: Over-provisioned central processing unit (CPU)/memory leads to wasted spend, while under-provisioned pods cause throttling and out-of-memory (OOM) kills.
      • Autoscaling misconfigurations: Horizontal pod autoscaler (HPA) and vertical pod autoscaler (VPA) often rely on inaccurate metrics or thresholds, causing flapping or delayed scaling.
      • Node pool fragmentation: Mixing workloads with different resource profiles on the same node pool leads to bin-packing inefficiencies.
    • Security
      • Role-based access control (RBAC) misconfigurations: Granting overly permissive roles such as cluster-admin increases the risk of lateral movement within the cluster.
      • Secrets leakage: Storing secrets in plain-text ConfigMaps or failing to rotate them regularly may lead to data breaches.
      • Pod security policies (PSPs) deprecation: Many clusters still rely on deprecated PSPs and have not yet migrated to tools such as Kyverno or open policy agent (OPA) Gatekeeper for policy enforcement.
    • Optimisation and cost
      • Idle workloads: Resources are often consumed by unused test pods, CronJobs or zombie deployments.
      • Persistent volume waste: Persistent volumes (PVs) are frequently left orphaned after the deletion of pods, especially in stateful workloads.
      • Lack of cost visibility: Without tools such as Kubecost or native billing integrations, it is difficult to allocate costs accurately across teams, namespaces or services.

Best practices for cloud-native Kubernetes success

To overcome these challenges, organisations should adopt best practices aligned with their cloud platform and operational maturity.

  • Platform design
    Effective platform design starts with clear strategies for environment isolation, workload distribution and governance. Rather than defaulting to shared clusters, organisations should evaluate the trade-offs between multi-cluster and multi-namespace architectures.
    Namespace-level isolation, enforced with resource quotas and limit ranges, helps to prevent contention and ensure fair usage. Pod placement should be deliberate. Using affinity rules, taints and tolerations allows workloads to be scheduled according to resource needs and operational priorities.
    Recommended tools: Kubernetes native policies, Azure Policy (for Azure Kubernetes Service), identity and access management (IAM) roles for service accounts (for Amazon Elastic Kubernetes Service), workload identity (for Google Kubernetes Engine).
  • Resilient and scalable architecture
    Although managed services such as AKS, EKS and GKE manage the control plane, it is still essential to architect applications for scalability and resilience.
    Service meshes provide secure and observable communication between services. Ingress controllers must be configured with TLS, path-based routing and integration with web application firewalls (WAFs) to ensure secure and efficient traffic handling.
    Network architecture should include enforcement of policies. Select CNI plugins based on performance and compatibility. Network segmentation should be implemented to prevent lateral movement.
    Recommended tools: Istio, Linkerd, NGINX Ingress, AWS Application Load Balancer Ingress, Azure Application Gateway, GKE Ingress.
  • Secure and automated CI/CD pipelines
    Kubernetes delivery pipelines should be automated, secure and auditable. Image scanning must be integrated into CI to detect vulnerabilities early. Declarative deployment models promote consistency and traceability.
    Helm charts should be version-controlled and validated prior to use. Manual updates to clusters should be avoided to prevent drift. Audit mechanisms should identify and correct deviations from the declared state.
    Recommended tools: ArgoCD, Flux, Helm, Trivy, Snyk, Azure DevOps, GitHub Actions.
  • Operational maturity and observability
    Operational excellence depends on strong observability and proactive management. Metrics, logs and traces should be collected and correlated to provide insights.
    Autoscaling should reflect actual usage patterns. Horizontal and vertical pod autoscalers must be configured with appropriate metrics and tested under real workloads. Node pools should be segmented by workload type to maximise resource efficiency.
    Recommended tools: Prometheus, Grafana, OpenTelemetry, Goldilocks, Azure Monitor, Amazon CloudWatch, GKE Monitoring.
  • Security governance and policy enforcement
    Security controls must be implemented at every layer.  RBAC should follow the principle of least privilege, and role bindings should be reviewed regularly. Secrets must be securely stored and rotated on a regular schedule.
    Admission controllers should be used to enforce security policies during deployment. Pod security standards must be adopted to minimise risks such as privilege escalation and container breakout.
    Recommended tools: Kyverno, Open Policy Agent Gatekeeper, Azure Key Vault, AWS Secrets Manager, Google Cloud Platform Secret Manager.
  • Cost awareness and resource optimisation
    Cost optimisation begins with visibility. Without the proper tools, idle workloads, orphaned volumes and oversized pods can go unnoticed and lead to increased costs.
    Select storage classes that align with performance and retention requirements. Enforce lifecycle policies to remove unused resources and logs. Autoscaling should be configured based on actual demand rather than theoretical peak usage.
    Recommended tools: Kubecost, GKE Autopilot, Azure Advisor, AWS Compute Optimizer.

Grant Thornton Bharat’s expertise in Kubernetes solutions

Grant Thornton Bharat offers a full suite of Kubernetes services across AKS, EKS and GKE, supporting organisations on their cloud-native transformation journey:

  • Cloud-native consulting: Platform selection, architecture design and migration planning.
  • DevSecOps enablement: Secure CI/CD pipelines, GitOps workflows and policy enforcement.
  • Managed operations: Around-the-clock monitoring, patching, backup and incident response.
  • Security and compliance audits: Centre for internet security (CIS) benchmarking, IAM reviews and zero-trust architecture.
  • FinOps and optimisation: Cost visibility, resource right-sizing and usage alignment.

Venkata Jagadeesh Kuriti, Assistant Manager, Grant Thornton Bharat, has also contributed to this article.

Discover how our Cloud Consulting services help you
Learn more
554x544px_Website_Photographs_647.jpg