Close Menu
    Trending
    • This Mac and Microsoft Bundle Pays for Itself in Productivity
    • Candy AI NSFW AI Video Generator: My Unfiltered Thoughts
    • Anaconda : l’outil indispensable pour apprendre la data science sereinement | by Wisdom Koudama | Aug, 2025
    • Automating Visual Content: How to Make Image Creation Effortless with APIs
    • A Founder’s Guide to Building a Real AI Strategy
    • Starting Your First AI Stock Trading Bot
    • Peering into the Heart of AI. Artificial intelligence (AI) is no… | by Artificial Intelligence Details | Aug, 2025
    • E1 CEO Rodi Basso on Innovating the New Powerboat Racing Series
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Membuat Google Kubernetes Engine (GKE) Cluster yang Dioptimalkan untuk AI/ML | by Xb4sh | Jul, 2025
    Machine Learning

    Membuat Google Kubernetes Engine (GKE) Cluster yang Dioptimalkan untuk AI/ML | by Xb4sh | Jul, 2025

    Team_AIBS NewsBy Team_AIBS NewsJuly 15, 2025No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Perkembangan AI dan Machine Studying (ML) mendorong kebutuhan akan infrastruktur yang scalable, fleksibel, dan hemat biaya. Google Kubernetes Engine (GKE) adalah salah satu solusi paling highly effective untuk menjalankan workload AI, mulai dari serving mannequin, inference, hingga coaching skala besar.

    Pada artikel ini, saya akan membagikan step-by-step membangun GKE Cluster AI-Optimized lengkap, termasuk setup node GPU, auto-scaling, safety, dan ideas penghematan biaya. Artikel ini cocok untuk praktisi yang ingin production-ready, bukan sekadar PoC.

    • Otomatisasi: Deployment, scaling, dan rolling replace mudah.
    • Dukungan GPU/TPU: Native assist NVIDIA, hemat waktu setup.
    • Value Effectivity: Autoscaler dan preemptible node, hemat biaya coaching/inference.
    • Managed Safety: Workload Id, personal node, dan community insurance policies.
    • Integrasi Google Cloud: Stackdriver, BigQuery, GCS, IAM, dsb.
    • Regional vs Zonal: Pilih regional untuk SLA tinggi.
    • Node Pool Terpisah: Pisahkan workload GPU dan CPU.
    • Autoscaling: Aktifkan untuk efisiensi useful resource.
    • Workload Id: Aman, tanpa service account key.
    • Personal Node: Hindari public publicity.
    • VPC-Native: Community lebih aman dan scalable.

    a. Buat Cluster Utama

    export PROJECT_ID=your-project-id
    export REGION=asia-southeast2
    export CLUSTER_NAME=ai-optimized-gke

    gcloud container clusters create $CLUSTER_NAME
    --region $REGION
    --release-channel common
    --enable-ip-alias
    --enable-private-nodes
    --enable-autoscaling
    --enable-autoprovisioning
    --min-cpu 4
    --max-cpu 64
    --min-memory 16
    --max-memory 512
    --enable-shielded-nodes
    --workload-pool=$PROJECT_ID.svc.id.goog
    --machine-type "e2-standard-8"
    --num-nodes "1"
    --enable-stackdriver-kubernetes
    --addons=HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver
    --no-enable-basic-auth
    --no-issue-client-certificate
    --enable-master-authorized-networks
    --master-authorized-networks 1.2.3.4/32 # (Ganti dengan IP kantor)

    b. Tambah Node Pool GPU (misal NVIDIA T4/A100/L4)

    gcloud container node-pools create gpu-pool-t4 
    --cluster $CLUSTER_NAME
    --region $REGION
    --accelerator sort=nvidia-tesla-t4,depend=1
    --machine-type n1-standard-8
    --num-nodes 0
    --min-nodes 0
    --max-nodes 10
    --enable-autoscaling
    --node-labels workload=ai,gpu=t4
    --node-taints ai-gpu=true:NoSchedule
    --scopes=cloud-platform

    c. Tambah Node Pool Excessive-Reminiscence (Opsional)

    gcloud container node-pools create high-mem-pool 
    --cluster $CLUSTER_NAME
    --region $REGION
    --machine-type n2-highmem-32
    --num-nodes 0
    --min-nodes 0
    --max-nodes 10
    --enable-autoscaling
    --node-labels workload=ai,mem=excessive
    --node-taints ai-mem=true:NoSchedule

    Jalankan:

    kubectl apply -f https://uncooked.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/steady/nvidia-driver-installer/cos/daemonset-preloaded.yaml

    Ideas: Untuk node Ubuntu/GKE Autopilot, cek dokumentasi resmi GKE GPU.

    • Workload Id: Default di script, jauh lebih aman dari service account key.
    • Personal Node: Semua node tanpa public IP.
    • Grasp Approved Community: Management aircraft hanya bisa diakses IP tertentu.
    • Shielded Node: Cegah boot malware.
    • Community Coverage: Isolasi visitors pod, implement zero belief.
    • Audit IAM: Minimal privilege, pakai GCP service account scoped.

    Contoh YAML easy untuk inference pakai PyTorch di GPU node:

    apiVersion: apps/v1
    form: Deployment
    metadata:
    identify: pytorch-inference
    spec:
    replicas: 1
    selector:
    matchLabels:
    app: pytorch
    template:
    metadata:
    labels:
    app: pytorch
    spec:
    containers:
    - identify: pytorch
    picture: pytorch/torchserve:newest
    assets:
    limits:
    nvidia.com/gpu: 1
    nodeSelector:
    gpu: t4
    tolerations:
    - key: "ai-gpu"
    operator: "Equal"
    worth: "true"
    impact: "NoSchedule"
    • Autoscaler: Aktifkan pada semua node pool.
    • Preemptible GPU: Untuk coaching yang bisa di-interrupt.
    • Scale-to-zero: Node pool bisa di-0-kan, cluster auto idle.
    • Pantau Billing: Setup alert di GCP Billing.
    • Overview Useful resource: Hapus node pool tidak terpakai.
    • Stackdriver (Ops Agent): Default aktif, cek log dan metric di GCP.
    • Prometheus + Grafana: Untuk customized metric AI.
    • Node Drawback Detector: Cek kesehatan {hardware} node.

    Dengan setup ini, Anda bisa menjalankan workload AI/ML (mannequin LLM, NLP, Laptop Imaginative and prescient, dst) di atas GKE secara safe, scalable, dan cost-efficient.
    Jika butuh template YAML, Helm chart, atau automation by way of Terraform — point out di kolom komentar!

    Tertarik?
    Bookmark, share, dan observe untuk replace seputar DevOps, Kubernetes, dan AI Engineering.

    Bonus: Terraform Instance (Partial)

    useful resource "google_container_node_pool" "gpu_pool" {
    identify = "gpu-pool-t4"
    cluster = google_container_cluster.ai_optimized.identify
    location = google_container_cluster.ai_optimized.location
    node_count = 1

    node_config {
    machine_type = "n1-standard-8"
    guest_accelerator {
    sort = "nvidia-tesla-t4"
    depend = 1
    }
    oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
    labels = {
    workload = "ai"
    gpu = "t4"
    }
    taint {
    key = "ai-gpu"
    worth = "true"
    impact = "NO_SCHEDULE"
    }
    }

    autoscaling {
    min_node_count = 0
    max_node_count = 10
    }
    }

    Referensi:



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleHow Fuzzy Matching and Machine Learning Are Transforming AML Technology
    Next Article How to Ensure Reliability in LLM Applications
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Anaconda : l’outil indispensable pour apprendre la data science sereinement | by Wisdom Koudama | Aug, 2025

    August 2, 2025
    Machine Learning

    Peering into the Heart of AI. Artificial intelligence (AI) is no… | by Artificial Intelligence Details | Aug, 2025

    August 2, 2025
    Machine Learning

    Why I Still Don’t Believe in AI. Like many here, I’m a programmer. I… | by Ivan Roganov | Aug, 2025

    August 2, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    This Mac and Microsoft Bundle Pays for Itself in Productivity

    August 2, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    The Definitive Guide to Mastering HuggingFace’s SO-101 with Jetson Nano Orin | by Keerthan K. Krishnamoorthy | Jul, 2025

    July 24, 2025

    Meta AI Lead: Humans Will Be the Boss of Superintelligent AI

    March 20, 2025

    Report: NVIDIA and AMD Devising Export Rules-Compliant Chips for China AI Market

    May 29, 2025
    Our Picks

    This Mac and Microsoft Bundle Pays for Itself in Productivity

    August 2, 2025

    Candy AI NSFW AI Video Generator: My Unfiltered Thoughts

    August 2, 2025

    Anaconda : l’outil indispensable pour apprendre la data science sereinement | by Wisdom Koudama | Aug, 2025

    August 2, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.