Moving Beyond Pipelines: Building a Self-Service Developer Platform with GitLab
Count the lines of CI/CD YAML in your monorepo. Now estimate what percentage of your senior engineers actually understand all of it. If you're honest with yourself, the answer is usually: two people, one of whom is on vacation this week.
I. The YAML Fatigue Epidemic
CI/CD pipelines were supposed to make software delivery faster. And they did—for a while. Then your startup scaled from 5 engineers to 50. The monorepo grew. Everyone started copy-pasting the .gitlab-ci.yml from the last project they worked on. Someone added a security scan stage that nobody configured correctly. The deploy job has 47 environment variables that need to exist for the pipeline to not fail, and the only person who knows what all of them do left eight months ago.
The result: deploying a new service requires filing a ticket to the platform team, waiting two days, getting a PR with 200 lines of YAML that nobody outside the team can review meaningfully, and hoping that the environment variables for staging don't drift from production again.
This isn't a YAML problem. It's a cognitive overhead problem. When the accidental complexity of infrastructure management exceeds the capacity of the teams who depend on it, you've created a bottleneck that scales linearly with your organization—every new service, every new engineer, every new environment adds to the pile.
The thesis: The future of DevOps isn't writing better pipelines for developers. It's building an Internal Developer Platform (IDP) that allows developers to serve themselves safely—without needing to understand what's happening underneath.
II. What an Internal Developer Platform Actually Is
The term "Internal Developer Platform" gets misused constantly. It is not:
- A Confluence wiki with deployment runbooks
- A dashboard that shows you your Kubernetes pod counts
- A collection of Slack bots that trigger pipelines
- Backstage, installed by default and never maintained
An IDP is a paved road. The metaphor is deliberate: a paved road doesn't prevent you from going off-road—it makes the common path so much easier that most people choose it voluntarily. A good IDP abstracts infrastructure complexity that doesn't differentiate your product (GCP load balancer configuration, Kubernetes namespace setup, IAM bindings, Terraform state management) and exposes a simpler interface through which developers can do what they actually care about: ship working software.
The concrete outcome is that a frontend engineer who needs to deploy a new backend service shouldn't need to understand:
- How Workload Identity Federation works
- What a Cloud Run concurrency setting does
- Why your Terraform state is stored in GCS and not in GitLab
- Which GCP project maps to which environment
They should be able to write a 15-line configuration file, commit it, and get infrastructure. The platform handles the rest.
The Three Layers
| Layer | What it provides | GitLab primitive |
|---|---|---|
| Self-service catalog | Versioned pipeline components teams consume without owning the internals | CI/CD Component Catalog |
| Golden paths | Opinionated defaults that encode security, cost, and reliability best practices automatically | Base templates, Terraform modules |
| Guardrails | Automated policy enforcement that prevents dangerous configurations without blocking progress | OPA policies, protected environments, approval rules |
III. Leveraging GitLab as Your Platform Foundation
GitLab ships with more IDP-relevant primitives than most teams realize. Most organizations use roughly 30% of them.
The CI/CD Component Catalog
Before the Component Catalog shipped, scaling GitLab CI across a large organization meant one of two things: copy-paste .gitlab-ci.yml fragments between repos (fragile, diverges immediately) or use the include: remote: directive to pull in a shared pipeline file (better, but no versioning or input parameters).
The Component Catalog solves this properly. A component is a versioned, reusable pipeline snippet with a defined interface—inputs it accepts, stages it runs, artifacts it produces. Teams consume components like library packages: they pin a version and pass in the parameters their project needs.
Here's a component that handles the full build-test-scan-deploy cycle for a Node.js service targeting Cloud Run:
# components/nodejs-app/template.yml
spec:
inputs:
service_name:
description: "Name of the service (used for image tagging and Cloud Run service name)"
type: string
environment:
description: "Target environment: staging or production"
type: string
default: staging
gcp_project_id:
description: "GCP project ID for the target environment"
type: string
node_version:
description: "Node.js major version"
type: string
default: "20"
---
stages:
- test
- build
- scan
- deploy
test:
stage: test
image: node:$[[ inputs.node_version ]]
script:
- npm ci --prefer-offline
- npm run lint
- npm test -- --ci
build-and-push:
stage: build
image: docker:24
services:
- docker:24-dind
script:
- docker build -t $CI_REGISTRY_IMAGE/$[[ inputs.service_name ]]:$CI_COMMIT_SHA .
- docker push $CI_REGISTRY_IMAGE/$[[ inputs.service_name ]]:$CI_COMMIT_SHA
sast:
stage: scan
include:
- template: Security/SAST.gitlab-ci.yml
dependency_scanning:
stage: scan
include:
- template: Security/Dependency-Scanning.gitlab-ci.yml
deploy-to-$[[ inputs.environment ]]:
stage: deploy
image: google/cloud-sdk:alpine
script:
- gcloud run deploy $[[ inputs.service_name ]]
--image $CI_REGISTRY_IMAGE/$[[ inputs.service_name ]]:$CI_COMMIT_SHA
--project $[[ inputs.gcp_project_id ]]
--region us-central1
--platform managed
environment:
name: $[[ inputs.environment ]]
url: https://$[[ inputs.service_name ]].run.app A team consuming this component needs zero knowledge of how Docker builds work, how GitLab's image registry is configured, or where security scan results go. Their entire .gitlab-ci.yml becomes this:
include:
- component: $CI_SERVER_FQDN/platform/components/[email protected]
inputs:
service_name: payments-api
environment: production
gcp_project_id: acme-prod-391045 Three lines. Versioned. SAST and dependency scanning run on every commit, by default, whether or not anyone remembered to add them.
Versioning is the critical part. Without it, you can never safely update a shared pipeline without risking breaking every project that uses it. Pin components to semver tags. Treat component releases like library releases: changelog, deprecation notices, migration guide. A @latest alias is fine for development; it's not fine in production pipelines.
Environment Tracking and Deployment Visibility
GitLab Environments give developers something genuinely useful: a live view of what's deployed where. When a deploy job uses the environment: keyword (as the component above does), GitLab tracks which commit is running in staging, which is in production, and exposes one-click rollbacks to any previous deployment.
This is underused. Most teams configure it for production because it's required for protected environments, and ignore it elsewhere. The real value comes from using it consistently across all environments so developers can self-diagnose: "My staging deploy was three hours ago and production was last Tuesday. Is that why the behavior differs?"
Combine environment tracking with GitLab's required approval rules for protected environments and you get a lightweight change management system without ServiceNow. Any deploy to production requires approval from a member of the platform team or a senior engineer. The audit trail is automatic and lives in the same tool everyone already uses.
Security as a Default, Not an Afterthought
The canonical compliance failure: security scanning is a separate pipeline, run by a separate team, on a separate schedule. Findings arrive weeks after the code merged. Nobody fixes them because the context is gone and the engineer has moved on.
GitLab's native scanners—SAST, DAST, dependency scanning, container scanning, secret detection—exist to make security part of every pipeline. They only work if they're in every pipeline. Base templates are exactly what makes that true.
Embed all relevant scanner types into your base component template. Set allow_failure: false for genuinely blocking categories (secrets committed to code, critical CVEs in production images). Use allow_failure: true for informational findings so they appear in the Security Dashboard but don't block releases. Teams get coverage automatically. The platform team can tune thresholds centrally when policies change.
IV. Bridging GitLab to GCP
Workload Identity Federation: Eliminate Long-Lived Keys
The old authentication pattern: create a GCP service account, download a JSON key file, base64-encode it, store it as a GitLab CI variable. That key never rotates, it's accessible to anyone with access to the CI variable store, and when an engineer leaves, nobody remembers to delete the service account.
Workload Identity Federation replaces this with cryptographic trust assertions. GitLab issues an OIDC token to every CI job automatically. Google Cloud verifies that token directly against GitLab's identity infrastructure. No key files. No rotation schedule. No secrets in your variable store.
The setup is a one-time operation per GCP project:
# Create the Workload Identity Pool
gcloud iam workload-identity-pools create "gitlab-pool" --project="YOUR_PROJECT_ID" --location="global" --display-name="GitLab CI Runners"
# Configure the OIDC provider
gcloud iam workload-identity-pools providers create-oidc "gitlab-provider" --project="YOUR_PROJECT_ID" --location="global" --workload-identity-pool="gitlab-pool" --display-name="GitLab OIDC" --attribute-mapping="google.subject=assertion.sub,attribute.project_path=assertion.project_path" --issuer-uri="https://gitlab.com"
# Bind a service account to a specific GitLab project
gcloud iam service-accounts add-iam-policy-binding platform-deployer@YOUR_PROJECT_ID.iam.gserviceaccount.com --project="YOUR_PROJECT_ID" --role="roles/iam.workloadIdentityUser" --member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/gitlab-pool/attribute.project_path/your-org/your-repo" In the pipeline component, job authentication becomes a single block:
deploy:
id_tokens:
GITLAB_OIDC_TOKEN:
aud: https://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/gitlab-pool/providers/gitlab-provider
script:
- echo $GITLAB_OIDC_TOKEN > /tmp/token.json
- gcloud auth login --cred-file=/tmp/token.json
- gcloud run deploy ... The CI job gets an ephemeral GCP access token scoped to that job. When the job ends, the token expires. There's nothing to rotate and nothing to steal.
Scope bindings tightly. The attribute.project_path attribute in the IAM binding restricts authentication to jobs running from a specific GitLab project. Avoid using attribute.namespace_path (which covers all projects in a group) unless you genuinely need that breadth—it's a significantly larger attack surface.
Infrastructure as Code as a Service
Most platform teams build Terraform modules for their standard infrastructure patterns. Fewer take the next step: wrapping those modules in an interface simple enough that developers never touch Terraform directly.
The pattern: developers commit a declarative configuration file to their repo. A platform-owned pipeline reads that file and invokes the appropriate Terraform module. The developer never interacts with HCL, state backends, or provider configuration.
A developer-facing service declaration:
# infra/service.yaml
name: payments-api
team: payments
cost_center: "cc-payments-001"
tier: standard # standard | premium | batch
runtime:
memory: 512Mi
cpu: 1
max_instances: 20
min_instances: 2 # keep warm in production
database:
type: postgres
version: "15"
tier: db-g1-small # platform validates against tier limits
iam:
invokers:
- serviceAccount:[email protected] The platform pipeline translates this into Terraform and applies it. The underlying module:
module "cloud_run_service" {
source = "git::https://gitlab.company.com/platform/tf-modules//cloud-run?ref=v3.2.0"
service_name = var.name
container_image = var.image_uri
region = local.region
project_id = local.gcp_project
memory_mib = local.memory_mib
cpu = var.runtime.cpu
max_instances = var.runtime.max_instances
min_instances = var.runtime.min_instances
cloudsql_instances = var.database != null ? [module.postgres[0].connection_name] : []
invoker_members = var.iam.invokers
}
module "postgres" {
count = var.database != null ? 1 : 0
source = "git::https://gitlab.company.com/platform/tf-modules//cloud-sql-postgres?ref=v2.0.0"
instance_name = "${var.name}-db"
database_version = "POSTGRES_${var.database.version}"
tier = var.database.tier
project_id = local.gcp_project
} Developers get their infrastructure in one commit. The platform team controls the module versions centrally. When Google releases a new recommended configuration for Cloud Run, the platform updates the module—and every service picks it up on its next deploy without any action from the application team.
V. The Cultural Shift: Managing the Transition
The hardest part of building an IDP has nothing to do with Terraform or GitLab YAML. It's getting people to use it.
Treat Platform as a Product
Your platform team has internal customers. Treat it that way. This means:
- Running user research. Interview the teams that are supposed to use the platform. What's actually painful about their current workflow? What have they hacked together to work around the official tooling? Those workarounds are your backlog.
- Measuring adoption explicitly. Track what percentage of new services are created through the platform's golden path versus ad-hoc. Track time-to-first-production-deploy for new services. These are product metrics.
- Writing internal release notes. When you update a component or add a new module, announce it. Explain what changed and why. Make the upgrade path explicit. Your internal developers are first-class users.
- Providing a migration path for legacy services. The worst outcome is a two-tier system where new services use the platform and legacy services accumulate technical debt unchecked. Offer migration support. Make the before/after comparison concrete.
The fastest way to kill adoption: force teams to use the platform for everything immediately. The fastest way to build adoption: make the first 80% of what teams do significantly easier, and leave well-documented escape hatches for the unusual 20%.
Guardrails, Not Gates
A common failure mode: the platform team gets compliance requirements from security or legal, implements them as hard pipeline blocks, and then spends the next quarter fielding exceptions. Engineers learn to route around the platform rather than work with it.
The better model is policy as code with clear escalation paths. Open Policy Agent (OPA) lets you express infrastructure policies in a language (Rego) evaluated against your Terraform plans before apply, your Kubernetes manifests before admission, or your service declarations before the pipeline runs.
# policies/cloudrun.rego
package platform.cloudrun
# Block: Cloud SQL with public IP and no allow-list
deny[msg] {
input.resource.changes[r]
r.type == "google_sql_database_instance"
r.change.after.settings[0].ip_configuration[0].ipv4_enabled == true
count(r.change.after.settings[0].ip_configuration[0].authorized_networks) == 0
msg := sprintf(
"Cloud SQL instance '%v': public IP enabled with no authorized networks. Use private IP or add an IP allow-list.",
[r.address]
)
}
# Warn: oversized instances that likely weren't intentional
warn[msg] {
input.resource.changes[r]
r.type == "google_sql_database_instance"
r.change.after.settings[0].tier == "db-n1-standard-16"
msg := sprintf(
"Instance '%v' uses db-n1-standard-16 (~$800/month). Confirm this size is necessary before applying.",
[r.address]
)
} The critical distinction: deny rules block the pipeline. warn rules post a comment on the merge request and require an explicit override. Developers see the policy, understand why it exists, and can request an exception through a documented process—instead of discovering at 3am that their deploy is blocked by a rule nobody explained to them.
Measuring Platform ROI
You will eventually need to justify the platform team's existence to someone who wants to know why engineers are building tooling instead of shipping features. Measure these:
| Metric | Before IDP (typical) | Target with IDP |
|---|---|---|
| Time from "new service" to first production deploy | 5–10 days (tickets, reviews, manual setup) | < 4 hours |
| Platform team tickets per new service | 3–6 | 0 (self-service) |
| Security scan coverage (% of pipelines) | 30–50% (opt-in, forgotten) | 100% (default) |
| Mean time to rollback | 15–45 min (manual, documented poorly) | < 2 min (one-click in GitLab UI) |
| CI YAML lines per service repo | 150–400 | 3–10 (component includes) |
VI. The Payoff
Platform engineering is a long game. The returns compound: every team that adopts the golden path is one fewer source of ad-hoc infrastructure, one more pipeline getting automatic security scanning, one fewer service with a custom runbook that only two people understand.
The YAML fatigue problem doesn't go away by writing better YAML. It goes away when the engineers shipping features don't need to write or understand YAML at all. The platform team's job is to make that true—and to make the abstractions good enough that teams choose them voluntarily, not because they're forced to.
GitLab gives you the raw materials: component catalog, environment tracking, OIDC-based GCP authentication, a mature CI/CD engine, and a security dashboard that actually works when you feed it data. What it doesn't give you is the opinionated defaults, the internal marketing, or the patience to migrate 40 legacy services. Those require judgment that only your team has.
Where to Start
- Week 1: Audit your most common pipeline pattern. How many repos copy-paste some version of it? That's your first component candidate.
- Week 2–4: Build and publish that component in the catalog. Migrate three repos to it. Collect feedback before building anything else.
- Month 2: Set up Workload Identity Federation for at least one GCP project. Remove the first service account key from your variable store.
- Month 3: Write one OPA policy for your most-violated infrastructure rule. Make it a
warn, not adeny. Tune it based on the exceptions you actually see. - Month 4+: Build the developer-facing service declaration format. Start migrating new services to it first, then tackle the legacy backlog.
What does your CI/CD bottleneck look like right now? Where are developers hitting a wall and filing tickets instead of deploying themselves? That's usually the right place to start—and the answer is usually more specific than "our pipelines are slow."