Import from Kubernetes

In modern cloud-native environments, applications run across dozens of Kubernetes namespaces, each hosting multiple Deployments, Services, and other resources. Manually onboarding these Kubernetes workloads into the Harness Software Catalog quickly becomes error-prone and unsustainable.

This script helps you discover and add your Kubernetes resources to the Harness Internal Developer Portal catalog automatically. It's especially useful when you have many Kubernetes resources across multiple namespaces that would be time-consuming to add manually.

The script follows a comprehensive workflow:

Scans your Kubernetes cluster and finds all your resources
Generates IDP-compatible YAML files for each resource
Commits these files to a central Git repository (GitHub)
Registers them with Harness IDP through the Entities API

This workflow ensures you have version control for all your catalog entities and can track changes over time.

By automatically analyzing Deployments, Services, and their interdependencies, the script ensures the catalog reflects a near real-time view of your cluster without requiring manual intervention.

Script Source

Source Code

curl -o kubernetes-harness-idp-catalog-sync.py https://raw.githubusercontent.com/harness-community/idp-samples/refs/heads/main/IDP-2.0-Samples/catalog-scripts/kubernetes-harness-idp-catalog-sync.py

Pre-Requisites

Local Environment

This script is designed to run on your local machine or in a CI/CD pipeline with access to both your Kubernetes cluster and GitHub. You'll need:

Python 3 with the following libraries installed:

pip install requests python-dotenv kubernetes

Kubernetes Access

Access to your Kubernetes cluster via properly configured kubectl and kubeconfig
Permissions to list and get deployments, services, and other resources

For local use, make sure you're connected to the right cluster context:

kubectl config current-context
# If needed, switch context
kubectl config use-context <your-context>

A .env file configured with the following environment variables:

HARNESS_API_KEY = '<harness-api-key>'
HARNESS_ACCOUNT_ID = '<harness-account-id>'
ORG_IDENTIFIER = '<harness-org-id>'
PROJECT_IDENTIFIER = '<harness-project-id>'
CONNECTOR_REF = '<harness-git-connector-ref>'
CENTRAL_REPO = '<name-of-central-repo-to-store-yamls>'
GITHUB_TOKEN = '<github-token>'
GITHUB_ORG = '<github-org-name>'

HARNESS_API_KEY must have write access to IDP entities. GITHUB_TOKEN requires repo and read:org scopes. CONNECTOR_REF is the Git connector reference in Harness pointing to your CENTRAL_REPO.

Execution

Run the script with namespace and dependency analysis flags:

python3 kubernetes-harness-idp-catalog-sync.py --namespace <namespace-name> --analyze-dependencies

Options:

--namespace (optional): Limit discovery to a specific namespace. Defaults to all namespaces.
--resource-kind (optional): Filter by resource type (Deployment, Service, Pod). Defaults to Deployments and Services.
--analyze-dependencies (flag): Enables detection of service-to-deployment dependencies based on selectors and environment variables.

What the Script Does

Connects to your Kubernetes cluster using kubeconfig
Discovers Deployments, Services (and optionally Pods)
Generates Harness-compatible idp.yaml for each resource
Pushes each YAML file into a GitHub central repo at a structured path
Registers the entity in Harness IDP via the Entities API

Resource Discovery Logic

The script intelligently discovers Kubernetes resources using the official Kubernetes Python client:

# Initialize API client
v1 = client.CoreV1Api()
apps_v1 = client.AppsV1Api()

It uses different API methods based on resource types and filters:

For Deployments: apps_v1.list_namespaced_deployment() or apps_v1.list_deployment_for_all_namespaces()
For Services: v1.list_namespaced_service() or v1.list_service_for_all_namespaces()

Each resource is extracted with its complete metadata including:

Name, namespace, kind
Labels and selectors
Environment variables (for Deployments)

Dependency Detection Mechanism

The script employs two sophisticated methods to detect dependencies between resources:

Service-to-Deployment Mapping:

# Check if deployment labels match service selector
if service_selector and all(deployment_labels.get(k) == v 
   for k, v in service_selector.items() if k in deployment_labels):
    implementing_deployments.append(resource["name"])

This identifies which Deployments implement each Service by comparing Service selectors with Deployment labels.

Environment Variable Analysis:

# Look for service name in environment variables
if service_name.lower() in env_value.lower():
    dependencies.append({
        "name": service_name,
        "identifier": service_id,
        "type": "Service"
    })

This detects when one resource references another via environment variables, revealing implicit dependencies.

YAML Generation and Entity Creation

The script dynamically generates Harness-compatible entity definitions with these key features:

Deterministic Identifiers:

# Generate a unique Harness identifier based on resource metadata
name = f"{resource_namespace}_{resource_kind}_{resource_name}".lower()

Creates stable, consistent identifiers using namespace, kind, and name.

Rich Metadata:

metadata:
  description: "Kubernetes {resource['kind']} {resource['name']} in namespace {resource['namespace']}"
  tags:
    - kubernetes
    - auto-onboarded
    - {resource['namespace']}
    - {resource['kind'].lower()}

Includes descriptive information and automatic tagging.

Dependency Relationships:

# Add dependsOn section as a proper array
idp_yaml += '\n  dependsOn:'
for dep in dependencies:
    idp_yaml += f'\n    - Component:{dep["name"]}'

Maps the discovered dependencies into Harness relationship format.

GitHub Integration

The script interfaces with GitHub's API to store entity definitions:

Path Organization:

file_path = f"{resource['namespace']}/{resource['kind'].lower()}/{resource['name']}/idp.yaml"

Creates a logical folder structure based on Kubernetes hierarchy.

Smart File Operations:

# Check if file exists and get its SHA
file_sha = None
try:
    check_response = requests.get(url, headers=GITHUB_HEADERS)
    if check_response.status_code == 200:
        file_sha = check_response.json()["sha"]

Checks if files already exist before creating or updating them.

Harness Catalog Registration

The script registers entities with Harness using the Entities API:

API Integration:

harness_url = (
    f"https://qa.harness.io/v1/entities"
    f"?convert=false&dry_run=false"
    f"&orgIdentifier={ORG_IDENTIFIER}&projectIdentifier={PROJECT_IDENTIFIER}"
)

Intelligent Retries:

# If entity already exists, try with operationMode=UPSERT
if response.status_code == 400 and ("already exists" in response.text.lower()):
    print("Trying with UPSERT mode...")
    
    # Try with UPSERT mode
    upsert_url = (
        f"https://qa.harness.io/v1/entities"
        f"?convert=false&dry_run=false&operationMode=UPSERT"
        f"&orgIdentifier={ORG_IDENTIFIER}&projectIdentifier={PROJECT_IDENTIFIER}"
    )

The script automatically retries with UPSERT mode if entities already exist.

Output Structure

The GitHub repo will store files in the following format:

central-repo/
  ├── <namespace>/
  │   ├── deployment/
  │   │   └── my-deployment/idp.yaml
  │   └── service/
  │       └── my-service/idp.yaml

Each YAML will look like:

apiVersion: harness.io/v1
kind: component
orgIdentifier: default
projectIdentifier: sd2
type: Service
identifier: default_deployment_myapp
name: "myapp"
owner: group:account/IDP_Test
spec:
  dependsOn:
    - Component:my-service   # automatically detected
  lifecycle: production
  type: kubernetes
  subtype: Deployment
metadata:
  description: "Kubernetes Deployment myapp in namespace default"
  tags:
    - kubernetes
    - auto-onboarded
    - default
    - deployment

Logs & Troubleshooting

Logs are printed to stdout for each resource:
- Discovery status (Found N resources)
- Entity creation (✓ Registered in Harness successfully)
- Dependency detection (Detected dependency: ...)
Failures include Harness API error codes and response details.
If entity already exists, the script automatically retries with UPSERT mode.
If running on a personal GitHub account instead of an org, change the GitHub API call from:

https://api.github.com/orgs/{GITHUB_ORG}/repos

to:

https://api.github.com/users/{GITHUB_ORG}/repos

Script Source​

Pre-Requisites​

Local Environment​

Kubernetes Access​

Execution​

What the Script Does​

Resource Discovery Logic​

Dependency Detection Mechanism​

YAML Generation and Entity Creation​

GitHub Integration​

Harness Catalog Registration​

Output Structure​

Logs & Troubleshooting​