Skip to content

Conversation

nicolevanderhoeven
Copy link
Contributor

@nicolevanderhoeven nicolevanderhoeven commented Aug 25, 2025

This PR takes the existing CI tests for the Helm chart and copies them over into the new testing framework based on helm-chart-toolbox. The goal is to get the new tests running in parallel with the old tests until we decide that we can safely retire the old testing framework.

The new tests include:

  • distributed-advanced: Complex multi-tenant microservices architecture
  • enterprise: Licensing and enterprise authentication
  • ingress: routing and k8s networking
  • single-binary: simple monolithic deployment
  • ssd: simple scalable deployment

There were some tests from the old framework that I didn't move over, namely:

  • The distributed test, which had been disabled anyway because it's multi-node and the old framework couldn't test that (to be added back later in the new framework)
  • Legacy monitoring values, which uses Grafana Agent (now deprecated).

Copy link

claude bot commented Aug 25, 2025

Claude finished @nicolevanderhoeven's task —— View job


PR Review: Copy Helm CI tests into new framework

I've completed a comprehensive review of this PR that migrates Helm CI tests to the new helm-chart-toolbox framework. Here's my analysis:

Todo List:

  • Read and analyze all changed files
  • Review ingress test configuration
  • Review single-binary test configuration
  • Review SSD test configuration
  • Check for consistency across test plans
  • Identify potential bugs or issues
  • Assess security considerations
  • Provide comprehensive feedback

Review Findings

Positive Aspects

  1. Well-structured migration: The test organization follows a clear pattern with separate directories for each deployment mode (ingress, single-binary, ssd).

  2. Consistent test plan format: All test plans follow the helm-chart-toolbox API specification properly with consistent structure.

  3. Appropriate resource checks: Each test validates the correct Kubernetes objects for its deployment mode:

    • Ingress test checks for Ingress resource
    • Single-binary checks for StatefulSet (loki) and associated services
    • SSD checks for distributed components (backend, read, write as separate resources)
  4. Good test coverage: Tests include both Kubernetes object validation and query functionality testing.

🔍 Issues and Recommendations

Critical Issues
  1. Inconsistent log-generator URLs (production/helm/loki/test/integration/single-binary/log-generator.yaml:47 and production/helm/loki/test/integration/ssd/log-generator.yaml:47)

    • Single-binary uses: http://loki.loki.svc:3100/loki/api/v1/push
    • SSD uses: http://loki-gateway.loki.svc/loki/api/v1/push
    • Issue: Single-binary should also use the gateway service for consistency and proper routing.
  2. Inconsistent query endpoints (production/helm/loki/test/integration/single-binary/test-plan.yaml:76 vs production/helm/loki/test/integration/ssd/test-plan.yaml:88)

    • Single-binary: http://loki.loki.svc:3100/loki/api/v1/query
    • SSD: http://loki-gateway.loki.svc/loki/api/v1/query
    • Issue: Similar inconsistency in query endpoints.
Minor Issues
  1. Missing log-generator dependency: The ingress test doesn't include a log-generator dependency, so the query test count_over_time({source="log-generator"}[1h]) referenced in other tests wouldn't work for ingress testing.

  2. Hardcoded tenant ID: All configurations use tenant ID "1" hardcoded. While this works for testing, consider if this should be configurable.

  3. Resource naming inconsistency:

    • Single-binary test expects StatefulSet named loki
    • SSD test expects StatefulSet named loki-backend, loki-write but Deployment named loki-read
    • Verify these match the actual Helm chart resource names.

🛡️ Security Considerations

  • Low Risk: The configurations use test schemas and minimal security configs, which is appropriate for testing
  • Good Practice: Tenant isolation is maintained with explicit tenant IDs
  • Consideration: The use of X-Scope-OrgID header is correct for multi-tenant testing

🚀 Performance Considerations

  • Good: chunksCache.allocatedMemory=1024 is set appropriately for test environments
  • Good: Replica counts are set to 1 for test efficiency
  • Good: Test schema usage (useTestSchema: true) optimizes for testing scenarios

📋 Test Coverage Assessment

  • Excellent: Tests cover both infrastructure (Kubernetes objects) and functionality (LogQL queries)
  • Good: Each deployment mode has appropriate resource validation
  • Missing: Integration testing between components (e.g., gateway → backend communication)

Recommendations

  1. Fix URL inconsistencies: Standardize on using gateway service URLs for both log ingestion and queries across all test configurations.

  2. Add log-generator to ingress test: Include the log-generator dependency in the ingress test plan if query testing is desired.

  3. Validate resource names: Ensure the expected Kubernetes resource names in test plans match what the Helm chart actually creates.

  4. Consider parameterization: Make tenant IDs and other hardcoded values configurable if this framework will be extended.

Overall, this is a solid migration that maintains good testing practices while transitioning to the new framework. The identified issues are mostly consistency problems that should be addressed before merging.

@nicolevanderhoeven nicolevanderhoeven changed the title Copy Helm CI tests into new framework test: Copy Helm CI tests into new framework Aug 25, 2025
Copy link
Contributor

Helm Diff Output - Summary

Default Values Scenario-diff-output
Distributed Scenario-diff-output
Ingress Values Scenario-diff-output
Legacy Monitoring Values Scenario-diff-output
Simple Scalable AWS Kube IRSA Values Scenario-diff-output
Simple Thanos Values Scenario-diff-output
Single Binary Scenario-diff-output

@pull-request-size pull-request-size bot added size/XL and removed size/L labels Sep 2, 2025
@nicolevanderhoeven nicolevanderhoeven marked this pull request as ready for review September 3, 2025 14:46
@nicolevanderhoeven nicolevanderhoeven requested a review from a team as a code owner September 3, 2025 14:46
@QuentinBisson
Copy link
Contributor

QuentinBisson commented Sep 4, 2025

This is some really great work and this will definitely improve the maintenance. LGTM

@Jayclifford345 Jayclifford345 merged commit 62cec3e into main Sep 5, 2025
83 checks passed
@Jayclifford345 Jayclifford345 deleted the nvdh-helm-testing branch September 5, 2025 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants