Skip to content

Conversation

HynoR
Copy link
Contributor

@HynoR HynoR commented Jul 25, 2025

This pull request introduces support for ClickHouse as a database option, primarily for log storage, along with necessary adjustments to accommodate its unique features. The changes span configuration, database initialization, and migration logic.

Ready for review

添加日志数据库对 clickhouse 的支持,适用于大规模的站点存储日志。
消费日志都可以存里面,ck 数据库更适合日志存储、压缩和分析,而且可以根据修改 TTL 来自动清理过时日志。

docker-compose 部署

version: '3.8' 

services:
  new-api:
    image:   calciumion/new-api:dev-ch # 请自行编译打包docker
    container_name: new-api
    restart: always
    command: --log-dir /app/logs
    ports:
      - "3000:3000"
    volumes:
      - ./data:/data
      - ./logs:/app/logs
    environment:
      - SQL_DSN=root:123456@tcp(mysql:3306)/new-api
      - REDIS_CONN_STRING=redis://redis
      - TZ=Asia/Shanghai
      - LOG_SQL_DSN=clickhouse://admin:123456@clickhouse-db:9000/default
      - ERROR_LOG_ENABLED=true
    depends_on:
      redis:
        condition: service_started
      mysql:
        condition: service_healthy
      clickhouse-db:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "wget -q -O - http://localhost:3000/api/status | grep -q '\"success\":true'"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s 

  redis:
    image: redis:latest
    container_name: redis
    restart: always

  mysql:
    image: mysql:8.2
    container_name: mysql
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: 123456
      MYSQL_DATABASE: new-api
    volumes:
      - mysql_data:/var/lib/mysql
    healthcheck:
      test: ["CMD", "mysqladmin" ,"ping", "-h", "localhost", "-u", "root", "-p123456"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

  clickhouse-db:
    image: clickhouse/clickhouse-server:latest
    container_name: clickhouse-db
    restart: always
    environment:
      CLICKHOUSE_DB: default
      CLICKHOUSE_USER: admin
      CLICKHOUSE_PASSWORD: 123456
    volumes:
      - clickhouse_data:/var/lib/clickhouse
      - clickhouse_logs:/var/log/clickhouse-server
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8123/ping"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

volumes:
  mysql_data:
  clickhouse_data:
  clickhouse_logs:

Summary by CodeRabbit

  • New Features

    • Added ClickHouse support for log storage with driver integration, tailored migrations, partitioning, TTL and optimized indexes.
  • Bug Fixes

    • Updated log query ordering and token-sum expressions to produce correct results across ClickHouse and other databases.
  • Chores

    • Updated dependencies to add ClickHouse drivers and related telemetry, compression, and utility libraries.

Copy link
Contributor

coderabbitai bot commented Jul 25, 2025

Walkthrough

ClickHouse support was added for the log database: new constant and flag in common, ClickHouse driver and deps in go.mod, ClickHouse-aware ordering/SQL in log queries, and ClickHouse-specific DB initialization and LOG migration logic in model/main.go.

Changes

Cohort / File(s) Change Summary
Common defs
common/database.go
Added DatabaseTypeClickHouse = "clickhouse" constant and var UsingClickHouse = false.
Dependencies
go.mod
Added gorm.io/driver/clickhouse@v0.6.0 and multiple ClickHouse/telemetry/compression-related indirect deps; updated versions.
Log queries
model/log.go
Use created_at DESC ordering when UsingClickHouse; use coalesce(...) for null-coalescing in SumUsedToken for ClickHouse.
DB init & migrations
model/main.go
Detect clickhouse:// DSNs for log DB, open GORM with ClickHouse driver (prepared stmts disabled, FKs off), set UsingClickHouse, apply ClickHouse table options (MergeTree, PARTITION BY month, TTL), create indices, and adjust log-column quoting for ClickHouse.

Sequence Diagram(s)

sequenceDiagram
    participant App
    participant Model
    participant Common
    participant ClickHouse
    participant SQLDB

    App->>Model: chooseDB(envName, isLog)
    Model->>Common: read DSN/type
    alt DSN starts with clickhouse:// and isLog
        Model->>ClickHouse: open GORM (clickhouse driver, prep stmts off)
        Model->>Common: UsingClickHouse = true
        Model->>ClickHouse: migrateLOGDB (CREATE MergeTree, PARTITION, TTL, indexes)
    else
        Model->>SQLDB: open GORM with standard driver
        Model->>SQLDB: migrateLOGDB (AutoMigrate)
    end

    App->>Model: Log queries (GetLogByKey / GetAllLogs / Search / SumUsedToken)
    alt UsingClickHouse
        Model->>ClickHouse: query with ORDER BY created_at DESC / use coalesce()
    else
        Model->>SQLDB: query with ORDER BY id DESC / use ifnull()
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

I nibble code at break of dawn,
ClickHouse seeds in rows well-sown.
MergeTree dances, TTL alight,
Coalesce hums through log-filled night.
A rabbit hops — migrations done. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@HynoR HynoR changed the base branch from main to alpha July 25, 2025 10:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
model/log.go (1)

369-376: Consider supporting PostgreSQL's coalesce function

PostgreSQL also supports coalesce instead of ifnull. Consider updating the condition to include PostgreSQL for better database compatibility.

 func SumUsedToken(logType int, startTimestamp int64, endTimestamp int64, modelName string, username string, tokenName string) (token int) {
 	var selectSQL string
-	if common.LogSqlType == common.DatabaseTypeClickHouse {
+	if common.LogSqlType == common.DatabaseTypeClickHouse || common.LogSqlType == common.DatabaseTypePostgreSQL {
 		selectSQL = "coalesce(sum(prompt_tokens),0) + coalesce(sum(completion_tokens),0)"
 	} else {
 		selectSQL = "ifnull(sum(prompt_tokens),0) + ifnull(sum(completion_tokens),0)"
 	}
-	
 	tx := LOG_DB.Table("logs").Select(selectSQL)
model/main.go (1)

329-336: Consider adding TTL support for automatic log cleanup

Since the PR objectives mention that ClickHouse supports automatic cleanup through TTL settings, consider adding TTL configuration to the table options for automatic log retention management.

Example enhancement:

// Add TTL for automatic log cleanup after a configurable period
ttlDays := common.GetEnvOrDefault("LOG_RETENTION_DAYS", 90)
tableOptions := fmt.Sprintf("ENGINE=MergeTree() ORDER BY (created_at, id) PARTITION BY toYYYYMM(toDateTime(created_at)) TTL toDateTime(created_at) + INTERVAL %d DAY", ttlDays)
err = LOG_DB.Set("gorm:table_options", tableOptions).AutoMigrate(&Log{})
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4d0037a and 53236b2.

📒 Files selected for processing (4)
  • common/database.go (1 hunks)
  • go.mod (2 hunks)
  • model/log.go (1 hunks)
  • model/main.go (4 hunks)
🔇 Additional comments (2)
common/database.go (1)

7-7: LGTM!

The ClickHouse database type constant and flag follow the established pattern for database support.

Also applies to: 14-14

model/main.go (1)

329-336: Ensure ClickHouse toDateTime matches your timestamp unit
ClickHouse’s toDateTime(UInt32) interprets its argument as seconds since the Unix epoch. Your Log.CreatedAt field is an int64—please confirm whether you’re populating it in seconds or milliseconds.

• If it’s already in seconds, the existing expression is correct.
• If it’s in milliseconds, adjust to:

ENGINE = MergeTree()
ORDER BY (created_at, id)
PARTITION BY toYYYYMM(toDateTime(created_at / 1000))

Locations to review:

  • model/log.goLog.CreatedAt int64
  • model/main.go → migration snippet with toDateTime(created_at)

Please verify and update as needed.

@HynoR HynoR marked this pull request as ready for review July 25, 2025 10:41
@Calcium-Ion
Copy link
Collaborator

冲突了

@HynoR
Copy link
Contributor Author

HynoR commented Aug 9, 2025

太久了,晚点我合并一下。

@HynoR
Copy link
Contributor Author

HynoR commented Aug 12, 2025

冲突解决了

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
model/main.go (1)

158-172: Validate ClickHouse main database restriction and connection configuration.

The implementation correctly restricts ClickHouse to log databases only with appropriate panic handling. The GORM configuration properly disables prepared statements and foreign key constraints for ClickHouse compatibility.

The previous review comment about documenting the ClickHouse DSN format still applies. Users need clear documentation on the connection string format.

🧹 Nitpick comments (2)
model/main.go (2)

201-201: Fix inconsistent comment formatting.

The comment has inconsistent formatting and appears to be incomplete or misplaced.

-			// For Gorm NewVersion:	DisableForeignKeyConstraintWhenMigrating: true,  Disable FK constraints during migration
+			// Note: DisableForeignKeyConstraintWhenMigrating can be enabled if needed for migration compatibility

403-403: Enhance logging message clarity.

The success message could be more informative about what was actually configured.

-		common.SysLog(fmt.Sprintf("ClickHouse log database migrated with optimized MergeTree engine, compression, %d days TTL, and indices", retentionDays))
+		common.SysLog(fmt.Sprintf("ClickHouse log database migrated successfully: MergeTree engine with monthly partitioning, %d-day TTL, compression enabled, and secondary indices created", retentionDays))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e3c863 and de080aa.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • go.mod (2 hunks)
  • model/main.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod
🧰 Additional context used
🧬 Code Graph Analysis (1)
model/main.go (3)
common/database.go (3)
  • DatabaseTypeClickHouse (7-7)
  • LogSqlType (12-12)
  • UsingClickHouse (14-14)
common/logger.go (1)
  • SysLog (50-53)
model/log.go (1)
  • Log (17-37)
🔇 Additional comments (3)
model/main.go (3)

14-14: LGTM! ClickHouse driver import added correctly.

The GORM ClickHouse driver import is properly positioned with other database drivers.


46-49: LGTM! ClickHouse column quoting configured correctly.

The column quoting for ClickHouse uses backticks similar to MySQL, which aligns with ClickHouse's SQL syntax requirements for reserved keywords like group and key.


404-408: LGTM! Proper fallback for non-ClickHouse databases.

The else branch correctly maintains the existing AutoMigrate behavior for traditional databases, ensuring backward compatibility.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
model/main.go (1)

158-172: Document the ClickHouse DSN format where users will see it

This code assumes a URI-style DSN (clickhouse://user:pass@host:port/db?params...) for GORM’s clickhouse.Open. Please add a short example and guidance in README or configuration docs so operators don’t misconfigure.

Would you like me to draft the README block for DSN examples and common parameters (dial_timeout, read_timeout)?

🧹 Nitpick comments (2)
model/main.go (2)

375-385: Revisit ORDER BY for pruning and compression (time bucketing and selective dims first)

ORDER BY (toDateTime(created_at), user_id) places time first and lacks a time bucket. For log analytics, you typically get better pruning by:

  • Putting low-cardinality filters (e.g., user_id/tenant/service) first
  • Adding a time bucket before the raw timestamp
  • Ending with raw timestamp (and optionally a unique column)

If user_id is a common filter, consider:

-	ORDER BY (toDateTime(created_at), user_id)
+	ORDER BY (user_id, toStartOfHour(toDateTime(created_at)), created_at)

If most queries are time-only, a simpler improvement is to drop the cast in ORDER BY (keep it numeric for index efficiency):

-	ORDER BY (toDateTime(created_at), user_id)
+	ORDER BY (created_at, user_id)

Pick the key that best matches your WHERE patterns; verify with EXPLAIN and real workloads.


393-403: Index strategy: good choices, but decide failure policy for critical indices

Bloom filter on user_id and set index on model_name are appropriate for equality filters. You currently warn on failure; decide whether any index creation failure should abort migration (e.g., for user_id, if queries depend on it).

Optionally fail migration if creating the user_id bloom index fails:

-		for _, index := range indices {
-			if err = LOG_DB.Exec(index).Error; err != nil {
-				common.SysLog(fmt.Sprintf("Warning: Failed to create index: %s, error: %v", index, err))
-			}
-		}
+		for _, index := range indices {
+			if err = LOG_DB.Exec(index).Error; err != nil {
+				if strings.Contains(index, "idx_logs_user_bloom") {
+					return fmt.Errorf("failed to create critical index: %s, error: %w", index, err)
+				}
+				common.SysLog(fmt.Sprintf("Warning: Failed to create non-critical index: %s, error: %v", index, err))
+			}
+		}

Note: strings is already imported in this file.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de080aa and b015b7a.

📒 Files selected for processing (1)
  • model/main.go (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
model/main.go (3)
common/database.go (3)
  • DatabaseTypeClickHouse (7-7)
  • LogSqlType (12-12)
  • UsingClickHouse (14-14)
common/logger.go (1)
  • SysLog (50-53)
model/log.go (1)
  • Log (17-37)
🔇 Additional comments (6)
model/main.go (6)

14-14: Add ClickHouse driver import — looks good

Import path is correct for GORM’s ClickHouse driver and aligns with the new DSN handling.


46-49: Correct quoting for ClickHouse reserved identifiers

Backtick-quoting group/key for ClickHouse is appropriate (both group and key can conflict with SQL keywords).


201-201: No action needed — comment-only change

This is just a comment; no functional impact.


371-406: ClickHouse MergeTree + TTL migration: baseline is solid

Engine selection, TTL, partitioning, and disabling FKs for GORM migrations are correct for ClickHouse. Good separation from non-ClickHouse path.


387-391: AutoMigrate won’t alter existing table engine/ORDER/TTL — verify first-run creation

If logs table already exists (from prior versions), AutoMigrate will not modify engine/ORDER BY/TTL. Ensure ClickHouse options are applied on first creation (clean install) or add an explicit path to recreate/alter when upgrading.

Would you like a guarded migration that checks existing table DDL (SHOW CREATE TABLE logs) and compares ORDER BY/TTL/engine, then prints actionable warnings or fails with a clear message if mismatched?


375-385: All CreatedAt timestamps use seconds; no TTL unit mismatch
The search shows every CreatedAt assignment comes from time.Unix(…,0) or common.GetTimestamp() (both in seconds), not milliseconds. The toDateTime(created_at) in your DDL will therefore be correct—no change needed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
model/main.go (1)

158-172: Document ClickHouse DSN format in project documentation.

The implementation correctly restricts ClickHouse to log databases only. Based on the GORM ClickHouse driver v0.6.1 documentation, the DSN format should be clearly documented for users.

Add this to your README or configuration documentation:

# ClickHouse DSN Format
clickhouse://user:password@host:port/database?param1=value1&param2=value2

Example:
LOG_SQL_DSN=clickhouse://admin:123456@clickhouse-db:9000/default?dial_timeout=10s&read_timeout=20s
🧹 Nitpick comments (3)
model/main.go (3)

201-201: Remove or update the outdated comment.

The comment references a "new version" feature that appears to already be in use. Either remove this comment or clarify what specific GORM version introduced this feature.

-			// For Gorm NewVersion:	DisableForeignKeyConstraintWhenMigrating: true,  Disable FK constraints during migration

373-373: Consider making retention configurable per model or user tier.

While the environment variable LOG_RETENTION_DAYS provides basic configurability, consider implementing tiered retention policies (e.g., longer retention for premium users or specific model types) for better flexibility.


405-405: Enhance log message with index creation status.

Consider logging which indices were successfully created vs. failed to provide better visibility into the migration outcome.

-		common.SysLog(fmt.Sprintf("ClickHouse log database migrated with optimized MergeTree engine, compression, %d days TTL, and indices", retentionDays))
+		common.SysLog(fmt.Sprintf("ClickHouse log database migrated with optimized MergeTree engine, compression, %d days TTL", retentionDays))

Add a counter to track successful index creation:

successCount := 0
for _, index := range indices {
    if err = LOG_DB.Exec(index).Error; err != nil {
        common.SysLog(fmt.Sprintf("Warning: Failed to create index: %s, error: %v", index, err))
    } else {
        successCount++
    }
}
common.SysLog(fmt.Sprintf("Created %d/%d indices successfully", successCount, len(indices)))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de080aa and b015b7a.

📒 Files selected for processing (1)
  • model/main.go (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
model/main.go (3)
common/database.go (3)
  • DatabaseTypeClickHouse (7-7)
  • LogSqlType (12-12)
  • UsingClickHouse (14-14)
common/logger.go (1)
  • SysLog (50-53)
model/log.go (1)
  • Log (17-37)
🔇 Additional comments (4)
model/main.go (4)

14-14: LGTM! ClickHouse driver import added correctly.

The import is properly placed and follows the existing import structure.


46-48: LGTM! ClickHouse column quoting correctly configured.

The backtick quoting for group and key columns is appropriate for ClickHouse, as it follows MySQL-style quoting conventions.


393-403: LGTM! Appropriate index types for ClickHouse.

The bloom filter index for user_id and set index for model_name are well-suited for their respective cardinality and query patterns. The error handling with warnings is appropriate for non-critical index creation.


376-385: Verify partition strategy aligns with query patterns.

Monthly partitioning via toYYYYMM is appropriate for multi-year retention. However, ensure this aligns with your typical query patterns. If most queries span days rather than months, consider daily partitioning for better partition pruning.

Would you like to analyze your typical query patterns to determine the optimal partitioning strategy? I can help generate monitoring queries to track partition efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants