-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Feat: Support ClickHouse For Log Storage #1435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: alpha
Are you sure you want to change the base?
Conversation
# Conflicts: # go.mod # go.sum
WalkthroughClickHouse support was added for the log database: new constant and flag in Changes
Sequence Diagram(s)sequenceDiagram
participant App
participant Model
participant Common
participant ClickHouse
participant SQLDB
App->>Model: chooseDB(envName, isLog)
Model->>Common: read DSN/type
alt DSN starts with clickhouse:// and isLog
Model->>ClickHouse: open GORM (clickhouse driver, prep stmts off)
Model->>Common: UsingClickHouse = true
Model->>ClickHouse: migrateLOGDB (CREATE MergeTree, PARTITION, TTL, indexes)
else
Model->>SQLDB: open GORM with standard driver
Model->>SQLDB: migrateLOGDB (AutoMigrate)
end
App->>Model: Log queries (GetLogByKey / GetAllLogs / Search / SumUsedToken)
alt UsingClickHouse
Model->>ClickHouse: query with ORDER BY created_at DESC / use coalesce()
else
Model->>SQLDB: query with ORDER BY id DESC / use ifnull()
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
model/log.go (1)
369-376
: Consider supporting PostgreSQL's coalesce functionPostgreSQL also supports
coalesce
instead ofifnull
. Consider updating the condition to include PostgreSQL for better database compatibility.func SumUsedToken(logType int, startTimestamp int64, endTimestamp int64, modelName string, username string, tokenName string) (token int) { var selectSQL string - if common.LogSqlType == common.DatabaseTypeClickHouse { + if common.LogSqlType == common.DatabaseTypeClickHouse || common.LogSqlType == common.DatabaseTypePostgreSQL { selectSQL = "coalesce(sum(prompt_tokens),0) + coalesce(sum(completion_tokens),0)" } else { selectSQL = "ifnull(sum(prompt_tokens),0) + ifnull(sum(completion_tokens),0)" } - tx := LOG_DB.Table("logs").Select(selectSQL)model/main.go (1)
329-336
: Consider adding TTL support for automatic log cleanupSince the PR objectives mention that ClickHouse supports automatic cleanup through TTL settings, consider adding TTL configuration to the table options for automatic log retention management.
Example enhancement:
// Add TTL for automatic log cleanup after a configurable period ttlDays := common.GetEnvOrDefault("LOG_RETENTION_DAYS", 90) tableOptions := fmt.Sprintf("ENGINE=MergeTree() ORDER BY (created_at, id) PARTITION BY toYYYYMM(toDateTime(created_at)) TTL toDateTime(created_at) + INTERVAL %d DAY", ttlDays) err = LOG_DB.Set("gorm:table_options", tableOptions).AutoMigrate(&Log{})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
common/database.go
(1 hunks)go.mod
(2 hunks)model/log.go
(1 hunks)model/main.go
(4 hunks)
🔇 Additional comments (2)
common/database.go (1)
7-7
: LGTM!The ClickHouse database type constant and flag follow the established pattern for database support.
Also applies to: 14-14
model/main.go (1)
329-336
: Ensure ClickHousetoDateTime
matches your timestamp unit
ClickHouse’stoDateTime(UInt32)
interprets its argument as seconds since the Unix epoch. YourLog.CreatedAt
field is anint64
—please confirm whether you’re populating it in seconds or milliseconds.• If it’s already in seconds, the existing expression is correct.
• If it’s in milliseconds, adjust to:ENGINE = MergeTree() ORDER BY (created_at, id) PARTITION BY toYYYYMM(toDateTime(created_at / 1000))Locations to review:
model/log.go
→Log.CreatedAt int64
model/main.go
→ migration snippet withtoDateTime(created_at)
Please verify and update as needed.
冲突了 |
太久了,晚点我合并一下。 |
冲突解决了 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
model/main.go (1)
158-172
: Validate ClickHouse main database restriction and connection configuration.The implementation correctly restricts ClickHouse to log databases only with appropriate panic handling. The GORM configuration properly disables prepared statements and foreign key constraints for ClickHouse compatibility.
The previous review comment about documenting the ClickHouse DSN format still applies. Users need clear documentation on the connection string format.
🧹 Nitpick comments (2)
model/main.go (2)
201-201
: Fix inconsistent comment formatting.The comment has inconsistent formatting and appears to be incomplete or misplaced.
- // For Gorm NewVersion: DisableForeignKeyConstraintWhenMigrating: true, Disable FK constraints during migration + // Note: DisableForeignKeyConstraintWhenMigrating can be enabled if needed for migration compatibility
403-403
: Enhance logging message clarity.The success message could be more informative about what was actually configured.
- common.SysLog(fmt.Sprintf("ClickHouse log database migrated with optimized MergeTree engine, compression, %d days TTL, and indices", retentionDays)) + common.SysLog(fmt.Sprintf("ClickHouse log database migrated successfully: MergeTree engine with monthly partitioning, %d-day TTL, compression enabled, and secondary indices created", retentionDays))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sum
is excluded by!**/*.sum
📒 Files selected for processing (2)
go.mod
(2 hunks)model/main.go
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- go.mod
🧰 Additional context used
🧬 Code Graph Analysis (1)
model/main.go (3)
common/database.go (3)
DatabaseTypeClickHouse
(7-7)LogSqlType
(12-12)UsingClickHouse
(14-14)common/logger.go (1)
SysLog
(50-53)model/log.go (1)
Log
(17-37)
🔇 Additional comments (3)
model/main.go (3)
14-14
: LGTM! ClickHouse driver import added correctly.The GORM ClickHouse driver import is properly positioned with other database drivers.
46-49
: LGTM! ClickHouse column quoting configured correctly.The column quoting for ClickHouse uses backticks similar to MySQL, which aligns with ClickHouse's SQL syntax requirements for reserved keywords like
group
andkey
.
404-408
: LGTM! Proper fallback for non-ClickHouse databases.The else branch correctly maintains the existing AutoMigrate behavior for traditional databases, ensuring backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
model/main.go (1)
158-172
: Document the ClickHouse DSN format where users will see itThis code assumes a URI-style DSN (clickhouse://user:pass@host:port/db?params...) for GORM’s clickhouse.Open. Please add a short example and guidance in README or configuration docs so operators don’t misconfigure.
Would you like me to draft the README block for DSN examples and common parameters (dial_timeout, read_timeout)?
🧹 Nitpick comments (2)
model/main.go (2)
375-385
: Revisit ORDER BY for pruning and compression (time bucketing and selective dims first)ORDER BY (toDateTime(created_at), user_id) places time first and lacks a time bucket. For log analytics, you typically get better pruning by:
- Putting low-cardinality filters (e.g., user_id/tenant/service) first
- Adding a time bucket before the raw timestamp
- Ending with raw timestamp (and optionally a unique column)
If user_id is a common filter, consider:
- ORDER BY (toDateTime(created_at), user_id) + ORDER BY (user_id, toStartOfHour(toDateTime(created_at)), created_at)If most queries are time-only, a simpler improvement is to drop the cast in ORDER BY (keep it numeric for index efficiency):
- ORDER BY (toDateTime(created_at), user_id) + ORDER BY (created_at, user_id)Pick the key that best matches your WHERE patterns; verify with EXPLAIN and real workloads.
393-403
: Index strategy: good choices, but decide failure policy for critical indicesBloom filter on user_id and set index on model_name are appropriate for equality filters. You currently warn on failure; decide whether any index creation failure should abort migration (e.g., for user_id, if queries depend on it).
Optionally fail migration if creating the user_id bloom index fails:
- for _, index := range indices { - if err = LOG_DB.Exec(index).Error; err != nil { - common.SysLog(fmt.Sprintf("Warning: Failed to create index: %s, error: %v", index, err)) - } - } + for _, index := range indices { + if err = LOG_DB.Exec(index).Error; err != nil { + if strings.Contains(index, "idx_logs_user_bloom") { + return fmt.Errorf("failed to create critical index: %s, error: %w", index, err) + } + common.SysLog(fmt.Sprintf("Warning: Failed to create non-critical index: %s, error: %v", index, err)) + } + }Note: strings is already imported in this file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
model/main.go
(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
model/main.go (3)
common/database.go (3)
DatabaseTypeClickHouse
(7-7)LogSqlType
(12-12)UsingClickHouse
(14-14)common/logger.go (1)
SysLog
(50-53)model/log.go (1)
Log
(17-37)
🔇 Additional comments (6)
model/main.go (6)
14-14
: Add ClickHouse driver import — looks goodImport path is correct for GORM’s ClickHouse driver and aligns with the new DSN handling.
46-49
: Correct quoting for ClickHouse reserved identifiersBacktick-quoting group/key for ClickHouse is appropriate (both
group
andkey
can conflict with SQL keywords).
201-201
: No action needed — comment-only changeThis is just a comment; no functional impact.
371-406
: ClickHouse MergeTree + TTL migration: baseline is solidEngine selection, TTL, partitioning, and disabling FKs for GORM migrations are correct for ClickHouse. Good separation from non-ClickHouse path.
387-391
: AutoMigrate won’t alter existing table engine/ORDER/TTL — verify first-run creationIf logs table already exists (from prior versions), AutoMigrate will not modify engine/ORDER BY/TTL. Ensure ClickHouse options are applied on first creation (clean install) or add an explicit path to recreate/alter when upgrading.
Would you like a guarded migration that checks existing table DDL (SHOW CREATE TABLE logs) and compares ORDER BY/TTL/engine, then prints actionable warnings or fails with a clear message if mismatched?
375-385
: All CreatedAt timestamps use seconds; no TTL unit mismatch
The search shows everyCreatedAt
assignment comes fromtime.Unix(…,0)
orcommon.GetTimestamp()
(both in seconds), not milliseconds. ThetoDateTime(created_at)
in your DDL will therefore be correct—no change needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
model/main.go (1)
158-172
: Document ClickHouse DSN format in project documentation.The implementation correctly restricts ClickHouse to log databases only. Based on the GORM ClickHouse driver v0.6.1 documentation, the DSN format should be clearly documented for users.
Add this to your README or configuration documentation:
# ClickHouse DSN Format clickhouse://user:password@host:port/database?param1=value1¶m2=value2 Example: LOG_SQL_DSN=clickhouse://admin:123456@clickhouse-db:9000/default?dial_timeout=10s&read_timeout=20s
🧹 Nitpick comments (3)
model/main.go (3)
201-201
: Remove or update the outdated comment.The comment references a "new version" feature that appears to already be in use. Either remove this comment or clarify what specific GORM version introduced this feature.
- // For Gorm NewVersion: DisableForeignKeyConstraintWhenMigrating: true, Disable FK constraints during migration
373-373
: Consider making retention configurable per model or user tier.While the environment variable
LOG_RETENTION_DAYS
provides basic configurability, consider implementing tiered retention policies (e.g., longer retention for premium users or specific model types) for better flexibility.
405-405
: Enhance log message with index creation status.Consider logging which indices were successfully created vs. failed to provide better visibility into the migration outcome.
- common.SysLog(fmt.Sprintf("ClickHouse log database migrated with optimized MergeTree engine, compression, %d days TTL, and indices", retentionDays)) + common.SysLog(fmt.Sprintf("ClickHouse log database migrated with optimized MergeTree engine, compression, %d days TTL", retentionDays))Add a counter to track successful index creation:
successCount := 0 for _, index := range indices { if err = LOG_DB.Exec(index).Error; err != nil { common.SysLog(fmt.Sprintf("Warning: Failed to create index: %s, error: %v", index, err)) } else { successCount++ } } common.SysLog(fmt.Sprintf("Created %d/%d indices successfully", successCount, len(indices)))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
model/main.go
(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
model/main.go (3)
common/database.go (3)
DatabaseTypeClickHouse
(7-7)LogSqlType
(12-12)UsingClickHouse
(14-14)common/logger.go (1)
SysLog
(50-53)model/log.go (1)
Log
(17-37)
🔇 Additional comments (4)
model/main.go (4)
14-14
: LGTM! ClickHouse driver import added correctly.The import is properly placed and follows the existing import structure.
46-48
: LGTM! ClickHouse column quoting correctly configured.The backtick quoting for
group
andkey
columns is appropriate for ClickHouse, as it follows MySQL-style quoting conventions.
393-403
: LGTM! Appropriate index types for ClickHouse.The bloom filter index for
user_id
and set index formodel_name
are well-suited for their respective cardinality and query patterns. The error handling with warnings is appropriate for non-critical index creation.
376-385
: Verify partition strategy aligns with query patterns.Monthly partitioning via
toYYYYMM
is appropriate for multi-year retention. However, ensure this aligns with your typical query patterns. If most queries span days rather than months, consider daily partitioning for better partition pruning.Would you like to analyze your typical query patterns to determine the optimal partitioning strategy? I can help generate monitoring queries to track partition efficiency.
This pull request introduces support for ClickHouse as a database option, primarily for log storage, along with necessary adjustments to accommodate its unique features. The changes span configuration, database initialization, and migration logic.
Ready for review
添加日志数据库对 clickhouse 的支持,适用于大规模的站点存储日志。
消费日志都可以存里面,ck 数据库更适合日志存储、压缩和分析,而且可以根据修改 TTL 来自动清理过时日志。
docker-compose 部署
Summary by CodeRabbit
New Features
Bug Fixes
Chores