feat: add bot detection in data ingestion #3347

skwowet · 2025-08-22T19:19:41Z

Changes proposed ✍️

Added bot detection logic to the data-sink-worker to automatically identify bot accounts during member creation. The detection system uses a three-tier approach:

Strong patterns for immediate bot confirmation, covering clear indicators such as [bot] notation, well-known bot services (e.g., dependabot, renovate, coderabbit), and other patterns.
Known bots list for specific accounts that cannot be consistently detected with regex alone.
Common patterns for broader automation keywords and platform prefixes, which are used to flag potential bots for LLM validation.

Bot flags provided by source integrations always take precedence over our detection logic, and suspected bots are explicitly flagged for LLM validation. Existing isBot values are preserved during member updates to ensure we do not overwrite integration-provided information.

Also, blacklistedDomains was converted from an array to a Set to improve lookup performance (O(1) vs O(n)).

feat: add bot detection in data ingestion

45fa241

skwowet self-assigned this Aug 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add bot detection in data ingestion #3347

feat: add bot detection in data ingestion #3347

skwowet commented Aug 22, 2025

Uh oh!

Uh oh!

feat: add bot detection in data ingestion #3347

Are you sure you want to change the base?

feat: add bot detection in data ingestion #3347

Conversation

skwowet commented Aug 22, 2025

Changes proposed ✍️

Uh oh!

Uh oh!