Skip to content

Conversation

winterhazel
Copy link
Member

Description

The listEvents and deleteEvents APIs execute some non-optimized queries, which results in very long response times in environments that have many events.

Some optimizations were performed in these two workflows in order to reduce response times. The listing now uses a covering index, and does not perform a unecessary DISTINCT anymore. The deletion is now performed in batches. The batch size is defined by the global setting delete.query.batch.size.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

How Has This Been Tested?

Events listing

In a test environment with 2 million event entries, I called listEvents multiple times passing different combinations of account, archived, domainid, duration, enddate, entrytime, id, keyword, level, projectid, resourceid, resourcetype, startdate, startid and type. For almost all the listings, I got a more reasonable response time (between 0.1 and 3 seconds depending on the filters) compared to before (30+ seconds). The only exception is the keyword parameter, which is not optimizable with a index because it performs a column LIKE "%keyword%" for 3 different columns.

Access control

I called listEvents and verified that:

  • root admins can list all events;
  • domain admins can only list events from their domain;
  • users can only list their events.

Events deletion

  1. I assigned a value to delete.query.batch.size, enabling batch delete;
  2. I called deleteEvents to remove all events before 2025-01-01. Then, I verified that they were removed successfully;
  3. I assigned a value to event.purge.delay, enabling the automatic deletion of events older than 15 days. After the deletion task executed, I verified that events older than 15 days were removed successfully.
Access control

I called deleteEvents and verified that:

  • root admins can delete any events;
  • domain admins can only delete events from their domain;
  • users can only delete their events.

@winterhazel
Copy link
Member Author

@blueorangutan package

@winterhazel winterhazel added this to the 4.20.2 milestone Aug 13, 2025
@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 1.20482% with 82 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.16%. Comparing base (f5b4858) to head (0a8e293).

Files with missing lines Patch % Lines
...ava/com/cloud/upgrade/dao/Upgrade42010to42020.java 0.00% 30 Missing ⚠️
...in/java/com/cloud/server/ManagementServerImpl.java 0.00% 23 Missing ⚠️
...rc/main/java/com/cloud/event/dao/EventDaoImpl.java 0.00% 19 Missing ⚠️
.../main/java/com/cloud/domain/dao/DomainDaoImpl.java 0.00% 6 Missing ⚠️
...ain/java/com/cloud/upgrade/dao/DbUpgradeUtils.java 0.00% 3 Missing ⚠️
...dstack/api/command/user/event/DeleteEventsCmd.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.20   #11443      +/-   ##
============================================
- Coverage     16.17%   16.16%   -0.01%     
- Complexity    13284    13286       +2     
============================================
  Files          5656     5657       +1     
  Lines        497990   498034      +44     
  Branches      60399    60404       +5     
============================================
+ Hits          80529    80532       +3     
- Misses       408499   408541      +42     
+ Partials       8962     8961       -1     
Flag Coverage Δ
uitests 4.00% <ø> (ø)
unittests 17.02% <1.20%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14617

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the performance of events listing and deletion operations to reduce response times in environments with large numbers of events. The optimizations include removing unnecessary DISTINCT operations, adding a covering index for better query performance, and implementing batch deletion.

  • Replaced DISTINCT with NATIVE function in event listing queries for better performance
  • Implemented batch deletion using the existing delete.query.batch.size configuration
  • Added a new covering index on the event table to optimize search queries

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
ManagementServerImpl.java Refactored deleteEvents method to use batch deletion and simplified access control logic
ConfigurationManagerImpl.java Updated documentation for DELETE_QUERY_BATCH_SIZE to include events
QueryManagerImpl.java Removed DISTINCT operation from event search query for performance
EventDaoImpl.java Added new purgeAll method for batch deletion and removed unused listOlderEvents method
EventDao.java Added purgeAll method interface and removed listOlderEvents method
DomainDaoImpl.java Added getDomainAndChildrenIds convenience method
DomainDao.java Added getDomainAndChildrenIds method interface
DeleteEventsCmd.java Moved parameter validation from command to service layer and improved error message
ListEventsCmd.java Removed unnecessary blank line
Upgrade42010to42020.java Added database upgrade class to create covering index
DbUpgradeUtils.java Added utility method for creating indexes with custom names
schema-42010to42020.sql Empty schema upgrade file
schema-42010to42020-cleanup.sql Empty schema cleanup file

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@winterhazel
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 14637

Copy link
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good, will make the delete events action much better. Needs testing.

@winterhazel could we extend this to cover archive events as well? Also, I’m not aware of any other DB changes slated for 4.20.2. If this is the only one, would it be safer to target 4.22? Not blocking—just a suggestion.

throw new InvalidParameterValueException("Either 'ids', 'type' or 'enddate' must be specified.");
}
if (startDate != null && endDate == null) {
throw new InvalidParameterValueException("'startdate' must be specified with 'enddate' parameter.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not checked the current behaviour but can't we allow delete with only one of the params?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current behavior. I've just moved the code from DeleteEventsCmd to here.

I think that this can be allowed, but I do not want to modify how the parameters of the API behave in this PR.

@winterhazel
Copy link
Member Author

@winterhazel could we extend this to cover archive events as well? Also, I’m not aware of any other DB changes slated for 4.20.2. If this is the only one, would it be safer to target 4.22? Not blocking—just a suggestion.

@shwstppr I'll have a look into whether event archiving needs or would benefit from any changes.

About targeting 4.22 instead: the only DB change here is the addition of an index. This can not cause any inconsistency as far as I know, so it should be safe going to 4.20.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants