Skip to content

Conversation

bartizan
Copy link
Contributor

@bartizan bartizan marked this pull request as ready for review August 26, 2025 11:33
@bartizan bartizan requested a review from a team as a code owner August 26, 2025 11:33
@waldekmastykarz
Copy link
Collaborator

I just saw that SQLite has the .import command which could help us improve the performance even further. Have you considered using it?

@bartizan
Copy link
Contributor Author

I just saw that SQLite has the .import command which could help us improve the performance even further. Have you considered using it?

Frankly, no, I have not.
The .import command is a part of the sqlite cli shell thus it cannot be used within Microsoft.Data.Sqlite provider. It imports a data format close to CSV, so we still need to parse OpenApi specs plus to keep that CSV data.

Current arrangements reduced the time of filling data to about 0.5 sec that is highly satisfactory. It is very likely the measures like a bulk insert, single transaction, journaling and caching management reproduce the effectiveness of .import command.

A bottleneck now is downloading (~ 5s) and parsing (~10s) OpenAPI specifications but it happens only once a day (I bet we could increase it with no harm).

Additionally contemplated the following measures:

  • to move the index creation after filling data (tests showed even slightly worse timing),
  • to parallel Graph specs downloading and parsing operations,
  • rearrange parsing func to get the input stream directly from network for "cold" initialization when we need to download data,
  • optimize OpenAPI reader by disabling the validation.

But all of that complicates the code base and does not seem to gain a significant effect.

@waldekmastykarz
Copy link
Collaborator

Good catch! Thanks for the additional explanation. Let's take the PR as-is.

As for the download frequency, we decided on daily so that in case there's a change to the API, you get it quickly, rather than waiting days or weeks. I wonder if there's something like e-tag headers that we could use to decide if we need to download the file.

@bartizan
Copy link
Contributor Author

I wonder if there's something like e-tag headers that we could use to decide if we need to download the file.

It is a great idea.
There is ETag header in response, let's take a shot at it.

@waldekmastykarz
Copy link
Collaborator

Shall I wait with reviewing the PR or will you open another one?

@bartizan
Copy link
Contributor Author

Shall I wait with reviewing the PR or will you open another one?

Either is fine with me. Maybe let's merge the PR and I will prepare a new one on Monday or so.

@waldekmastykarz
Copy link
Collaborator

Perfect! Thank you!

@waldekmastykarz waldekmastykarz self-assigned this Aug 28, 2025
Copy link
Collaborator

@waldekmastykarz waldekmastykarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Nothing to add 👏

@waldekmastykarz waldekmastykarz merged commit bff72ba into dotnet:main Aug 28, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants