Skip to content

Conversation

haydenhoang
Copy link
Contributor

@haydenhoang haydenhoang commented Jul 24, 2025

Change Summary

This PR modernizes the Rust client with multi-node support, a cleaner/ergonomic builder API, helpers for federated search, typed parsing, and a substantially improved derive workflow for complex/nested document schemas. It also replaces mocked HTTP tests with a real Typesense server in CI to increase reliability.

What’s in this PR

  • Xtasks : added fetch and code-gen tasks to fetch the open-api spec and run code generation with this new config. Command: cargo xtask fetch code-gen

  • Multi-node client with retry/load-balance logic. All API methods now live inside typesense/src/client/*.

  • Ergonomic builder API for client config, search parameters, multi-search, collection schema & fields (typesense/src/builders/*).

  • Federated search helpers and typed result parsing (models/multi_search.rs, traits/multi_search_ext.rs).

  • Derive improvements: advanced nested + flattened field handling with UI tests for invalid attributes. Previously only thefacet flag is supported, this PR added the ability to specify attribute value (string, number, boolean,...)

  • WASM support (but timeout and retry features are disabled because of reqwest and reqwest_middleware)

  • CI: run against a real Typesense server (Docker), removed API mocks, lint config updates (disable all lints for typesense_codegen).

Usage highlights

Examples

use typesense::{
    Typesense,
    Client,
    models::Field
    prelude::*,
    builders:new_search_parameters
};
use reqwest::Url;
use reqwest_retry::policies::ExponentialBackoff;
use std::time::Duration;

/// A nested struct that will be flattened into the parent.
#[derive(Typesense, Serialize, Deserialize, Debug, PartialEq, Clone)]
struct ProductDetails {
    #[typesense(facet)]
    part_number: String,
    #[typesense(sort = false)]
    weight_kg: f32,
    #[typesense(skip)]
    desc: String,
}

/// A nested struct that will be flattened and renamed.
#[derive(Typesense, Serialize, Deserialize, Debug, PartialEq, Clone)]
struct Logistics {
    warehouse_code: String,
    shipping_class: String,
}

/// A nested struct that will be indexed as a single "object".
#[derive(Typesense, Serialize, Deserialize, Debug, PartialEq, Clone)]
struct Manufacturer {
    name: String,
    city: String,
}

/// The main struct that uses every feature of the derive macro.
#[derive(Typesense, Serialize, Deserialize, Debug, PartialEq, Clone)]
#[typesense(
    collection_name = "mega_products",
    default_sorting_field = "price",
    enable_nested_fields = true,
    token_separators = ["-", "/"],
    symbols_to_index = ["+"]
)]
struct MegaProduct {
    id: String,

    #[typesense(infix, stem)]
    title: String,

    #[typesense(rename = "product_name")]
    #[serde(rename = "product_name")]
    official_name: String,

    #[typesense(facet)]
    brand: String,

    #[typesense(sort)]
    price: f32,

    #[typesense(range_index)]
    review_score: f32,

    #[typesense(index = false, store = false)]
    internal_sku: Option<String>,

    #[typesense(type = "geopoint")]
    location: (f32, f32),

    #[typesense(num_dim = 4, vec_dist = "cosine")]
    embedding: Vec<f32>,

    #[typesense(flatten)]
    details: ProductDetails,

    #[typesense(flatten, rename = "logistics_data")]
    #[serde(rename = "logistics_data")]
    logistics: Logistics,

    manufacturer: Manufacturer,

    tags: Option<Vec<String>>,
}

let client = Client::builder()
        .nodes(vec![Url::parse("http://localhost:8108").unwrap()])
        .api_key("xyz")
        .healthcheck_interval(Duration::from_secs(5))
        .retry_policy(ExponentialBackoff::builder().build_with_max_retries(1))
        .connection_timeout(Duration::from_secs(3))
        .build()
        .expect("Failed to create Typesense client")

// Create Collection using the schema from the derive macro
let schema = MegaProduct::collection_schema();
let create_res = client.collections().create(schema).await;
// builder for search params
let search_params = new_search_parameters().q("the").query_by("title").build();
// generic collection
let typed_collection = client.collection_of::<MegaProduct>(&schema.name);
let search_res = typed_collection.documents().search(search_params).await;

For more examples including the multi-search, union-search typed parsing helpers, please see the integration tests.

Local quick start

# Start a local Typesense for tests
docker compose up -d

# Run the test suite
cargo test --all-features
# Run test in WASM
wasm-pack test --headless --chrome

Backward compatibility

This PR is a substantial re-architecture of the Typesense Rust client. There is no backward compatibility guarantee with previous versions. APIs have been redesigned around the new multi-node client. The APIs intypesense_codegen are no longer exported.

Documentation

Inline docs added alongside the new builders and client modules. Follow-up PRs will update the README with examples.

Checklist

  • Multi-node client with retries/load-balancing
  • Builders for client/search/multi-search/schema/fields
  • Federated search typed parsing helpers & generic collection
  • Union search
  • Derive: nested/flattened handling + UI tests
  • CI with real Typesense; remove HTTP mocks
  • WASM target support
  • Integration tests
  • Operations API
  • Natural language search API
  • Enum for API key actions
  • Refactor: deduplicate open-api generated enums
  • Future-proofing Typesense v30

PR Checklist

This commit introduces a comprehensive and robust implementation for handling complex document structures within the `#[derive(Typesense)]` macro, enabling powerful schema generation directly from Rust structs.

The macro now supports the full range of advanced indexing strategies offered by Typesense, including automatic object indexing, field flattening with prefix control, and patterns for manual flattening.

### Key Features & Implementation Details

- **Automatic Object Indexing:**
  - A field containing a nested struct that also derives `Document` is now automatically mapped to a Typesense `object` (or `object[]` for `Vec<T>`).
  - This feature requires `#[typesense(enable_nested_fields = true)]` on the parent collection, which the macro now supports.

- **Automatic Field Flattening with `#[typesense(flatten)]`:**
  - A field marked `#[typesense(flatten)]` has its sub-fields expanded into the parent schema using dot-notation.
  - By default, the Rust field's name is used as the prefix for all sub-fields (e.g., `details: ProductDetails` results in schema fields like `details.part_number`).

- **Prefix Override for Flattening:**
  - The `flatten` attribute can be combined with `rename` to provide a custom prefix for the flattened fields.
  - Usage: `#[typesense(flatten, rename = "custom_prefix")]`
  - This provides powerful schema mapping flexibility, allowing the Rust struct's field name to differ from the prefix used in the Typesense schema.

- **Manual Flattening Pattern (`skip` + `rename`):**
  - A new `#[typesense(skip)]` attribute has been introduced to completely exclude a field from the generated Typesense schema.
  - This enables the powerful pattern of sending both nested and flattened data to Typesense: the nested version can be used for display/deserialization, while a separate set of flattened fields is used for indexing. This is achieved by:
    1.  Marking the nested struct field (e.g., `details: Details`) with `#[typesense(skip)]`.
    2.  Adding corresponding top-level fields to the Rust struct, marked with `#[typesense(rename = "details.field_name")]`.

- **Ergonomic Boolean Attributes:**
  - All boolean attributes (`facet`, `sort`, `index`, `store`, `infix`, `stem`, `optional`, `range_index`) now support shorthand "flag" syntax.
  - For example, `#[typesense(sort)]` is a valid and recommended equivalent to `#[typesense(sort = true)]`, dramatically improving readability and consistency.

- **Robust Error Handling & Validation:**
  - The macro provides clear, compile-time errors for invalid or ambiguous attribute usage.
  - It correctly detects and reports duplicate attributes, whether they are in the same `#[typesense(...)]` block or across multiple attributes on the same field.

### Testing

- **Comprehensive Integration Test (`derive_integration.rs`):**
  - A new, full-lifecycle integration test has been added to validate the entire feature set.
  - The test defines a complex struct using every new attribute and pattern, generates a schema, creates a real collection, and uses the generic client (`collection_of<T>`) to perform and validate a full Create, Read, Update, Delete, and Search lifecycle.
  - A second integration test was added to specifically validate the manual flattening pattern.

- **UI Tests:**
  - `trybuild` UI tests have been added to verify that the macro produces the correct compile-time errors for invalid attribute combinations, such as duplicate attributes.
@haydenhoang haydenhoang changed the title feat: multi nodes Client re-architecture Aug 22, 2025
@haydenhoang
Copy link
Contributor Author

Hi @morenol, @RoDmitry and maintainers, I'd love to get your thoughts on the overall direction of this PR before going too deep into refinements. Does this approach align with how you'd like the library to evolve?

@RoDmitry
Copy link
Contributor

RoDmitry commented Aug 22, 2025

Why would you want to wrap each API function into it's own struct implementation? That's something I have tried to avoid. Because it's hard to synchronize (support) manually with an OpenAPI implementation. It does not provide any benefits except not needing to write &config.
Btw config: Arc<configuration::Configuration> Arc is useless, and params.clone() is not needed (use move keyword), also &str must be converted to String using to_owned. Looks like it was generated by an AI.

I would also propose to separate it to different PRs, because it's easier to review smaller PRs:

  • Tests are a good thing, maybe except removing mock tests.
  • OpenAPI update is very much needed. But auto-generated code is not ideal, review changes. For example this code:
if let Some(ref apikey) = configuration.api_key {
        let key = apikey.key.clone();
        let value = match apikey.prefix {
            Some(ref prefix) => format!("{} {}", prefix, key),
            None => key,
        };

I previously manually replaced with:

if let Some(ref local_var_apikey) = local_var_configuration.api_key {
        let local_var_key = &local_var_apikey.key;
        let local_var_value = match local_var_apikey.prefix {
            Some(ref local_var_prefix) => format!("{local_var_prefix} {local_var_key}"),
            None => local_var_key.clone(),
        };

to avoid extra clone.

Also documents_api::search_collection returns SearchResult, but your model is SearchResult<T> (nit: must be <D>, because Deserialize). Compiler would have noticed that. Did you test your code?
And why is it DeserializeOwned? There were no requirements previously.

@haydenhoang
Copy link
Contributor Author

haydenhoang commented Aug 23, 2025

Hi @RoDmitry,

Why would you want to wrap each API function into it's own struct implementation?

It's to support the multi-node configuration (retry to a different node), it also give us more control over the input and output schemas of each API method. One example is the documents_api::multi_search, it accept a request body of type MultiSearchSearchesParameter in which the paramter union: Option<bool> will determine the search result type (SearchResult for union: true and MultiSearchResult for union: false). Because of this, I manually defined a new multi-search body schema that remove the union field, and created 2 separate methods .perform() and .perform_union() which will parse the result JSON accordingly:

  // Handle the result: parse to raw SearchResult, then convert to generic SearchResult<Value>
  match raw_result {
      Ok(json_value) => {
          // A union search returns a single SearchResult object, not a MultiSearchResult.
          // First, parse into the non-generic, raw model.
          let raw_search_result: raw_models::SearchResult =
              serde_json::from_value(json_value).map_err(Error::from)?;

          // Then, use our existing constructor to convert the raw result to the typed one,
          // specifying `serde_json::Value` as the document type.
          SearchResult::<serde_json::Value>::from_raw(raw_search_result).map_err(Error::from)
      }
      Err(e) => Err(e),
  }

And isn't this

let search_requests = MultiSearchSearchesParameter {
   // the `union` field is removed
    searches: vec![
        MultiSearchCollectionParameters {
            collection: Some("company".to_owned()),
            ..Default::default()
        },
    ],
};
let common_params = MultiSearchParameters {
    limit: Some(1),
    ..Default::default()
};
let raw_response = client.multi_search().perform(&search_requests, &common_params).await?;

looks cleaner than this?

let request_body = raw_models::MultiSearchSearchesParameter {
    union: Some(false),
    searches: search_requests,
};
let params = MultiSearchParams {
  // body
  multi_search_searches_parameter: Some(request_body),
  // common params
  collection: "companies".to_owned()
}
documents_api::multi_search(&config, params).await

That's something I have tried to avoid. Because it's hard to synchronize (support) manually with an OpenAPI implementation.

Sorry, I’m not sure I understand your point, could you clarify?

Also documents_api::search_collection returns SearchResult, but your model is SearchResult (nit: must be , because Deserialize). Compiler would have noticed that. Did you test your code?

Yes, all tests have passed.
I created the typesense/src/models that reexport from typesense_codegen so that we have more control over what is visible to the users (because like you said auto-generated code is not ideal). The customSearchResult<D> in typesense/src/models/search_result.rs will be constructed from typesense_codegen::SearchResult under the hood.

This way we can minimize modifying the auto-generated code which will eventually be overwritten by future open api spec updates.

&str must be converted to String using to_owned. Looks like it was generated by an AI.

My bad, I'm fairly new to Rust, my thinking was simply 'I need a String,' so to_string() was the first method that came to mind 😅.

config: Arcconfiguration::Configuration Arc is useless

Thanks for the catch, I will get it updated!

and params.clone() is not needed (use move keyword)

Hmmm, the retry mechanism in the execute() loop require the closure to be callable multiple times. Rust will yell at me if I don't use that clone().

And why is it DeserializeOwned?

Because of the from_raw(raw_result: raw_models::SearchResult) function which is used to construct SearchResult<T> from typesense_codegen::SearchResult. The raw_hit.document is temporary and will be dropped at the end of the function call.

OpenAPI update is very much needed. But auto-generated code is not ideal, review changes. For example this code:

The rust client is too outdated right now, we should prioritize adding the missing features in my opinion. We can always optimize later and it won't be a breaking change.

Thanks for the feedback @RoDmitry, how does this sound? I can start by opening a small PR to add the Xtasks.

@RoDmitry
Copy link
Contributor

I did not notice the retry mechanism, my bad, it makes sense. Will take a look at it later.

If you have multiple SearchResults, I would recommend renaming, so all names would be unique, I still have not found the second SearchResult 😳

Xtasks is manually run, right? Because if it's automatic, it can break the library once OpenAPI spec changes types or something, so it will not compile.

@haydenhoang
Copy link
Contributor Author

haydenhoang commented Aug 24, 2025

Xtasks is manually run, right?

Yes, will post up a PR for this shortly

@RoDmitry
Copy link
Contributor

RoDmitry commented Aug 24, 2025

I have looked at the Client::execute() ("retry to a different node mechanism") and it's very messy, very non-optimal solution. Again it looks like an AI code. It's not even a "load balancing logic" as you call it, because it uses the nearest_node until it's not is_healthy. I think that most people don't even need that kind of behavior. But anyway I think this can be done as a separate thing, not the default implementation. It's too complicated for the default usage. And multi-node configuration can be done using a single address, which is routed to multiple nodes through a separate load balancer (e.g. Kubernetes). So I think that separate struct implementations for each API method don't make much sense, and hard to support.

@haydenhoang
Copy link
Contributor Author

haydenhoang commented Aug 25, 2025

It's not even a "load balancing logic" as you call it, because it uses the nearest_node until it's not is_healthy

That nearest_node is for the server-side load-balancer, it is always prioritized. But in cases where the user doesn't have one, they can still specify multiple nodes and the client will load balance across all nodes.

That load-balance logic is similar to the logic in Typesense clients for other languages.

More info in the official docs.

So I think that separate struct implementations for each API method don't make much sense, and hard to support.

Could you clarify what the drawbacks are?

This wrapper design works really well in the Typesense Go client. It gives us more control/flexibility over the input and output schemas (like I said above). Auto-generated API methods are not very clean, we should not expose it to the user, it should only be used internally to reduce boilerplate code.

and it's very messy, very non-optimal solution.

I'm open to improve this in a separate smaller PR.

@RoDmitry
Copy link
Contributor

RoDmitry commented Aug 25, 2025

We can always optimize later

I'm open to improve this in a separate smaller PR.

For me it sounds like: "I don't care about performance". Not good for somebody who tries to re-architecture the code API.

But all I was trying to say that this can be made simpler, without too much affect on performance. Why do you want to avoid modifying the auto-generated code? Do you have the same opinion towards the AI generated code? 😁😅

Auto-generated API methods are not very clean

So make them clean. Don't rely that much on auto-generation. It does not make the code unmodifiable. Git allows you to review changes after re-generation, and revert the changed parts. Also you can cover these changes with tests, so all output types will be checked.
Anybody can auto-generate the code using OpenAPI, but the purpose of this library is to make it cleaner, faster and easier to use.

Could you clarify what the drawbacks are?

The thing that bothers me is that serde_json::Value is very slow, that's why I want to avoid parsing into it (and then converting by serde_json::from_value). And it's just overall a slower code. If you can achieve the same goals with simpler code, make it simple (KISS principle), and as a bonus - it will work faster.

P.S. I'm not against a builder concept.

@haydenhoang
Copy link
Contributor Author

haydenhoang commented Aug 26, 2025

we should prioritize adding the missing features in my opinion. We can always optimize later and it won't be a breaking change.

I'm open to improve this in a separate smaller PR.

For me it sounds like: "I don't care about performance". Not good for somebody who tries to re-architecture the code API.

That was my attempt to avoid premature optimization 😅, I do care about performance. That's why while working on this I always make sure that future improvement/optimization won't introduce breaking change for users.

Why do you want to avoid modifying the auto-generated code?

I am not against modifying the auto-generated code, in fact, I did modify some of it out of necessity (for endpoints that accept JSONL like document import, the auto-generated code caused JSONL to be serialized into JSON). But we should avoid it whenever possible as this would make the developer experience much better overall: less auto-generated code modifications to keep track of and future updates to parameters can be made by running one single command.

So make them clean. Don't rely that much on auto-generation. It does not make the code unmodifiable. Git allows you to review changes after re-generation, and revert the changed parts. Also you can cover these changes with tests, so all output types will be checked.
Anybody can auto-generate the code using OpenAPI, but the purpose of this library is to make it cleaner, faster and easier to use.

Imagine a new contributor run the openapi codegen expecting to add just one new search parameter but it overwrite all previous changes we made to the auto-generated code and a bunch of tests start failing. Also, it would be painful for reviewers to keep track of all the modifications.

The thing that bothers me is that serde_json::Value is very slow, that's why I want to avoid parsing into it (and then converting by serde_json::from_value)

Really? I thought this was fast enough that it is negligible. In this case it might be worth it to directly modify the auto generated code. So the problem isn't about the wrapper API methods but the wrapper struct?

I'm thinking of writing a simple script to find and replace the auto-generated code, and it will run after the codegen script.
Fyi, there is a preprocessing step for the open-api spec file.

Do you have the same opinion towards the AI generated code? 😁😅

One thing for sure is that it's a better Rust developer than I am 😂. Just picking up Rust as I work on this project and I'm amazed by how good the developer experience is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants