Add Vector benchmarks to suite by connortsui20 · Pull Request #7397 · vortex-data/vortex

connortsui20 · 2026-04-11T16:43:27Z

Summary

Tracking issue: #7297

Adds a vector-search-bench crate similar to VectorDBBench.

The benchmark brute-forces cosine similarity search over public VectorDBBench datasets (Cohere, OpenAI, etc).

Since Vortex is not a database, we do not measure things like vector inserts and deletes. instead we just measure storage size and throughput for four targets:

Hand-rolled Rust baseline over &[f32]
Uncompressed (canonical) Vortex
Default compressed (which ends up being ALPrd)
TurboQuant (plus Recall@10 for the lossy TurboQuant path).

Every variant goes through a correctness check against the uncompressed scan before timing.

Not sure if this makes sense to have on a PR benchmark yet since this can only happen on a very specific array tree that will have a very specific optimized implementation.

Testing

N/A

Promote `compress_turboquant`, `build_constant_query_vector`, and `build_similarity_search_tree` from the bench-only `similarity_search_common` module into a new public `vortex_tensor::vector_search` module so downstream benchmark crates (and library users doing brute-force vector search) can reuse them without duplicating the canonical recipe. The existing similarity_search bench now delegates to these helpers instead of inlining them, and adds three unit tests for the new public surface: constant-query vector construction, end-to-end tree execution to a BoolArray, and a TurboQuant-roundtrip ranking check. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Adds `Format::VortexTurboQuant` as a new Format enum variant, parallel to `VortexCompact`, so downstream dashboards can distinguish TurboQuant-encoded Vortex results from the generic BtrBlocks-compressed OnDiskVortex layout via the existing `Target { engine, format }` grouping key. Also adds `vortex_bench::conversions::list_to_vector_ext`: a utility that rewraps a `List<float>` column (the shape an `emb` column takes when ingested from parquet via `parquet_to_vortex_chunks`) as `Extension<Vector>(FixedSizeList<T, D>)`. This is the only piece of conversion glue needed to bring a VectorDBBench-style parquet embedding column into a form the vector-search scalar functions expect. It handles both single-list and chunked inputs, validates uniform stride, and rejects nullable, non-float, or mismatched-length inputs. `vortex-bench` gains a direct dependency on `vortex-tensor` for the `Vector` extension type. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Introduce `VectorDataset`, a new `Dataset`-trait implementor that wires public VectorDBBench-hosted embedding corpora into the vortex-bench data-management machinery. The first variant is `CohereSmall` (Cohere wiki-22-12, 100K rows × 768 dims, cosine metric, ~150 MB zstd-parquet), targeting a CI-friendly size. Follow-up variants (CohereMedium, OpenAI*, SIFT*, etc.) can be added alongside without structural change. Each variant carries its upstream `assets.zilliz.com` parquet URL, dimension, row count, and curated distance metric. The download path reuses the existing `download_data` / `idempotent_async` infrastructure from `DownloadableDataset`, caching under `{name}/{name}.parquet` in the local benchmark data directory. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Introduce a new standalone benchmark crate at `benchmarks/vector-search-bench/` that measures brute-force cosine similarity search over VectorDBBench-style public embedding corpora. This commit wires up three Vortex variants: - `vortex-uncompressed`: raw `Vector<dim, f32>` with no encoding-level compression applied. Baseline for Vortex variants. - `vortex-default`: `BtrBlocksCompressor::default()` applied to the FSL storage — generic lossless compression for float vectors. - `vortex-turboquant`: the full `L2Denorm(SorfTransform(FSL(Dict)))` pipeline via `compress_turboquant`. Lossy; in-memory `.nbytes()` is used as the reported size until `L2Denorm` serialization lands. Four measurements are emitted per `(dataset, variant)` pair: 1. Compressed size (bytes) — on-disk `.vortex` file size for the lossless variants, in-memory footprint for TurboQuant. 2. Full-scan decode time — executes the column to a materialized `FixedSizeListArray`, forcing the entire decompression/dequantization path for each variant. 3. Cosine-similarity execute time — runs `CosineSimilarity(data, query)` into a materialized f32 array. 4. Filter execute time — runs `Binary(Gt, [cosine, threshold])` into a `BoolArray`. This is the end-to-end "query pushdown" path. Output goes through the existing `vortex_bench::measurements` types so results flow through the standard `gh-json` pipeline and appear on the CI dashboard alongside compress-bench / random-access-bench results. `Variant::VortexUncompressed` and `Variant::VortexDefault` both report as `Format::OnDiskVortex`; `Variant::VortexTurboQuant` reports as `Format::VortexTurboQuant`, letting downstream consumers distinguish them via the existing `Target { engine, format }` grouping key. The crate ships with a unit test that builds a 64×128 synthetic `Vector` array, runs all three variants through `prepare_variant` + `run_timings`, and verifies every variant reports a non-zero size and non-zero timings. Parquet-Arrow baseline and Recall@10 quality measurements are deferred to a follow-up commit; this commit intentionally keeps the Vortex-only core focused. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…ment This completes the v1 scope of the vector-search benchmark by adding two orthogonal pieces of functionality: 1. **Parquet-Arrow hand-rolled cosine baseline** (`parquet_baseline.rs`) reads the canonical parquet file via `parquet::arrow`, decodes the `emb` column to a flat `Vec<f32>`, and runs a straightforward Rust cosine loop. This is the "what you'd do without Vortex" external floor — necessary for the benchmark to tell a credible story about Vortex's value vs. raw parquet. Reports size, decode time, cosine time, and filter time in the same format as the Vortex variants, under `Format::Parquet`. Handles both `List<f32>` and `FixedSizeList<f32>` parquet schemas. 2. **Recall@K quality measurement** (`recall.rs`) computes the fraction of the exact top-K nearest neighbours that the lossy TurboQuant variant recovers, using the uncompressed Vortex scan as local ground truth (not VectorDBBench's shipped `neighbors.parquet`, which would require an index and is out of scope). Samples are deterministic — rows picked at uniform intervals across the dataset — so results are stable across runs. Only TurboQuant is checked; lossless variants are trivially 1.0. Emitted as a `CustomUnitMeasurement` with unit "recall" so it flows through the existing gh-json pipeline alongside the size/timing measurements. The binary picks this all up behind two new CLI flags: `--parquet-baseline` (default true) and `--recall-queries N` (default 100), plus `--recall-k K` (default 10). Four unit tests cover the new functionality: - `parquet_baseline_reads_fsl_column` — end-to-end parquet write + read + cosine loop on a 3×3 fixture. - `uncompressed_has_perfect_self_recall` — sanity check that recall@10 of a variant against itself is exactly 1.0. - `turboquant_recall_is_reasonable_for_synthetic_data` — loose lower bound (>= 0.3) on TurboQuant recall for a 64×128 random dataset to catch total regressions without being flaky. A small `test_utils` submodule was extracted from the lib's existing tests module so the recall tests can reuse the synthetic vector generator. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Adds `vector-search-bench` as a new entry in the post-merge `bench` matrix alongside `random-access-bench` and `compress-bench`. Uses the same build-bin / bench-taskset / gh-json / S3-upload pipeline so results flow into the existing benchmarks dashboard without any dashboard-side changes. The four formats passed on the CLI (`parquet,vortex-uncompressed,vortex-default,vortex-turboquant`) surface in the emitted JSON as three different `target.format` values — `Format::Parquet` for the hand-rolled baseline, `Format::OnDiskVortex` for the two lossless variants, and `Format::VortexTurboQuant` for the lossy one — so dashboards can render them as distinct bars per metric. Also promotes the `--variants` clap flag on the binary to `--formats` for naming consistency with the other standalone benchmarks (the CI script pipes `--formats ${{ matrix.benchmark.formats }}` uniformly). The underlying `Variant` enum is unchanged; only the CLI surface moved. Finally, adds `benchmarks/vector-search-bench/README.md` documenting what the benchmark measures, the four formats, local-run instructions, and — importantly — the dataset-mirror caveat. The current dataset URL points at `assets.zilliz.com` (Zilliz's public anonymous-readable bucket). Running this matrix entry on every develop merge will create recurring egress traffic on a third-party bucket; the README explains how to swap the URL in `VectorDataset::parquet_url` to an internal mirror before turning this on for a fork with heavy merge frequency. Skipping `bench-pr.yml` (PR runs) deliberately — v1 is develop-only until the mirror / egress question is resolved. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Extends `VectorDataset` with five additional cosine-metric corpora from the VectorDBBench collection: - CohereMedium: 1M × 768d (wiki-22-12) - OpenAiSmall: 50K × 1536d (OpenAI-on-C4) - OpenAiMedium: 500K × 1536d (OpenAI-on-C4) - BioasqMedium: 1M × 1024d (biomedical) - GloveMedium: 1M × 200d (word embeddings) All six built-in datasets are exported via a new `ALL_VECTOR_DATASETS` slice and exposed through the `vector-search-bench` `--datasets` flag so local runs can pick any of them. Adds an `all_datasets_have_consistent_metadata` test that verifies every built-in variant has a unique kebab-cased name, a `train.parquet` URL under `assets.zilliz.com/benchmark/`, a dimension above the TurboQuant minimum (128), and cosine metric (v1 only wires cosine). L2 / IP datasets (SIFT, GIST, LAION) remain out of scope until vortex-tensor gains an L2-distance scalar function. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…generator Fixes two issues discovered while running the benchmark end-to-end against a real parquet file in a no-network environment: 1. `parquet_to_vortex_chunks` produces `ListView` arrays for list columns by default, so `list_to_vector_ext` was rejecting them with "expects a List array, got list(f32)". Added a fast path that applies `recursive_list_from_list_view` before the offset-stride validation kicks in. 2. `list_to_vector_ext` returns a `ChunkedArray<Extension<Vector>>` when given a chunked input (the normal shape after parquet ingest), but downstream `extract_query_row` expects a single non-chunked `Extension<Vector>`. `prepare_dataset` now calls `wrapped.execute::<ExtensionArray>(&mut ctx)` to materialize the chunked wrapper into a single unchunked extension array. Also adds a new `gen_synthetic_dataset` helper binary that writes a VectorDBBench-shape parquet file (`id: int64`, `emb: list<float32>`, zstd-compressed) with deterministic xorshift-generated vectors. This is useful for: - Local dev runs of `vector-search-bench` without needing outbound network access to `assets.zilliz.com`. - Populating the `vortex-bench/data/<name>/<name>.parquet` cache path so the benchmark's idempotent download step skips the HTTP fetch. - Sanity-checking the benchmark pipeline in CI environments that block outbound HTTPS to third-party buckets. Verified end-to-end against 5000×768 synthetic Cohere-small and 5000×1536 synthetic OpenAI-small inputs — every measurement pipeline (size, decode, cosine, filter, recall@10) produces valid gh-json and TurboQuant recall@10 lands at 0.91 on both fixtures, consistent with the loose recall floor the unit test enforces. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…press measurements A thorough audit of the benchmark's outputs found three issues that together made the results hard to interpret: 1. **"Size" was lying.** The old `measure_on_disk_size` helper wrote every variant through `SESSION.write_options()`, which uses `WriteStrategyBuilder::default()` — and that default applies BtrBlocks compression regardless of the in-memory tree. So both `vortex-uncompressed` and `vortex-default` ended up writing identical ALP-RD-compressed bytes to disk, reporting the same "size" number even though the two in-memory trees were different. Switched to reporting `ArrayRef::nbytes()` of the in-memory variant tree instead — the honest footprint of what the compute measurements actually operate on. Running the diagnostic `print_variant_trees` test now shows what each variant really is: ``` VortexUncompressed nbytes=15,360,000 Extension → FSL → Primitive<f32> VortexDefault nbytes=13,072,050 Extension → FSL → alprd(f32) [bitpacked u16/u32 + patches] VortexTurboQuant nbytes= 5,141,024 L2Denorm → SorfTransform → FSL → Dict(u8 codes + f32[256]) ``` So BtrBlocks really does find a ~15% lossless saving on random f32 via ALP-RD, and the previously-observed 3.5× cosine slowdown is the real cost of decoding ALP-RD on the fly inside `CosineSimilarity`. 2. **"Decompress time" for the compressed variants was a no-op.** The old `decode_full_scan` only executed the Extension shell and the FSL storage; it never forced the element buffer to materialize. Since FSL canonical form can keep its elements in their encoded representation, this reported ~60 ns for `vortex-default` even though reading any value would actually have to run the ALP-RD unpacking. Renamed to `decompress_full_scan` and added a final `fsl.elements().execute::<PrimitiveArray>()` call that forces the lazy decode to completion. `vortex-default`'s decompress time now shows ~6 ms on 5000×768 rows — the actual ALP-RD cost. 3. **No correctness verification.** The benchmark reported throughput numbers with no guarantee that the variants were producing the same answers. A new `verify.rs` module now computes cosine-similarity scores for one query against every variant *before* timing, compares them to the uncompressed baseline, and bails the run if any lossless variant drifts beyond 1e-4 max-abs-diff or any lossy variant drifts beyond 0.2 max-abs-diff. The `execute_cosine` implementation is promoted to `pub` and shared between the timing loop and verification so both paths exercise the exact same expression tree. Also adds a new `compress time` measurement alongside `decompress time`, so the dashboard can show the full round-trip cost per variant (previously we only timed one side). The TurboQuant verification is emitted as a warning instead of a hard error — the lossy tolerance is a quality-budget alert, not a correctness invariant. New unit tests (13 total in vector-search-bench, up from 4): - `verify::*` — 8 tests covering the `compare_scores`, `verify_scores`, and `verify_variant` paths including NaN handling and end-to-end lossless/lossy checks on synthetic 64×128 data. - `tests::uncompressed_decompress_is_fast` — regression guard ensuring the uncompressed variant's decompress pass is >5× faster than TurboQuant's so future refactors can't accidentally make the "uncompressed" path take the slow path. - `tests::print_variant_trees` — `#[ignore]` diagnostic test that dumps `display_tree_encodings_only()` for each variant. This is what was used to root-cause the size-lying issue. Verified end-to-end on synthetic 5000×768 Cohere-small: parquet nbytes=14,511,306 cosine= 2.9 ms correctness=0.00e0 vortex-uncompressed nbytes=15,360,000 cosine= 7.9 ms correctness=0.00e0 vortex-default nbytes=13,072,050 cosine=20.5 ms correctness=0.00e0 vortex-turboquant nbytes= 5,141,024 cosine=59.1 ms correctness=5.18e-3 recall@10 (tq) 0.9150 All lossless variants now produce bit-identical cosine scores (max diff exactly 0.0) against the uncompressed baseline, confirming the ALP-RD and hand-rolled parquet paths are correct. TurboQuant's 5.18e-3 drift is well within the 0.2 lossy tolerance. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

The compute side of the baseline isn't really a parquet implementation — it's a **hand-rolled Rust scalar cosine loop over a flat `Vec<f32>`** that happens to have been decoded from a parquet file via `parquet-rs` / `arrow-rs`. Labeling it as "parquet" in CLI flags, metric names, and docs suggested the baseline was measuring a real parquet-based analytical engine, when in reality it's the minimum cost a Rust programmer could get away with given no query engine. User-facing rename: everywhere the baseline was called "parquet" it now says "handrolled" — the CLI format, the metric labels, the CI matrix entry, the README, the type and function names in the baseline module, and the module filename itself (`parquet_baseline.rs` → `handrolled_baseline.rs`). Unchanged: - `read_parquet_embedding_column` keeps its name because it really does read parquet. Only the baseline-level wrappers take the `handrolled` label. - `target.format` stays `Format::Parquet` — the storage side really is parquet on disk, and dashboards grouping by format should still put the baseline's size/decompress measurements under "parquet". The `CompressionTimingMeasurement::to_json` prefix therefore still reads `"parquet_rs-zstd "`, which honestly describes the parquet reader stack. Only the compute label in the metric `name` string reflects the hand-rolled compute path. The module-level doc comment on `handrolled_baseline.rs` now explicitly frames the baseline as a compute-cost *floor*, not a fair DBMS comparison, and points at DuckDB `list_cosine_similarity` and DataFusion vector UDFs as better "what real users pay for parquet+cosine" baselines for future work. Verified end-to-end against the 5000×768 synthetic cohere-small fixture. All 13 existing tests still pass (the baseline test rename from `parquet_baseline_reads_fsl_column` to `handrolled_baseline_reads_fsl_column` is the only functional change to tests). Correctness verification still reports: handrolled max_abs_diff = 0.000e0 (bit-identical, lossless) vortex-uncompressed max_abs_diff = 0.000e0 vortex-default max_abs_diff = 0.000e0 vortex-turboquant max_abs_diff = 5.18e-3 (within 0.2 lossy tolerance) Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Review caught four real bugs plus a stylistic stray import. All are small, targeted fixes. 1. `build_similarity_search_tree` (vortex-tensor) and `execute_cosine` (vector-search-bench) wrapped `CosineSimilarity::try_new_array` in `vortex_expect`, which panics on the `Err` branch. Both functions' doc comments claimed they *returned* errors on dimension mismatch, so the behavior contradicted the contract: a caller passing a wrong-dimension query vector would crash instead of seeing a clean error. Fixed by replacing `.vortex_expect(...)` with `?`. 2. The progress bar in `main.rs` was initialized with `datasets.len() * args.formats.len()` (which counts the handrolled baseline) but only `inc(1)`'d inside the Vortex-variants loop, so the bar finished at `N-1/N` whenever handrolled was selected. Fixed by incrementing inside the handrolled block and recomputing `total_work` from the actual increment sites. 3. `extract_query_row` blindly called `PrimitiveArray::as_slice::<f32>()` without checking the underlying ptype. The benchmark only ever uses f32 today, but the `Vector` extension type supports f16/f32/f64, so a future caller that accidentally passes an f64 `Vector` would get a mis-typed slice. Fixed by asserting `elements.ptype() == PType::F32` before the cast with a clear error message. While there, simplified the convoluted bounds check (was `row * dim + dim > len * dim`, reduces to `row >= len` without the multiplication). 4. `gen_synthetic_dataset.rs` and the README had shell examples using `--bin gen-synthetic-dataset` with hyphens, but cargo's default bin name is the filename minus extension — i.e., underscores. The example would fail to resolve. Fixed to `gen_synthetic_dataset`. 5. `use std::io::Write;` was stranded at the bottom of `main.rs` as a single-line import after the `main` function. Moved to the top import block for idiomatic style. All 13 vector-search-bench tests still pass. 222 vortex-tensor tests still pass. clippy and rustfmt clean. Public API unchanged — no public-api.lock regen needed. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Cleanup of seven dead-or-suppressed items the review flagged: 1. `QueryLen` type alias (`lib.rs`) — defined but only "used" via `let _ = QueryLen::default;`, a warning-suppression hack. The doc comment claimed it made "broadcast semantics obvious at call sites" but no call site actually referenced it. Both the alias and the suppression `let` are gone. 2. `async-trait` and `itertools` Cargo dependencies — listed in `vector-search-bench/Cargo.toml` but not referenced anywhere in the crate's source tree (probably copied from `compress-bench` when the crate was bootstrapped). Removed. 3. `VERIFICATION_QUERY_ROW` constant (`verify.rs`) — defined as `0` and documented as "the row we verify against" but never referenced from anywhere; `main.rs` passes `prepared.query` directly, and `prepared.query` is extracted at `DEFAULT_QUERY_ROW` time inside `prepare_dataset`. Removed. 4. Dead-code-suppression hack in `vortex_bench::conversions::tests`: // ...suppress the dead-code warning by referencing them in a // no-op expression. let _ = (Nullability::NonNullable, PType::I32); Both imports were never used in the test body. Removed the imports and the dead let. 5. `let _ = emb_idx;` in `handrolled_baseline::read_parquet_embedding_column`. Replaced with destructuring via `let (_, emb_field) = ...`. 6. `HandrolledBaselineData::file_size` — populated by `read_parquet_ embedding_column` via a second `File::metadata()` call, but no caller reads it: the benchmark's "handrolled size" measurement reads `PreparedDataset::parquet_bytes` (populated once in `prepare_dataset`). Keeping the duplicated state was a recipe for drift. Removed the field and its populating code, and added a doc comment on the struct pointing at `parquet_bytes` as the canonical file-size source. All 13 vector-search-bench tests + 16 vortex-bench tests still pass. clippy / rustfmt clean. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Four related cleanups the review flagged for modularity & clarity: 1. Deleted the duplicate `extract_query_row` helper from `recall.rs` (30 lines near-identical to the one in `lib.rs`). Both call sites now go through a single `pub(crate)` helper in `lib.rs`. The lib.rs helper also gets a fresh doc comment explicitly calling out its f32-only requirement (landed in the previous commit) so the deduped version is not just shorter but more correct. 2. Added `verify::verify_and_report_scores`: a score-vector-first variant of `verify_variant` that takes pre-computed scores as `&[f32]` instead of an `ArrayRef`. `verify_variant` now delegates to it. The point is to make both main-loop paths share a single error-handling / logging / bail-on-lossless codepath: - Vortex variants compute scores via `compute_cosine_scores(array)` and call `verify_variant`, which then calls `verify_and_report_scores` under the hood. - The handrolled baseline computes scores via `cosine_loop(&[f32])` and calls `verify_and_report_scores` directly. Previously `main.rs` hand-rolled the handrolled path's check/log/ bail triplet, duplicating logic in two places that could drift. 3. Dropped the `compress_turboquant` and `build_similarity_search_tree` wrapper shims from `vortex-tensor/benches/similarity_search_common/ mod.rs`. They existed only to re-expose the public helpers under local names for bench call-site compatibility. The bench now `pub use`s them directly, which is one fewer indirection layer. `compress_default` stays in the bench module because its `BtrBlocksCompressor` dependency lives in vortex-tensor's dev-dependencies, not main deps, so lifting it to the public `vector_search` module would pull `vortex-btrblocks` into vortex-tensor's main dep graph. 4. Replaced the stale "in future commits" doc comment on `PreparedDataset::uncompressed` — recall is already wired. All 13 vector-search-bench + 222 vortex-tensor tests pass. clippy / rustfmt clean. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Two micro-refactors for correctness quality: 1. `list_to_vector_ext` offset validation walks `ListArrayExt::offset_at` N+1 times (once per row boundary) instead of 2N times (once per start and once per end of every row). Each iteration now carries the previous `end` into the next iteration's `prev_end`, so we only fetch each offset boundary once. For a canonical list produced by `parquet_to_vortex_chunks` the offsets are a `Primitive` array and `offset_at` takes the fast slice-index path, so this halves the offset lookups from ~200K to ~100K on a 100K-row column. The loop body is still O(1) per row either way. 2. `run_timings` now interleaves its decompress / cosine / filter stages inside a single outer loop over iterations, instead of running the three stages as back-to-back sequential loops. The back-to-back shape gives each stage an asymmetric cache profile — the second+ iterations of the cosine stage run on cache lines left behind by the previous *cosine* iteration, while the second+ iterations of the filter stage run on cache lines left behind by the *cosine* stage. Interleaving makes each iteration of each stage see roughly the same cache state, removing the asymmetry. Each stage still gets a fresh `ExecutionCtx` so no cached scalar-fn state leaks between stages within a single iteration. The doc comment on `run_timings` now spells out why interleaving matters. All 13 vector-search-bench + 16 vortex-bench tests still pass. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…bail Two review items related to testing coverage and trait-contract honesty: 1. `list_to_vector_ext` has five unit tests but none of them exercised the `ListView` → `List` fast path, even though that path is the one the benchmark actually hits after `parquet_to_vortex_chunks` (which canonicalizes list columns into `ListView`, not `List`). Added a `list_view_input_is_rewrapped_as_vector_extension` test with a `list_view_f32(dim, rows)` helper that constructs a `ListViewArray<f32>` with uniform-stride offsets and sizes, then asserts `list_to_vector_ext` returns an `Extension<Vector>` whose storage is a `FixedSizeList<f32, dim>`. A regression to the ListView handling would now be caught by unit tests rather than by the end-to-end benchmark smoke test. `vortex-bench` tests are now 17 (was 16). 2. `VectorDataset::to_vortex_array` previously implemented the `Dataset` trait by downloading the parquet file, writing it through the default Vortex write path, and reading it back. The returned array was a `StructArray` with `{ id: int64, emb: list<float> }` — **not** the `Extension<Vector>(FixedSizeList<...>)` shape the vector-search benchmark actually operates on. The benchmark worked around this by bypassing `to_vortex_array` entirely and rebuilding the Vector extension via `parquet_to_vortex_chunks` + `list_to_vector_ext`. But that left the trait method as a trap for future callers who expect it to return the same shape the benchmark measures. Replaced the body with `bail!` and a message that points callers at the parquet path + list_to_vector_ext sequence. This makes the contract unambiguous: either use the parquet path, or get a clear error instead of a semantically-wrong array. The unused imports (`tokio::fs::File`, `ArrayStreamExt`, `OpenOptionsSessionExt`, `WriteOptionsSessionExt`, `SESSION`, `parquet_to_vortex_chunks`, `idempotent_async`, `IntoArray`) are dropped too. 13 vector-search-bench + 17 vortex-bench tests pass. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

Five small style-only cleanups identified during the review pass. Pure documentation + formatting — no behavior change. 1. `list_to_vector_ext` had one error message missing the `"list_to_vector_ext: ..."` prefix every other error used (`"list_to_vector_ext expects a List array ..."` → `"list_to_vector_ext: expected a List array ..."`). The test that asserts the substring is updated too. 2. `handrolled_baseline::filter_loop` now has an explicit comment flagging the `>` comparison as the *strict* greater-than that must stay in sync with the Vortex-side `Operator::Gt` in `build_similarity_search_tree`. A divergence between the two would show up as a max-abs-diff correctness failure for the lossless variants, but the comment makes the invariant explicit so the next person to touch either side thinks about it. 3. `gen_synthetic_dataset.rs` had three magic-number constants in the per-element value computation (`0.00013`, `0.00007`, `0.25`, plus `1.0/32768.0` for the random scale). Promoted them to named constants (`POS_FREQ_ROW`, `POS_FREQ_COL`, `POS_AMPLITUDE`, `RAND_SCALE`) with a comment explaining the intent: the sinusoid mixes the `(row, col)` position so vectors stay distinct even at low bit widths after quantization, and the frequency constants are chosen small and coprime to avoid short-period aliasing over the 100K-row × 1536-col domain. 4. `gen_synthetic_dataset.rs`'s module doc said "bit-identical across runs". That overpromises: the generator uses `f32::sin`, whose last few ULPs are libm/CPU-dependent. Changed to "deterministic... on the same machine" with an explicit note about cross-machine variance. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

…ests A second pass over the branch caught several items I missed in the first review cleanup. All code-style and correctness-level, no behavior changes beyond the missed f32 assertion. ## Missed bug `verify::compute_cosine_scores` had the same blind `PrimitiveArray::as_slice::<f32>()` call I already fixed in `extract_query_row` last round — I only patched the lib.rs copy, not the verify.rs one. Added the `scores.ptype() != PType::F32` check with a clear error message before the slice cast. Matches the doc contract that the function returns `Vec<f32>` and only supports f32 Vector columns today. ## CLAUDE.md style violations CLAUDE.md's import rule: "All imports must be at the top of the module, never inside functions. The only exception is `#[cfg(test)]` blocks, where imports should be at the top of the test module." I had function-scoped `use` statements sprinkled across five functions in `lib.rs` and inside an `extract_row_zero` test helper in `verify.rs`, all against the guidance. Moved every function-scoped `use` into the top-level import block: - `prepare_dataset`: 1 import moved - `extract_emb_column`: 5 imports moved - `extract_query_row`: 4 imports moved - `decompress_full_scan`: 3 imports moved - `execute_cosine`: 2 imports moved - `PreparedDataset::dim`: 2 fully-qualified `vortex::dtype::DType::...` paths replaced with plain `DType::...` from top-level import ## Test duplication `verify::tests::extract_row_zero` was a 10-line reimplementation of `extract_query_row` inside the test module. Deleted it and made `verify::tests::make_prepared` call `extract_query_row` directly, which also sidestep the need for function-scoped imports in the test. Same rework applied to `lib::tests::prepare_variant_produces_...` (which had its own inline f32 extraction) and `recall::tests::...` (which was building a `PreparedDataset` with an empty query). A new `test_prepared` helper in `lib.rs` tests replaces the three separate inline constructions with a single shared builder. ## New test coverage Added 8 new tests across the three modules: - `vortex-bench::conversions::all_invalid_list_validity_is_rejected`: builds a `List<f32>` with `Validity::AllInvalid` and confirms the rejection path fires. Previously unexercised. - `vector-search-bench::tests::extract_query_row_returns_the_right_slice`: verifies row 0 extraction is idempotent and different rows differ. - `vector-search-bench::tests::extract_query_row_rejects_out_of_bounds_row`: verifies the `row >= len` bounds check. - `vector-search-bench::verify::tests::verify_scores_fails_on_nan_in_baseline`: symmetric NaN case — the existing `verify_scores_fails_on_nan` test only covered the variant-side NaN. - `verify::tests::verify_and_report_scores_is_ok_for_identical_inputs`: direct test for the helper that main.rs's handrolled path uses. - `verify::tests::verify_and_report_scores_bails_for_lossless_mismatch`: lossless failures must hard-error with a message that includes the variant name. - `verify::tests::verify_and_report_scores_warns_for_lossy_mismatch_without_bailing`: lossy failures must NOT bail — they log a warning and return the failing report so the caller can still emit it as a measurement. - `recall::tests::top_k_indices_handles_nan_without_panicking`: NaN scores used to sort via `partial_cmp(...).unwrap_or(Equal)`, which produced arbitrary orderings. Switched to `f32::total_cmp` for a proper total order; test confirms non-NaN scores still rank in descending order and the sort doesn't panic. ## Minor - Stale comment on `main.rs::total_work` calculation: referenced `args.formats.len()` but the expression uses a different formula after commit 74aaf5a. Fixed the comment. - `top_k_indices` switched from `scores[b].partial_cmp(&scores[a]).unwrap_or(Ordering::Equal)` to `scores[b].total_cmp(&scores[a])`. Added a doc comment on the function explaining the NaN-safety rationale. ## Test counts after this commit - vortex-bench: 18 tests (was 17, +1 for `all_invalid_list_validity`) - vector-search-bench: 20 tests passing + 1 ignored (was 13 + 1, +7 for the new coverage above) - vortex-tensor: 222 tests (unchanged) Clippy / rustfmt clean. End-to-end smoke run on synthetic cohere-small still reports: - all four correctness-max-diff values coherent (0.000e0 for the three lossless variants, 5.18e-3 for TurboQuant) - Recall@10 = 0.91 for TurboQuant Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

… reads Three targeted fixes from the third review pass. ## 1. Drop vestigial `session: &VortexSession` parameters The benchmark always passed `&vortex_bench::SESSION` to every function that took a session parameter, so the parameter was plumbing with zero value. More importantly it was *inconsistent* — some internal helpers (like `extract_query_row`) already hardcoded the global, so the API suggested a custom session was respected end-to-end when it wasn't. Removed the `session` parameter from: - `lib::prepare_variant(prepared, variant)` - `lib::run_timings(variant_array, query, iterations)` - `verify::compute_cosine_scores(data, query)` - `verify::verify_variant(name, array, query, baseline, kind)` - `recall::measure_recall_at_k(uncompressed, compressed, num_queries, top_k)` Each function now calls `vortex_bench::SESSION.create_execution_ctx()` directly where needed. Doc comments note the "uses the global SESSION" contract explicitly. `execute_cosine`, `decompress_full_scan`, and `execute_filter` still take `&mut ExecutionCtx` because they're the low-level primitives that `run_timings` constructs fresh contexts for on each iteration. All call sites in `main.rs` and the test modules updated. ## 2. Fix the handrolled double-parquet-read `main.rs` previously did: 1. Read parquet → run cosine → verify against baseline → bail on fail 2. Call `run_handrolled_baseline_timings`, which reads parquet N more times inside its own loop Total: N+1 parquet reads per dataset per run. The extra read was wasted work — `cosine_loop` is deterministic, so the scores from iteration N of the timing loop are bit-identical to a one-shot pre-timing computation. Restructured `run_handrolled_baseline_timings` to return a new `HandrolledBaselineResult { timings, last_scores }` where `last_scores` is the `Vec<f32>` produced by the final iteration. `main.rs` now: 1. Call `run_handrolled_baseline_timings` → gets timings + last_scores 2. Call `verify_and_report_scores` with those last_scores vs baseline → bails on lossless mismatch Total: N parquet reads per dataset per run. One less read, and the verification now uses the exact same scores the timing loop computed rather than a separate pass. Order reversal note: verification now runs *after* timing instead of before. The handrolled loop is cheap enough (~5 ms cosine + ~40 ms parquet read per iter) that running it once even for a broken variant is acceptable, and the correctness bail still prevents the run from reporting wrong-answer numbers. `run_handrolled_baseline_timings` now asserts `iterations >= 1` up front (previously relied on `last_scores` being populated after the loop). ## 3. Nullable element type test `list_to_vector_ext` has a rejection path for `List<f32?>` — element dtype is a nullable f32 (even when every value is present). Added `conversions::tests::nullable_element_dtype_is_rejected` to exercise this path. Passing `Validity::AllValid` to `PrimitiveArray::new::<f32>` produces the nullable dtype the rejection path is looking for. ## Also: housekeeping - Dropped `use vortex_bench::SESSION` from the `lib.rs` inner test module (was left over from the session-parameter removal). - Replaced two fully-qualified `vortex::error::vortex_panic!` calls in `PreparedDataset::dim()` with a plain `vortex_panic!` sourced from a top-level import. - Refreshed the doc comment on `PreparedDataset` — it previously described the struct as carrying an "execution session / context", which is no longer accurate. - Added two new tests for `run_handrolled_baseline_timings`: - `run_handrolled_baseline_timings_returns_last_iteration_scores`: verifies the `last_scores` contract on a 3-row fixture. - `run_handrolled_baseline_timings_panics_on_zero_iterations`: regression guard for the iterations>=1 assert. ## Test counts - vortex-bench: 19 tests (was 18, +1 for nullable_element_dtype) - vector-search-bench: 22 passing + 1 ignored (was 20+1, +2 for the handrolled timing tests) - vortex-tensor: 222 tests (unchanged) - Total: 263 passing, 3 ignored clippy / rustfmt clean. End-to-end smoke run still coherent: - correctness-max-diff = 0.000e0 for all three lossless variants - correctness-max-diff = 5.18e-3 for TurboQuant (within 0.2 tolerance) - Recall@10 for TurboQuant = 0.9267 Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>

connortsui20 · 2026-04-11T21:23:58Z

claude code web fell over so this doesnt even compile

connortsui20 · 2026-04-11T21:34:19Z

Superseded by #7399

claude added 18 commits April 11, 2026 12:33

connortsui20 added the changelog/feature A new feature label Apr 11, 2026

connortsui20 mentioned this pull request Apr 11, 2026

Tracking Issue: Vector Similarity Search #7297

Open

36 tasks

connortsui20 closed this Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Vector benchmarks to suite#7397

Add Vector benchmarks to suite#7397
connortsui20 wants to merge 18 commits intodevelopfrom
ct/vector-bench

connortsui20 commented Apr 11, 2026 •

edited

Loading

Uh oh!

connortsui20 commented Apr 11, 2026

Uh oh!

connortsui20 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

connortsui20 commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

connortsui20 commented Apr 11, 2026

Uh oh!

connortsui20 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

connortsui20 commented Apr 11, 2026 •

edited

Loading