Closed
Conversation
Promote `compress_turboquant`, `build_constant_query_vector`, and `build_similarity_search_tree` from the bench-only `similarity_search_common` module into a new public `vortex_tensor::vector_search` module so downstream benchmark crates (and library users doing brute-force vector search) can reuse them without duplicating the canonical recipe. The existing similarity_search bench now delegates to these helpers instead of inlining them, and adds three unit tests for the new public surface: constant-query vector construction, end-to-end tree execution to a BoolArray, and a TurboQuant-roundtrip ranking check. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Adds `Format::VortexTurboQuant` as a new Format enum variant, parallel to
`VortexCompact`, so downstream dashboards can distinguish TurboQuant-encoded
Vortex results from the generic BtrBlocks-compressed OnDiskVortex layout
via the existing `Target { engine, format }` grouping key.
Also adds `vortex_bench::conversions::list_to_vector_ext`: a utility that
rewraps a `List<float>` column (the shape an `emb` column takes when
ingested from parquet via `parquet_to_vortex_chunks`) as
`Extension<Vector>(FixedSizeList<T, D>)`. This is the only piece of
conversion glue needed to bring a VectorDBBench-style parquet embedding
column into a form the vector-search scalar functions expect. It handles
both single-list and chunked inputs, validates uniform stride, and
rejects nullable, non-float, or mismatched-length inputs.
`vortex-bench` gains a direct dependency on `vortex-tensor` for the
`Vector` extension type.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Introduce `VectorDataset`, a new `Dataset`-trait implementor that wires
public VectorDBBench-hosted embedding corpora into the vortex-bench
data-management machinery. The first variant is `CohereSmall`
(Cohere wiki-22-12, 100K rows × 768 dims, cosine metric, ~150 MB
zstd-parquet), targeting a CI-friendly size. Follow-up variants
(CohereMedium, OpenAI*, SIFT*, etc.) can be added alongside without
structural change.
Each variant carries its upstream `assets.zilliz.com` parquet URL,
dimension, row count, and curated distance metric. The download path
reuses the existing `download_data` / `idempotent_async` infrastructure
from `DownloadableDataset`, caching under `{name}/{name}.parquet` in
the local benchmark data directory.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Introduce a new standalone benchmark crate at
`benchmarks/vector-search-bench/` that measures brute-force cosine
similarity search over VectorDBBench-style public embedding corpora.
This commit wires up three Vortex variants:
- `vortex-uncompressed`: raw `Vector<dim, f32>` with no encoding-level
compression applied. Baseline for Vortex variants.
- `vortex-default`: `BtrBlocksCompressor::default()` applied to the FSL
storage — generic lossless compression for float vectors.
- `vortex-turboquant`: the full `L2Denorm(SorfTransform(FSL(Dict)))`
pipeline via `compress_turboquant`. Lossy; in-memory `.nbytes()` is
used as the reported size until `L2Denorm` serialization lands.
Four measurements are emitted per `(dataset, variant)` pair:
1. Compressed size (bytes) — on-disk `.vortex` file size for the
lossless variants, in-memory footprint for TurboQuant.
2. Full-scan decode time — executes the column to a materialized
`FixedSizeListArray`, forcing the entire decompression/dequantization
path for each variant.
3. Cosine-similarity execute time — runs `CosineSimilarity(data, query)`
into a materialized f32 array.
4. Filter execute time — runs `Binary(Gt, [cosine, threshold])` into a
`BoolArray`. This is the end-to-end "query pushdown" path.
Output goes through the existing `vortex_bench::measurements` types so
results flow through the standard `gh-json` pipeline and appear on the
CI dashboard alongside compress-bench / random-access-bench results.
`Variant::VortexUncompressed` and `Variant::VortexDefault` both report
as `Format::OnDiskVortex`; `Variant::VortexTurboQuant` reports as
`Format::VortexTurboQuant`, letting downstream consumers distinguish
them via the existing `Target { engine, format }` grouping key.
The crate ships with a unit test that builds a 64×128 synthetic
`Vector` array, runs all three variants through `prepare_variant` +
`run_timings`, and verifies every variant reports a non-zero size and
non-zero timings. Parquet-Arrow baseline and Recall@10 quality
measurements are deferred to a follow-up commit; this commit
intentionally keeps the Vortex-only core focused.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…ment This completes the v1 scope of the vector-search benchmark by adding two orthogonal pieces of functionality: 1. **Parquet-Arrow hand-rolled cosine baseline** (`parquet_baseline.rs`) reads the canonical parquet file via `parquet::arrow`, decodes the `emb` column to a flat `Vec<f32>`, and runs a straightforward Rust cosine loop. This is the "what you'd do without Vortex" external floor — necessary for the benchmark to tell a credible story about Vortex's value vs. raw parquet. Reports size, decode time, cosine time, and filter time in the same format as the Vortex variants, under `Format::Parquet`. Handles both `List<f32>` and `FixedSizeList<f32>` parquet schemas. 2. **Recall@K quality measurement** (`recall.rs`) computes the fraction of the exact top-K nearest neighbours that the lossy TurboQuant variant recovers, using the uncompressed Vortex scan as local ground truth (not VectorDBBench's shipped `neighbors.parquet`, which would require an index and is out of scope). Samples are deterministic — rows picked at uniform intervals across the dataset — so results are stable across runs. Only TurboQuant is checked; lossless variants are trivially 1.0. Emitted as a `CustomUnitMeasurement` with unit "recall" so it flows through the existing gh-json pipeline alongside the size/timing measurements. The binary picks this all up behind two new CLI flags: `--parquet-baseline` (default true) and `--recall-queries N` (default 100), plus `--recall-k K` (default 10). Four unit tests cover the new functionality: - `parquet_baseline_reads_fsl_column` — end-to-end parquet write + read + cosine loop on a 3×3 fixture. - `uncompressed_has_perfect_self_recall` — sanity check that recall@10 of a variant against itself is exactly 1.0. - `turboquant_recall_is_reasonable_for_synthetic_data` — loose lower bound (>= 0.3) on TurboQuant recall for a 64×128 random dataset to catch total regressions without being flaky. A small `test_utils` submodule was extracted from the lib's existing tests module so the recall tests can reuse the synthetic vector generator. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Adds `vector-search-bench` as a new entry in the post-merge `bench`
matrix alongside `random-access-bench` and `compress-bench`. Uses the
same build-bin / bench-taskset / gh-json / S3-upload pipeline so
results flow into the existing benchmarks dashboard without any
dashboard-side changes. The four formats passed on the CLI
(`parquet,vortex-uncompressed,vortex-default,vortex-turboquant`) surface
in the emitted JSON as three different `target.format` values —
`Format::Parquet` for the hand-rolled baseline, `Format::OnDiskVortex`
for the two lossless variants, and `Format::VortexTurboQuant` for the
lossy one — so dashboards can render them as distinct bars per metric.
Also promotes the `--variants` clap flag on the binary to `--formats`
for naming consistency with the other standalone benchmarks (the CI
script pipes `--formats ${{ matrix.benchmark.formats }}` uniformly).
The underlying `Variant` enum is unchanged; only the CLI surface moved.
Finally, adds `benchmarks/vector-search-bench/README.md` documenting
what the benchmark measures, the four formats, local-run instructions,
and — importantly — the dataset-mirror caveat. The current dataset URL
points at `assets.zilliz.com` (Zilliz's public anonymous-readable
bucket). Running this matrix entry on every develop merge will create
recurring egress traffic on a third-party bucket; the README explains
how to swap the URL in `VectorDataset::parquet_url` to an internal
mirror before turning this on for a fork with heavy merge frequency.
Skipping `bench-pr.yml` (PR runs) deliberately — v1 is develop-only
until the mirror / egress question is resolved.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Extends `VectorDataset` with five additional cosine-metric corpora from the VectorDBBench collection: - CohereMedium: 1M × 768d (wiki-22-12) - OpenAiSmall: 50K × 1536d (OpenAI-on-C4) - OpenAiMedium: 500K × 1536d (OpenAI-on-C4) - BioasqMedium: 1M × 1024d (biomedical) - GloveMedium: 1M × 200d (word embeddings) All six built-in datasets are exported via a new `ALL_VECTOR_DATASETS` slice and exposed through the `vector-search-bench` `--datasets` flag so local runs can pick any of them. Adds an `all_datasets_have_consistent_metadata` test that verifies every built-in variant has a unique kebab-cased name, a `train.parquet` URL under `assets.zilliz.com/benchmark/`, a dimension above the TurboQuant minimum (128), and cosine metric (v1 only wires cosine). L2 / IP datasets (SIFT, GIST, LAION) remain out of scope until vortex-tensor gains an L2-distance scalar function. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…generator Fixes two issues discovered while running the benchmark end-to-end against a real parquet file in a no-network environment: 1. `parquet_to_vortex_chunks` produces `ListView` arrays for list columns by default, so `list_to_vector_ext` was rejecting them with "expects a List array, got list(f32)". Added a fast path that applies `recursive_list_from_list_view` before the offset-stride validation kicks in. 2. `list_to_vector_ext` returns a `ChunkedArray<Extension<Vector>>` when given a chunked input (the normal shape after parquet ingest), but downstream `extract_query_row` expects a single non-chunked `Extension<Vector>`. `prepare_dataset` now calls `wrapped.execute::<ExtensionArray>(&mut ctx)` to materialize the chunked wrapper into a single unchunked extension array. Also adds a new `gen_synthetic_dataset` helper binary that writes a VectorDBBench-shape parquet file (`id: int64`, `emb: list<float32>`, zstd-compressed) with deterministic xorshift-generated vectors. This is useful for: - Local dev runs of `vector-search-bench` without needing outbound network access to `assets.zilliz.com`. - Populating the `vortex-bench/data/<name>/<name>.parquet` cache path so the benchmark's idempotent download step skips the HTTP fetch. - Sanity-checking the benchmark pipeline in CI environments that block outbound HTTPS to third-party buckets. Verified end-to-end against 5000×768 synthetic Cohere-small and 5000×1536 synthetic OpenAI-small inputs — every measurement pipeline (size, decode, cosine, filter, recall@10) produces valid gh-json and TurboQuant recall@10 lands at 0.91 on both fixtures, consistent with the loose recall floor the unit test enforces. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…press measurements
A thorough audit of the benchmark's outputs found three issues that together
made the results hard to interpret:
1. **"Size" was lying.** The old `measure_on_disk_size` helper wrote every
variant through `SESSION.write_options()`, which uses
`WriteStrategyBuilder::default()` — and that default applies BtrBlocks
compression regardless of the in-memory tree. So both
`vortex-uncompressed` and `vortex-default` ended up writing identical
ALP-RD-compressed bytes to disk, reporting the same "size" number even
though the two in-memory trees were different. Switched to reporting
`ArrayRef::nbytes()` of the in-memory variant tree instead — the honest
footprint of what the compute measurements actually operate on.
Running the diagnostic `print_variant_trees` test now shows what each
variant really is:
```
VortexUncompressed nbytes=15,360,000 Extension → FSL → Primitive<f32>
VortexDefault nbytes=13,072,050 Extension → FSL → alprd(f32)
[bitpacked u16/u32 + patches]
VortexTurboQuant nbytes= 5,141,024 L2Denorm → SorfTransform →
FSL → Dict(u8 codes + f32[256])
```
So BtrBlocks really does find a ~15% lossless saving on random f32 via
ALP-RD, and the previously-observed 3.5× cosine slowdown is the real
cost of decoding ALP-RD on the fly inside `CosineSimilarity`.
2. **"Decompress time" for the compressed variants was a no-op.** The old
`decode_full_scan` only executed the Extension shell and the FSL
storage; it never forced the element buffer to materialize. Since FSL
canonical form can keep its elements in their encoded representation,
this reported ~60 ns for `vortex-default` even though reading any value
would actually have to run the ALP-RD unpacking. Renamed to
`decompress_full_scan` and added a final
`fsl.elements().execute::<PrimitiveArray>()` call that forces the lazy
decode to completion. `vortex-default`'s decompress time now shows
~6 ms on 5000×768 rows — the actual ALP-RD cost.
3. **No correctness verification.** The benchmark reported throughput
numbers with no guarantee that the variants were producing the same
answers. A new `verify.rs` module now computes cosine-similarity
scores for one query against every variant *before* timing, compares
them to the uncompressed baseline, and bails the run if any lossless
variant drifts beyond 1e-4 max-abs-diff or any lossy variant drifts
beyond 0.2 max-abs-diff. The `execute_cosine` implementation is
promoted to `pub` and shared between the timing loop and verification
so both paths exercise the exact same expression tree.
Also adds a new `compress time` measurement alongside `decompress time`,
so the dashboard can show the full round-trip cost per variant
(previously we only timed one side). The TurboQuant verification is
emitted as a warning instead of a hard error — the lossy tolerance is a
quality-budget alert, not a correctness invariant.
New unit tests (13 total in vector-search-bench, up from 4):
- `verify::*` — 8 tests covering the `compare_scores`, `verify_scores`,
and `verify_variant` paths including NaN handling and end-to-end
lossless/lossy checks on synthetic 64×128 data.
- `tests::uncompressed_decompress_is_fast` — regression guard ensuring
the uncompressed variant's decompress pass is >5× faster than
TurboQuant's so future refactors can't accidentally make the
"uncompressed" path take the slow path.
- `tests::print_variant_trees` — `#[ignore]` diagnostic test that dumps
`display_tree_encodings_only()` for each variant. This is what was
used to root-cause the size-lying issue.
Verified end-to-end on synthetic 5000×768 Cohere-small:
parquet nbytes=14,511,306 cosine= 2.9 ms correctness=0.00e0
vortex-uncompressed nbytes=15,360,000 cosine= 7.9 ms correctness=0.00e0
vortex-default nbytes=13,072,050 cosine=20.5 ms correctness=0.00e0
vortex-turboquant nbytes= 5,141,024 cosine=59.1 ms correctness=5.18e-3
recall@10 (tq) 0.9150
All lossless variants now produce bit-identical cosine scores (max diff
exactly 0.0) against the uncompressed baseline, confirming the ALP-RD
and hand-rolled parquet paths are correct. TurboQuant's 5.18e-3 drift
is well within the 0.2 lossy tolerance.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
The compute side of the baseline isn't really a parquet implementation — it's a **hand-rolled Rust scalar cosine loop over a flat `Vec<f32>`** that happens to have been decoded from a parquet file via `parquet-rs` / `arrow-rs`. Labeling it as "parquet" in CLI flags, metric names, and docs suggested the baseline was measuring a real parquet-based analytical engine, when in reality it's the minimum cost a Rust programmer could get away with given no query engine. User-facing rename: everywhere the baseline was called "parquet" it now says "handrolled" — the CLI format, the metric labels, the CI matrix entry, the README, the type and function names in the baseline module, and the module filename itself (`parquet_baseline.rs` → `handrolled_baseline.rs`). Unchanged: - `read_parquet_embedding_column` keeps its name because it really does read parquet. Only the baseline-level wrappers take the `handrolled` label. - `target.format` stays `Format::Parquet` — the storage side really is parquet on disk, and dashboards grouping by format should still put the baseline's size/decompress measurements under "parquet". The `CompressionTimingMeasurement::to_json` prefix therefore still reads `"parquet_rs-zstd "`, which honestly describes the parquet reader stack. Only the compute label in the metric `name` string reflects the hand-rolled compute path. The module-level doc comment on `handrolled_baseline.rs` now explicitly frames the baseline as a compute-cost *floor*, not a fair DBMS comparison, and points at DuckDB `list_cosine_similarity` and DataFusion vector UDFs as better "what real users pay for parquet+cosine" baselines for future work. Verified end-to-end against the 5000×768 synthetic cohere-small fixture. All 13 existing tests still pass (the baseline test rename from `parquet_baseline_reads_fsl_column` to `handrolled_baseline_reads_fsl_column` is the only functional change to tests). Correctness verification still reports: handrolled max_abs_diff = 0.000e0 (bit-identical, lossless) vortex-uncompressed max_abs_diff = 0.000e0 vortex-default max_abs_diff = 0.000e0 vortex-turboquant max_abs_diff = 5.18e-3 (within 0.2 lossy tolerance) Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Review caught four real bugs plus a stylistic stray import. All are small, targeted fixes. 1. `build_similarity_search_tree` (vortex-tensor) and `execute_cosine` (vector-search-bench) wrapped `CosineSimilarity::try_new_array` in `vortex_expect`, which panics on the `Err` branch. Both functions' doc comments claimed they *returned* errors on dimension mismatch, so the behavior contradicted the contract: a caller passing a wrong-dimension query vector would crash instead of seeing a clean error. Fixed by replacing `.vortex_expect(...)` with `?`. 2. The progress bar in `main.rs` was initialized with `datasets.len() * args.formats.len()` (which counts the handrolled baseline) but only `inc(1)`'d inside the Vortex-variants loop, so the bar finished at `N-1/N` whenever handrolled was selected. Fixed by incrementing inside the handrolled block and recomputing `total_work` from the actual increment sites. 3. `extract_query_row` blindly called `PrimitiveArray::as_slice::<f32>()` without checking the underlying ptype. The benchmark only ever uses f32 today, but the `Vector` extension type supports f16/f32/f64, so a future caller that accidentally passes an f64 `Vector` would get a mis-typed slice. Fixed by asserting `elements.ptype() == PType::F32` before the cast with a clear error message. While there, simplified the convoluted bounds check (was `row * dim + dim > len * dim`, reduces to `row >= len` without the multiplication). 4. `gen_synthetic_dataset.rs` and the README had shell examples using `--bin gen-synthetic-dataset` with hyphens, but cargo's default bin name is the filename minus extension — i.e., underscores. The example would fail to resolve. Fixed to `gen_synthetic_dataset`. 5. `use std::io::Write;` was stranded at the bottom of `main.rs` as a single-line import after the `main` function. Moved to the top import block for idiomatic style. All 13 vector-search-bench tests still pass. 222 vortex-tensor tests still pass. clippy and rustfmt clean. Public API unchanged — no public-api.lock regen needed. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Cleanup of seven dead-or-suppressed items the review flagged:
1. `QueryLen` type alias (`lib.rs`) — defined but only "used" via
`let _ = QueryLen::default;`, a warning-suppression hack. The doc
comment claimed it made "broadcast semantics obvious at call sites"
but no call site actually referenced it. Both the alias and the
suppression `let` are gone.
2. `async-trait` and `itertools` Cargo dependencies — listed in
`vector-search-bench/Cargo.toml` but not referenced anywhere in the
crate's source tree (probably copied from `compress-bench` when the
crate was bootstrapped). Removed.
3. `VERIFICATION_QUERY_ROW` constant (`verify.rs`) — defined as `0`
and documented as "the row we verify against" but never referenced
from anywhere; `main.rs` passes `prepared.query` directly, and
`prepared.query` is extracted at `DEFAULT_QUERY_ROW` time inside
`prepare_dataset`. Removed.
4. Dead-code-suppression hack in `vortex_bench::conversions::tests`:
// ...suppress the dead-code warning by referencing them in a
// no-op expression.
let _ = (Nullability::NonNullable, PType::I32);
Both imports were never used in the test body. Removed the imports
and the dead let.
5. `let _ = emb_idx;` in `handrolled_baseline::read_parquet_embedding_column`.
Replaced with destructuring via `let (_, emb_field) = ...`.
6. `HandrolledBaselineData::file_size` — populated by `read_parquet_
embedding_column` via a second `File::metadata()` call, but no
caller reads it: the benchmark's "handrolled size" measurement
reads `PreparedDataset::parquet_bytes` (populated once in
`prepare_dataset`). Keeping the duplicated state was a recipe for
drift. Removed the field and its populating code, and added a doc
comment on the struct pointing at `parquet_bytes` as the canonical
file-size source.
All 13 vector-search-bench tests + 16 vortex-bench tests still pass.
clippy / rustfmt clean.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Four related cleanups the review flagged for modularity & clarity:
1. Deleted the duplicate `extract_query_row` helper from
`recall.rs` (30 lines near-identical to the one in `lib.rs`). Both
call sites now go through a single `pub(crate)` helper in `lib.rs`.
The lib.rs helper also gets a fresh doc comment explicitly calling
out its f32-only requirement (landed in the previous commit) so the
deduped version is not just shorter but more correct.
2. Added `verify::verify_and_report_scores`: a score-vector-first
variant of `verify_variant` that takes pre-computed scores as
`&[f32]` instead of an `ArrayRef`. `verify_variant` now delegates
to it. The point is to make both main-loop paths share a single
error-handling / logging / bail-on-lossless codepath:
- Vortex variants compute scores via `compute_cosine_scores(array)`
and call `verify_variant`, which then calls
`verify_and_report_scores` under the hood.
- The handrolled baseline computes scores via `cosine_loop(&[f32])`
and calls `verify_and_report_scores` directly.
Previously `main.rs` hand-rolled the handrolled path's check/log/
bail triplet, duplicating logic in two places that could drift.
3. Dropped the `compress_turboquant` and `build_similarity_search_tree`
wrapper shims from `vortex-tensor/benches/similarity_search_common/
mod.rs`. They existed only to re-expose the public helpers under
local names for bench call-site compatibility. The bench now
`pub use`s them directly, which is one fewer indirection layer.
`compress_default` stays in the bench module because its
`BtrBlocksCompressor` dependency lives in vortex-tensor's
dev-dependencies, not main deps, so lifting it to the public
`vector_search` module would pull `vortex-btrblocks` into
vortex-tensor's main dep graph.
4. Replaced the stale "in future commits" doc comment on
`PreparedDataset::uncompressed` — recall is already wired.
All 13 vector-search-bench + 222 vortex-tensor tests pass. clippy /
rustfmt clean.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Two micro-refactors for correctness quality: 1. `list_to_vector_ext` offset validation walks `ListArrayExt::offset_at` N+1 times (once per row boundary) instead of 2N times (once per start and once per end of every row). Each iteration now carries the previous `end` into the next iteration's `prev_end`, so we only fetch each offset boundary once. For a canonical list produced by `parquet_to_vortex_chunks` the offsets are a `Primitive` array and `offset_at` takes the fast slice-index path, so this halves the offset lookups from ~200K to ~100K on a 100K-row column. The loop body is still O(1) per row either way. 2. `run_timings` now interleaves its decompress / cosine / filter stages inside a single outer loop over iterations, instead of running the three stages as back-to-back sequential loops. The back-to-back shape gives each stage an asymmetric cache profile — the second+ iterations of the cosine stage run on cache lines left behind by the previous *cosine* iteration, while the second+ iterations of the filter stage run on cache lines left behind by the *cosine* stage. Interleaving makes each iteration of each stage see roughly the same cache state, removing the asymmetry. Each stage still gets a fresh `ExecutionCtx` so no cached scalar-fn state leaks between stages within a single iteration. The doc comment on `run_timings` now spells out why interleaving matters. All 13 vector-search-bench + 16 vortex-bench tests still pass. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…bail
Two review items related to testing coverage and trait-contract
honesty:
1. `list_to_vector_ext` has five unit tests but none of them exercised
the `ListView` → `List` fast path, even though that path is the one
the benchmark actually hits after `parquet_to_vortex_chunks` (which
canonicalizes list columns into `ListView`, not `List`). Added a
`list_view_input_is_rewrapped_as_vector_extension` test with a
`list_view_f32(dim, rows)` helper that constructs a
`ListViewArray<f32>` with uniform-stride offsets and sizes, then
asserts `list_to_vector_ext` returns an `Extension<Vector>` whose
storage is a `FixedSizeList<f32, dim>`. A regression to the
ListView handling would now be caught by unit tests rather than by
the end-to-end benchmark smoke test.
`vortex-bench` tests are now 17 (was 16).
2. `VectorDataset::to_vortex_array` previously implemented the
`Dataset` trait by downloading the parquet file, writing it through
the default Vortex write path, and reading it back. The returned
array was a `StructArray` with `{ id: int64, emb: list<float> }` —
**not** the `Extension<Vector>(FixedSizeList<...>)` shape the
vector-search benchmark actually operates on. The benchmark worked
around this by bypassing `to_vortex_array` entirely and rebuilding
the Vector extension via `parquet_to_vortex_chunks` +
`list_to_vector_ext`. But that left the trait method as a trap for
future callers who expect it to return the same shape the
benchmark measures.
Replaced the body with `bail!` and a message that points callers at
the parquet path + list_to_vector_ext sequence. This makes the
contract unambiguous: either use the parquet path, or get a clear
error instead of a semantically-wrong array.
The unused imports (`tokio::fs::File`, `ArrayStreamExt`,
`OpenOptionsSessionExt`, `WriteOptionsSessionExt`, `SESSION`,
`parquet_to_vortex_chunks`, `idempotent_async`, `IntoArray`) are
dropped too.
13 vector-search-bench + 17 vortex-bench tests pass.
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Five small style-only cleanups identified during the review pass. Pure documentation + formatting — no behavior change. 1. `list_to_vector_ext` had one error message missing the `"list_to_vector_ext: ..."` prefix every other error used (`"list_to_vector_ext expects a List array ..."` → `"list_to_vector_ext: expected a List array ..."`). The test that asserts the substring is updated too. 2. `handrolled_baseline::filter_loop` now has an explicit comment flagging the `>` comparison as the *strict* greater-than that must stay in sync with the Vortex-side `Operator::Gt` in `build_similarity_search_tree`. A divergence between the two would show up as a max-abs-diff correctness failure for the lossless variants, but the comment makes the invariant explicit so the next person to touch either side thinks about it. 3. `gen_synthetic_dataset.rs` had three magic-number constants in the per-element value computation (`0.00013`, `0.00007`, `0.25`, plus `1.0/32768.0` for the random scale). Promoted them to named constants (`POS_FREQ_ROW`, `POS_FREQ_COL`, `POS_AMPLITUDE`, `RAND_SCALE`) with a comment explaining the intent: the sinusoid mixes the `(row, col)` position so vectors stay distinct even at low bit widths after quantization, and the frequency constants are chosen small and coprime to avoid short-period aliasing over the 100K-row × 1536-col domain. 4. `gen_synthetic_dataset.rs`'s module doc said "bit-identical across runs". That overpromises: the generator uses `f32::sin`, whose last few ULPs are libm/CPU-dependent. Changed to "deterministic... on the same machine" with an explicit note about cross-machine variance. Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…ests A second pass over the branch caught several items I missed in the first review cleanup. All code-style and correctness-level, no behavior changes beyond the missed f32 assertion. ## Missed bug `verify::compute_cosine_scores` had the same blind `PrimitiveArray::as_slice::<f32>()` call I already fixed in `extract_query_row` last round — I only patched the lib.rs copy, not the verify.rs one. Added the `scores.ptype() != PType::F32` check with a clear error message before the slice cast. Matches the doc contract that the function returns `Vec<f32>` and only supports f32 Vector columns today. ## CLAUDE.md style violations CLAUDE.md's import rule: "All imports must be at the top of the module, never inside functions. The only exception is `#[cfg(test)]` blocks, where imports should be at the top of the test module." I had function-scoped `use` statements sprinkled across five functions in `lib.rs` and inside an `extract_row_zero` test helper in `verify.rs`, all against the guidance. Moved every function-scoped `use` into the top-level import block: - `prepare_dataset`: 1 import moved - `extract_emb_column`: 5 imports moved - `extract_query_row`: 4 imports moved - `decompress_full_scan`: 3 imports moved - `execute_cosine`: 2 imports moved - `PreparedDataset::dim`: 2 fully-qualified `vortex::dtype::DType::...` paths replaced with plain `DType::...` from top-level import ## Test duplication `verify::tests::extract_row_zero` was a 10-line reimplementation of `extract_query_row` inside the test module. Deleted it and made `verify::tests::make_prepared` call `extract_query_row` directly, which also sidestep the need for function-scoped imports in the test. Same rework applied to `lib::tests::prepare_variant_produces_...` (which had its own inline f32 extraction) and `recall::tests::...` (which was building a `PreparedDataset` with an empty query). A new `test_prepared` helper in `lib.rs` tests replaces the three separate inline constructions with a single shared builder. ## New test coverage Added 8 new tests across the three modules: - `vortex-bench::conversions::all_invalid_list_validity_is_rejected`: builds a `List<f32>` with `Validity::AllInvalid` and confirms the rejection path fires. Previously unexercised. - `vector-search-bench::tests::extract_query_row_returns_the_right_slice`: verifies row 0 extraction is idempotent and different rows differ. - `vector-search-bench::tests::extract_query_row_rejects_out_of_bounds_row`: verifies the `row >= len` bounds check. - `vector-search-bench::verify::tests::verify_scores_fails_on_nan_in_baseline`: symmetric NaN case — the existing `verify_scores_fails_on_nan` test only covered the variant-side NaN. - `verify::tests::verify_and_report_scores_is_ok_for_identical_inputs`: direct test for the helper that main.rs's handrolled path uses. - `verify::tests::verify_and_report_scores_bails_for_lossless_mismatch`: lossless failures must hard-error with a message that includes the variant name. - `verify::tests::verify_and_report_scores_warns_for_lossy_mismatch_without_bailing`: lossy failures must NOT bail — they log a warning and return the failing report so the caller can still emit it as a measurement. - `recall::tests::top_k_indices_handles_nan_without_panicking`: NaN scores used to sort via `partial_cmp(...).unwrap_or(Equal)`, which produced arbitrary orderings. Switched to `f32::total_cmp` for a proper total order; test confirms non-NaN scores still rank in descending order and the sort doesn't panic. ## Minor - Stale comment on `main.rs::total_work` calculation: referenced `args.formats.len()` but the expression uses a different formula after commit 74aaf5a. Fixed the comment. - `top_k_indices` switched from `scores[b].partial_cmp(&scores[a]).unwrap_or(Ordering::Equal)` to `scores[b].total_cmp(&scores[a])`. Added a doc comment on the function explaining the NaN-safety rationale. ## Test counts after this commit - vortex-bench: 18 tests (was 17, +1 for `all_invalid_list_validity`) - vector-search-bench: 20 tests passing + 1 ignored (was 13 + 1, +7 for the new coverage above) - vortex-tensor: 222 tests (unchanged) Clippy / rustfmt clean. End-to-end smoke run on synthetic cohere-small still reports: - all four correctness-max-diff values coherent (0.000e0 for the three lossless variants, 5.18e-3 for TurboQuant) - Recall@10 = 0.91 for TurboQuant Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
… reads
Three targeted fixes from the third review pass.
## 1. Drop vestigial `session: &VortexSession` parameters
The benchmark always passed `&vortex_bench::SESSION` to every function
that took a session parameter, so the parameter was plumbing with zero
value. More importantly it was *inconsistent* — some internal helpers
(like `extract_query_row`) already hardcoded the global, so the API
suggested a custom session was respected end-to-end when it wasn't.
Removed the `session` parameter from:
- `lib::prepare_variant(prepared, variant)`
- `lib::run_timings(variant_array, query, iterations)`
- `verify::compute_cosine_scores(data, query)`
- `verify::verify_variant(name, array, query, baseline, kind)`
- `recall::measure_recall_at_k(uncompressed, compressed, num_queries, top_k)`
Each function now calls `vortex_bench::SESSION.create_execution_ctx()`
directly where needed. Doc comments note the "uses the global SESSION"
contract explicitly. `execute_cosine`, `decompress_full_scan`, and
`execute_filter` still take `&mut ExecutionCtx` because they're the
low-level primitives that `run_timings` constructs fresh contexts for
on each iteration.
All call sites in `main.rs` and the test modules updated.
## 2. Fix the handrolled double-parquet-read
`main.rs` previously did:
1. Read parquet → run cosine → verify against baseline → bail on fail
2. Call `run_handrolled_baseline_timings`, which reads parquet N more
times inside its own loop
Total: N+1 parquet reads per dataset per run. The extra read was
wasted work — `cosine_loop` is deterministic, so the scores from
iteration N of the timing loop are bit-identical to a one-shot
pre-timing computation.
Restructured `run_handrolled_baseline_timings` to return a new
`HandrolledBaselineResult { timings, last_scores }` where `last_scores`
is the `Vec<f32>` produced by the final iteration. `main.rs` now:
1. Call `run_handrolled_baseline_timings` → gets timings + last_scores
2. Call `verify_and_report_scores` with those last_scores vs baseline
→ bails on lossless mismatch
Total: N parquet reads per dataset per run. One less read, and the
verification now uses the exact same scores the timing loop computed
rather than a separate pass.
Order reversal note: verification now runs *after* timing instead of
before. The handrolled loop is cheap enough (~5 ms cosine + ~40 ms
parquet read per iter) that running it once even for a broken variant
is acceptable, and the correctness bail still prevents the run from
reporting wrong-answer numbers.
`run_handrolled_baseline_timings` now asserts `iterations >= 1` up
front (previously relied on `last_scores` being populated after the
loop).
## 3. Nullable element type test
`list_to_vector_ext` has a rejection path for `List<f32?>` — element
dtype is a nullable f32 (even when every value is present). Added
`conversions::tests::nullable_element_dtype_is_rejected` to exercise
this path. Passing `Validity::AllValid` to `PrimitiveArray::new::<f32>`
produces the nullable dtype the rejection path is looking for.
## Also: housekeeping
- Dropped `use vortex_bench::SESSION` from the `lib.rs` inner test
module (was left over from the session-parameter removal).
- Replaced two fully-qualified `vortex::error::vortex_panic!` calls in
`PreparedDataset::dim()` with a plain `vortex_panic!` sourced from a
top-level import.
- Refreshed the doc comment on `PreparedDataset` — it previously
described the struct as carrying an "execution session / context",
which is no longer accurate.
- Added two new tests for `run_handrolled_baseline_timings`:
- `run_handrolled_baseline_timings_returns_last_iteration_scores`:
verifies the `last_scores` contract on a 3-row fixture.
- `run_handrolled_baseline_timings_panics_on_zero_iterations`:
regression guard for the iterations>=1 assert.
## Test counts
- vortex-bench: 19 tests (was 18, +1 for nullable_element_dtype)
- vector-search-bench: 22 passing + 1 ignored (was 20+1, +2 for the
handrolled timing tests)
- vortex-tensor: 222 tests (unchanged)
- Total: 263 passing, 3 ignored
clippy / rustfmt clean. End-to-end smoke run still coherent:
- correctness-max-diff = 0.000e0 for all three lossless variants
- correctness-max-diff = 5.18e-3 for TurboQuant (within 0.2 tolerance)
- Recall@10 for TurboQuant = 0.9267
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
36 tasks
Contributor
Author
|
claude code web fell over so this doesnt even compile |
Contributor
Author
|
Superseded by #7399 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Tracking issue: #7297
Adds a
vector-search-benchcrate similar to VectorDBBench.The benchmark brute-forces cosine similarity search over public VectorDBBench datasets (Cohere, OpenAI, etc).
Since Vortex is not a database, we do not measure things like vector inserts and deletes. instead we just measure storage size and throughput for four targets:
&[f32]Every variant goes through a correctness check against the uncompressed scan before timing.
Not sure if this makes sense to have on a PR benchmark yet since this can only happen on a very specific array tree that will have a very specific optimized implementation.
Testing
N/A