Launch Week 12: Day 3

Learn more

What's new in pgvector v0.7.0

2024-05-02

8 minute read

Real-world embedding datasets often contain redundancy buried within the vector space. For example, when vectors cluster around certain central points in a multidimensional space, it reveals an exploitable structure. By reducing this redundancy, we can achieve memory and performance savings with a minimal impact on precision. Several approaches to leverage this idea have been introduced in pgvector since version 0.7.0:

  • float16 vector representation
  • sparse vectors
  • bit vectors

Float16 vectors

An HNSW index is most efficient when it fits into shared memory and avoids being evicted due to concurrent operations, which Postgres performs to minimize costly I/O operations. Historically, pgvector supported only 32-bit vectors. In version 0.7.0, pgvector introduces 16-bit float HNSW indexes which consume exactly half the memory. That reduction in memory keeps operations at maximum performance for twice as long.

There are two options when using float16 vectors:

  • Index using float16, but the underlying table continues to use float32
  • The index and the underlying table both use float16. This options uses 50% as much disk space in addition to requiring 50% less shared memory to operate efficiently. Performance is further improved with more vectors fitting in a single Postgres page and with fewer page evictions due to concurrent operations.

To duplicate an existing float32 embedding table to float16 one:


_11
create table embedding_half (
_11
id serial,
_11
vector halfvec(1536),
_11
primary key (id)
_11
);
_11
_11
insert into embedding_half (vector)
_11
select
_11
vector::halfvec(1536)
_11
from
_11
embedding_full;

With 900K OpenAI 1536-dimensional vectors, the table size is 3.5Gb. For comparison, embedding_full required 7Gb.

Then we can build a float16 HNSW index:


_10
create index on embedding_half using hnsw (vector halfvec_l2_ops);

To test the performance of index creation, we chose a c7g.metal instance with 128Gb memory and the following parameters:


_10
shared_buffers = 50000MB
_10
maintenance_work_mem = 30000MB
_10
max_parallel_maintenance_workers = {0-63}
_10
wal_level=minimal
_10
max_wal_size = 10GB
_10
autovacuum = off
_10
full_page_writes = off
_10
fsync = off

HNSW build times recently experienced a stepwise improvement in the 0.6.2 release, which introduced parallel builds. 0.7.0 with the halfvec (float16) feature improves that speedup a further 30%.

Note that float16 vector arithmetic on the ARM architecture is identical to float32, so serial build times (with one parallel worker) have not improved. However there is a significant difference for parallel builds due to better pages and I/O utilization. Also note that this test doesn't use pre-warming or other artificial enhancements.

Both heap and HNSW relations for float16 occupy only half of the space compared to the previous float32 ones.

There is a proposal to speed it up even more in the future by using SVE intrinsics on ARM architecture (see: https://github.com/pgvector/pgvector/pull/536).

Jonathan Katz made his measurements on HNSW performance using r7gd.16xlarge (64 vCPU, 512GiB RAM), and his results are even better. For float16, HNSW build times are up to 3x faster. For select’s performance, ANN benchmark results show that precision is not changed with decreasing bitness, and queries per second (QPS) is similar to in-memory cases. But when real machine queries are using I/O or some HNSW pages are evicted from memory due to concurrent connections, there would be a meaningful difference. With only half of memory needed to accommodate the same HNSW index, cost for the same performance and precision is also significantly less.

Vector / VectorVector / HalfVec
Index size (MB)77343867
Index build time (s)26490
Recall @ ef_search=100.8190.809
QPS @ ef_search=1012311219
Recall @ ef_search=400.9450.945
QPS @ ef_search=40627642
Recall @ ef_search=2000.9870.987
QPS @ ef_search=200191190

For full results on the different datasets, see this GitHub issue.

Sparse vectors

If vectors contain many zero components, then a sparse vector representation can save significant storage space. For example, to populate sparse vectors:


_10
create embedding_sparse (
_10
id serial,
_10
vector sparsevec(1536),
_10
primary key (id)
_10
)
_10
_10
insert into embedding_sparse (embedding) values ('{1:0.1,3:0.2,5:0.3}/1536'), ('{1:0.4,3:0.5,5:0.6}/1536');

The sparse vector only consumes storage space for the non-zero components. In this case, thats 3 values in a 1536 vector.

Note the new vector syntax {1:3,3:1,5:2}/1536 for the sparse vector representation in:


_10
select * from embedding_sparse order by vector <-> '{1:3,3:1,5:2}/1536' limit 5;

Bit vectors

Using binary quantization we can represent float vector as a vector in binary space. This reduces storage size dramatically and is intended as a way to quickly “pre-select” from a data set before performing an additional search within the subset. When properly parameterized, the secondary select can be very fast, even without an index.


_10
create index on embedding
_10
using hnsw ((binary_quantize(vector)::bit(1000)) bit_hamming_ops);
_10
_10
select
_10
*
_10
from
_10
embedding
_10
order by
_10
binary_quantize(vector)::bit(3) <~> binary_quantize('[1,-2,3]')
_10
limit 5;

To use a binary quantized HNSW index to pre-select from a larger dataset and then make a fast selection from the resulting subset, without an index:


_12
select * from (
_12
select
_12
*
_12
from
_12
embedding
_12
order by
_12
binary_quantize(vector)::bit(3) <~> binary_quantize('[1,-2,3]')
_12
limit 20
_12
)
_12
order by
_12
vector <=> '[1,-2,3]'
_12
limit 5;

It allows building a small and fast HNSW index for select, insert, or update operations while still having fast vector search. Exact configuration for the limit clauses are data dependent, so you’ll want to experiment with the sub-select size and the number of final results directly on your own dataset.

New distance functions

pgvector 0.7.0 also added support for L1 distance operator <+>.

And new distance types for indexing:

L1 distance - added in 0.7.0


_10
create index on items using hnsw (embedding vector_l1_ops);

Hamming distance - added in 0.7.0


_10
create index on items using hnsw (embedding bit_hamming_ops);

Jaccard distance - added in 0.7.0


_10
create index on vector using hnsw (vector bit_jaccard_ops);

Conclusion

Over the last year pgvector has had significant development in both functionality and performance, including HNSW indexes, parallel builds, and many other options. With the introduction of half vectors (float16), sparse vectors, and bit vectors, we're now seeing over 100x speedup compared to one year ago.

For a more complete comparison of pgvector performance over the last year, check out this post by Jonathan Katz.

Using v0.7.0 in Supabase

All new projects ship with pgvector v0.7.0 (or later). Be sure to enable the extension if you haven't already:


_10
create extension if not exists vector
_10
with
_10
schema extensions;

If you are unsure which version of pgvector your project is using, search for vector on the Extensions page. If you are using a previous version, you can upgrade by navigating to the Service Versions section on the Infrastructure page and upgrading your Postgres version to 15.1.1.47 or later.

Share this article

Build in a weekend, scale to millions