UUID V4 Benchmark

12 Jan 2025

UUID V4 is one of many kinds of UUID used as identities for its randomness and uniqueness. In this post I will show some benchmarks of UUID V4 generations in some programming languages. This microbenchmark will measure time spent generating 1 millions UUID. The code is simple, loop for 1 millions time and generate UUID inside. I use benchmarking tools in each languages, except C.

The machine spec for running this benchmark is:

OS: macOS 13.6 x86_64
Host: MacBookPro16,2
CPU: Intel i5-10 (8) @ 2.00GHz
Memory: 16GB

C#

dotnet version: 9

Benchmarked with BenchmarkDotnet version v0.13.12.

The benchmark setting is 10 warmup iterations and 10 measurement iterations.

...
[MemoryDiagnoser]
[WarmupCount(10)]
[IterationCount(10)]
public class UUIDBenchmark
{
    private readonly Consumer consumer = new Consumer();

    [Benchmark]
    public void NewGuid()
    {
        for (int i = 0; i < 1_000_000; i++)
        {
            consumer.Consume(Guid.NewGuid());
        }
    }
}
...

Result: 280.4ms

BenchmarkDotNet v0.13.12, macOS 15.2 (24C101) [Darwin 24.2.0]
Intel Core i5-1038NG7 CPU 2.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 9.0.101
  [Host]     : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2
  Job-TVBFQK : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX2

IterationCount=10  WarmupCount=10

| Method  | Mean     | Error    | StdDev   | Allocated |
|-------- |---------:|---------:|---------:|----------:|
| NewGuid | 280.4 ms | 18.20 ms | 12.04 ms |     368 B |

Java

OpenJDK version 11, 17, 21, 23.

Benchmarked with JMH 1.37

The benchmark setting is 10 warmup iterations and 10 measurement iterations.

...
    @Param({"1000000"})
    private int iterations;

    @Benchmark
    public void benchmarkUUIDGeneration(Blackhole blackhole) {
        for (int i = 0; i < iterations; i++) {
            blackhole.consume(UUID.randomUUID());
        }
    }
...

Result:

Java 11: 230.76ms

Benchmark                              (iterations)  Mode  Cnt    Score   Error  Units
UUIDBenchmark.benchmarkUUIDGeneration       1000000  avgt   10  230.765 ± 1.495  ms/op

Java 17: 231.51ms

Benchmark                              (iterations)  Mode  Cnt    Score   Error  Units
UUIDBenchmark.benchmarkUUIDGeneration       1000000  avgt   10  231.511 ± 3.102  ms/op

Java 21: 238.43ms

Benchmark                              (iterations)  Mode  Cnt    Score    Error  Units
UUIDBenchmark.benchmarkUUIDGeneration       1000000  avgt   10  238.435 ± 14.640  ms/op

Java 23: 227.97ms

Benchmark                              (iterations)  Mode  Cnt    Score   Error  Units
UUIDBenchmark.benchmarkUUIDGeneration       1000000  avgt   10  227,973 ± 4,130  ms/op

All of 4 versions are close within 200ms benchmarks. On average around 232ms.

Go

Go version 22 and 23.

Benchmarked using built in tool.

The benchmark setting is 10 iterations.

As far as I know, there's no warming up in Go benchmarking built-in tool.

Libraries used:

uuid: https://github.com/google/uuid
fastuuid: https://github.com/rogpeppe/fastuuid
go.uuid: https://github.com/satori/go.uuid

...
const iterations = 1_000_000

func BenchmarkGoogleUUID(b *testing.B) {
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		for i := 0; i < iterations; i++ {
			ret, err := googleUUID.NewRandom()
			if err != nil {
				panic(err)
			}
			runtime.KeepAlive(ret)
		}
	}
}

func BenchmarkFastUUIDRandom(b *testing.B) {
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		for i := 0; i < iterations; i++ {
			g, err := fastUUID.NewGenerator()
			if err != nil {
				panic(err)
			}
			runtime.KeepAlive(g.Hex128())
		}
	}
}

func BenchmarkFastUUIDNotRandom(b *testing.B) {
	b.ResetTimer()

	for n := 0; n < b.N; n++ {
		g, err := fastUUID.NewGenerator()
		if err != nil {
			panic(err)
		}

		for i := 0; i < iterations; i++ {
			runtime.KeepAlive(g.Hex128())
		}
	}
}

func BenchmarkSatoriUUID(b *testing.B) {
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		for i := 0; i < iterations; i++ {
			runtime.KeepAlive(satoriUUID.NewV4())
		}
	}
}
...

Note
BenchmarkFastUUIDNotRandom is not random because it only generate next uuid that only differs for the first 8 bit. It's faster, but not secure.

Result:

Go 1.22.0:

BenchmarkGoogleUUID-8                 10        1280294953 ns/op        16000116 B/op    1000001 allocs/op
BenchmarkFastUUIDRandom-8             10        1340914057 ns/op        80000519 B/op    2000005 allocs/op
BenchmarkFastUUIDNotRandom-8          10          49942936 ns/op        48000128 B/op    1000002 allocs/op
BenchmarkSatoriUUID-8                 10        1301521588 ns/op        16000057 B/op    1000000 allocs/op

Go 1.23.4:

BenchmarkGoogleUUID-8                 10         412086926 ns/op        16000100 B/op    1000001 allocs/op
BenchmarkFastUUIDRandom-8             10         485570965 ns/op        80000862 B/op    2000004 allocs/op
BenchmarkFastUUIDNotRandom-8          10          51265116 ns/op        48000329 B/op    1000004 allocs/op
BenchmarkSatoriUUID-8                 10         415979198 ns/op        16000096 B/op    1000001 allocs/op

Javascript/NodeJS

NodeJS version 23.4.0.

Benchmarked with benchmark 2.1.4

Libraries tested built-in crypto and uuid 9.0.1.

const Benchmark = require('benchmark');
// const { v4: uuidv4 } = require('uuid'); /// for uuid

const suite = new Benchmark.Suite();

let consume;
suite
  .add('Generating UUID V4 1 million times', function () {
    for (let i = 0; i < 1_000_000; i++) {
      consume = crypto.randomUUID();
      // consume = uuidv4(); /// for uuid
    }
  }, {
    initCount: 10,
    minSamples: 10,
    maxTime: 10
  })
  .on('cycle', function (event) {
    console.log(String(event.target));
  })
  .on('complete', function () {
    const benchmark = this[0];
    const meanSeconds = benchmark.stats.mean;
    const meanMilliseconds = meanSeconds * 1000;
    console.log(`Average time: ${meanMilliseconds.toFixed(3)} ms`);
  })
  .run({ async: false });

I use let consume; to prevent optimization by the runtime for unused values returned from uuid computation. Async is turned off just like other benchmarks.

Result:

Native crypto: 188.09ms

Generating UUID V4 1 million times x 5.32 ops/sec ±2.73% (14 runs sampled)
Average time: 188.099 ms

uuid: 157.56ms

Generating UUID V4 1 million times x 6.35 ops/sec ±1.67% (15 runs sampled)
Average time: 157.564 ms

Ruby

Ruby version 3.2.2.

Benchmarked using benchmark-ips.

$consume = nil

def generate_uuids
    for i in 1..1_000_000 do
        $consume = SecureRandom.uuid
    end
end

Benchmark.ips do |bench|
    bench.config(
        iterations: 10,
        time: 10, 
        warmup: 10,
    )
    
    bench.report("Generate 1 million UUIDs") { generate_uuids }

    bench.compare!
end

Just like in JS, I use that global to avoid optimization by the runtime for unused values returned from uuid computation.

Result:

Generate 1 million UUIDs
                          0.449 (± 0.0%) i/s     (2.23 s/i) -      5.000 in  11.153187s
Generate 1 million UUIDs
                          0.453 (± 0.0%) i/s     (2.21 s/i) -      5.000 in  11.030155s
Generate 1 million UUIDs
                          0.434 (± 0.0%) i/s     (2.31 s/i) -      5.000 in  11.559152s
Generate 1 million UUIDs
                          0.425 (± 0.0%) i/s     (2.35 s/i) -      5.000 in  11.759262s
Generate 1 million UUIDs
                          0.423 (± 0.0%) i/s     (2.37 s/i) -      5.000 in  11.864912s
Generate 1 million UUIDs
                          0.420 (± 0.0%) i/s     (2.38 s/i) -      5.000 in  11.971267s
Generate 1 million UUIDs
                          0.464 (± 0.0%) i/s     (2.16 s/i) -      5.000 in  10.783979s
Generate 1 million UUIDs
                          0.457 (± 0.0%) i/s     (2.19 s/i) -      5.000 in  10.948187s
Generate 1 million UUIDs
                          0.458 (± 0.0%) i/s     (2.18 s/i) -      5.000 in  10.922634s
Generate 1 million UUIDs
                          0.394 (± 0.0%) i/s     (2.54 s/i) -      5.000 in  12.937615s

Average: 2.3s

Python

Python version 3.13.1

Benchmarked using timeit

import timeit
import uuid

consume = None
def generate_uuids():
    for x in range(1_000_000):
        consume = uuid.uuid4()

timer = timeit.Timer(
    stmt='generate_uuids()',
    setup='from __main__ import generate_uuids'
)

timer.timeit(number=1)

results = timer.repeat(repeat=10, number=1)
min_time = min(results)
max_time = max(results)
mean_time = sum(results) / len(results)

print(f"Min: {min_time:.6f} seconds")
print(f"Max: {max_time:.6f} seconds")
print(f"Mean: {mean_time:.6f} seconds")
print(f"All times: {[f'{r:.6f}' for r in results]} seconds")

Result: 2.6s

Min: 2.582918 seconds
Max: 2.712734 seconds
Mean: 2.622747 seconds
All times: ['2.712734', '2.627278', '2.632984', '2.588112', '2.599262', '2.634081', '2.607896', '2.596771', '2.582918', '2.645434'] seconds

Rust

Cargo version 1.83.0.

Benchmarked with Criterion 0.5.

Library used is uuid benchmarked for both default and fast-rng feature.

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use uuid::Uuid;

fn generate_uuid_v4_benchmark(c: &mut Criterion) {
    c.bench_function("generate_uuid_v4", |b| {
        b.iter(|| {
            for _ in 0..1_000_000 {
                black_box(Uuid::new_v4());
            }
        });
    });
}

criterion_group! {
    name = benches;
    config = Criterion::default()
        .sample_size(10);
    targets = generate_uuid_v4_benchmark
}
criterion_main!(benches);

Result:

Default feature(using OS RNG): 1.06s

generate_uuid_v4        time:   [1.0541 s 1.0618 s 1.0715 s]
                        change: [+5519.4% +5623.5% +5715.8%] (p = 0.00 < 0.05)

fast-rng feature: 17.81ms

generate_uuid_v4        time:   [17.681 ms 17.812 ms 17.896 ms]
                        change: [-5.9214% -1.9871% +0.9153%] (p = 0.38 > 0.05)

C

Clang version 16.0.0.

Using library uuid4

I don't use any benchmarking library since C is compiled, raw and nothing much from the compiler. It's just plain C.

...
  int uuid_generated = 1000000;
  int i;
  char uuid_buf[UUID4_LEN];
  float elapsed;

  start_timer();
  for (i = 0; i < uuid_generated; i++){
    uuid4_generate(uuid_buf);
  }
  elapsed = stop_timer();

  printf("%.2fms\n", elapsed);
...

Result: 107.87ms

Summary

I run all the benchmarks using the benchmarking tools found in each language. In C#, Java, and Rust I use kind of consume, which eliminate compiler/runtime dead-code elimination. It's one kind of compiler optimization for unused values, so I put the values returned inside the consume function/method.

For other benchmarks without consume functions, I use something similar like in Go using runtime.KeepAlive, or using global variable to put values into.

Since values returned are kind of randomized, the constant folding from putting inside global var should be eliminated.

From the results we can see the winner is Rust with fast-rng feature running for 17.81ms. Feature fast-rng use rand crate, while default one use getrandom crate inside. Crate getrandom get the random values from system resource(syscall).

Go happen to increase in performance from 1.22 to 1.23 with 2-3 times improve. All libraries in 1.22 result around 1.30s and 437.33ms in 1.23. Generating with fastuuid(not random) is the fastest for Go, but I won't consider it since it's not random and less secure.

Language	Mean
C#	280.4ms
Java 11.0.25	230.76ms
Java 17.0.13	231.51ms
Java 21.0.5	238.43ms
Java 23.0.1	227.97ms
Go 1.22 uuid	1.28s
Go 1.22 fastuuid(random)	1.34s
Go 1.22 go.uuid	1.30s
Go 1.22 fastuuid(not random)	49.94ms
Go 1.23 fastuuid(not random)	51.26ms
Go 1.23 uuid	412ms
Go 1.23 fastuuid(random)	485ms
Go 1.23 go.uuid	415ms
NodeJS 23.4.0 crypto	188.09ms
NodeJS 23.4.0 uuid	157.56ms
Ruby 3.2.2	2.3s
Python 3.13.1	2.6s
Rust 1.83.0 uuid default	1.06s
Rust 1.83.0 uuid fast-rng	17.81ms
C Clang 16.0.0	107.87ms

Generating 1 Million UUIDs(lower is better)

This microbenchmark only show a fraction of realtime long running systems. There could be more adjustments/optimizations happening depends on workload and runtime configurations, expecially in languages running on top of virtual machines like C# and Java. There are many other considerations beside microbenchmark for whole big system. This atleast show performance of litle part of system which used to be in contentions specially for CPU bound operations.