# Random number generator using modulo

You have probably heard about modulo bias using random number generator. If you read Effective Java by Joshua Bloch this problem is described in Item 47. But did you ever think why there is a problem with simple code like this `Math.abs(rnd.nextInt()) % n`

? Why the distribution of elements is not equal and by how much is it exactly biased? Here is the mathematical explanation:

Let’s assume that function `r()`

returns a random number in range `[0, N)`

with equal distribution. Let’s examine function `f(x) = r() mod x`

, where `x ≤ N`

. I have faced this problem recently when I needed a random long value in given range – `java.util.Random`

class provides only function `nextLong()`

, there is no `nextLong(long max)`

.

Let’s split all numbers in range 0 to N into buckets of length x. This gives k buckets of length x and a single bucket smaller than x (assuming `N mod x ≠ 0`

).

We can easily notice that `k = (N - N mod x) / x`

and last bucket has size `N mod x`

.

If the random number returned by function `r()`

lands in any of these k buckets of length x, everything is fine – if we do `mod x`

we will get a number from the first bucket (i.e. `[0; x)`

). But if the number lands in the last bucket, if we do `mod x`

we will get number in range `[0; N mod x)`

which is a smaller section of the first bucket.

Each number in range `[0; N)`

has probability of being picked equal to `1/N`

. However, after we do `mod x`

, each number in range `[0; N mod x)`

will have probability equal to `(k+1)/N`

while each number in range `[N mod x; x)`

will have probability equal to `k/N`

.

For example, if `x = 2N/3`

then `N mod x = N/3 = x/2`

and `k = 1`

. This means that each number in the first half of requested range will have twice as high probability as each number in the second half. By changing the value of x (relatively to N) the range of affected numbers is changed (e.g. for `x ~ N mod x`

only the last few number in the requested range will have lower probability). And obviously the larger the k is (i.e. the smaller x is relatively to N), the more difficult the bias will be to observe.

So how in Java get a random long in specified range? The following code is obviously deeply flawed:

private static final Random RANDOM = new Random(); public static long random(long x) { return Math.abs(RANDOM.nextLong()) % x; }

Common solution that you can find online is to discard all numbers greater than requested maximum:

public static long random(long x) { long value; do { value = Math.abs(RANDOM.nextLong()); } while (value >= x); return value; }

Although correct, it is slow as the loop will be iterating many times for smaller values of x. The key lies in discarding numbers only from the last, smaller than others, bucket:

public static long random(long x) { long n = Long.MAX_VALUE; long nmodx = n % x; long value; do { value = Math.abs(RANDOM.nextLong()); } while (value >= n - nmodx); // iterate again if value in the last bucket return value % x; // safely take remainder }

Worst case time complexity? Two iterations when `N mod x ~ N / 2`

(i.e. one full bucket and the last smaller bucket has almost the same size). If x is much smaller than N, this loop however will rarely iterate more than once.

This solution has one other small flaw that can cause problems difficult to reproduce, but is easy to solve: `Math.abs(Long.MIN_VALUE) = Long.MIN_VALUE`

. Solution is to simply discard this value as well, by changing the condition to `value >= n - nmodx || value < 0`

.

Posted on October 11, 2014, in Test Driven Development. Bookmark the permalink. Leave a comment.

## Leave a comment

## Comments 0