CS 70

The Source for Our randuint32 Library

This is bonus material. You don't need to look at this code.

Header File: randuint32.hpp

/* randuint32.hpp
   ==============

   Interface definition for the RandUInt32 class, which provides a simple
   way to get random numbers.

   This file is provided as part of CS 70, but may be used in other
   code.
*/

#ifndef RANDUINT32_HPP_INCLUDED
#define RANDUINT32_HPP_INCLUDED

#include <iostream>
#include <cstdint>
#include "jsf.hpp"

/**
 * \class RandUInt32
 * \brief Provides a simplified way to get random numbers.
 *
 * \details
 * Contains a C++-style a random number engine as a data member that it
 * uses to produce the numbers.
 *
 */

class RandUInt32 {
 public:
    /*
     * \brief    Construct a new RandUInt32 with a random seed (uses system
     *           entropy under the hood).
     *
     * \note     Entropy collection is slow. Prefer creating one RandUInt32
     *           and reusing it rather than constructing per draw.  In fact,
     *           this code just delegates to the parameterized constructor
     *           with the seed -1, which causes it to use system entropy.
     */
    RandUInt32();

    /*
     * \brief    Construct a new RandUInt32 with the given seed.
     *
     * \param seedval
     *           63-bit seed used to initialize the generator.  If the value
     *           is positive, it is used directly; if negative, its value is
     *           negated and mixed with some system entropy to produce a
     *           truly random seed.
     *
     * \note     Same positive seed ⇒ same sequence. Initialization is ~100×
     *           slower than drawing a value, so prefer reusing a single
     *           instance.  The negative seed option is provided to make it
     *           easier to truly random seeds from conditional code paths
     *           (e.g., set the seed value to -1 and then if seed is specified
     *           on the command line, set it to that positive value).
     *
     * \warning  Seeding the state with a 32-bit value causes measurable bias
     *           in early outputs. Classic occupancy problem: 2^32 seeds mapping
     *           to 2^32 possible first outputs means ~37% of uint32_t values
     *           will NEVER occur as the first result (n balls → n bins leaves
     *           n/e bins empty). Not a PRNG flaw -- this is fundamental
     *           probability theory.  When you want actual random behavior,
     *           it's easiest to use the default constructor, which provides
     *           a good source of randomness and avoids this issue, but if
     *           you want to use std::random_device or some other source
     *           of 32-bit entropy to seed the generator, make *two* calls
     *           and combine them to get a 64-bit seed.
     */
    RandUInt32(int64_t seedval);

    /*
     * \brief    Get a random unsigned integer in the range [0, 2^32-1].
     *
     * \return   A random unsigned 32-bit integer.
     *
     * \warning  Don't *ever* write `gen.get() % n` to get a number
     *           in the range [0, n-1].  This introduces bias unless
     *           n is a power of two.  Use `gen.get(n)` instead.
     */
    uint32_t get();

    /*
     * \brief    Get a random unsigned integer in the range [0, max-1].
     *
     * \param    max The upper bound (exclusive) for the random number.
     * \return   A random unsigned 32-bit integer in the range [0, max-1].
     *
     * \throws   std::invalid_argument if max is 0.
     */
    uint32_t get(uint32_t max);

    // Although the preferred interface is to use get(), we also allow the
    // PRNG to be used as a standard C++ PRNG, so that it can be used
    // with standard library facilities like std::shuffle,
    // std::uniform_int_distribution, etc.
    using result_type = typename jsf32::result_type;
    static constexpr result_type min() {
        return jsf32::min();
    }
    static constexpr result_type max() {
        return jsf32::max();
    }
    result_type operator()() {
        return rng_();
    }

 private:
    typedef jsf32 rng_type;
    rng_type rng_;
};

#endif

The class is designed so that we could change rng_type to be a different PRNG. Currently it uses jsf32, but it could for example use std::minstd_rand or std::mt19937 instead.

Source File: randuint32.cpp

The following would be entirely sufficient:

/* randuint32.cpp
   ==============

   Implementation of the RandUInt32 class, which provides a simple way
   to get random numbers.

   This file is provided as part of CS 70, but may be used in other
   code.
*/

#include "randuint32.hpp"
#include <chrono>
#include <random>
#include <stdexcept>
#include <cinttypes>

namespace {
// Assumes std::random_device actually provides non-deterministic entropy.
uint64_t system_entropy() {
    // Phase one, use the random device to get some entropy
    std::random_device rd;
    uint64_t val = rd();
    val = (val << 32) | rd();
    return val;
}

}  // namespace

RandUInt32::RandUInt32() : RandUInt32(-1) {
    // Nothing (else) to do.
}

RandUInt32::RandUInt32(int64_t seedval)
    : rng_(seedval < 0 ? system_entropy() : uint64_t(seedval)) {
    // Nothing (else) to do.
}

uint32_t RandUInt32::get(size_t max) {
    std::uniform_int_distribution<uint32_t> udist(0, max - 1);
    return udist(rng_);
}

uint32_t RandUInt32::get() {
    std::uniform_int_distribution<uint32_t> udist(0, UINT32_MAX);
    return udist(rng_);
}

If you want to have you're own version of RandUInt32 to use, the one above might be the best, because the code is the simplest.

For this class though, we like to give examples where we can know for sure what the output would be for a given seed. The std::uniform_int_distribution class template is not guaranteed to produce the same results on different standard library implementations (or even different versions of the same library). To have identical results across systems, we avoid using std::uniform_int_distribution and instead implement its functionality ourselves (unless you switch the underlying generator to something that makes doing that super-awkward).

Generally though, reimplementing code that is already in the standard library is a bad idea. Often considerable engineering effort has gone into making the standard library code perform very well (although obviously that cany vary from system to system).

The code below uses various recent C++ features we've not covered in this class, including templated (compile-time) constants, and if constexpr which is an if whose outcome must be determined at compile-time.

/* randuint32.cpp
   ==============

   Implementation of the RandUInt32 class, which provides a simple way
   to get random numbers.

   This file is provided as part of CS 70, but may be used in other
   code.
*/

#include "randuint32.hpp"
#include <chrono>
#include <random>
#include <stdexcept>
#include <cstdint>

namespace {
// In some ways, this function is overkill, but the C++ standard does not
// actually guarantee that std::random_device is non-deterministic, so
// we mix in some other sources of (hopefully) non-deterministic entropy.
uint64_t system_entropy() {
    // Phase one, use the random device to get some entropy
    std::random_device rd;
    uint64_t val = rd();
    val = (val << 32) | rd();
    // Phase two, mix in some timing noise
    auto now = std::chrono::high_resolution_clock::now();
    val ^= now.time_since_epoch().count();
    // Phase three, multiply-xorshift mixing
    val *= 17857243053658340393ULL;
    val ^= (val >> 35);
    // Phase four, mix in the address of a local variable
    val ^= reinterpret_cast<uint64_t>(&val);
    // Phase five, multiply-xorshift mixing again
    val *= 1080363914018236217ULL;
    val ^= (val >> 43);
    // Phase six, mix in the address of our own code
    val ^= reinterpret_cast<uint64_t>(&system_entropy);
    // Phase seven, final multiply-xorshift mixing
    val *= 3262792193083468147ULL;
    val ^= (val >> 31);
    return val;
}

}  // namespace

RandUInt32::RandUInt32() : RandUInt32(-1) {
    // Nothing (else) to do.
}

RandUInt32::RandUInt32(int64_t seedval)
    : rng_(seedval < 0 ? system_entropy() ^ uint64_t(seedval)
                       : uint64_t(seedval)) {
    // jsf32 only has a 32-bit seed, which means that we would have bias
    // if we only used 32 bits of our entropy, so we use some of the high
    // bits to advance a little bit.
    uint32_t advance = (seedval >> 32) & 0x1ff;
    for (uint32_t i = 0; i < advance; ++i) {
        rng_();
    }
}

uint32_t RandUInt32::get(uint32_t max) {
    if (max == 0) {
        throw std::invalid_argument("RandUInt32::get(max): max must be > 0");
    }
    if constexpr (rng_type::min() == 0 && rng_type::max() == UINT32_MAX) {
        // From https://www.pcg-random.org/posts/bounded-rands.html
        uint32_t x = rng_();
        uint64_t m = uint64_t(x) * uint64_t(max);
        uint32_t l = uint32_t(m);
        if (l < max) {
            uint32_t t = -max;
            if (t >= max) {
                t -= max;
                if (t >= max)
                    t %= max;
            }
            while (l < t) {
                x = rng_();
                m = uint64_t(x) * uint64_t(max);
                l = uint32_t(m);
            }
        }
        return m >> 32;
    } else {
        std::uniform_int_distribution<unsigned int> udist(0, max - 1);
        return udist(rng_);
    }
}

unsigned int RandUInt32::get() {
    if constexpr (rng_type::min() == 0 && rng_type::max() == UINT32_MAX) {
        return rng_();
    } else {
        std::uniform_int_distribution<unsigned int> udist(0, UINT32_MAX);
        return udist(rng_);
    }
}

Queries

  • Duck speaking

    What's jsf32?

  • LHS Cow speaking

    It's a C++-style PRNG that is fast and compact. The original algorithm was invented by Bob Jenkins and implemented in C, but we use this C++ implementation.

  • RHS Cow speaking

    Prof. Melissa discusses it in depth on her blog here and here.

  • Duck speaking

    Are there other generators I can use?

  • LHS Cow speaking

    Yes, a lot. There are a number built into the C++ standard library, but they tend to be older generators that have various flaws.

  • RHS Cow speaking

    The std::mt19937 generator (a.k.a. The Mersenne Twister) is pretty good, but uses up a lot of space -- at least 19937 bits or about 2.5 kiB, and some implementations use twice that to gain added speed.

  • LHS Cow speaking

    Technically, the Mersenne Twister fails some PRNG test suites (failing the Matrix Rank and Linear Complexity tests).

  • Cat speaking

    What about Prof. Melissa's PRNG?

  • LHS Cow speaking

    You can download a C++ implementation of PCG from GitHub.

  • Goat speaking

    Why don't you use it here?

  • LHS Cow speaking

    Prof. Melissa, as a matter of principle, prefers not to require anyone to use her PRNG.

(When logged in, completion status appears here.)