Parallelized list-processing with stream.py

Version 0.8 of stream.py is out with a lot of goodies to make it easy to parallelize a pipeline. I have written previously that by expressing a list-processing task as a pipeline (or at least thinking of it as such), we can have parallelism for free. Well, at least as free as Python allows us to be. We can choose between threading and multiprocessing as our concurrency model: with first choice, we have shared-memory but no real concurrence of threads (thanks to the GIL), and with second choice, we have real multi-tasking but only under the tax of serialization and interprocess communication.

But there is still hope! Many tasks can benefit from parallelization even when Python is shackled by those chains. Today I am going to demonstrate one such program: generating RSA keys.

The main bulk of the program is generating random prime numbers, ones with hundreds of digits, for which we will simply reuse the code from this small library: ent.py. I will only mention in passing that it uses the Miller-Rabin probabilistic primality test, which is not that hard to implement, but why waste time when already written code is available?

An RSA tuple (public, private, modulus) is generated using two prime numbers p and q. The modulus is simply the product p*q. The public and private exponents are inverse elements modulo (p-1)*(q-1), chosen such that the private exponent is as large as possible and is coprime with (p-1)*(q-1). ent.py provides the function rsa_init(p, q) for this very purpose:

>>> import ent

>>> p = ent.random_prime(50); p
20095834826155173708596891384562552406675590831401L

>>> q = ent.random_prime(50); q
99396218006787791985834438209829764811078465689153L

>>> e, d, n = ent.rsa_init(p, q)
>>> e  # public key for encryption
3
>>> d  # private key for decryption
13316333196059453962327958527234463764532500697076
11789395763884874151730570514580176855049093981867L
>>> n  # modulus
19974499794089180943491937790851695646798751045615
37176146478770276922027185366262582500327697493353L

You can read the details of RSA elsewhere. To say the least, it is very easy to generate public and private keys from a pair of prime numbers. But in order to obtain the decryption key from the modulus, we need to factor it and that requires a substantial amount of computation. The recently factored RSA-768 number with 232 decimal digits took an equivalent of more than 1,500 years on one core of a 2.2GHz Opteron! [arstechnica.com]

Let us write a script that takes an argument from the command line and generates that many RSA tuples, using 200-digit primes. Such task is embarrasingly parallel, and I am going to demonstrate how it can be done using streams. Here it the complete script:

#!/usr/bin/env python2.6

"""
Usage:  python ./rsa_tuples [-s] n

Generate `n` RSA tuples using 200-digit primes.

Requires: ent.py <http://modular.fas.harvard.edu/ent/ent_py>.
"""

import ent
from stream import ForkedFeeder, PCollector, chop

NUM_DIGITS = 200

def random_primes(n):
    """Yield `n` random primes."""
    for _ in range(n):
        yield ent.random_prime(num_digits=NUM_DIGITS)

def sequential_main(n):
    """Print `n` RSA (e, d, n) tuples sequentially."""
    for p, q in random_primes(2*n) >> chop(2):
        ent.rsa_init(p,q)

def parallel_main(n):
    """Print `n` RSA (e, d, n) tuples using random primes
    generated by two child processes.
    """
    primes = PCollector()
    for _ in range(2):
        ForkedFeeder(random_primes, n) >> primes
    for p, q in primes >> chop(2):
        print rsa_init(p, q)

if __name__ == '__main__':
    import sys
    if sys.argv[1] == '-s':
        sequential_main(int(sys.argv[2]))
    else:
        parallel_main(int(sys.argv[1]))

It is worth pointing out that sequential_main() is usually written as:

def sequential_main(n):
    """Print `n` RSA (e, d, n) tuples sequentially."""
    for _ in range(n):
        p = ent.random_prime(NUM_DIGITS)
        q = ent.random_prime(NUM_DIGITS)
        print ent.rsa_init(p, q)

However, by rewriting it with a stream-oriented view, we make it much easier to parallelize. First we factor out the task of generating random prime numbers into a generator. The stream-based sequential_main() simply consumes from that generator two primes at a time using chop() and then call rsa_init().

Parallelizing this is almost trivial. Since the problem is CPU-bound, we will use two ForkedFeeder's to run two instances of our prime numbers generator in 2 child processes, each yielding half the number of primes needed. A PCollector is used to merge those concurrent streams of primes generated by child processes into a single output stream, to be consumed just like the above-described sequential_main().

And that is all it takes! The timing results are what you would expect:

# sequential
$ time python rsa_tuples.py -s 20 >/dev/null

real    4m54.438s
user    4m53.144s
sys     0m0.123s

# parallel
$ time python rsa_tuples.py 20 >/dev/null

real    2m57.388s
user    5m21.502s
sys     0m0.393s

Obviously, not all problems can be reduced to list-processing. But if they can, good for you!

Now let's wait until that GIL is removed ؟

Comments