Version 0.8 of stream.py is out with a lot of goodies to make it easy to parallelize a pipeline. I have written previously that by expressing a list-processing task as a pipeline (or at least thinking of it as such), we can have parallelism for free. Well, at least as free as Python allows us to be. We can choose between threading and multiprocessing as our concurrency model: with first choice, we have shared-memory but no real concurrence of threads (thanks to the GIL), and with second choice, we have real multi-tasking but only under the tax of serialization and interprocess communication.
But there is still hope! Many tasks can benefit from parallelization even when Python is shackled by those chains. Today I am going to demonstrate one such program: generating RSA keys.
The main bulk of the program is generating random prime numbers, ones with hundreds of digits, for which we will simply reuse the code from this small library: ent.py. I will only mention in passing that it uses the Miller-Rabin probabilistic primality test, which is not that hard to implement, but why waste time when already written code is available?
An RSA tuple
(public, private, modulus) is generated using two
prime numbers
p and
q. The
modulus is simply the
product
p*q. The public and private exponents are inverse elements
modulo
(p-1)*(q-1), chosen such that the private exponent is as
large as possible and is coprime with
(p-1)*(q-1).
ent.py provides the
function
rsa_init(p, q) for this very purpose:
>>> import ent >>> p = ent.random_prime(50); p 20095834826155173708596891384562552406675590831401L >>> q = ent.random_prime(50); q 99396218006787791985834438209829764811078465689153L >>> e, d, n = ent.rsa_init(p, q) >>> e # public key for encryption 3 >>> d # private key for decryption 13316333196059453962327958527234463764532500697076 11789395763884874151730570514580176855049093981867L >>> n # modulus 19974499794089180943491937790851695646798751045615 37176146478770276922027185366262582500327697493353L
You can read the details of RSA elsewhere. To say the least, it is very easy to generate public and private keys from a pair of prime numbers. But in order to obtain the decryption key from the modulus, we need to factor it and that requires a substantial amount of computation. The recently factored RSA-768 number with 232 decimal digits took an equivalent of more than 1,500 years on one core of a 2.2GHz Opteron! [arstechnica.com]
Let us write a script that takes an argument from the command line and generates that many RSA tuples, using 200-digit primes. Such task is embarrasingly parallel, and I am going to demonstrate how it can be done using streams. Here it the complete script:
#!/usr/bin/env python2.6
"""
Usage: python ./rsa_tuples [-s] n
Generate `n` RSA tuples using 200-digit primes.
Requires: ent.py <http://modular.fas.harvard.edu/ent/ent_py>.
"""
import ent
from stream import ForkedFeeder, PCollector, chop
NUM_DIGITS = 200
def random_primes(n):
"""Yield `n` random primes."""
for _ in range(n):
yield ent.random_prime(num_digits=NUM_DIGITS)
def sequential_main(n):
"""Print `n` RSA (e, d, n) tuples sequentially."""
for p, q in random_primes(2*n) >> chop(2):
ent.rsa_init(p, q)
def parallel_main(n):
"""Print `n` RSA (e, d, n) tuples using random primes
generated by two child processes.
"""
primes = PCollector()
for _ in range(2):
ForkedFeeder(random_primes, n) >> primes
for p, q in primes >> chop(2):
print rsa_init(p, q)
if __name__ == '__main__':
import sys
if sys.argv[1] == '-s':
sequential_main(int(sys.argv[2]))
else:
parallel_main(int(sys.argv[1]))
It is worth pointing out that
sequential_main() is usually
written as:
def sequential_main(n):
"""Print `n` RSA (e, d, n) tuples sequentially."""
for _ in range(n):
p = ent.random_prime(NUM_DIGITS)
q = ent.random_prime(NUM_DIGITS)
print ent.rsa_init(p, q)
However, by rewriting it with a stream-oriented view, we make it much
easier to parallelize. First we factor out the task of generating random
prime numbers into a generator. The stream-based
sequential_main() simply consumes from that generator two primes at
a time using
chop() and then call
rsa_init().
Parallelizing this is almost trivial. Since the problem is CPU-bound, we
will use two
ForkedFeeder's to run two instances of our prime numbers generator
in 2 child processes, each yielding half the number of primes needed.
A
PCollector is used to merge those concurrent streams of
primes generated by child processes into a single output stream, to be
consumed just like the above-described
sequential_main().
And that is all it takes! The timing results are what you would expect:
# sequential $ time python rsa_tuples.py -s 20 >/dev/null real 4m54.438s user 4m53.144s sys 0m0.123s # parallel $ time python rsa_tuples.py 20 >/dev/null real 2m57.388s user 5m21.502s sys 0m0.393s
Obviously, not all problems can be reduced to list-processing. But if they can, good for you!
Now let's wait until that GIL is removed ؟ ■
Previous Post Next Post