[PRL] the saga of ip number lookups
Matthias Felleisen
matthias at ccs.neu.edu
Sat Jan 24 17:56:38 EST 2004
Here is the (I think) end of the story of processing IP numbers
from the ISP routers in Scheme. I basically did four things:
0. I read in the entire routing table, because a simple check
suggested its just 3M and fits into memory as a list. That's a
lot of time saved and I won't count it :-)
1. I realized that I can use the first 8 bits of the index number
as a "hashnumber". So instead of a list of 126,000 routing
entries, I now have a vector of 256 lists of say 4,000 entries.
(Actually the distribution is extremely uneven. But the lists
aren't too bad now.) One order of magnitude.
2. Dan re-computed properties of the routing
table entries for every ip line he processed. So I did a "partial
evaluation." Naturally I am wondering whether a partial eval
could have done this. I did change the representation.
Another order of magnitude.
3. Last thing I saw was that Dan used pregexp in an inner
loop. When Robby reminded me that regexp is faster, I
rewrote that, too. I didn't find regexp-split (it's probably in
some library ...) I rewrote that function from first principles.
That was a factor of 2 or 3. Surprising but it's nice to know.
4. I rewrote according to the design recipe. In 300 lines,
I needed one "advanced" concept -- an accumulator to
keep track of the "right" match. But it's all a structural
recursion, occasionally with map, quite suitable for a
210 student. Dan had forgotten his design recipe in so
many places. There were five totally useless tests in
the inner loop. That shaved off a bit, too.
He did use (= (length l) 0) in a post-processing phase,
but that one takes 2 minutes so it doesn't matter.
;; ---
Running the stress test is now a matter of one hour
and 20 minutes or so. When I started (post 0), it was
a 200 to 300 hour job (min). Dan guestimated it may
take almost a month of computer time when I sent
him the 0-th draft.
-- Matthias
More information about the PRL
mailing list