[PRL] the saga of ip number lookups

Sat Jan 24 17:56:38 EST 2004

Here is the (I think) end of the story of processing IP numbers
from the ISP routers in Scheme. I basically did four things:

0. I read in the entire routing table, because a simple check
suggested its just 3M and fits into memory as a list. That's a
lot of time saved and I won't count it :-)

1. I realized that I can use the first 8 bits of the index number
as a "hashnumber". So instead of a list of 126,000 routing
entries, I now have a vector of 256 lists of say 4,000 entries.
(Actually the distribution is extremely uneven. But the lists
aren't too bad now.) One order of magnitude.

2. Dan re-computed properties of the routing
table entries for every ip line he processed. So I did a "partial
evaluation." Naturally I am wondering whether a partial eval
could have done this. I did change the representation.
Another order of magnitude.

3. Last thing I saw was that Dan used pregexp in an inner
loop. When Robby reminded me that regexp is faster, I
rewrote that, too. I didn't find regexp-split (it's probably in
some library ...) I rewrote that function from first principles.

That was a factor of 2 or 3. Surprising but it's nice to know.

4. I rewrote according to the design recipe. In 300 lines,
I needed one "advanced" concept -- an accumulator to
keep track of the "right" match. But it's all a structural
recursion, occasionally with map, quite suitable for a
210 student. Dan had forgotten his design recipe in so
many places. There were five totally useless tests in
the inner loop. That shaved off a bit, too.

He did use (= (length l) 0) in a post-processing phase,
but that one takes 2 minutes so it doesn't matter.

;; ---

Running the stress test is now a matter of one hour
and 20 minutes or so. When I started (post 0), it was
a 200 to 300 hour job (min). Dan guestimated it may
take almost a month of computer time when I sent
him the 0-th draft.

-- Matthias