Saturday, April 16, 2005

Anagrams ....

You have dictionary words in a file. You are solving crosswords. You want to quickly find anagrams of a given word.
Perf requirements:
1. Load up time of a couple of seconds is acceptable (wordlist of ~50k which you can get from web)
2. Each anagram query should be fast say < 0.5 or 1 sec.

Here's how building the strucure would look in python. Reads like psuedo code but it works.. Pretty neat?

def sort(word):
l = list(word)
l.sort()
return "".join(l)

def anagrams(words):
ags = {}
for word in words:
sw = sort(word)
ags.setdefault(sw, []).append(word)

return dict([(x,y) for (x,y) in ags.iteritems() if len(y) > 1])

if __name__ == "__main__":
words = open("wordlist.txt").readlines()
words = [word.strip() for word in words]
rs = anagrams(words)
print len(rs)

If you think about it your first reaction would be - but it would be **so** slow :)). But you know what? It satisfies the requirements and takes a fraction of time to write than in C.

2 comments:

Anonymous said...

How about this short perl one-liner that prints each anagram group on a separate line? Had seen this on the web sometime back - uses those clever Perl tricks to make it really short.

perl -lne'$x{1,sort/./g}.="$_ "}{/ ./&&print for%x' wordlist.txt

For DOS
perl -lne"$x{1,sort/./g}.=\"$_ \"}{/ ./&&print for%x" wordlist.txt

-MN

Kalyan said...

Thats neat too, but I personally dont find perl readable :)