Perf requirements:
1. Load up time of a couple of seconds is acceptable (wordlist of ~50k which you can get from web)
2. Each anagram query should be fast say < 0.5 or 1 sec.
Here's how building the strucure would look in python. Reads like psuedo code but it works.. Pretty neat?
def sort(word):
l = list(word)
l.sort()
return "".join(l)
def anagrams(words):
ags = {}
for word in words:
sw = sort(word)
ags.setdefault(sw, []).append(word)
return dict([(x,y) for (x,y) in ags.iteritems() if len(y) > 1])
if __name__ == "__main__":
words = open("wordlist.txt").readlines()
words = [word.strip() for word in words]
rs = anagrams(words)
print len(rs)
If you think about it your first reaction would be - but it would be **so** slow :)). But you know what? It satisfies the requirements and takes a fraction of time to write than in C.
2 comments:
How about this short perl one-liner that prints each anagram group on a separate line? Had seen this on the web sometime back - uses those clever Perl tricks to make it really short.
perl -lne'$x{1,sort/./g}.="$_ "}{/ ./&&print for%x' wordlist.txt
For DOS
perl -lne"$x{1,sort/./g}.=\"$_ \"}{/ ./&&print for%x" wordlist.txt
-MN
Thats neat too, but I personally dont find perl readable :)
Post a Comment