Kalyan's Blog: Anagrams ....

Saturday, April 16, 2005

Anagrams ....

You have dictionary words in a file. You are solving crosswords. You want to quickly find anagrams of a given word.
Perf requirements:
1. Load up time of a couple of seconds is acceptable (wordlist of ~50k which you can get from web)
2. Each anagram query should be fast say < 0.5 or 1 sec.

Here's how building the strucure would look in python. Reads like psuedo code but it works.. Pretty neat?


def sort(word):
    l = list(word)
    l.sort()
    return "".join(l)

def anagrams(words):
    ags = {}
    for word in words:
        sw = sort(word)
        ags.setdefault(sw, []).append(word)

    return dict([(x,y) for (x,y) in ags.iteritems() if len(y) > 1])

if __name__ == "__main__":
    words = open("wordlist.txt").readlines()
    words = [word.strip() for word in words]
    rs = anagrams(words)
    print len(rs)

If you think about it your first reaction would be - but it would be **so** slow :)). But you know what? It satisfies the requirements and takes a fraction of time to write than in C.

2 comments:

Anonymous said...: How about this short perl one-liner that prints each anagram group on a separate line? Had seen this on the web sometime back - uses those clever Perl tricks to make it really short.

perl -lne'$x{1,sort/./g}.="$_ "}{/ ./&&print for%x' wordlist.txt

For DOS
perl -lne"$x{1,sort/./g}.=\"$_ \"}{/ ./&&print for%x" wordlist.txt

-MN; 5:14 AM
Kalyan said...: Thats neat too, but I personally dont find perl readable :); 5:33 PM

Kalyan's Blog

Saturday, April 16, 2005

Anagrams ....

2 comments:

About Me

Blog Archive

Blogs I read