When dealing with large data most would suggest to use a database. Here's an interesting problem though. Someone on another board wants to filter "dictionary" flat files so that each line, consisting of a single word, is unique.
In effect it amounts to a database where the keys are the data. Doesn't seem to be an easy answer. I suggested using Windows ports of Linux utilities "sort" and "uniq" since the files are too large for an in memory manipulation. Seems "sort" is working but it's not an ideal solution.
I'm wondering if a db guru has run into this where essentially you want quick access and addition to keys that all point to null data?
Right now the user just has a bunch of flat files with a word per line that have been independently grown. There are scores of duplicates across the files.