Except for tiny sites, this is impossible to do manually.
Note that lower level fields may sometimes be needed to compute more useful fields. Also, sometimes the higher level fields may be more difficult to compute, so they are not always worth it.
Chimera implements a probabilistic algorithm to find full-page, near text duplicates (if two pages have nearly the same content, then it will probably find it).