Many sites do not have true content types, but in order to move toward more structured content having an understand of at least notional content types is essential. The most direct way to get content types is to pull them from a CMS, but even if you have this available then you may need to rationalize across a constellation of sites / CMSes (for instance, "blog" on one system may be equivalent to "article" on another).
If you cannot get the content type directly, then you have a couple other options: scraping this information out (either what's exposed to the user or what's in the HTML behind the scenes) or creating maps/rules to estimate this information (such as pulling out elements of URLs).
Content Chimera allows you to create custom formulas that work against maps (a bunch of from → to pairs, such as "/blog"→blog (if the url has /blog in it, then categorize it as a blog content type).