Most inventories or audits start with an automated crawl to get a list of content, and those tools typically provide a variety of data about each URL in the list like the MIME Type and Crawl Depth. As always, we recommend an iterative approach, so first starting with basic information is helpful (and may even lead to useful insights right off the bat, although generally a first pass is primarily toward ensuring you have a good list in the first place). That said, in general we do need to iterate and add more fields in order to answer our content questions and hypotheses.
Note: if for any reason you decide to generate the list of content manually, it's even more important to be careful what fields you add to your analysis since you spend effort on each page to enter each field (and meanwhile may lose sight of the top priority in the first pass: getting a reasonable list).
We need to be methodical rather than just dumping more columns in our analysis because:
Extra fields can distract you in your analysis.
You should concentrate on the goals and highest priority questions that face you at each iteration, and not just add the fields that are easiest.
Sometimes the first field(s) that come to mind are not as effective as other, related ones.
Depending on your analysis approach, it may be important to pick fields that can be automated (and, if adding manual to a large analysis, ensuring it will be high impact and you can actually gather the data).
Decision makers (the ultimate consumers of the analysis) probably only care about a handful of fields.
Note: in the course of a deep content analysis you may in fact wind up with a ton of fields, but many of these fields will end up being dead ends (after all, an analysis is an exploration). More importantly, the point is that whenever you add a field you should be methodical about it — if an analysis goes for a long time then you will probably be adding a lot of fields (some of which flow from one to the other).
If you get some fields "for free" (like in an automated tool), then that's fine. But that doesn't mean you shouldn't still think clearly about the fields you really need to meet your goals and apply your approach. If you are going with a completely manual method, then there's even more reason to focus on what columns you add to your inventory.
You need to have:
A goal for your analysis. In other words, what is the reason you are doing the analysis and what's the end result you want? See more.
An approach for your analysis. Brute force, line-by-line, manual content review works in about 80% of the sites, since most sites are not especially complex. But you should take a moment to consider if other approaches (such as Sample, Rules, Repeat) make more sense, especially if you have a complex site. See more.
Key content questions. What questions do you most want answered? See more.
To develop a candidate list:
Go backwards from your goal to consider the data you need to get there
Ensure you have a clear definition of what each field means
Consider how you will get the data
Consider the order in which you could get the data
Of course with the overriding consideration being fields toward your analysis goals, you can review your possible fields looking at:
Are these generally-useful fields?
Overall how difficult will it be to get these fields?
Do you have the basic fields like URL and title?
Do you have few enough fields that you can manage and coordinate?
In general you don't want to just dive in and get all the fields at once, but instead iterate as you better understand and refine the questions. Broadly speaking, you should:
First do a pass of the absolute basics, just to ensure you have a good list.
Do a pass with some categorical information and some user-focused information.
And then iterate from there.
If you are using a brute force, line-by-line approach, then it's time to start slogging through the spreadsheet — otherwise you want to start firing up the tools to get the first pass.