A Project of
|Guidelines||Rants||Patterns||Poems||Services||Classes||Press||Blog||Resources||About Us||Site Map|
The problems with massive content
As Web sites expand in size, and as more people in every organization pour content onto the sites, new problems emerge:
Inconsistent structure and format
A series of documents, posted as individual pages, turn out to be organized differently.
Perhaps they were written by different people at different times for different purposes, but now all those different articles appear together on the site.
Visitors who get used to product descriptions that start with a challenge, and go to a solution, followed by a section of features and benefits may be a bit puzzled to find another product description that only offers a list of features.
From department to department, the structure of similar pages, the layout, and the interface all subtly change, which is challenging and sometimes completely frustrating to visitors--as well as software.
What kind of thing am I creating? (Full chapter from Hot Text, in PDF, 728K, or 12 minutes at 56K)
The old system of hand coding HTML tags, dropping in the standard elements of an interface, and publishing, one page at a time, just won't work any more, when a company publishes thousands, or hundreds of thousands of pages a month.
The process has to be automated.
Gigantic chunks of information, instead of pinpointed answers.
When visitors search for a specific fact, they often receive a 20-page report, or a 300-page manual, and the site says, in effect, "It's in here somewhere. Good luck."
That approach was bad enough when users got actual books, but on the Web, it's crazy.
People need fast access to a particular fact, and just that fact, even if it appears in a tiny paragraph, sentence, or phrase. You must be able to serve up the smallest chunks of information that someone might want to look at.
Lack of customization
When everything appears inside large documents, it's difficult to create different content for different visitors.
You need to offer parts of these documents, in a new order, adding a few elements tailored just for that niche audience.
But taking apart an existing whole, and rebuilding it, can be tedious if you work in word-processing software.
On the other hand, if you can treat the pieces as objects in a database, then you can issue a wide variety of reports by picking and choosing different components.
Humans are reading your stuff, but software may have to read it, too. Software often manipulates the content, transforms the structure, and adjusts the format to deliver a customized version to a particular user.
You may want to be able to send down a Java applet, for instance, that offers to re-order the list of book titles by date, by author, by price, or whatever--on the client's machine, without having to go back to the server and ask the original database to perform this chore.
To aid the software, your text must contain tags indicating what each component is, preferably creating and inserting eXtensible Markup Language (XML) tags that indicate what each element is and providing a schema or a Document Type Definition to tell the software how each element fits into the overall structure.
That way, the software can quickly tell which paragraph contains a date, which contains an author's name, and then--if the user asks for a list sorted chronologically, or by author--reorder the paragraphs.
You are now writing for two audiences, one made of code, and the other of flesh--software and wetware.
These large problems are forcing content creators to take apart entire documents, breaking them down into their components, and building new assemblages out of those chunks on the fly, to offer personalized, updated, consistently organized content through software, rather than manual control.
To identify the chunks, we are being forced to expand the tags with which we mark up our text. We no longer put in formatting tags (bold, italic, Tahoma). Instead we use tags that define the meaningful content of each element (step, caption, tip). We are moving toward semantic markup.
Writing that Works!