The Importance of Historiography on the Web

Feb 28 2011 by Alexander Dawson | 10 Comments

The Importance of Historiography on the Web

The topic of history immediately draws to mind a dusty classroom in which professors tell stories of war, royalty and civilizations lost to the sands of time.

While traditional history is expressed as a vibrant tapestry of events, dates, people and places, we often forget that the web has its own rich history and a legacy to leave future generations that needs both preservation and recognition.

By examining current problems in how we preserve our digital heritage and through a significant change in our attitude towards web content, we can hope to leave future Internet users with something tangible and useful.

History doesn’t exist separate from our actions; it is built up over time in what we write and record, allowing those in the future to analyze and improve upon our work. Addressing our current perspective of web content as "disposable data" is critical at this time.

Evolution of Knowledge-Sharing

The passing on and recording of knowledge is a time-honored tradition. This practice has spanned generations, ranging from mankind’s early cave paintings to the industrious storage of information, such as can be found in the Library of Alexandria in Egypt.

Yet, while so much emphasis is placed on preserving our rich analog history, our digital past seems to be rapidly disappearing around us; and like the great Egyptian library that was lost to fire, we are now left with large gaps in our understanding of how much of our present web culture came to be.

Creative Commons, freeing content from the claws of copyright and loss.

The average website exists in a single form for a period of time before being reinvented when it goes through a website redesign, but if we value published content, then the need to preserve it should be immeasurable.

Today, many argue that if it does not appear in search engines or if there’s no clear link to it to the web, then it no longer exists. By law, the famous Library of Alexandra preserved any scroll it acquired, yet with the web, we throw out useful old content if visitor numbers are not high enough or because certain copyright laws prohibit us from preserving knowledge soon to be lost.

Digital Curators and Librarians

It would be unfair not to mention some present-day schemes to protect valuable web content from being discarded, such as the Internet Archive and, to a point, Wikipedia and Google.

However, if we look to Geocities being shut down, we can see the damage that a web service’s disappearance can cause to our web culture. With the bookmarking service Delicious being threatened into extinction, we might see valuable bookmarks of its users lost as well.

The knowledge and imprint of our history serves to teach others about the development of the web, and we must accept our role as curators and librarians of this modern digital world.

The Internet Archive is like a museum of old websites, but it hasn’t saved everything.

As a web professional, seeing web content disappear makes me sad, even if its relevancy and accuracy changes over time.

As we produce more and more content, finding true treasures on the web is becoming increasingly difficult. While the average blog may only have fragments of gold, it still reflects the diverse world we inhabit. The purpose of owning a website is to increase visibility, but so many still leave their creations unattended, creating dead, broken links, orphan pages and poor navigation and archival systems.

We should do more to improve our websites and showcase the good content in our archives, even if only to revisit an article or subject from a present-day perspective.

The Danger of Disposable Data

We have bred a society in which information and opinion aren’t valued over the long term. As a result, we are left with no infrastructure to ensure the sustainability of that content.

For example, little exists from websites created in the 1990s, and what can be found is often disjointed and scattered; imagine the state of our web content 20 years from now!

Quite paradoxically, we value our own work so highly, often spending hours upon hours creating beautiful websites and amazing content, and yet we forget them after their 15 minutes of fame.

The BBC archives old information, but even that’s not safe forever.

To wake up from this mentality of waste, we have to be critical in our evaluation of modern web history. We must recognize that as the web changes, formats will shift and media consumption will alter, and we will need to maintain some level of control over the effects of popular trends.

Missing Links and Lost Empires

As a web professional or website owner, you can do plenty to reduce the disappearance of your content on your website. Certain practices have profound benefits and could give users additional reasons to return to your website. If we can expose a website’s history, we provide a more enriching experience full of quality content.

The first thing to do in promoting a healthy archive is to weed out the dead links that accumulate over time. This waste is easy to spot with tools like Xenu Link Sleuth (freeware), which scans every piece of content on a website.

Chronicling a website with thousands of pages can be quite complex, but maintaining an effective website navigation scheme and an up-to-date site map is central to good information architecture.

Automated tools can poke around your website and report dead links.

Next, we can ensure the survival of our content by connecting every page to the rest of the website and listing them all on a site map (as well as creating a robot-readable Sitemaps XML file). Disconnected pages — pages that have no remaining active links to them — are orphans. Orphan pages rarely index well, which negatively impacts the findability of your content and its usefulness to future generations.

Site maps can be easily produced using an app, but you might want to code one yourself.

If you’re the kind of person who shudders at that thought of their earlier work, consider making active revisions of your old content visible on your website; do, however, preserve the original using some form of version control system.

If you redesign your website regularly and change content often, keep older pages available for nostalgic users or those interested in seeing previous versions. Exposing revisions (like a version history file) can also be very beneficial in tracking the progress and evolution of websites.

Some people allow users to revisit previous versions of their website’s layout as a way to showcase their evolving talent.

Donating published content is another practice to consider. While republishing old material ad infinitum is not sensible (because it would create duplicate content for search engines to index and perhaps be regarded as spam), there may come a time when you close down, change direction or overhaul your website.

In such cases, consider donating your posts to other websites (think of it as content acquisition); you could make a bit of money from it and free up useful data.

Certain interest pieces might be useful for Wikipedia or could be donated elsewhere.

While donating outdated or useless content could help you clean up your web presence, the websites or services that receive this old material might not be able to manage it effectively, especially if it comes in at an unsustainable rate; this could, in turn, lead to dead image links, stale content and an increase in 404 errors. Instead, try to use your archived material in other ways, perhaps by promoting earlier articles, as a retrospective of sorts.

This is the first article published on Six Revisions, and yet it could still help a number of readers today.

Finally, the most important precaution for ensuring the survival of your content is to back it up. To this day, many websites still have no substantial process for archiving and backing up their content. What happens if your website is hacked? What if your computer crashes? If the history of computing has shown us anything, it’s that data increasingly disappears as a result of both computer failure and a website or web technology aging.

Historiography for the People

The web is ever-changing, and its history is being erased, written over and lost in poorly maintained archives.

High-quality content — even an individual’s personal reflection on the world on his or her blog — never loses value. Of course, drowning out the spam and fluff helps, but if we value a healthy digital ecosystem, then we will focus on producing things that contribute to our evolving worldwide virtual library.

The content we produce will give future generations a fascinating look at how the web has evolved over time and how web professionals and ordinary folks have carried out their daily tasks.

We should not build websites solely for the here and now, forgetting the mistakes and successes of those who have come before us. By preserving the past and documenting the development of the web, we are immortalizing ourselves, ensuring that we don’t become yet another people who simply fade into oblivion.

Related Content

About the Author

Alexander Dawson is a freelance web designer, author and recreational software developer specializing in web standards, accessibility and UX design. As well as running a business called HiTechy and writing, he spends time on Twitter, SitePoint’s forums and other places, helping those in need.

10 Comments

joshintosh

February 28th, 2011

Not so sure the history of the web is all that important. In order for something to evolve, the weak need to be weeded out, the web’s version of weeding out is no longer being on the grid. I think the history that matters will be preserved naturally. Like a library, not every note or book that was ever written makes it into a library. If we try to keep track of every thought that gets put on the internet, well, I think we will be doing ourselves a disservice. Let the passing thoughts pass.

Ellie K

February 28th, 2011

I liked the article, Alexander Dawson, and disagree with with @joshintosh (cute name though!). History of the internet is relevant in the same way as History of Science or History of Computing. It isn’t necessary to meticulously preserve EVERYTHING (unless it is your own personal stuff and you want to!), but lessons learned from past situations that are analogous to issues affecting us currently, and in the future, are the obvious reasons that internet history should be documented with some degree of care.

Best example: This is more of a hardware storage example, but is relevant to cloud computing, which is definitely part of the internet! Specifically, the lessons learned about 30 years ago regarding optimizing storage disk configurations, read vs write seek times, and avoiding disk contention affecting processor performance have become directly relevant again for spinning instances for upload to the cloud with AWS EC2 or JetS.

I like forward to reading other articles on SixRevisions that just caught my eye, on “History of Web Browsers” and “Evolution of Web Design”.

Young

February 28th, 2011

I’m with Josh. I think it’s unfair to compare the web to say, the Library of Alenxandra, since the content at the library didn’t exactly evolve. The records kept at libraries are in their essence contemporary human thoughts. While this is true to a degree on the web, most “content” survives through many iterations of websites, and if they are lost, chances are people deemed them useless or the author felt so. Even offline historical records should be disposed of if there’s nothing to be learned from them, in my opinion. Natural selection trumps all – your article is saying Neanderthals should have been preserved. It’s not a “waste” as you say – web development has evolved a lot, and for a reason.

aurel

February 28th, 2011

Looking at this issue from a bloging perspective, I think we should evaluate the content and make it easier to be accessed for those that need it (and I believe that there’s always someone that would benefit from old content).
But I don’t think that the websites from the 90s should be active, it would be nice if they would be collected in one place for those that were not active back then, to see how everything started. But the content from 90s (for example) would be useless for us now. So is a case of not losing the content as quickly as we do. For example the website which lets you see their first version, that is a great idea. But if their first version was their 20th re-design then it be worthless

Chris Quinn

February 28th, 2011

Sharing knowledge is the most important topic I think with the Internet. There are so many ways to share and distribute knowledge (this blog being one) and it helps people that are learning new things or just browsing the net. Anyway, great article.

Grün Weiss

March 1st, 2011

nice article, good stuff

Albany

March 1st, 2011

@joshintosh not sure I agree with that stance. Sure, there needs to be an evolution to the web, but that doesn’t mean there’s no value in preserving the history of things that didn’t succeed. History doesn’t just teach us what works, it teaches us what doesn’t work.

PS – Interesting angle for an article, Alexander.

Alexander Dawson

March 1st, 2011

Thanks for the interesting debate points everyone!

@Young: I have to disagree about most content surviving through iteration. While it’s true that older sites that remain unmaintained do disappear over time (and we can put some of that down to Darwinism), the issue of record keeping on the web is only going to become more profound as the Web ages. Consider this, currently, it’s fair to say that the majority of those who have been producing content for the web since it’s inception are still alive. What happens when a site author passes away? It’s not something we often like talking about but the fact remains that when a site owner who’s a respected expert in their field (posting information on a subject that doesn’t evolve as rapidly as web design), if something happened to them, what would happen to their site and it’s content? I had a friend who unfortunately passed away a year ago, and the information on their site was relevant, even too this day, but their family not knowing about it’s value simply didn’t “hand the site over” to someone else to continue, or keep it up. They took it down (the value of human insights was lost in this case).

Here’s the situation, we as a society are placing greater demands on the Web to hold all of our information (books are going digital, as are most records), and releasing this content into the wild for all to read is also a central part in society. What will happen in 100 years time, when many of us will no longer be around? Copyright exists post-death, and there’s plenty of other reasons why content disappears online, businesses going bust, ceasing services, users getting bored and not carrying on with the project. I just think we treat data so poorly these days, that we risk of not having that iterative system you speak about, but having holes in our part of the worlds evolution, and I’ve seen a number of sites I considered valuable disappear in the past few years.

Rachel

March 1st, 2011

If you want to help preserve Internet history, considering joining the efforts of Archive Team (http://www.archiveteam.org) a group made up entirely of volunteers who, when they hear some content service is going down (like the aforementioned Geocities and Delicious, and the soon-to-be-retired Yahoo! video), they mobilize and save as much of the content as they can before it disappears.

Young

March 1st, 2011

I see your point a little more clearly now. Your article seemed to be supporting preservation of web content regardless of content – and a clearer line needs to be drawn between your discussion of content and structure. Most people, when we hear “old sites” we think horrendous non-compliant HTML. If the author or whoever’s maintaining the site doesn’t deem the content valuable enough to keep it up with the times, I’m okay with letting that content disappear.

I can’t agree with your stance on that web content is regarded with less importance than traditional documents. If anything, I think most people are switching over to digital media now and much more conscientious about maintaining these “records” than they were years ago. It just seemed silly to claim that information is better preserved in traditional media, especially if you think back to things like the Cultural Revolution, erasing all “irrelevant” information. I’m sure the Library of Alexandra could keep every record cuz back then, people who could afford to write something down on scrolls were people worth listening to. The proportion of useful info to useless info on the web nowadays is beyond comparison to ancient Greece.

Posthumous management of web content might very well be tricky, as you say. But realistically, with domains expiring and server resources being limited, is there much more to be done than for YOU to take up the content you found relevant from your friend’s site, and publishing it on your site? How long could we keep up that chain of re-publication? Should websites become family heirlooms? Traditional media definitely has the advantage there – so if you wanna preserve content, write a book?

Leave a Comment

Subscribe to the comments on this article.