Bounties attract serious brainpower to the challenge.
I only recently became aware of the pandemic that is linkrot after reading an article by the Columbia Journalism Review in which they stated that 25% of all links were completely inaccessible. Linkrot became more common over time: 6% of links from 2018 had rotted, as compared to 43% of links from 2008 and 72% of links from 1998. 53% of all articles that contained deep links had at least one rotted link.
While I assumed that this must be a relatively new phenomenon, I found articles dating as early as 1998, where they found that 6% of the links on the Webwere broken at that time and that linkrot in May 1998 was double that found by a similar survey in August 1997.
Why does this matter?
Well, while a couple of 404 errors might seem like a minor inconvenience, because the problem is exponential, they start adding up, and it isn't only casually shared social media links that suffer from this, 50% of U.S. Supreme Court opinions contain dead links, and so do 70% of Harvard academic journals.
Because linkrot is not going to stop by itself, we need to find a way to address the issue things like academic research, digital journalism and even law will be greatly affected in the near future if we don't find a solution.
Why do links rot?
There are many reasons why links might die, which is why this is such a difficult problem to solve. Often people change their site names, causing many of the old links to rot, sometmes people simply allow domain registrations to expire, sometimes people can't afford to keep their domains alive, sometimes businesses have their domains shut down after being acquired by another business, sometimes content "drifts" meaning that links take you to different websites than what was originally intended and sometimes governments and businesses purposefully censor certain publications.
Currently Internet Archive is busy curating a collection of screenshots of websites to keep them accessible even after their links die, unfortunately their collection is still far from comprehensive and while screenshots do save the content for future use, it does not stop us from encountering dead links on the internet and we won't be able to go cross reference a massive database of screenshots of dead links to see if the one we are looking for happens to be there.
Blockchain of the internet + page ID of every historic version
Darko SavicMay 25, 2021
What if every page on the internet received its own ID as soon as it was detected by a scanner/bot. And every modification of every page would save it as a newer version.
Blockchain archive of the internet (minus AI-detected malicious/nonsense/spam pages)
Some huge institutions on par with the library of Alexandria would keep the blockchain copies safe and up to date. This would be funded by nation-states and philanthropy.
If any link went missing, your web browser would poll a local server that pulls the data from the blockchain and shows you the missing version as it was saved at the time the link was created.
Please leave the feedback on this idea
The linkrot (LR) number of as a webpage "up-to-date" metrics
jnikolaMay 24, 2021
Although it is not going to stop the appearance of "the rotten links", we could maybe use it as a marketing tool to give web pages another metrics to be evaluated by.
What do you think of a Rotten Link Number or a LinkRot Number (RL/LR) as a metrics that is simply a count of non-working links on the website?
We would build a simple tool that scans for links, opens them, and checks for the validity of the linked object. If the object is there, it scores the link as 1, if it's not, a 0. Then it calculates the percentage of working links compared to all the links on the website and gives a website an overall RL score.
Why?
It could be a nice metric that tells you about how up-to-date is the website. Websites with many rotten links would mean that the site is not regularly maintained and that you should be careful about the facts found there.
It would force people to edit their rotten links immediately in order to keep a high LR ranking, since it would become a metric considered by searching algorithms.
Please leave the feedback on this idea
Spook Louw4 years ago
That way, pages with broken/dead links will become irrelevant and die themselves.
I think this is a great idea and a very practical approach. It could be a problem with legitimate/important pages with legitimate/important links that have gone dead (I'm thinking more along the lines of academic research). You don't want to demerit some of those pages just because a couple of links on them have died, but for those, we could perhaps use the screenshot technique.
Please leave the feedback on this idea
jnikola4 years ago
Spook Louw I agree that it does not solve the problem, but it could force the maintenance service of the important web pages to constantly review and update. It could also encourage startups and new sites with curated, "refurbished" or remastered old content.
Please leave the feedback on this idea
There are archives available, but they don't solve the problem
Spook LouwMay 25, 2021
Upon further research, I've been directed to this list of archives that are available and I found Mementos to be a good tool for searching through all the available archives simultaneously.
However, I think this highlights the scope of the problem more than it solves it and as I have mentioned before, archiving the content of the dead links does not do anything to help with coming across dead links while browsing.
Wouldn't it be possible to have the dead link take you directly to the archive? That would probably defeat the purpose of domain registrations though. Would it be possible to find a way to remove all links when a domain dies or moves? I'm not sure if they can be traced back that way.