Government websites have undergone massive changes since President Donald Trump returned to office.
Some of the changes are routine — like swapping out the current president and vice president for their predecessors on the White House’s official site.
But other changes go much further. Several sites — like USAID.gov, ReproductiveRights.gov, and the Spanish-language version of WhiteHouse.gov — have gone offline. Remaining sites have been scrubbed of certain data and terminology in order to comply with Trump’s executive orders targeting “gender ideology” and DEI.
It’s an acceleration of a problem known as digital decay — or linkrot. Large quantities of the internet are disappearing as media outlets go under, companies upgrade their web infrastructure, or organizations take down information they believe is no longer valuable or relevant. A recent Pew Research Center study found that 38 percent of webpages that existed in 2013 are no longer available. Because so much of our culture now happens online, losing those pages means losing part of the record of ourselves.
Mark Graham, director of the Wayback Machine, joined Sean Rameswaram on Today, Explained to talk about digital decay, what his team is doing to combat the problem both generally and during Trump’s second term, and why internet preservation is so important.
Below is an excerpt of the conversation, edited for length and clarity. There’s much more in the full podcast, so listen to Today, Explained wherever you get podcasts, including Apple Podcasts, Spotify, and Stitcher.
For people who have maybe stumbled upon your website but don’t really know what you do, can you give them a sense of the things that you guys have saved in 30 years?
Where do I begin? It’s like walking into a very large library and saying, “Show me your favorite book.”
Last year, there was a big news story that MTV News was shut down. The founding editor wrote about it on LinkedIn, and there were a lot of other editors talking about it: “My God, all of our articles are gone. They’re missing.” And I just casually waded into the conversation and went, “Hi, um … check the Wayback Machine.”
They were like, ‘Oh my God, you guys got it all. What did you do?’ We didn’t do anything when the site went down because we’ve been doing our job all along. We’ve been working to archive the public web, as it’s published, on an ongoing continuous basis. If we have to start paying attention to something after it’s gone down, that means we screwed up.
So what are you guys doing in advance of these sites going down to make sure that people can find out what Everlast was singing about in 2004?
We set our web crawlers and archiving software out on a mission every day to identify and to download web pages and related web-based resources. We bring in millions and millions of URLs every day that are signals of where new material is being published on the web. And we make sure that we archive all of those URLs and all the web pages associated with those URLs.
Then, we look at those pages, and we identify links to other pages. And then we go to those pages and we archive them. That’s where you get this metaphor of crawling like a spider throughout this web.
The net result of it is that we add more than a billion archived URLs to the Wayback Machine every day. This material that’s added to the Wayback Machine is indexed and it’s immediately available to people who go to web dot archive.org and enter in a URL. They are then able to see a history of archives that we have of that web page that was available from the URL at any given time.
“That’s where you get this metaphor of crawling like a spider throughout this web.”
I want to talk about government websites, because that’s the reason we’re having this conversation today. I think most people probably think the government will take care of archiving government websites. But here we are in a new administration and websites are disappearing, coming back online, and people are worried. When you — an archivist of the internet — see this happening, how do you react to that? Is it better or worse than regular, non-governmental websites going offline?
Well, as an American, my tax dollars help pay for some of this stuff and much of it is a benefit to people. Certainly my first reaction is: That might not be such a good thing.
I do want to underscore that the National Archives and Records Administration does do archiving as well, and the Library of Congress. So it’s not like we’re the only game in town. But for whatever reason, we seem to be one of the main players in the space of trying to archive much of the public web, including — and right now, especially — US government websites and making those archives available in near real time.
Were you caught off-guard when you saw the new administration removing web pages, removing websites?
In some respects, this is normal and expected. It’s what’s happened, frankly, for each administration in the time that we’ve been working on this effort. I mean, look, it’s under new management, right? You wouldn’t expect the WhiteHouse.gov website under any new presidential administration to be the same as it was before. You’re going to see the bios of the people that are part of the current administration, the news of that administration. We go out of our way to try to anticipate the frequency in which web pages should be archived so that we have a pretty good shot at getting those changes.
You’re saying that the WhiteHouse.gov site obviously changes administration to administration. I think to some degree people understand that: Joe Biden’s administration probably wouldn’t have been posting trolly Valentines about immigration to their Instagram account a year ago. But what we’re seeing here is websites that people need — websites that record public health information going offline — briefly, permanently, what have you.
Is that a different degree of erasing the historical record — or messing with the historical record — than we’ve seen?
That’s true. It is. It’s different. It’s certainly different in terms of the number [of changes] — seemingly! We’re still in the early stages of this administration, but yeah, I’d say on the face of it, you’re right. Historically, we haven’t seen major US government websites taken offline like we did, for example, with regard to USAID. But I’m going to leave that kind of analysis to others, and really just focus on trying to archive the material.
The Wayback Machine and the Internet Archive are mostly funded through donations: the generosity of people, institutions, even governments. Is that going to be enough to archive the internet to the extent that future generations will want and need?
“Enough” is a very subjective term. As an archivist, for me, it’s never enough. I don’t know, and no one knows, what is going to be of use, value, importance in the future — maybe even the near future of tomorrow, much less the very far-off future. Since millions of people use our site on a daily basis, we get a lot of feedback from them. It motivates us, but it also helps direct us and inspires us to continuously try to do a better job at being the best library that we can be.
“As an archivist, for me, it’s never enough.”
You guys have been at this for nearly three decades. Certainly, you’ve saved a lot of stuff. Certainly, a lot of stuff has fallen through the cracks. I wonder, is there something that slipped through the cracks that might suggest to our audience what is lost when we can’t archive to the extent we want to, or need to?
Okay, I got one! This is just in recent history. Apparently there was a page up on the CDC website about bird flu last week that was only up for a few minutes, and no one got it.
And by losing that fleeting web page, that one maybe minor, maybe major web page about bird flu on the CDC website, what are we losing?
Well, we’re losing part of the story, right? We’re losing part of our understanding of the evolution of arguably a significant health issue. We don’t know where this is going to go. I guess that’s the other point, right? You don’t know now what is going to be very important in the near or longer term.
In the time of Martin Luther, there were raging debates. Much of that debate took the form of things that were written on pamphlets. The pamphlets at the time were considered of little value: People read them and they shared them, but they didn’t necessarily save them. So today, a scholar of that time — or someone like me, who is strangely curious — what I would give for a collection of those pamphlets.
You are comparing, in a way, a CDC website to the Protestant Reformation. But I think you mean it, don’t you?
I do! Because I don’t know. One really can’t know without the benefit of the long historical view. That’s not something that we have access to today. Why? Because we don’t have a real time machine.
Recent Comments