How to save a web page to the Internet Archive

02 August 2014

This short tutorial shows how to take a snapshot of a web page, and save it to the Internet Archive’s Wayback Machine.

Method 1: web interface

  1. Go to the Wayback website: https://archive.org/web/
  2. Paste the URL of the page you want to archive into the Save Page Now box (at the bottom-right).
  3. Click on the Save Page button (or press enter).
  4. Wait while the page is being crawled. Once the archiving process is complete, the URL of the archived page appears.

Method 2: bookmarklet

This method is faster than using the web interface, but you will first need to install a bookmarklet (which is just a browser bookmark that contains some JavaScript).

Installation

  1. Go to the Save Page to Wayback Machine Bookmarklet link here: http://marklets.com/Save%20Page%20to%20Wayback%20Machine.aspx

  2. Click at the left-hand site of the URL bar, and drag it to the bookmarks toolbar of your browser. The figure below shows how this works in FireFox:

    Installation of bookmarklet

    Alternatively you can also use Add Bookmark in the Bookmarks menu.

Using the bookmarklet

  1. Open the web page that you want to save in your browser.
  2. Click on Save Page to Wayback Machine in the bookmarks toolbar.
  3. Wait while the page is being crawled. Once the archiving process is complete, the URL of the archived page appears.

Method 3: Chrome extension

If you’re using the Google Chrome browser, you may want to check out Jimmy Lin’s “Save a Page” extension. Once installed, it allows you to save a page by simply right-clicking on it. The extension can be found here:

https://github.com/lintool/chrome-archive-this-page

Just follow the installation instructions on that page.

Limitations

  • Webmasters can use robots.txt to prevent web crawlers from crawling/saving anything on their website.
  • If a webmaster decides to change the robots.txt permissions at some point in the future, a saved page may be removed from the Wayback Machine. For details see: https://archive.org/about/exclude.php.

Acknowledgement

This tutorial partially draws from a blog post by Gary Price on Search Engine Land.



Comments

Post a comment by replying to this post using your ActivityPub (e.g. Mastodon) account.
  • avatar corneliusroemer wrote (archived comment):

    This Python based CLI works very well, very convenient for scripting and quick saving and archiving in general

    https://github.com/palewire/savepagenow

    Installation: pip install savepagenow Usage: savepagenow URL

    2022-10-21T12:58:01Z

    Search

    Tags

    Archive

    2025

    April

    2024

    December

    November

    October

    March

    2023

    June

    May

    March

    February

    January

    2022

    November

    June

    April

    March

    2021

    September

    February

    2020

    September

    June

    April

    March

    February

    2019

    September

    April

    March

    January

    2018

    July

    April

    2017

    July

    June

    April

    January

    2016

    December

    April

    March

    2015

    December

    November

    October

    July

    April

    March

    January

    2014

    December

    November

    October

    September

    August

    January

    2013

    October

    September

    August

    July

    May

    April

    January

    2012

    December

    September

    August

    July

    June

    April

    January

    2011

    December

    September

    July

    June

    2010

    December

    Feeds

    RSS

    ATOM