How to save a web page to the Internet Archive

02 August 2014

This short tutorial shows how to take a snapshot of a web page, and save it to the Internet Archive’s Wayback Machine.

Method 1: web interface

  1. Go to the Wayback website: https://archive.org/web/
  2. Paste the URL of the page you want to archive into the Save Page Now box (at the bottom-right).
  3. Click on the Save Page button (or press enter).
  4. Wait while the page is being crawled. Once the archiving process is complete, the URL of the archived page appears.

Method 2: bookmarklet

This method is faster than using the web interface, but you will first need to install a bookmarklet (which is just a browser bookmark that contains some JavaScript).

Installation

  1. Go to the Save Page to Wayback Machine Bookmarklet link here: http://marklets.com/Save%20Page%20to%20Wayback%20Machine.aspx

  2. Click at the left-hand site of the URL bar, and drag it to the bookmarks toolbar of your browser. The figure below shows how this works in FireFox:

    Installation of bookmarklet

    Alternatively you can also use Add Bookmark in the Bookmarks menu.

Using the bookmarklet

  1. Open the web page that you want to save in your browser.
  2. Click on Save Page to Wayback Machine in the bookmarks toolbar.
  3. Wait while the page is being crawled. Once the archiving process is complete, the URL of the archived page appears.

Method 3: Chrome extension

If you’re using the Google Chrome browser, you may want to check out Jimmy Lin’s “Save a Page” extension. Once installed, it allows you to save a page by simply right-clicking on it. The extension can be found here:

https://github.com/lintool/chrome-archive-this-page

Just follow the installation instructions on that page.

Limitations

  • Webmasters can use robots.txt to prevent web crawlers from crawling/saving anything on their website.
  • If a webmaster decides to change the robots.txt permissions at some point in the future, a saved page may be removed from the Wayback Machine. For details see: https://archive.org/about/exclude.php.

Acknowledgement

This tutorial partially draws from a blog post by Gary Price on Search Engine Land.




Search

Tags

Archive

2024

November

October

March

2023

June

May

March

February

January

2022

November

June

April

March

2021

September

February

2020

September

June

April

March

February

2019

September

April

March

January

2018

July

April

2017

July

June

April

January

2016

December

April

March

2015

December

November

October

July

April

March

January

2014

December

November

October

September

August

January

2013

October

September

August

July

May

April

January

2012

December

September

August

July

June

April

January

2011

December

September

July

June

2010

December

Feeds

RSS

ATOM