Website Archive with Automated Screenshots in Astro with Playwright & GitHub Actions

Cosmo the Space JellyfishonApril 14th, 2023

Have you ever used the Wayback Machine from the Web Archive? It’s an incredible tool that lets you access past versions of a website as a snapshot in history, complete with actual HTML, images, and more. In this tutorial, we’re going to build our own website archive that uses GitHub Actions to automatically take a screenshot with Playwright, upload it to Cloudinary, and create a nice static web archive app in Astro.

Table of Contents

Getting Started
Setting Up the Project
Capturing a Screenshot with Playwright
Uploading the Screenshot to Cloudinary
Saving a local file with fs
Automating the Process with GitHub Actions
Displaying the Archives on the Astro Page

Getting Started

We’ll be working in a new Astro project. If you want to follow along, you can jump in with this starter: https://github.com/colbyfayock/demo-web-archive-starter

Setting Up the Project

Inside your Astro project, create a new directory called scripts containing a new file, archive.js. In this file, we’ll set up a script to grab information about the website and the screenshots.

First, we’ll need some data to display on the page. Inside the archive.js file, create an object with the following structure:

const url = 'https://spacejelly.dev';
const date = new Date(Date.now());

const archive = {
  url,
  image: {
    url: '',
    width: 0,
    height: 0,
  },
  date,
};

This object will include the URL we want to take a screenshot of, an object for the image (including the URL, width, and height), and the current date.

Capturing a Screenshot with Playwright

To take the screenshot of the website, we’ll use Playwright. Playwright is a fantastic tool for browser automation and testing, and its API makes it easy to take screenshots of websites.

Install Playwright as a development dependency:

npm install playwright --save-dev

Then, import the chromium browser into your archive.js file:

import { chromium } from 'playwright';

Next, we’ll use Playwright to capture a screenshot. Place the following above the archive object:

// Launch the browser and create a new page
const browser = await chromium.launch();
const page = await browser.newPage();

// Navigate to the specified URL
await page.goto(url);

// Take a screenshot
await page.screenshot({
  path: `screenshots/${date.toISOString()}.png`
  fullPage: true
});

// Close the browser
await browser.close();

If you now run the script:

node scripts/archive.js

You’ll see Playwright will create a new screenshot in the directory we specified.

Next we need to upload that screenshot to make it available to use in our app.

Uploading the Screenshot to Cloudinary

Cloudinary is a powerful media management platform that provides various features for optimizing, transforming and delivering images and videos. We’ll use Cloudinary to store and serve the screenshots of our website archives.

First, install the Cloudinary Node SDK:

npm install cloudinary --save-dev

Next, import and configure Cloudinary in your archive.js file:

import { v2 as cloudinary } from 'cloudinary';

cloudinary.config({
  cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
  api_key: process.env.CLOUDINARY_API_KEY,
  api_secret: process.env.CLOUDINARY_API_SECRET,
});

In the above, we’re using environment variables, so we need to also create a new file in our project called .env and add the values from our Cloudinary account:

CLOUDINARY_CLOUD_NAME="<Your Cloud Name>"
CLOUDINARY_API_KEY="<Your API Key>"
CLOUDINARY_API_SECRET="<Your API Secret>"

However, by default, our script won’t see environment variables. We can use dotenv to easily pull those in.

In your terminal run:

npm install dotenv --save-dev

And at the top of the file, we need to now import dotenv with:

import * as dotenv from 'dotenv';

And then invoke it with:

dotenv.config();

With everything set up, we can update our code to perform the upload:

const results = await cloudinary.uploader.upload(`screenshots/${date.toISOString()}.png`);

This code will now store the image in Cloudinary.

To store that result, we can now add the results to our archive object:

const archive = {
  url,
  date,
  image: {
    url: results.secure_url,
    width: results.width,
    height: results.height
  }
}

Saving a local file with fs

With our script collecting all of the data we need, we can now store the data in a local file which we’ll use in our Astro app to pull everything in.

We’ll use fs to do this which comes with Node.

First import this into your project:

import { promises as fs } from 'fs';

And with it imported, we can write our file:

await fs.writeFile(`./archives/${date.toISOString()}.json`, JSON.stringify(archive, null, 2))

Before running the script, create a folder manually in the root of the project called archives.

Now if you run your script again, we’ll see a new file is created with all of our data!

Automating the Process with GitHub Actions

Now that we’ve set everything up to take a screenshot and upload it to Cloudinary, let’s automate this process using GitHub Actions.

GitHub Actions is a powerful tool for automating tasks, CI/CD workflows, and more, directly within your GitHub repository.

First, create a .github/workflows directory in your project and add a new YAML file called archive.yml. In this file, we’ll define a GitHub Actions workflow to run our script:

name: Archive

on:
  schedule:
    - cron: "0 0 * * *"
  workflow_dispatch:

jobs:
  archive:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v2

      - uses: actions/setup-node@v3
        with:
          node-version: 18

      - run: npm ci

      - run: npx playwright install --with-deps

      - run: node scripts/archive.js
        env:
          PUBLIC_CLOUDINARY_CLOUD_NAME: ${{ secrets.PUBLIC_CLOUDINARY_CLOUD_NAME }}
          CLOUDINARY_API_KEY: ${{ secrets.CLOUDINARY_API_KEY }}
          CLOUDINARY_API_SECRET: ${{ secrets.CLOUDINARY_API_SECRET }}

      - uses: mikeal/publish-to-github-action@master
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          BRANCH_NAME: 'main'

This workflow consists of several steps:

Check out the repository
Set up Node.js with the specified version
Install the project dependencies
Install Playwright manually with its dependencies included as recommended by Microsoft
Run the archive.js script
Commit and push the changes to the repository

Make sure to configure the Cloudinary secrets in your GitHub repository settings to securely use the Cloudinary API keys in the workflow as well as allow GitHub Actions to write files to the repository.

Now, whenever the workflow is triggered (manually or on a schedule), the script will run, take a screenshot, upload it to Cloudinary, and add the archive file to your project!

Displaying the Archives on the Astro Page

Finally, let’s display the archives on the Astro page. In your index.astro, import the archives and map through them to create a list of images and dates:

const files = await Astro.glob('../../archives/*.json');

const archives = files.map( file => {
  const data = file.default;
  return data;
});

We use Astro.glob to easily grab our local files.

Then in the UI, we can loop through these and display them in our page:

<ul class="grid">
  {archives.map(archive => {
    return (
      <li>
        <a href={archive.image.url}>
          <img width={archive.image.width} height={archive.image.height} src={archive.image.url)} alt="Screenshot" />
        </a>
        <p>{ new Date(archive.date).toLocaleString() }</p>
      </li>
    )
  })}
</ul>

And that’s it! You now have a website archive that automatically updates with screenshots of your website, thanks to Playwright, GitHub Actions, Cloudinary, and Astro.

Last updated on April 26th, 2023.