Dyota's blog

PowerShell: Backup This Blog

Backups are important! A few events have happened around me recently which reminds me that digital information sometimes is fragile and it's imporant to keep up backups.

This blog is being important enough for me to start doing a regular backup of it, just in case.

Here is a PowerShell script of it, to do just that.

$response = iwr "https://dyota257.bearblog.dev/blog/"
$root = "https://dyota257.bearblog.dev"

function downloadLinks() {
    $response.Links |
        select -Skip 7 |
        ? {
            # skip the nav links at the top
            -not ($_.outerHTML -like "*/blog/?q=*") -and
    
            # skip the hashtags at the bottom
            -not ($_.outerHTML -like "*bearblog.dev*")
        } |
        % {
            [void] ($_.outerHTML -match '(?<=href=").*(?=">)')
    
            $Matches[0]
        } |
        Set-Content '.\links.txt'
}

function downloadAllArticles() {
    cat .\links.txt |
        select -first 5 |
        % {
            $title = $_.split('/')[1]

            $content = (iwr "$root$_").Content

            $content -match '(?<=datetime=").*(?=">)'

            $date = $Matches[0].split('T')[0]

            $content > ".\articles\$date $title.html"
    
        }
}

#powershell #webscraping