Skip to content

Php library to dump an entire website (HTML, CSS & Javascript)

License

Notifications You must be signed in to change notification settings

Languagewire/html-dumper

Repository files navigation

LanguageWire HtmlDumper library

Packagist Build Coverage Status license

HtmlDumper is a PHP library which downloads a copy of an HTML page and its assets into a target directory.

  • Downloads HTML source code and transforms all URIs into relative paths, creating an updated index.html file.
  • Parses HTML and fetches relevant resources
    • Stylesheets, scripts, images, videos
    • Also works with assets located within CSS files.
  • Removes anchor links to external pages.
  • Does not crawl pages beyond the initial URL.
$url = "https://example.com";
$targetDirectory = "/tmp/htmldump";

$downloader = new \LanguageWire\HtmlDumper\Service\PageDownloader();
if ($downloader->download($url, $targetDirectory)) {
    echo "Sucessfully downloaded $url in $targetDirectory";
}

Requirements

Installation

The recommended way to install HtmlDumper is through Composer.

composer require languagewire/html-dumper

Development

In the build/ folder there is a Dockerfile file which sets up all dependencies needed for local development, runs unit tests and other linters.

Customize build/.env like this:

cd build
cp .env.template .env
nano .env

And then run ./build.sh within the build/ folder:

cd build
./build.sh

License

HtmlDumper is made available under the MIT License (MIT). Please see the LICENSE file for more information.

About

Php library to dump an entire website (HTML, CSS & Javascript)

Resources

License

Stars

Watchers

Forks

Packages

No packages published