A step-by-step htsquirrel tutorial
1
Prerequisites
2
Introduction
2.1
Code and Documentation contributions
2.2
How To Get Support
3
Installing htsquirrel
3.1
Linux
3.2
MacOS
3.3
Windows
4
Creating a new Google Cloud Project
5
Deploying a singleUseCollect VM
5.1
Step one - download JSON credentials for your Google Cloud Project
5.2
Step two - generate the singleUseCollect VM
5.3
Step three - connect to the singleUseCollect VM via SSH
5.4
Step four - manually installing httrack into the singleUseCollect VM
5.5
Step five - starting the web crawl for the assigned website
5.6
Step six - manually running 7zip to compress the completed web crawl
5.7
Step seven - copy the 7z file to Nearline storage and shutdown the VM
6
Deploying a reusableStore VM
6.1
Step one - download JSON credentials for your Google Cloud Project
6.2
Step two - generate the reusableStore VM
7
Final Words
Appendix A - htsquirrel code overview
Appendix B - Creating your own test website
References
Published with bookdown
htsquirrel - repeatedly crawls & archives websites
References