Gospider
Fast web spider written in Go
Usage
Usage: gospider [options]
| Option | Description |
|---|---|
-s, --site string | Site to crawl |
-S, --sites string | Site list to crawl |
-p, --proxy string | Proxy (Ex: http://127.0.0.1:8080) |
-o, --output string | Output folder |
-u, --user-agent string | User Agent to use (web: random web, mobi: random mobile, or custom) |
--cookie string | Cookie to use (e.g., testA=a; testB=b) |
-H, --header stringArray | Header to use (Use multiple flags to set multiple headers) |
--burp string | Load headers and cookie from burp raw HTTP request |
--blacklist string | Blacklist URL Regex |
--whitelist string | Whitelist URL Regex |
--whitelist-domain string | Whitelist Domain |
-L, --filter-length string | Turn on length filter |
-t, --threads int | Number of threads (Run sites in parallel) (default 1) |
-c, --concurrent int | Max allowed concurrent requests per domain (default 5) |
-d, --depth int | Recursion depth for visited URLs (0 = infinite) (default 1) |
-k, --delay int | Delay before creating new request to matching domains (seconds) |
-K, --random-delay int | Extra randomized delay added to base delay (seconds) |
-m, --timeout int | Request timeout in seconds (default 10) |
-B, --base | Disable all and only use HTML content |
--js | Enable linkfinder in JavaScript files (default true) |
--sitemap | Try to crawl sitemap.xml |
--robots | Try to crawl robots.txt (default true) |
-a, --other-source | Find URLs from 3rd parties (Archive.org, CommonCrawl, etc.) |
-w, --include-subs | Include subdomains crawled from 3rd party sources |
-r, --include-other-source | Also include other-source URLs (still crawled and requested) |
--subs | Include subdomains |
--debug | Turn on debug mode |
--json | Enable JSON output |
-v, --verbose | Turn on verbose |
-q, --quiet | Suppress all output except URLs |
--no-redirect | Disable redirects |
--version | Check version |
-l, --length | Turn on length |
-R, --raw | Enable raw output |
Examples
Quite output
gospider -q -s "https://google.com/"
Run with single site
gospider -s "https://google.com/" -o output -c 10 -d 1
Run with site list
gospider -S sites.txt -o output -c 10 -d 1
Run with 20 sites at the same time with 10 bot each site
gospider -S sites.txt -o output -c 10 -d 1 -t 20
Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source
Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs
Use custom header/cookies
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"
gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt
Blacklist url/file extension.
P/s: gospider blacklisted .(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico) as default
gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)"
Show and Blacklist file length.
gospider -s "https://google.com/" -o output -c 10 -d 1 --length --filter-length "6871,24432"