Gospider

Fast web spider written in Go

Usage

Usage: gospider [options]

OptionDescription
-s, --site stringSite to crawl
-S, --sites stringSite list to crawl
-p, --proxy stringProxy (Ex: http://127.0.0.1:8080)
-o, --output stringOutput folder
-u, --user-agent stringUser Agent to use (web: random web, mobi: random mobile, or custom)
--cookie stringCookie to use (e.g., testA=a; testB=b)
-H, --header stringArrayHeader to use (Use multiple flags to set multiple headers)
--burp stringLoad headers and cookie from burp raw HTTP request
--blacklist stringBlacklist URL Regex
--whitelist stringWhitelist URL Regex
--whitelist-domain stringWhitelist Domain
-L, --filter-length stringTurn on length filter
-t, --threads intNumber of threads (Run sites in parallel) (default 1)
-c, --concurrent intMax allowed concurrent requests per domain (default 5)
-d, --depth intRecursion depth for visited URLs (0 = infinite) (default 1)
-k, --delay intDelay before creating new request to matching domains (seconds)
-K, --random-delay intExtra randomized delay added to base delay (seconds)
-m, --timeout intRequest timeout in seconds (default 10)
-B, --baseDisable all and only use HTML content
--jsEnable linkfinder in JavaScript files (default true)
--sitemapTry to crawl sitemap.xml
--robotsTry to crawl robots.txt (default true)
-a, --other-sourceFind URLs from 3rd parties (Archive.org, CommonCrawl, etc.)
-w, --include-subsInclude subdomains crawled from 3rd party sources
-r, --include-other-sourceAlso include other-source URLs (still crawled and requested)
--subsInclude subdomains
--debugTurn on debug mode
--jsonEnable JSON output
-v, --verboseTurn on verbose
-q, --quietSuppress all output except URLs
--no-redirectDisable redirects
--versionCheck version
-l, --lengthTurn on length
-R, --rawEnable raw output

Examples

Quite output

gospider -q -s "https://google.com/"

Run with single site

gospider -s "https://google.com/" -o output -c 10 -d 1

Run with site list

gospider -S sites.txt -o output -c 10 -d 1

Run with 20 sites at the same time with 10 bot each site

gospider -S sites.txt -o output -c 10 -d 1 -t 20

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com)

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source

Also get URLs from 3rd party (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com) and include subdomains

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --include-subs

Use custom header/cookies

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source -H "Accept: */*" -H "Test: test" --cookie "testA=a; testB=b"

gospider -s "https://google.com/" -o output -c 10 -d 1 --other-source --burp burp_req.txt

Blacklist url/file extension.

P/s: gospider blacklisted .(jpg|jpeg|gif|css|tif|tiff|png|ttf|woff|woff2|ico) as default

gospider -s "https://google.com/" -o output -c 10 -d 1 --blacklist ".(woff|pdf)"

Show and Blacklist file length.

gospider -s "https://google.com/" -o output -c 10 -d 1 --length --filter-length "6871,24432"