Katana

Katana is a command-line interface (CLI) web crawling tool written in Golang, designed to be fast, efficient, and provide simple output. It is designed to crawl websites to gather information and endpoints. One of its defining features is the ability to use headless browsing to crawl applications. This means that it can crawl single-page applications (SPAs) built using technologies such as JavaScript, Angular, or React to effectively access and gather information from these types of applications.

Usage

Usage: katana [flags]

Options

Input

OptionDescription
-u, -listtarget url / list to crawl
-resumeresume scan using resume.cfg
-e, -excludeexclude host matching specified filter (‘cdn’, ‘private-ips’, cidr, ip, regex)

Configuration

OptionDescription
-r, -resolverslist of custom resolver (file or comma separated)
-d, -depthmaximum depth to crawl (default 3)
-jc, -js-crawlenable endpoint parsing / crawling in javascript file
-jsl, -jsluiceenable jsluice parsing in javascript file (memory intensive)
-ct, -crawl-durationmaximum duration to crawl the target for (s, m, h, d) (default s)
-kf, -known-filesenable crawling of known files (all,robotstxt,sitemapxml), a minimum depth of 3 is required for proper crawling
-mrs, -max-response-sizemaximum response size to read (default 4194304)
-timeouttime to wait for request in seconds (default 10)
-time-stabletime to wait until the page is stable in seconds (default 1)
-aff, -automatic-form-fillenable automatic form filling (experimental)
-fx, -form-extractionextract form, input, textarea & select elements in jsonl output
-retrynumber of times to retry the request (default 1)
-proxyhttp/socks5 proxy to use
-td, -tech-detectenable technology detection
-H, -headerscustom header/cookie to include in all http request in header:value format (file)
-configpath to the katana configuration file
-fc, -form-configpath to custom form configuration file
-flc, -field-configpath to custom field configuration file
-s, -strategyVisit strategy (depth-first, breadth-first) (default “depth-first”)
-iqp, -ignore-query-paramsIgnore crawling same path with different query-param values
-tlsi, -tls-impersonateenable experimental client hello (ja3) tls randomization
-dr, -disable-redirectsdisable following redirects (default false)
-pc, -path-climbenable path climb (auto crawl parent paths)

Debug

OptionDescription
-health-check, -hcrun diagnostic check up
-elog, -error-logfile to write sent requests error log
-pprof-serverenable pprof server

Headless

OptionDescription
-hl, -headlessenable headless hybrid crawling (experimental)
-sc, -system-chromeuse local installed chrome browser instead of katana installed
-sb, -show-browsershow the browser on the screen with headless mode
-ho, -headless-optionsstart headless chrome with additional options
-nos, -no-sandboxstart headless chrome in —no-sandbox mode
-cdd, -chrome-data-dirpath to store chrome browser data
-scp, -system-chrome-pathuse specified chrome browser for headless crawling
-noi, -no-incognitostart headless chrome without incognito mode
-cwu, -chrome-ws-urluse chrome browser instance launched elsewhere with debugger URL
-xhr, -xhr-extractionextract xhr request url, method in jsonl output

Scope

OptionDescription
-cs, -crawl-scopein scope url regex to be followed by crawler
-cos, -crawl-out-scopeout of scope url regex to be excluded by crawler
-fs, -field-scopepre-defined scope field (dn, rdn, fqdn) or custom regex (e.g., ‘(company-staging.io
-ns, -no-scopedisables host based default scope
-do, -display-out-scopedisplay external endpoint from scoped crawling

Filter

OptionDescription
-mr, -match-regexregex or list of regex to match on output url (cli, file)
-fr, -filter-regexregex or list of regex to filter on output url (cli, file)
-f, -fieldfield to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir) (Deprecated: use -output-template instead)
-sf, -store-fieldfield to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
-em, -extension-matchmatch output for given extension (e.g., -em php,html,js)
-ef, -extension-filterfilter output for given extension (e.g., -ef png,css)
-mdc, -match-conditionmatch response with dsl based condition
-fdc, -filter-conditionfilter response with dsl based condition

Rate-Limit

OptionDescription
-c, -concurrencynumber of concurrent fetchers to use (default 10)
-p, -parallelismnumber of concurrent inputs to process (default 10)
-rd, -delayrequest delay between each request in seconds
-rl, -rate-limitmaximum requests to send per second (default 150)
-rlm, -rate-limit-minutemaximum number of requests to send per minute

Update

OptionDescription
-up, -updateupdate katana to latest version
-duc, -disable-update-checkdisable automatic katana update check

Output

OptionDescription
-o, -outputfile to write output to
-ot, -output-templatecustom output template
-sr, -store-responsestore http requests/responses
-srd, -store-response-dirstore http requests/responses to custom directory
-ncb, -no-clobberdo not overwrite output file
-sfd, -store-field-dirstore per-host field to custom directory
-or, -omit-rawomit raw requests/responses from jsonl output
-ob, -omit-bodyomit response body from jsonl output
-j, -jsonlwrite output in jsonl format
-nc, -no-colordisable output content coloring (ANSI escape codes)
-silentdisplay output only
-v, -verbosedisplay verbose output
-debugdisplay debug output
-versiondisplay project version