Katana
Katana is a command-line interface (CLI) web crawling tool written in Golang, designed to be fast, efficient, and provide simple output. It is designed to crawl websites to gather information and endpoints. One of its defining features is the ability to use headless browsing to crawl applications. This means that it can crawl single-page applications (SPAs) built using technologies such as JavaScript, Angular, or React to effectively access and gather information from these types of applications.
Usage
Usage: katana [flags]
Options
Input
| Option | Description |
|---|---|
-u, -list | target url / list to crawl |
-resume | resume scan using resume.cfg |
-e, -exclude | exclude host matching specified filter (‘cdn’, ‘private-ips’, cidr, ip, regex) |
Configuration
| Option | Description |
|---|---|
-r, -resolvers | list of custom resolver (file or comma separated) |
-d, -depth | maximum depth to crawl (default 3) |
-jc, -js-crawl | enable endpoint parsing / crawling in javascript file |
-jsl, -jsluice | enable jsluice parsing in javascript file (memory intensive) |
-ct, -crawl-duration | maximum duration to crawl the target for (s, m, h, d) (default s) |
-kf, -known-files | enable crawling of known files (all,robotstxt,sitemapxml), a minimum depth of 3 is required for proper crawling |
-mrs, -max-response-size | maximum response size to read (default 4194304) |
-timeout | time to wait for request in seconds (default 10) |
-time-stable | time to wait until the page is stable in seconds (default 1) |
-aff, -automatic-form-fill | enable automatic form filling (experimental) |
-fx, -form-extraction | extract form, input, textarea & select elements in jsonl output |
-retry | number of times to retry the request (default 1) |
-proxy | http/socks5 proxy to use |
-td, -tech-detect | enable technology detection |
-H, -headers | custom header/cookie to include in all http request in header:value format (file) |
-config | path to the katana configuration file |
-fc, -form-config | path to custom form configuration file |
-flc, -field-config | path to custom field configuration file |
-s, -strategy | Visit strategy (depth-first, breadth-first) (default “depth-first”) |
-iqp, -ignore-query-params | Ignore crawling same path with different query-param values |
-tlsi, -tls-impersonate | enable experimental client hello (ja3) tls randomization |
-dr, -disable-redirects | disable following redirects (default false) |
-pc, -path-climb | enable path climb (auto crawl parent paths) |
Debug
| Option | Description |
|---|---|
-health-check, -hc | run diagnostic check up |
-elog, -error-log | file to write sent requests error log |
-pprof-server | enable pprof server |
Headless
| Option | Description |
|---|---|
-hl, -headless | enable headless hybrid crawling (experimental) |
-sc, -system-chrome | use local installed chrome browser instead of katana installed |
-sb, -show-browser | show the browser on the screen with headless mode |
-ho, -headless-options | start headless chrome with additional options |
-nos, -no-sandbox | start headless chrome in —no-sandbox mode |
-cdd, -chrome-data-dir | path to store chrome browser data |
-scp, -system-chrome-path | use specified chrome browser for headless crawling |
-noi, -no-incognito | start headless chrome without incognito mode |
-cwu, -chrome-ws-url | use chrome browser instance launched elsewhere with debugger URL |
-xhr, -xhr-extraction | extract xhr request url, method in jsonl output |
Scope
| Option | Description |
|---|---|
-cs, -crawl-scope | in scope url regex to be followed by crawler |
-cos, -crawl-out-scope | out of scope url regex to be excluded by crawler |
-fs, -field-scope | pre-defined scope field (dn, rdn, fqdn) or custom regex (e.g., ‘(company-staging.io |
-ns, -no-scope | disables host based default scope |
-do, -display-out-scope | display external endpoint from scoped crawling |
Filter
| Option | Description |
|---|---|
-mr, -match-regex | regex or list of regex to match on output url (cli, file) |
-fr, -filter-regex | regex or list of regex to filter on output url (cli, file) |
-f, -field | field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir) (Deprecated: use -output-template instead) |
-sf, -store-field | field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir) |
-em, -extension-match | match output for given extension (e.g., -em php,html,js) |
-ef, -extension-filter | filter output for given extension (e.g., -ef png,css) |
-mdc, -match-condition | match response with dsl based condition |
-fdc, -filter-condition | filter response with dsl based condition |
Rate-Limit
| Option | Description |
|---|---|
-c, -concurrency | number of concurrent fetchers to use (default 10) |
-p, -parallelism | number of concurrent inputs to process (default 10) |
-rd, -delay | request delay between each request in seconds |
-rl, -rate-limit | maximum requests to send per second (default 150) |
-rlm, -rate-limit-minute | maximum number of requests to send per minute |
Update
| Option | Description |
|---|---|
-up, -update | update katana to latest version |
-duc, -disable-update-check | disable automatic katana update check |
Output
| Option | Description |
|---|---|
-o, -output | file to write output to |
-ot, -output-template | custom output template |
-sr, -store-response | store http requests/responses |
-srd, -store-response-dir | store http requests/responses to custom directory |
-ncb, -no-clobber | do not overwrite output file |
-sfd, -store-field-dir | store per-host field to custom directory |
-or, -omit-raw | omit raw requests/responses from jsonl output |
-ob, -omit-body | omit response body from jsonl output |
-j, -jsonl | write output in jsonl format |
-nc, -no-color | disable output content coloring (ANSI escape codes) |
-silent | display output only |
-v, -verbose | display verbose output |
-debug | display debug output |
-version | display project version |