htmlq

Like jq, but for HTML. Uses CSS selectors to extract bits of content from HTML files.

Usage

Usage: htmlq [FLAGS] [OPTIONS] [--] [selector]...

Options

OptionDescription
-B, --detect-baseTry to detect the base URL from the <base> tag in the document. If not found, default to the value of --base, if supplied
-w, --ignore-whitespaceWhen printing text nodes, ignore those that consist entirely of whitespace
-p, --prettyPretty-print the serialised output
-t, --textOutput only the contents of text nodes inside selected elements
-a, --attribute <attribute>Only return this attribute (if present) from selected elements
-b, --base <base>Use this URL as the base for links
-f, --filename <FILE>The input file. Defaults to stdin
-o, --output <FILE>The output file. Defaults to stdout
-r, --remove-nodes <SELECTOR>...Remove nodes matching this expression before output. May be specified multiple times