lychee

A fast, async link checker

Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!

Usage

Usage: lychee [OPTIONS] <inputs>...

The inputs (where to get links to check from). These can be: files (e.g. README.md), file globs (e.g. "~/git/*/README.md"), remote URLs (e.g. https://example.com/README.md) or standard input (-). NOTE: Use -- to separate inputs from options that allow multiple arguments

Options

OptionDescription
-c, --config <CONFIG_FILE>Configuration file to use [default: lychee.toml]
-v, --verbose...Set verbosity level; more output per occurrence (e.g. -v or -vv)
-q, --quiet...Less output per occurrence (e.g. -q or -qq)
-n, --no-progressDo not show progress bar. This is recommended for non-interactive shells (e.g. for continuous integration)
--cacheUse request cache stored on disk at .lycheecache
--max-cache-age <MAX_CACHE_AGE>Discard all cached requests older than this duration [default: 1d]
--cache-exclude-status <CACHE_EXCLUDE_STATUS>A list of status codes that will be ignored from the cache
--dumpDon’t perform any link checking. Instead, dump all the links extracted from inputs that would be checked
--dump-inputsDon’t perform any link extraction and checking. Instead, dump all input sources from which links would be collected
--archive <ARCHIVE>Specify the use of a specific web archive. Can be used in combination with --suggest [possible values: wayback]
--suggestSuggest link replacements for broken links, using a web archive. The web archive can be specified with --archive
-m, --max-redirects <MAX_REDIRECTS>Maximum number of allowed redirects [default: 5]
--max-retries <MAX_RETRIES>Maximum number of retries per request [default: 3]
--max-concurrency <MAX_CONCURRENCY>Maximum number of concurrent network requests [default: 128]
-T, --threads <THREADS>Number of threads to utilize. Defaults to number of cores available to the system
-u, --user-agent <USER_AGENT>User agent [default: lychee/0.16.1]
-i, --insecureProceed for server connections considered insecure (invalid TLS)
-s, --scheme <SCHEME>Only test links with the given schemes (e.g. https). Omit to check links with any other scheme. At the moment, we support http, https, file, and mailto
--offlineOnly check local files and block network requests
--include <INCLUDE>URLs to check (supports regex). Has preference over all excludes
--exclude <EXCLUDE>Exclude URLs and mail addresses from checking (supports regex)
--exclude-file <EXCLUDE_FILE>Deprecated; use --exclude-path instead
--exclude-path <EXCLUDE_PATH>Exclude file path from getting checked
-E, --exclude-all-privateExclude all private IPs from checking. Equivalent to --exclude-private --exclude-link-local --exclude-loopback
--exclude-privateExclude private IP address ranges from checking
--exclude-link-localExclude link-local IP address range from checking
--exclude-loopbackExclude loopback IP address range and localhost from checking
--exclude-mailExclude all mail addresses from checking (deprecated; excluded by default)
--include-mailAlso check email addresses
--remap <REMAP>Remap URI matching pattern to different URI
--header <HEADER>Custom request header
-a, --accept <ACCEPT>A List of accepted status codes for valid links
--include-fragmentsEnable the checking of fragments in links
-t, --timeout <TIMEOUT>Website timeout in seconds from connect to response finished [default: 20]
-r, --retry-wait-time <RETRY_WAIT_TIME>Minimum wait time in seconds between retries of failed requests [default: 1]
-X, --method <METHOD>Request method [default: get]
-b, --base <BASE>Base URL or website root directory to check relative URLs e.g. https://example.com or /path/to/public
--basic-auth <BASIC_AUTH>Basic authentication support. E.g. http://example.com username:password
--github-token <GITHUB_TOKEN>GitHub API token to use when checking github.com links, to avoid rate limiting [env: $GITHUB_TOKEN]
--skip-missingSkip missing input files (default is to error if they don’t exist)
--no-ignoreDo not skip files that would otherwise be ignored by ‘.gitignore’, ‘.ignore’, or the global ignore file
--hiddenDo not skip hidden directories and files
--include-verbatimFind links in verbatim sections like pre- and code blocks
--glob-ignore-caseIgnore case when expanding filesystem path glob inputs
-o, --output <OUTPUT>Output file of status report
--mode <MODE>Set the output display mode. Determines how results are presented in the terminal [default: color] [possible values: plain, color, emoji]
-f, --format <FORMAT>Output format of final status report [default: compact] [possible values: compact, detailed, json, markdown, raw]
--require-httpsWhen HTTPS is available, treat HTTP links as errors
--cookie-jar <COOKIE_JAR>Tell lychee to read cookies from the given file. Cookies will be stored in the cookie jar and sent with requests. New cookies will be stored in the cookie jar and existing cookies will be updated

Configuration

The configuration file is a TOML file that can be used to specify the options that are also available on the command line. It comes in handy when you want to specify a lot of options, or when you want to configure lychee for continuous integration as part of a repository (configuration as code).

./lychee.toml (in the current working directory) is used if no other configuration file is specified. Here is an example of a configuration file. Please find the latest version on Github.

#############################  Display  #############################
 
# Verbose program output
# Accepts log level: "error", "warn", "info", "debug", "trace"
verbose = "info"
 
# Don't show interactive progress bar while checking links.
no_progress = false
 
# Path to summary output file.
output = ".config.dummy.report.md"
 
#############################  Cache  ###############################
 
# Enable link caching. This can be helpful to avoid checking the same links on
# multiple runs.
cache = true
 
# Discard all cached requests older than this duration.
max_cache_age = "2d"
 
#############################  Runtime  #############################
 
# Number of threads to utilize.
# Defaults to number of cores available to the system if omitted.
threads = 2
 
# Maximum number of allowed redirects.
max_redirects = 10
 
# Maximum number of allowed retries before a link is declared dead.
max_retries = 2
 
# Maximum number of concurrent link checks.
max_concurrency = 14
 
#############################  Requests  ############################
 
# User agent to send with each request.
user_agent = "curl/7.83. 1"
 
# Website timeout from connect to response finished.
timeout = 20
 
# Minimum wait time in seconds between retries of failed requests.
retry_wait_time = 2
 
# Comma-separated list of accepted status codes for valid links.
# Supported values are:
#
# accept = ["200..=204", "429"]
# accept = "200..=204, 429"
# accept = ["200", "429"]
# accept = "200, 429"
accept = ["200", "429"]
 
# Proceed for server connections considered insecure (invalid TLS).
insecure = false
 
# Only test links with the given schemes (e.g. https).
# Omit to check links with any other scheme.
# At the moment, we support http, https, file, and mailto.
scheme = ["https"]
 
# When links are available using HTTPS, treat HTTP links as errors.
require_https = false
 
# Request method
method = "get"
 
# Custom request headers
headers = []
 
# Remap URI matching pattern to different URI.
remap = ["https://example.com http://example.invalid"]
 
# Base URL or website root directory to check relative URLs.
base = "https://example.com"
 
# HTTP basic auth support. This will be the username and password passed to the
# authorization HTTP header. See
# <https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization>
basic_auth = ["example.com user:pwd"]
 
#############################  Exclusions  ##########################
 
# Skip missing input files (default is to error if they don't exist).
skip_missing = false
 
# Check links inside `<code>` and `<pre>` blocks as well as Markdown code
# blocks.
include_verbatim = false
 
# Ignore case of paths when matching glob patterns.
glob_ignore_case = false
 
# Exclude URLs and mail addresses from checking (supports regex).
exclude = ['^https://www\.linkedin\.com', '^https://web\.archive\.org/web/']
 
# Exclude these filesystem paths from getting checked.
exclude_path = ["file/path/to/Ignore", "./other/file/path/to/Ignore"]
 
# URLs to check (supports regex). Has preference over all excludes.
include = ['gist\.github\.com.*']
 
# Exclude all private IPs from checking.
# Equivalent to setting `exclude_private`, `exclude_link_local`, and
# `exclude_loopback` to true.
exclude_all_private = false
 
# Exclude private IP address ranges from checking.
exclude_private = false
 
# Exclude link-local IP address range from checking.
exclude_link_local = false
 
# Exclude loopback IP address range and localhost from checking.
exclude_loopback = false
 
# Check mail addresses
include_mail = true

GitHub Action

lychee is also available as a GitHub Action. This way you can set up a job which regularly checks all links in your repository. If you like, it can open an issue when lychee finds problems with your links.

Here is a full example of a GitHub workflow file:

It will check all repository links once per day and create an issue in case of errors. Save this under .github/workflows/links.yml:

name: Links
 
on:
  repository_dispatch:
  workflow_dispatch:
  schedule:
    - cron: "00 18 * * *"
 
jobs:
  linkChecker:
    runs-on: ubuntu-latest
    permissions:
      issues: write # required for peter-evans/create-issue-from-file
    steps:
      - uses: actions/checkout@v4
 
      - name: Link Checker
        id: lychee
        uses: lycheeverse/lychee-action@v2
 
      - name: Create Issue From File
        if: env.exit_code != 0
        uses: peter-evans/create-issue-from-file@v5
        with:
          title: Link Checker Report
          content-filepath: ./lychee/out.md
          labels: report, automated issue

Here is how to pass the arguments.

- name: Link Checker
  uses: lycheeverse/lychee-action@v2
  with:
    # Check all markdown, html and reStructuredText files in repo (default)
    args: --base . --verbose --no-progress './**/*.md' './**/*.html' './**/*.rst'
    # Use json as output format (instead of markdown)
    format: json
    # Use different output file path
    output: /tmp/foo.txt
    # Use a custom GitHub token, which 
    token: ${{ secrets.CUSTOM_TOKEN }}
    # Don't fail action on broken links
    fail: false

Examples

Check All Links In Current Directory:

The following command recursively checks all links in all supported files inside the current directory.

lychee .

Check All Links On A Website:

lychee https://example.com

Check Only Specific Files:

lychee README.md
lychee test.html info.txt
lychee test.html info.txt https://example.com

Check Links In Directories, But Block All Network Requests:

lychee --offline path/to/directory

Check Links In A Remote File:

lychee https://raw.githubusercontent.com/lycheeverse/lychee/master/README.md

Check links from stdin:

cat test.md | lychee -
echo 'https://example.com' | lychee -