Styles

Vale has a powerful extension system that doesn't require knowledge of any programming language. Instead, it uses collections of individual YAML files (or "rules") to enforce particular writing constructs.

HeadingPunctuation.yml
# An example rule from the "Microsoft" style.
extends: existence
message: "Don't use end punctuation in headings."
link: https://docs.microsoft.com/en-us/style-guide/punctuation/periods
nonword: true
level: warning
scope: heading
action:
name: edit
params:
- remove
- '.?!'
tokens:
- '[a-z0-9][.?!](?:\s|$)'

These collections are referred to as styles and are organized in a nested folder structure at a user-specified location (see Configuration). For example,

styles/
β”œβ”€β”€ base/
β”‚ β”œβ”€β”€ ComplexWords.yml
β”‚ β”œβ”€β”€ SentenceLength.yml
β”‚ ...
β”œβ”€β”€ blog/
β”‚ β”œβ”€β”€ TechTerms.yml
β”‚ ...
└── docs/
β”œβ”€β”€ Branding.yml
...

where base, blog, and docs are your styles that each contain certain rules.

Extension points

Heads up!

Vale uses Go's regexp package to evaluate all patterns in rule definitions. This means that lookarounds and backreferences aren't supported.

This shouldn't be a limitation, though, as Vale offers its own ways to achieve same behavior.

The building blocks of styles are rules (YAML files ending in .yml), which utilize extension points to perform specific tasks.

The basic structure of a rule consists of a small header (shown below) followed by extension-specific arguments.

# All rules should define the following header keys:
#
# `extends` indicates the extension point being used (see below for information
# on the possible values).
extends: existence
# `message` is shown to the user when the rule is broken.
#
# Many extension points accept format specifiers (%s), which are replaced by
# extracted values. See the exention-specific sections below for more details.
message: "Consider removing '%s'"
# `level` assigns the rule's severity.
#
# The accepted values are suggestion, warning, and error.
level: warning
# `scope` specifies where this rule should apply -- e.g., headings, sentences, etc.
#
# See the Markup section for more information on scoping.
scope: heading
# `link` gives the source for this rule.
link: 'https://errata.ai/'
# The number of times this rule should raise an alert.
#
# By default, there is no limit.
limit: 1

The available extension points are discussed below.

existence

Example definition

extends: existence
message: Consider removing '%s'
level: warning
code: false
ignorecase: true
tokens:
- appears to be
- arguably

Key summary

NameTypeDescription
appendboolAdds raw to the end of tokens, assuming both are defined.
ignorecaseboolMakes all matches case-insensitive.
nonwordboolRemoves the default word boundaries (\b).
rawarrayA list of tokens to be concatenated into a pattern.
tokensarrayA list of tokens to be transformed into a non-capturing group.

The most general extension point is existence. As its name implies, it looks for the "existence" of particular tokens.

These tokens can be anything from simple phrases (as in the above example) to complex regular expressionsβ€”e.g., the number of spaces between sentences or the position of punctuation after quotes.

You may define the tokens as elements of lists named either tokens (shown above) or raw. The former converts its elements into a word-bounded, non-capturing group. For instance,

tokens:
- appears to be
- arguably

becomes \b(?:appears to be|arguably)\b.

raw, on the other hand, simply concatenates its elementsβ€”so, something like

raw:
- '(?:foo)\sbar'
- '(baz)'

becomes (?:foo)\sbar(baz).

substitution

Example definition

extends: substitution
message: Consider using '%s' instead of '%s'
level: warning
ignorecase: false
# swap maps tokens in form of bad: good
swap:
abundance: plenty
accelerate: speed up

Key summary

NameTypeDescription
appendboolAdds raw to the end of tokens, assuming both are defined.
ignorecaseboolMakes all matches case-insensitive.
nonwordboolRemoves the default word boundaries (\b).
swapmapA sequence of observed: expected pairs.

substitution associates a string with a preferred form. If we want to suggest the use of "plenty" instead of "abundance," for example, we'd write:

swap:
abundance: plenty

The keys may be regular expressions, but they can't include nested capture groups:

swap:
'(?:give|gave) rise to': lead to # this is okay
'(give|gave) rise to': lead to # this is bad!

Like existence, substitution accepts the keys ignorecase and nonword.

substitution can have one or two %s format specifiers in its message. This allows us to do either of the following:

message: "Consider using '%s' instead of '%s'"
# or
message: "Consider using '%s'"

occurrence

Example definition

extends: occurrence
message: "More than 3 commas!"
level: error
# Here, we're counting the number of times a comma appears
# in a sentence.
#
# If it occurs more than 3 times, we'll flag it.
scope: sentence
ignorecase: false
max: 3
token: ','

Key summary

NameTypeDescription
maxintThe maximum amount of times token may appear in a given scope.
minintThe minimum amount of times token has to appear in a given scope.
tokenstringThe token of interest.

occurrence enforces the maximum or minimum number of times a particular token can appear in a given scope.

In the example above, we're limiting the number of words per sentence.

This is the only extension point that doesn't accept a format specifier in its message.

repetition

Example definition

extends: repetition
message: "'%s' is repeated!"
level: error
alpha: true
tokens:
- '[^\s]+'

Key summary

NameTypeDescription
ignorecaseboolMakes all matches case-insensitive.
alphaboolLimits all matches to alphanumeric tokens.
tokensarrayA list of tokens to be transformed into a non-capturing group.

repetition looks for repeated occurrences of its tokens. If ignorecase is set to true, it'll convert all tokens to lower case for comparison purposes.

consistency

Example definition

extends: consistency
message: "Inconsistent spelling of '%s'"
level: error
scope: text
ignorecase: true
nonword: false
# We only want one of these to appear.
either:
advisor: adviser
centre: center

Key summary

NameTypeDescription
nonwordboolRemoves the default word boundaries (\b).
ignorecaseboolMakes all matches case-insensitive.
eitherarrayA map of option 1: option 2 pairs of which only one may appear.

consistency will ensure that a key and its value (e.g., "advisor" and "adviser") don't both occur in its scope.

conditional

Example definition

extends: conditional
message: "'%s' has no definition"
level: error
scope: text
ignorecase: false
# Ensures that the existence of 'first' implies the existence of 'second'.
first: '\b([A-Z]{3,5})\b'
second: '(?:\b[A-Z][a-z]+ )+\(([A-Z]{3,5})\)'
# ... with the exception of these:
exceptions:
- ABC
- ADD

Key summary

NameTypeDescription
ignorecaseboolMakes all matches case-insensitive.
firststringThe antecedent of the statement.
secondstringThe consequent of the statement.
exceptionsarrayAn array of strings to be ignored.

conditional ensures that the existence of first implies the existence of second. For example, consider the following text:

According to Wikipedia, the World Health Organization (WHO) is a specialized agency of the United Nations that is concerned with international public health. We can now use WHO because it has been defined, but we can't use DAFB because people may not know what it represents. We can use DAFB when it's presented as code, though.

Using the above text with our example rule yields the following:

test.md:1:224:style.UnexpandedAcronyms:'DAFB' has no definition

conditional also takes an optional exceptions list. Any token listed as an exception won't be flagged.

capitalization

Example definition

extends: capitalization
message: "'%s' should be in title case"
level: warning
scope: heading
# $title, $sentence, $lower, $upper, or a pattern.
match: $title
style: AP # AP or Chicago; only applies when match is set to $title.

Key summary

NameTypeDescription
matchstring$title, $sentence, $lower, $upper, or a pattern.
stylestringAP or Chicago; only applies when match is set to $title.
exceptionsarrayAn array of strings to be ignored.

capitalization checks that the text in the specified scope matches the case of match. There are a few pre-defined variables that can be passed as matches:

  • $title: "The Quick Brown Fox Jumps Over the Lazy Dog."
  • $sentence: "The quick brown fox jumps over the lazy dog."
  • $lower: "the quick brown fox jumps over the lazy dog."
  • $upper: "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG."

Additionally, when using match: $title, you can specify a style of either AP or Chicago.

readability

Example definition

extends: readability
message: "Grade level (%s) too high!"
level: warning
grade: 8
metrics:
- Flesch-Kincaid
- Gunning Fog

Key summary

NameTypeDescription
metricsarrayOne or more of "Gunning Fog", "Coleman-Liau", "Flesch-Kincaid", "SMOG", and "Automated Readability".
gradefloatThe highest acceptable score.

readability calculates a readability score according the specified metrics. The supported tests are Gunning Fog, Coleman-Liau, Flesch-Kincaid, SMOG, and Automated Readability.

If more than one is listed (as seen above), the scores will be averaged. This is also the only extension point that doesn't accept a scope, as readability is always calculated using the entire document (minus headings, code blocks, and lists).

grade is the highest acceptable score. Using the example above, a warning will be issued if grade exceeds 8.

spelling

Example definition

extends: spelling
message: "Did you really mean '%s'?"
level: error

Key summary

NameTypeDescription
affstringThe path to a Hunspell-compatible .aff file.
customboolTurn off the default filters for acronyms, abbreviations, and numbers.
dicstringThe path to a Hunspell-compatible .dic file.
filtersarrayAn array of patterns to ignore during spell checking.
ignorestringA relative path (from StylesPath) to a personal vocabulary file consisting of one word per line to ignore.

spelling implements spell checking based on Hunspell-compatible dictionaries.

Choosing a dictionary

By default, spelling includes a custom, open-source dictionary for American English. However, you may also provide your own by using the dic and aff keys:

extends: spelling
message: "Did you really mean '%s'?"
level: error
aff: path/to/my/file.aff
dic: path/to/my/file.dic

The values for both aff and dic may be absolute file paths or relative to the current StylesPath.

Ignoring non-dictionary words

spelling offers two different ways of ignoring non-dictionary words:

  1. Using ignore files: Ignore files are plain-text files that list words to be ignored during spell check (one case-insensitive entry per line) . For example,

    ignore.txt
    destructuring
    transpiler

    Here, we're instructing spelling to ignore both [Dd]estructuring and [Tt]ranspiler.

    You can name these files anything you'd like and reference them relative to the active StylesPath:

    extends: spelling
    message: "Did you really mean '%s'?"
    level: error
    ignore:
    # Located at StylesPath/ignore1.txt
    - ignore1.txt
    - ignore2.txt
  2. Using filters: You can also customize the spell-checking experience by defining filters, which are Go-compatible regular expressions to applied to individual words:

    extends: spelling
    message: "Did you really mean '%s'?"
    level: error
    # This disables the built-in filters. If you omit this
    # key or set it to false, custom filters (see below) are
    # added on top of the built-in ones.
    #
    # By default, filters for acronyms, abbreviations, and
    # numbers are included.
    custom: true
    # A "filter" is a regular expression specifying words
    # to ignore during spell checking.
    filters:
    # Ignore all words starting with 'py'.
    #
    # e.g., 'PyYAML'.
    - '[pP]y.*\b'

sequence (v2.3.0)

Example definition

extends: sequence
message: "The infinitive '%[4]s' after 'be' requries 'to'. Did you mean '%[2]s %[3]s *to* %[4]s'?"
tokens:
- tag: MD
- pattern: be
- tag: JJ
- tag: VB|VBN

Key summary

NameTypeDescription
tokens[]NLPTokenA list of tokens with associated NLP metadata.
ignorecaseboolMakes all matches case-insensitive.

Unlike the other extension points, sequence aims to support grammar-focused rules (rather than "style"). Its built on top of prose (an open-source natural language processing library) and makes significant use of part-of-speech tagging.

An individual NLPToken has the following structure:

# [optional]: A regular expression (required
# if `tag` isn't given).
pattern: '...'
# [optional]: If true, indicates that we
# *shouldn't* match this token.
negate: true # or false
# [optional]: A part-of-speech tag (required
# if `pattern` isn't given).
tag: '...'
# [optional]: An integer meaning that there may
# be up to `n` (3, in this case) tokens between
# this token and the next one.
skip: 3

The example rule illustrates the basics of the extension point: we're looking for a particular sequence of tokens (which can either be a regular expression or a part-of-speech tag). The | notation means that we'll accept VB or VBN in position 4.

There's also new notation in the message: %[4]s is like %s, but specifically refers to the 4th token in our sequence.

Built-in style

Heads up!

The built-in Vale.Spelling rule will respect an ignore file stored at StylesPath/vocab.txt. However, the recommended approach is to use a custom Vocab instead.

Vale comes with a single built-in style named Vale that implements three rules, as described in the table below.

RuleScopeLevelDescription
Vale.SpellingtexterrorSpell checks text while respecting the active project's vocabulary.
Vale.TermstexterrorEnforces the current project's Preferred vocabulary terms.
Vale.AvoidtexterrorEnforces the current project's Do not use vocabulary terms.

Third-party styles

Vale has a growing selection of pre-made styles available for download from its style library.