Involved Source Filesbacktrack.goexec.goonepass.go
Package regexp implements regular expression search.
The syntax of the regular expressions accepted is the same
general syntax used by Perl, Python, and other languages.
More precisely, it is the syntax accepted by RE2 and described at
https://golang.org/s/re2syntax, except for \C.
For an overview of the syntax, run
go doc regexp/syntax
The regexp implementation provided by this package is
guaranteed to run in time linear in the size of the input.
(This is a property not guaranteed by most open source
implementations of regular expressions.) For more information
about this property, see
https://swtch.com/~rsc/regexp/regexp1.html
or any book about automata theory.
All characters are UTF-8-encoded code points.
There are 16 methods of Regexp that match a regular expression and identify
the matched text. Their names are matched by this regular expression:
Find(All)?(String)?(Submatch)?(Index)?
If 'All' is present, the routine matches successive non-overlapping
matches of the entire expression. Empty matches abutting a preceding
match are ignored. The return value is a slice containing the successive
return values of the corresponding non-'All' routine. These routines take
an extra integer argument, n. If n >= 0, the function returns at most n
matches/submatches; otherwise, it returns all of them.
If 'String' is present, the argument is a string; otherwise it is a slice
of bytes; return values are adjusted as appropriate.
If 'Submatch' is present, the return value is a slice identifying the
successive submatches of the expression. Submatches are matches of
parenthesized subexpressions (also known as capturing groups) within the
regular expression, numbered from left to right in order of opening
parenthesis. Submatch 0 is the match of the entire expression, submatch 1
the match of the first parenthesized subexpression, and so on.
If 'Index' is present, matches and submatches are identified by byte index
pairs within the input string: result[2*n:2*n+1] identifies the indexes of
the nth submatch. The pair for n==0 identifies the match of the entire
expression. If 'Index' is not present, the match is identified by the text
of the match/submatch. If an index is negative or text is nil, it means that
subexpression did not match any string in the input. For 'String' versions
an empty string means either no match or an empty match.
There is also a subset of the methods that can be applied to text read
from a RuneReader:
MatchReader, FindReaderIndex, FindReaderSubmatchIndex
This set may grow. Note that regular expression matches may need to
examine text beyond the text returned by a match, so the methods that
match text from a RuneReader may read arbitrarily far into the input
before returning.
(There are a few other methods that do not match this pattern.)
Code Examples
package main
import (
"fmt"
"regexp"
)
func main() {
// Compile the expression once, usually at init time.
// Use raw strings to avoid having to quote the backslashes.
var validID = regexp.MustCompile(`^[a-z]+\[[0-9]+\]$`)
fmt.Println(validID.MatchString("adam[23]"))
fmt.Println(validID.MatchString("eve[7]"))
fmt.Println(validID.MatchString("Job[48]"))
fmt.Println(validID.MatchString("snakey"))
}
package main
import (
"fmt"
"regexp"
)
func main() {
matched, err := regexp.Match(`foo.*`, []byte(`seafood`))
fmt.Println(matched, err)
matched, err = regexp.Match(`bar.*`, []byte(`seafood`))
fmt.Println(matched, err)
matched, err = regexp.Match(`a(b`, []byte(`seafood`))
fmt.Println(matched, err)
}
package main
import (
"fmt"
"regexp"
)
func main() {
matched, err := regexp.MatchString(`foo.*`, "seafood")
fmt.Println(matched, err)
matched, err = regexp.MatchString(`bar.*`, "seafood")
fmt.Println(matched, err)
matched, err = regexp.MatchString(`a(b`, "seafood")
fmt.Println(matched, err)
}
package main
import (
"fmt"
"regexp"
)
func main() {
fmt.Println(regexp.QuoteMeta(`Escaping symbols like: .+*?()|[]{}^$`))
}
package main
import (
"fmt"
"regexp"
)
func main() {
content := []byte(`
# comment line
option1: value1
option2: value2
# another comment line
option3: value3
`)
// Regex pattern captures "key: value" pair from the content.
pattern := regexp.MustCompile(`(?m)(?P<key>\w+):\s+(?P<value>\w+)$`)
// Template to convert "key: value" to "key=value" by
// referencing the values captured by the regex pattern.
template := []byte("$key=$value\n")
result := []byte{}
// For each match of the regex in the content.
for _, submatches := range pattern.FindAllSubmatchIndex(content, -1) {
// Apply the captured submatches to the template and append the output
// to the result.
result = pattern.Expand(result, template, content, submatches)
}
fmt.Println(string(result))
}
package main
import (
"fmt"
"regexp"
)
func main() {
content := `
# comment line
option1: value1
option2: value2
# another comment line
option3: value3
`
// Regex pattern captures "key: value" pair from the content.
pattern := regexp.MustCompile(`(?m)(?P<key>\w+):\s+(?P<value>\w+)$`)
// Template to convert "key: value" to "key=value" by
// referencing the values captured by the regex pattern.
template := "$key=$value\n"
result := []byte{}
// For each match of the regex in the content.
for _, submatches := range pattern.FindAllStringSubmatchIndex(content, -1) {
// Apply the captured submatches to the template and append the output
// to the result.
result = pattern.ExpandString(result, template, content, submatches)
}
fmt.Println(string(result))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`foo.?`)
fmt.Printf("%q\n", re.Find([]byte(`seafood fool`)))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`foo.?`)
fmt.Printf("%q\n", re.FindAll([]byte(`seafood fool`), -1))
}
package main
import (
"fmt"
"regexp"
)
func main() {
content := []byte("London")
re := regexp.MustCompile(`o.`)
fmt.Println(re.FindAllIndex(content, 1))
fmt.Println(re.FindAllIndex(content, -1))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a.`)
fmt.Println(re.FindAllString("paranormal", -1))
fmt.Println(re.FindAllString("paranormal", 2))
fmt.Println(re.FindAllString("graal", -1))
fmt.Println(re.FindAllString("none", -1))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(x*)b`)
fmt.Printf("%q\n", re.FindAllStringSubmatch("-ab-", -1))
fmt.Printf("%q\n", re.FindAllStringSubmatch("-axxb-", -1))
fmt.Printf("%q\n", re.FindAllStringSubmatch("-ab-axb-", -1))
fmt.Printf("%q\n", re.FindAllStringSubmatch("-axxb-ab-", -1))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(x*)b`)
// Indices:
// 01234567 012345678
// -ab-axb- -axxb-ab-
fmt.Println(re.FindAllStringSubmatchIndex("-ab-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-axxb-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-ab-axb-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-axxb-ab-", -1))
fmt.Println(re.FindAllStringSubmatchIndex("-foo-", -1))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`foo(.?)`)
fmt.Printf("%q\n", re.FindAllSubmatch([]byte(`seafood fool`), -1))
}
package main
import (
"fmt"
"regexp"
)
func main() {
content := []byte(`
# comment line
option1: value1
option2: value2
`)
// Regex pattern captures "key: value" pair from the content.
pattern := regexp.MustCompile(`(?m)(?P<key>\w+):\s+(?P<value>\w+)$`)
allIndexes := pattern.FindAllSubmatchIndex(content, -1)
for _, loc := range allIndexes {
fmt.Println(loc)
fmt.Println(string(content[loc[0]:loc[1]]))
fmt.Println(string(content[loc[2]:loc[3]]))
fmt.Println(string(content[loc[4]:loc[5]]))
}
}
package main
import (
"fmt"
"regexp"
)
func main() {
content := []byte(`
# comment line
option1: value1
option2: value2
`)
// Regex pattern captures "key: value" pair from the content.
pattern := regexp.MustCompile(`(?m)(?P<key>\w+):\s+(?P<value>\w+)$`)
loc := pattern.FindIndex(content)
fmt.Println(loc)
fmt.Println(string(content[loc[0]:loc[1]]))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`foo.?`)
fmt.Printf("%q\n", re.FindString("seafood fool"))
fmt.Printf("%q\n", re.FindString("meat"))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`ab?`)
fmt.Println(re.FindStringIndex("tablett"))
fmt.Println(re.FindStringIndex("foo") == nil)
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(x*)b(y|z)c`)
fmt.Printf("%q\n", re.FindStringSubmatch("-axxxbyc-"))
fmt.Printf("%q\n", re.FindStringSubmatch("-abzc-"))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`foo(.?)`)
fmt.Printf("%q\n", re.FindSubmatch([]byte(`seafood fool`)))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(x*)b`)
// Indices:
// 01234567 012345678
// -ab-axb- -axxb-ab-
fmt.Println(re.FindSubmatchIndex([]byte("-ab-")))
fmt.Println(re.FindSubmatchIndex([]byte("-axxb-")))
fmt.Println(re.FindSubmatchIndex([]byte("-ab-axb-")))
fmt.Println(re.FindSubmatchIndex([]byte("-axxb-ab-")))
fmt.Println(re.FindSubmatchIndex([]byte("-foo-")))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(|b)`)
fmt.Println(re.FindString("ab"))
re.Longest()
fmt.Println(re.FindString("ab"))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`foo.?`)
fmt.Println(re.Match([]byte(`seafood fool`)))
fmt.Println(re.Match([]byte(`something else`)))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(gopher){2}`)
fmt.Println(re.MatchString("gopher"))
fmt.Println(re.MatchString("gophergopher"))
fmt.Println(re.MatchString("gophergophergopher"))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re0 := regexp.MustCompile(`a.`)
fmt.Printf("%d\n", re0.NumSubexp())
re := regexp.MustCompile(`(.*)((a)b)(.*)a`)
fmt.Println(re.NumSubexp())
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(x*)b`)
fmt.Printf("%s\n", re.ReplaceAll([]byte("-ab-axxb-"), []byte("T")))
fmt.Printf("%s\n", re.ReplaceAll([]byte("-ab-axxb-"), []byte("$1")))
fmt.Printf("%s\n", re.ReplaceAll([]byte("-ab-axxb-"), []byte("$1W")))
fmt.Printf("%s\n", re.ReplaceAll([]byte("-ab-axxb-"), []byte("${1}W")))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(x*)b`)
fmt.Println(re.ReplaceAllLiteralString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllLiteralString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllLiteralString("-ab-axxb-", "${1}"))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`a(x*)b`)
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "${1}W"))
}
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
re := regexp.MustCompile(`[^aeiou]`)
fmt.Println(re.ReplaceAllStringFunc("seafood fool", strings.ToUpper))
}
package main
import (
"fmt"
"regexp"
)
func main() {
a := regexp.MustCompile(`a`)
fmt.Println(a.Split("banana", -1))
fmt.Println(a.Split("banana", 0))
fmt.Println(a.Split("banana", 1))
fmt.Println(a.Split("banana", 2))
zp := regexp.MustCompile(`z+`)
fmt.Println(zp.Split("pizza", -1))
fmt.Println(zp.Split("pizza", 0))
fmt.Println(zp.Split("pizza", 1))
fmt.Println(zp.Split("pizza", 2))
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+)`)
fmt.Println(re.MatchString("Alan Turing"))
matches := re.FindStringSubmatch("Alan Turing")
lastIndex := re.SubexpIndex("last")
fmt.Printf("last => %d\n", lastIndex)
fmt.Println(matches[lastIndex])
}
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`(?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+)`)
fmt.Println(re.MatchString("Alan Turing"))
fmt.Printf("%q\n", re.SubexpNames())
reversed := fmt.Sprintf("${%s} ${%s}", re.SubexpNames()[2], re.SubexpNames()[1])
fmt.Println(reversed)
fmt.Println(re.ReplaceAllString("Alan Turing", reversed))
}
Package-Level Type Names (total 18, in which 1 are exported)
/* sort exporteds by: | */
Regexp is the representation of a compiled regular expression.
A Regexp is safe for concurrent use by multiple goroutines,
except for configuration methods, such as Longest.
// empty-width conditions required at start of match
// as passed to Compile
This field can be modified by the Longest method,
but it is otherwise read-only.
// whether regexp prefers leftmost-longest match
// size of recorded match lengths
maxBitStateLenint
// minimum length of the input in bytes
// pool for machines
numSubexpint
// onepass program or nil
// required prefix in unanchored matches
// prefix, as a []byte
// prefix is the entire regexp
// pc for last rune in prefix
// first rune in prefix
// compiled program
subexpNames[]string
Copy returns a new Regexp object copied from re.
Calling Longest on one copy does not affect another.
Deprecated: In earlier releases, when using a Regexp in multiple goroutines,
giving each goroutine its own copy helped to avoid lock contention.
As of Go 1.12, using Copy is no longer necessary to avoid lock contention.
Copy may still be appropriate if the reason for its use is to make
two copies with different Longest settings.
Expand appends template to dst and returns the result; during the
append, Expand replaces variables in the template with corresponding
matches drawn from src. The match slice should have been returned by
FindSubmatchIndex.
In the template, a variable is denoted by a substring of the form
$name or ${name}, where name is a non-empty sequence of letters,
digits, and underscores. A purely numeric name like $1 refers to
the submatch with the corresponding index; other names refer to
capturing parentheses named with the (?P<name>...) syntax. A
reference to an out of range or unmatched index or a name that is not
present in the regular expression is replaced with an empty slice.
In the $name form, name is taken to be as long as possible: $1x is
equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.
To insert a literal $ in the output, use $$ in the template.
ExpandString is like Expand but the template and source are strings.
It appends to and returns a byte slice in order to give the calling
code control over allocation.
Find returns a slice holding the text of the leftmost match in b of the regular expression.
A return value of nil indicates no match.
FindAll is the 'All' version of Find; it returns a slice of all successive
matches of the expression, as defined by the 'All' description in the
package comment.
A return value of nil indicates no match.
FindAllIndex is the 'All' version of FindIndex; it returns a slice of all
successive matches of the expression, as defined by the 'All' description
in the package comment.
A return value of nil indicates no match.
FindAllString is the 'All' version of FindString; it returns a slice of all
successive matches of the expression, as defined by the 'All' description
in the package comment.
A return value of nil indicates no match.
FindAllStringIndex is the 'All' version of FindStringIndex; it returns a
slice of all successive matches of the expression, as defined by the 'All'
description in the package comment.
A return value of nil indicates no match.
FindAllStringSubmatch is the 'All' version of FindStringSubmatch; it
returns a slice of all successive matches of the expression, as defined by
the 'All' description in the package comment.
A return value of nil indicates no match.
FindAllStringSubmatchIndex is the 'All' version of
FindStringSubmatchIndex; it returns a slice of all successive matches of
the expression, as defined by the 'All' description in the package
comment.
A return value of nil indicates no match.
FindAllSubmatch is the 'All' version of FindSubmatch; it returns a slice
of all successive matches of the expression, as defined by the 'All'
description in the package comment.
A return value of nil indicates no match.
FindAllSubmatchIndex is the 'All' version of FindSubmatchIndex; it returns
a slice of all successive matches of the expression, as defined by the
'All' description in the package comment.
A return value of nil indicates no match.
FindIndex returns a two-element slice of integers defining the location of
the leftmost match in b of the regular expression. The match itself is at
b[loc[0]:loc[1]].
A return value of nil indicates no match.
FindReaderIndex returns a two-element slice of integers defining the
location of the leftmost match of the regular expression in text read from
the RuneReader. The match text was found in the input stream at
byte offset loc[0] through loc[1]-1.
A return value of nil indicates no match.
FindReaderSubmatchIndex returns a slice holding the index pairs
identifying the leftmost match of the regular expression of text read by
the RuneReader, and the matches, if any, of its subexpressions, as defined
by the 'Submatch' and 'Index' descriptions in the package comment. A
return value of nil indicates no match.
FindString returns a string holding the text of the leftmost match in s of the regular
expression. If there is no match, the return value is an empty string,
but it will also be empty if the regular expression successfully matches
an empty string. Use FindStringIndex or FindStringSubmatch if it is
necessary to distinguish these cases.
FindStringIndex returns a two-element slice of integers defining the
location of the leftmost match in s of the regular expression. The match
itself is at s[loc[0]:loc[1]].
A return value of nil indicates no match.
FindStringSubmatch returns a slice of strings holding the text of the
leftmost match of the regular expression in s and the matches, if any, of
its subexpressions, as defined by the 'Submatch' description in the
package comment.
A return value of nil indicates no match.
FindStringSubmatchIndex returns a slice holding the index pairs
identifying the leftmost match of the regular expression in s and the
matches, if any, of its subexpressions, as defined by the 'Submatch' and
'Index' descriptions in the package comment.
A return value of nil indicates no match.
FindSubmatch returns a slice of slices holding the text of the leftmost
match of the regular expression in b and the matches, if any, of its
subexpressions, as defined by the 'Submatch' descriptions in the package
comment.
A return value of nil indicates no match.
FindSubmatchIndex returns a slice holding the index pairs identifying the
leftmost match of the regular expression in b and the matches, if any, of
its subexpressions, as defined by the 'Submatch' and 'Index' descriptions
in the package comment.
A return value of nil indicates no match.
LiteralPrefix returns a literal string that must begin any match
of the regular expression re. It returns the boolean true if the
literal string comprises the entire regular expression.
Longest makes future searches prefer the leftmost-longest match.
That is, when matching against text, the regexp returns a match that
begins as early as possible in the input (leftmost), and among those
it chooses a match that is as long as possible.
This method modifies the Regexp and may not be called concurrently
with any other methods.
Match reports whether the byte slice b
contains any match of the regular expression re.
MatchReader reports whether the text returned by the RuneReader
contains any match of the regular expression re.
MatchString reports whether the string s
contains any match of the regular expression re.
NumSubexp returns the number of parenthesized subexpressions in this Regexp.
ReplaceAll returns a copy of src, replacing matches of the Regexp
with the replacement text repl. Inside repl, $ signs are interpreted as
in Expand, so for instance $1 represents the text of the first submatch.
ReplaceAllFunc returns a copy of src in which all matches of the
Regexp have been replaced by the return value of function repl applied
to the matched byte slice. The replacement returned by repl is substituted
directly, without using Expand.
ReplaceAllLiteral returns a copy of src, replacing matches of the Regexp
with the replacement bytes repl. The replacement repl is substituted directly,
without using Expand.
ReplaceAllLiteralString returns a copy of src, replacing matches of the Regexp
with the replacement string repl. The replacement repl is substituted directly,
without using Expand.
ReplaceAllString returns a copy of src, replacing matches of the Regexp
with the replacement string repl. Inside repl, $ signs are interpreted as
in Expand, so for instance $1 represents the text of the first submatch.
ReplaceAllStringFunc returns a copy of src in which all matches of the
Regexp have been replaced by the return value of function repl applied
to the matched substring. The replacement returned by repl is substituted
directly, without using Expand.
Split slices s into substrings separated by the expression and returns a slice of
the substrings between those expression matches.
The slice returned by this method consists of all the substrings of s
not contained in the slice returned by FindAllString. When called on an expression
that contains no metacharacters, it is equivalent to strings.SplitN.
Example:
s := regexp.MustCompile("a*").Split("abaabaccadaaae", 5)
// s: ["", "b", "b", "c", "cadaaae"]
The count determines the number of substrings to return:
n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings
String returns the source text used to compile the regular expression.
SubexpIndex returns the index of the first subexpression with the given name,
or -1 if there is no subexpression with that name.
Note that multiple subexpressions can be written using the same name, as in
(?P<bob>a+)(?P<bob>b+), which declares two subexpressions named "bob".
In this case, SubexpIndex returns the index of the leftmost such subexpression
in the regular expression.
SubexpNames returns the names of the parenthesized subexpressions
in this Regexp. The name for the first sub-expression is names[1],
so that if m is a match slice, the name for m[i] is SubexpNames()[i].
Since the Regexp as a whole cannot be named, names[0] is always
the empty string. The slice should not be modified.
allMatches calls deliver at most n times
with the location of successive matches in the input text.
The input text is b if non-nil, otherwise s.
backtrack runs a backtracking search of prog on the input starting at pos.
doExecute finds the leftmost match in the input, appends the position
of its subexpressions to dstCap and returns dstCap.
nil is returned if no matches are found and non-nil if matches are found.
doMatch reports whether either r, b or s match the regexp.
doOnePass implements r.doExecute using the one-pass execution engine.
(*T) expand(dst []byte, template string, bsrc []byte, src string, match []int) []byte
get returns a machine to use for matching re.
It uses the re's machine cache if possible, to avoid
unnecessary allocation.
The number of capture values in the program may correspond
to fewer capturing expressions than are in the regexp.
For example, "(a){0}" turns into an empty program, so the
maximum capture in the program is 0 but we need to return
an expression for \1. Pad appends -1s to the slice a as needed.
put returns a machine to the correct machine pool.
(*T) replaceAll(bsrc []byte, src string, nmatch int, repl func(dst []byte, m []int) []byte) []byte
tryBacktrack runs a backtracking search starting at pos.
*T : fmt.Stringer
*T : context.stringer
*T : runtime.stringer
func Compile(expr string) (*Regexp, error)
func CompilePOSIX(expr string) (*Regexp, error)
func MustCompile(str string) *Regexp
func MustCompilePOSIX(str string) *Regexp
func (*Regexp).Copy() *Regexp
func compile(expr string, mode syntax.Flags, longest bool) (*Regexp, error)
var gitlab.com/pcanilho/goneo.supportedIdentifierFormat *Regexp
bitState holds state for the backtracker.
cap[]intendintinputsinputsjobs[]jobmatchcap[]intvisited[]uint32
push pushes (pc, pos, arg) onto the job stack if it should be
visited.
reset resets the state of the backtracker.
end is the end position in the input.
ncap is the number of captures.
shouldVisit reports whether the combination of (pc, pos) has not
been visited yet.
func newBitState() *bitState
func freeBitState(b *bitState)
func (*Regexp).tryBacktrack(b *bitState, i input, pc uint32, pos int) bool
An entry is an entry on a queue.
It holds both the instruction pc and the actual thread.
Some queue entries are just place holders so that the machine
knows it has considered that pc. Such entries have t == nil.
pcuint32t*thread
A job is an entry on the backtracker's job stack. It holds
the instruction pc and the position in the input.
argboolpcuint32posint
A lazyFlag is a lazily-evaluated syntax.EmptyOp,
for checking zero-width flags like ^ $ \A \z \B \b.
It records the pair of relevant runes and does not
determine the implied flags until absolutely necessary
(most of the time, that means never).
( T) match(op syntax.EmptyOp) bool
func newLazyFlag(r1, r2 rune) lazyFlag
A machine holds all the state during an NFA simulation for p.
inputsinputs
// capture information for the match
// whether a match was found
// compiled program
// pool of available threads
// two queues for runq, nextq
// two queues for runq, nextq
// corresponding Regexp
add adds an entry to q for pc, unless the q already has such an entry.
It also recursively adds an entry for all instructions reachable from pc by following
empty-width conditions satisfied by cond. pos gives the current position
in the input.
alloc allocates a new thread with the given instruction.
It uses the free pool if possible.
clear frees all threads on the thread queue.
(*T) init(ncap int)
match runs the machine over the input starting at pos.
It reports whether a match was found.
If so, m.matchcap holds the submatch information.
step executes one step of the machine, running each of the threads
on runq and appending new threads to nextq.
The step processes the rune c (which may be endOfText),
which starts at position pos and ends at nextPos.
nextCond gives the setting for the empty-width flags after c.
func (*Regexp).get() *machine
func (*Regexp).put(m *machine)
A onePassInst is a single instruction in a one-pass regular expression program.
It is the same as syntax.Inst except for the new 'Next' field.
Instsyntax.Inst
// InstAlt, InstAltMatch, InstCapture, InstEmptyWidth
Inst.Opsyntax.InstOp
// all but InstMatch, InstFail
Inst.Rune[]runeNext[]uint32
MatchEmptyWidth reports whether the instruction matches
an empty string between the runes before and after.
It should only be called when i.Op == InstEmptyWidth.
MatchRune reports whether the instruction matches (and consumes) r.
It should only be called when i.Op == InstRune.
MatchRunePos checks whether the instruction matches (and consumes) r.
If so, MatchRunePos returns the index of the matching rune pair
(or, when len(i.Rune) == 1, rune singleton).
If not, MatchRunePos returns -1.
MatchRunePos should only be called when i.Op == InstRune.
(*T) String() string
op returns i.Op but merges all the Rune special cases into InstRune
*T : fmt.Stringer
*T : context.stringer
*T : runtime.stringer
func onePassNext(i *onePassInst, r rune) uint32
A onePassProg is a compiled one-pass regular expression program.
It is the same as syntax.Prog except for the use of onePassInst.
Inst[]onePassInst
// number of InstCapture insts in re
// index of start instruction
func compileOnePass(prog *syntax.Prog) (p *onePassProg)
func makeOnePass(p *onePassProg) *onePassProg
func onePassCopy(prog *syntax.Prog) *onePassProg
func cleanupOnePass(prog *onePassProg, original *syntax.Prog)
func makeOnePass(p *onePassProg) *onePassProg
A queue is a 'sparse array' holding pending threads of execution.
See https://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html
dense[]entrysparse[]uint32
runeSlice exists to permit sorting the case-folded rune sets.
( T) Len() int( T) Less(i, j int) bool( T) Swap(i, j int)
T : sort.Interface
A thread is the state of a single path through the machine:
an instruction and a corresponding capture array.
See https://swtch.com/~rsc/regexp/regexp2.html
cap[]intinst*syntax.Inst
Package-Level Functions (total 30, in which 8 are exported)
Compile parses a regular expression and returns, if successful,
a Regexp object that can be used to match against text.
When matching against text, the regexp returns a match that
begins as early as possible in the input (leftmost), and among those
it chooses the one that a backtracking search would have found first.
This so-called leftmost-first matching is the same semantics
that Perl, Python, and other implementations use, although this
package implements it without the expense of backtracking.
For POSIX leftmost-longest matching, see CompilePOSIX.
CompilePOSIX is like Compile but restricts the regular expression
to POSIX ERE (egrep) syntax and changes the match semantics to
leftmost-longest.
That is, when matching against text, the regexp returns a match that
begins as early as possible in the input (leftmost), and among those
it chooses a match that is as long as possible.
This so-called leftmost-longest matching is the same semantics
that early regular expression implementations used and that POSIX
specifies.
However, there can be multiple leftmost-longest matches, with different
submatch choices, and here this package diverges from POSIX.
Among the possible leftmost-longest matches, this package chooses
the one that a backtracking search would have found first, while POSIX
specifies that the match be chosen to maximize the length of the first
subexpression, then the second, and so on from left to right.
The POSIX rule is computationally prohibitive and not even well-defined.
See https://swtch.com/~rsc/regexp/regexp2.html#posix for details.
Match reports whether the byte slice b
contains any match of the regular expression pattern.
More complicated queries need to use Compile and the full Regexp interface.
MatchReader reports whether the text returned by the RuneReader
contains any match of the regular expression pattern.
More complicated queries need to use Compile and the full Regexp interface.
MatchString reports whether the string s
contains any match of the regular expression pattern.
More complicated queries need to use Compile and the full Regexp interface.
MustCompile is like Compile but panics if the expression cannot be parsed.
It simplifies safe initialization of global variables holding compiled regular
expressions.
MustCompilePOSIX is like CompilePOSIX but panics if the expression cannot be parsed.
It simplifies safe initialization of global variables holding compiled regular
expressions.
QuoteMeta returns a string that escapes all regular expression metacharacters
inside the argument text; the returned string is a regular expression matching
the literal text.
cleanupOnePass drops working memory, and restores certain shortcut instructions.
compileOnePass returns a new *syntax.Prog suitable for onePass execution if the original Prog
can be recharacterized as a one-pass regexp program, or syntax.nil if the
Prog cannot be converted. For a one pass prog, the fundamental condition that must
be true is: at any InstAlt, there must be no ambiguity about what branch to take.
extract returns the name from a leading "$name" or "${name}" in str.
If it is a number, extract returns num set to that number; otherwise num = -1.
makeOnePass creates a onepass Prog, if possible. It is possible if at any alt,
the match engine can always tell which branch to take. The routine may modify
p if it is turned into a onepass Prog. If it isn't possible for this to be a
onepass Prog, the Prog nil is returned. makeOnePass is recursive
to the size of the Prog.
maxBitStateLen returns the maximum length of a string to search with
the backtracker using prog.
onePassCopy creates a copy of the original Prog, as we'll be modifying it
OnePassNext selects the next actionable state of the prog, based on the input character.
It should only be called when i.Op == InstAlt or InstAltMatch, and from the one-pass machine.
One of the alternates may ultimately lead without input to end of line. If the instruction
is InstAltMatch the path to the InstMatch is in i.Out, the normal node in i.Next.
OnePassPrefix returns a literal string that all matches for the
regexp must start with. Complete is true if the prefix
is the entire match. Pc is the index of the last rune instruction
in the string. The OnePassPrefix skips over the mandatory
EmptyBeginText
Pools of *machine for use during (*Regexp).doExecute,
split up by the size of the execution queues.
matchPool[i] machines have queue size matchSize[i].
On a 64-bit system each queue entry is 16 bytes,
so matchPool[0] has 16*2*128 = 4kB queues, etc.
The final matchPool is a catch-all for very large queues.
Pools of *machine for use during (*Regexp).doExecute,
split up by the size of the execution queues.
matchPool[i] machines have queue size matchSize[i].
On a 64-bit system each queue entry is 16 bytes,
so matchPool[0] has 16*2*128 = 4kB queues, etc.
The final matchPool is a catch-all for very large queues.
mergeRuneSets merges two non-intersecting runesets, and returns the merged result,
and a NextIp array. The idea is that if a rune matches the OnePassRunes at index
i, NextIp[i/2] is the target. If the input sets intersect, an empty runeset and a
NextIp array with the single element mergeFailed is returned.
The code assumes that both inputs contain ordered and non-intersecting rune pairs.
const startSize = 10 // The size at which to start a slice in the 'All' routines.
The pages are generated with Goldsv0.4.2. (GOOS=darwin GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @Go100and1 (reachable from the left QR code) to get the latest news of Golds.