class OedipusLex

Oedipus Lex is a lexer generator in the same family as Rexical and Rex. Oedipus Lex is my independent lexer fork of Rexical. Rexical was in turn a fork of Rex. We’ve been unable to contact the author of rex in order to take it over, fix it up, extend it, and relicense it to MIT. So, Oedipus was written clean-room in order to bypass licensing constraints (and because bootstrapping is fun).

Oedipus brings a lot of extras to the table and at this point is only historically related to rexical. The syntax has changed enough that any rexical lexer will have to be tweaked to work inside of oedipus. At the very least, you need to add slashes to all your regexps.

Oedipus, like rexical, is based primarily on generating code much like you would a hand-written lexer. It is not a table or hash driven lexer. It uses StrScanner within a multi-level case statement. As such, Oedipus matches on the first match, not the longest (like lex and its ilk).

This documentation is not meant to bypass any prerequisite knowledge on lexing or parsing. If you’d like to study the subject in further detail, please try [TIN321] or the [LLVM Tutorial] or some other good resource for CS learning. Books… books are good. I like books.

The generated lexer OedipusLex

Attributes

class_name[RW]

The class name to generate.

ends[RW]

An array of lines to have after the lexer class.

filename[RW]

The file name / path

group[RW]

An array of all the groups within the lexer rules.

header[RW]

An array of header lines to have before the lexer class.

inners[RW]

An array of lines to have inside (but at the bottom of) the lexer class.

lineno[RW]

The current line number.

macros[RW]

An array of name/regexp pairs to generate constants inside the lexer class.

match[RW]

The StringScanner for this lexer.

old_pos[RW]

The previous position. Only available if the :column option is on.

option[RW]

A hash of options for the code generator. See README.rdoc for supported options.

rules[RW]

The rules for the lexer.

ss[RW]

The StringScanner for this lexer.

start_of_current_line_pos[RW]

The position of the start of the current line. Only available if the :column option is on.

starts[RW]

An array of lines of code to generate into the top of the lexer (next_token) loop.

state[RW]

The current lexical state.

Public Class Methods

[](name, *rules) click to toggle source

A convenience method to create a new lexer with a name and given rules.

# File lib/oedipus_lex.rb, line 227
def self.[](name, *rules)
  r = new
  r.class_name = name
  r.rules.concat rules
  r
end

Public Instance Methods

action() { || ... } click to toggle source

Yields on the current action.

# File lib/oedipus_lex.rex.rb, line 59
def action
  yield
end
column() click to toggle source

The current column, starting at 0. Only available if the :column option is on.

# File lib/oedipus_lex.rex.rb, line 77
def column
  old_pos - start_of_current_line_pos
end
do_parse() click to toggle source

Parse the file by getting all tokens and calling lex_type on them.

# File lib/oedipus_lex.rex.rb, line 84
def do_parse
  while token = next_token do
    type, *vals = token

    send "lex_#{type}", *vals
  end
end
end_group() click to toggle source

End a group.

# File lib/oedipus_lex.rb, line 345
def end_group
  rules << group
  self.group = nil
  self.state = :rule
end
generate() click to toggle source

Generate the lexer.

# File lib/oedipus_lex.rb, line 370
def generate
  filter = lambda { |r| Rule === r && r.start_state || nil }
  _mystates = rules.map(&filter).flatten.compact.uniq
  exclusives, inclusives = _mystates.partition { |s| s =~ /^:[A-Z]/ }

  # NOTE: doubling up assignment to remove unused var warnings in
  # ERB binding.

  all_states =
    all_states = [[nil, *inclusives],          # nil+incls # eg [[nil, :a],
                  *exclusives.map { |s| [s] }] # [excls]   #     [:A], [:B]]

  encoding = header.shift if /encoding:/.match?(header.first)
  encoding ||= "# encoding: UTF-8"

  erb = if RUBY_VERSION >= "2.6.0" then
          ERB.new(TEMPLATE, trim_mode:"%")
        else
          ERB.new(TEMPLATE, nil, "%")
        end

  erb.result binding
end
lex_class(prefix, name) click to toggle source

Process a class lexeme.

# File lib/oedipus_lex.rb, line 270
def lex_class prefix, name
  header.concat prefix.split(/\n/)
  self.class_name = name
end
lex_comment(line) click to toggle source

Process a comment lexeme.

# File lib/oedipus_lex.rb, line 278
def lex_comment line
  # do nothing
end
lex_end(line) click to toggle source

Process an end lexeme.

# File lib/oedipus_lex.rb, line 285
def lex_end line
  ends << line
end
lex_group(start_state, regexp, action = nil) click to toggle source

Process a group lexeme.

# File lib/oedipus_lex.rb, line 336
def lex_group start_state, regexp, action = nil
  rule = Rule.new(start_state, regexp, action)
  rule.group = group
  self.group << rule
end
lex_groupend(start_state, regexp, action = nil) click to toggle source

Process the end of a group lexeme.

# File lib/oedipus_lex.rb, line 354
def lex_groupend start_state, regexp, action = nil
  end_group
  lex_rule start_state, regexp, action
end
lex_grouphead(re) click to toggle source

Process a +group head+ lexeme.

# File lib/oedipus_lex.rb, line 327
def lex_grouphead re
  end_group if group
  self.state = :group
  self.group = Group.new re
end
lex_inner(line) click to toggle source

Process an inner lexeme.

# File lib/oedipus_lex.rb, line 292
def lex_inner line
  inners << line
end
lex_macro(name, value) click to toggle source

Process a macro lexeme.

# File lib/oedipus_lex.rb, line 306
def lex_macro name, value
  macros << [name, value]
end
lex_option(option) click to toggle source

Process an option lexeme.

# File lib/oedipus_lex.rb, line 313
def lex_option option
  self.option[option.to_sym] = true
end
lex_rule(start_state, regexp, action = nil) click to toggle source

Process a X lexeme.

# File lib/oedipus_lex.rb, line 320
def lex_rule start_state, regexp, action = nil
  rules << Rule.new(start_state, regexp, action)
end
lex_start(line) click to toggle source

Process a start lexeme.

# File lib/oedipus_lex.rb, line 299
def lex_start line
  starts << line.strip
end
lex_state(_new_state) click to toggle source

Process a state lexeme.

# File lib/oedipus_lex.rb, line 362
def lex_state _new_state
  end_group if group
  # do nothing -- lexer switches state for us
end
location() click to toggle source

The current location in the parse.

# File lib/oedipus_lex.rex.rb, line 125
def location
  [
    (filename || "<input>"),
    lineno,
    column,
  ].compact.join(":")
end
matches() click to toggle source

The match groups for the current scan.

# File lib/oedipus_lex.rex.rb, line 50
def matches
  m = (1..9).map { |i| ss[i] }
  m.pop until m[-1] or m.empty?
  m
end
next_token() click to toggle source

Lex the next token.

# File lib/oedipus_lex.rex.rb, line 136
def next_token

  token = nil

  until ss.eos? or token do
    if ss.check(/\n/) then
      self.lineno += 1
      # line starts 1 position after the newline
      self.start_of_current_line_pos = ss.pos + 1
    end
    self.old_pos = ss.pos
    token =
      case state
      when nil, :option, :inner, :start, :macro, :rule, :group then
        case
        when ss.skip(/options?.*/) then
          [:state, :option]
        when ss.skip(/inner.*/) then
          [:state, :inner]
        when ss.skip(/macros?.*/) then
          [:state, :macro]
        when ss.skip(/rules?.*/) then
          [:state, :rule]
        when ss.skip(/start.*/) then
          [:state, :start]
        when ss.skip(/end/) then
          [:state, :END]
        when ss.skip(/\A((?:.|\n)*)class ([\w:]+.*)/) then
          action { [:class, *matches] }
        when ss.skip(/\n+/) then
          # do nothing
        when text = ss.scan(/\s*(\#.*)/) then
          action { [:comment, text] }
        when (state == :option) && (ss.skip(/\s+/)) then
          # do nothing
        when (state == :option) && (text = ss.scan(/stub/i)) then
          action { [:option, text] }
        when (state == :option) && (text = ss.scan(/debug/i)) then
          action { [:option, text] }
        when (state == :option) && (text = ss.scan(/do_parse/i)) then
          action { [:option, text] }
        when (state == :option) && (text = ss.scan(/lineno/i)) then
          action { [:option, text] }
        when (state == :option) && (text = ss.scan(/column/i)) then
          action { [:option, text] }
        when (state == :inner) && (text = ss.scan(/.*/)) then
          action { [:inner, text] }
        when (state == :start) && (text = ss.scan(/.*/)) then
          action { [:start, text] }
        when (state == :macro) && (ss.skip(/\s+(\w+)\s+#{RE}/o)) then
          action { [:macro, *matches] }
        when (state == :rule) && (ss.skip(/\s*#{ST}?[\ \t]*#{RE}[\ \t]*#{ACT}?/o)) then
          action { [:rule, *matches] }
        when (state == :rule) && (ss.skip(/\s*:[\ \t]*#{RE}/o)) then
          action { [:grouphead, *matches] }
        when (state == :group) && (ss.skip(/\s*:[\ \t]*#{RE}/o)) then
          action { [:grouphead, *matches] }
        when (state == :group) && (ss.skip(/\s*\|\s*#{ST}?[\ \t]*#{RE}[\ \t]*#{ACT}?/o)) then
          action { [:group, *matches] }
        when (state == :group) && (ss.skip(/\s*#{ST}?[\ \t]*#{RE}[\ \t]*#{ACT}?/o)) then
          action { [:groupend, *matches] }
        else
          text = ss.string[ss.pos .. -1]
          raise ScanError, "can not match (#{state.inspect}) at #{location}: '#{text}'"
        end
      when :END then
        case
        when ss.skip(/\n+/) then
          # do nothing
        when text = ss.scan(/.*/) then
          action { [:end, text] }
        else
          text = ss.string[ss.pos .. -1]
          raise ScanError, "can not match (#{state.inspect}) at #{location}: '#{text}'"
        end
      else
        raise ScanError, "undefined state at #{location}: '#{state}'"
      end # token = case state

    next unless token # allow functions to trigger redo w/ nil
  end # while

  raise LexerError, "bad lexical result at #{location}: #{token.inspect}" unless
    token.nil? || (Array === token && token.size >= 2)

  # auto-switch state
  self.state = token.last if token && token.first == :state

  token
end
parse(str) click to toggle source

Parse the given string.

# File lib/oedipus_lex.rex.rb, line 103
def parse str
  self.ss     = scanner_class.new str
  self.lineno = 1
  self.start_of_current_line_pos = 0
  self.state  ||= nil

  do_parse
end
parse_file(path) click to toggle source

Read in and parse the file at path.

# File lib/oedipus_lex.rex.rb, line 115
def parse_file path
  self.filename = path
  open path do |f|
    parse f.read
  end
end
scanner_class() click to toggle source

The current scanner class. Must be overridden in subclasses.

# File lib/oedipus_lex.rex.rb, line 96
def scanner_class
  StringScanner
end