class OedipusLex
Oedipus Lex is a lexer generator in the same family as Rexical and Rex. Oedipus Lex is my independent lexer fork of Rexical. Rexical was in turn a fork of Rex. We’ve been unable to contact the author of rex in order to take it over, fix it up, extend it, and relicense it to MIT. So, Oedipus was written clean-room in order to bypass licensing constraints (and because bootstrapping is fun).
Oedipus brings a lot of extras to the table and at this point is only historically related to rexical. The syntax has changed enough that any rexical lexer will have to be tweaked to work inside of oedipus. At the very least, you need to add slashes to all your regexps.
Oedipus, like rexical, is based primarily on generating code much like you would a hand-written lexer. It is not a table or hash driven lexer. It uses StrScanner within a multi-level case statement. As such, Oedipus matches on the first match, not the longest (like lex and its ilk).
This documentation is not meant to bypass any prerequisite knowledge on lexing or parsing. If you’d like to study the subject in further detail, please try [TIN321] or the [LLVM Tutorial] or some other good resource for CS learning. Books… books are good. I like books.
The generated lexer OedipusLex
Attributes
The class name to generate.
An array of lines to have after the lexer class.
The file name / path
An array of all the groups within the lexer rules.
An array of header lines to have before the lexer class.
An array of lines to have inside (but at the bottom of) the lexer class.
The current line number.
An array of name/regexp pairs to generate constants inside the lexer class.
The StringScanner for this lexer.
The previous position. Only available if the :column option is on.
A hash of options for the code generator. See README.rdoc for supported options.
The rules for the lexer.
The StringScanner for this lexer.
The position of the start of the current line. Only available if the :column option is on.
An array of lines of code to generate into the top of the lexer (next_token
) loop.
The current lexical state.
Public Class Methods
A convenience method to create a new lexer with a name
and given rules
.
# File lib/oedipus_lex.rb, line 227 def self.[](name, *rules) r = new r.class_name = name r.rules.concat rules r end
Public Instance Methods
Yields on the current action.
# File lib/oedipus_lex.rex.rb, line 59 def action yield end
The current column, starting at 0. Only available if the :column option is on.
# File lib/oedipus_lex.rex.rb, line 77 def column old_pos - start_of_current_line_pos end
Parse the file by getting all tokens and calling lex_type
on them.
# File lib/oedipus_lex.rex.rb, line 84 def do_parse while token = next_token do type, *vals = token send "lex_#{type}", *vals end end
End a group.
# File lib/oedipus_lex.rb, line 345 def end_group rules << group self.group = nil self.state = :rule end
Generate the lexer.
# File lib/oedipus_lex.rb, line 370 def generate filter = lambda { |r| Rule === r && r.start_state || nil } _mystates = rules.map(&filter).flatten.compact.uniq exclusives, inclusives = _mystates.partition { |s| s =~ /^:[A-Z]/ } # NOTE: doubling up assignment to remove unused var warnings in # ERB binding. all_states = all_states = [[nil, *inclusives], # nil+incls # eg [[nil, :a], *exclusives.map { |s| [s] }] # [excls] # [:A], [:B]] encoding = header.shift if /encoding:/.match?(header.first) encoding ||= "# encoding: UTF-8" erb = if RUBY_VERSION >= "2.6.0" then ERB.new(TEMPLATE, trim_mode:"%") else ERB.new(TEMPLATE, nil, "%") end erb.result binding end
Process a class
lexeme.
# File lib/oedipus_lex.rb, line 270 def lex_class prefix, name header.concat prefix.split(/\n/) self.class_name = name end
Process a comment
lexeme.
# File lib/oedipus_lex.rb, line 278 def lex_comment line # do nothing end
Process an end
lexeme.
# File lib/oedipus_lex.rb, line 285 def lex_end line ends << line end
Process a group
lexeme.
# File lib/oedipus_lex.rb, line 336 def lex_group start_state, regexp, action = nil rule = Rule.new(start_state, regexp, action) rule.group = group self.group << rule end
Process the end of a group
lexeme.
# File lib/oedipus_lex.rb, line 354 def lex_groupend start_state, regexp, action = nil end_group lex_rule start_state, regexp, action end
Process a +group head+ lexeme.
# File lib/oedipus_lex.rb, line 327 def lex_grouphead re end_group if group self.state = :group self.group = Group.new re end
Process an inner
lexeme.
# File lib/oedipus_lex.rb, line 292 def lex_inner line inners << line end
Process a macro
lexeme.
# File lib/oedipus_lex.rb, line 306 def lex_macro name, value macros << [name, value] end
Process an option
lexeme.
# File lib/oedipus_lex.rb, line 313 def lex_option option self.option[option.to_sym] = true end
Process a X
lexeme.
# File lib/oedipus_lex.rb, line 320 def lex_rule start_state, regexp, action = nil rules << Rule.new(start_state, regexp, action) end
Process a start
lexeme.
# File lib/oedipus_lex.rb, line 299 def lex_start line starts << line.strip end
Process a state
lexeme.
# File lib/oedipus_lex.rb, line 362 def lex_state _new_state end_group if group # do nothing -- lexer switches state for us end
The current location in the parse.
# File lib/oedipus_lex.rex.rb, line 125 def location [ (filename || "<input>"), lineno, column, ].compact.join(":") end
The match groups for the current scan.
# File lib/oedipus_lex.rex.rb, line 50 def matches m = (1..9).map { |i| ss[i] } m.pop until m[-1] or m.empty? m end
Lex the next token.
# File lib/oedipus_lex.rex.rb, line 136 def next_token token = nil until ss.eos? or token do if ss.check(/\n/) then self.lineno += 1 # line starts 1 position after the newline self.start_of_current_line_pos = ss.pos + 1 end self.old_pos = ss.pos token = case state when nil, :option, :inner, :start, :macro, :rule, :group then case when ss.skip(/options?.*/) then [:state, :option] when ss.skip(/inner.*/) then [:state, :inner] when ss.skip(/macros?.*/) then [:state, :macro] when ss.skip(/rules?.*/) then [:state, :rule] when ss.skip(/start.*/) then [:state, :start] when ss.skip(/end/) then [:state, :END] when ss.skip(/\A((?:.|\n)*)class ([\w:]+.*)/) then action { [:class, *matches] } when ss.skip(/\n+/) then # do nothing when text = ss.scan(/\s*(\#.*)/) then action { [:comment, text] } when (state == :option) && (ss.skip(/\s+/)) then # do nothing when (state == :option) && (text = ss.scan(/stub/i)) then action { [:option, text] } when (state == :option) && (text = ss.scan(/debug/i)) then action { [:option, text] } when (state == :option) && (text = ss.scan(/do_parse/i)) then action { [:option, text] } when (state == :option) && (text = ss.scan(/lineno/i)) then action { [:option, text] } when (state == :option) && (text = ss.scan(/column/i)) then action { [:option, text] } when (state == :inner) && (text = ss.scan(/.*/)) then action { [:inner, text] } when (state == :start) && (text = ss.scan(/.*/)) then action { [:start, text] } when (state == :macro) && (ss.skip(/\s+(\w+)\s+#{RE}/o)) then action { [:macro, *matches] } when (state == :rule) && (ss.skip(/\s*#{ST}?[\ \t]*#{RE}[\ \t]*#{ACT}?/o)) then action { [:rule, *matches] } when (state == :rule) && (ss.skip(/\s*:[\ \t]*#{RE}/o)) then action { [:grouphead, *matches] } when (state == :group) && (ss.skip(/\s*:[\ \t]*#{RE}/o)) then action { [:grouphead, *matches] } when (state == :group) && (ss.skip(/\s*\|\s*#{ST}?[\ \t]*#{RE}[\ \t]*#{ACT}?/o)) then action { [:group, *matches] } when (state == :group) && (ss.skip(/\s*#{ST}?[\ \t]*#{RE}[\ \t]*#{ACT}?/o)) then action { [:groupend, *matches] } else text = ss.string[ss.pos .. -1] raise ScanError, "can not match (#{state.inspect}) at #{location}: '#{text}'" end when :END then case when ss.skip(/\n+/) then # do nothing when text = ss.scan(/.*/) then action { [:end, text] } else text = ss.string[ss.pos .. -1] raise ScanError, "can not match (#{state.inspect}) at #{location}: '#{text}'" end else raise ScanError, "undefined state at #{location}: '#{state}'" end # token = case state next unless token # allow functions to trigger redo w/ nil end # while raise LexerError, "bad lexical result at #{location}: #{token.inspect}" unless token.nil? || (Array === token && token.size >= 2) # auto-switch state self.state = token.last if token && token.first == :state token end
Parse the given string.
# File lib/oedipus_lex.rex.rb, line 103 def parse str self.ss = scanner_class.new str self.lineno = 1 self.start_of_current_line_pos = 0 self.state ||= nil do_parse end
Read in and parse the file at path
.
# File lib/oedipus_lex.rex.rb, line 115 def parse_file path self.filename = path open path do |f| parse f.read end end
The current scanner class. Must be overridden in subclasses.
# File lib/oedipus_lex.rex.rb, line 96 def scanner_class StringScanner end