Breaking Ruby code into tokens

Ask a Question related to Ruby, Design and Development.

  1. #1

    Default Breaking Ruby code into tokens

    I've been looking at lex.c and parse.y and parse.c, but it's all rather
    over my head.

    How might one simply break Ruby code into tokens?

    Maybe token isn't the proper term, but I think it is. I've written
    very few real parsers in my life.

    For example, obviously all keywords and identifiers and punctuation
    (such as <<) would be treated as single entities. Strings and
    regular expressions would also be treated as such.

    I know that Ruby grammar is nontrivial -- for example, by looking at
    the code I just now realized that "class <<" is treated as a special
    case so that it won't look like a here-doc. Never thought of that
    before.

    But a full-fledged parser is overkill, too, isn't it? Surely this could
    be done in 100 lines of Ruby or so?

    Enlighten me...

    Hal


    Hal Fulton Guest

  2. Similar Questions and Discussions

    1. editing PDF in illustrator without breaking code
      In order to explain this, i'll give it as a scenario. Let's say you design a form with minimal graphics in Adobe Illustrator CS3. you save a copy...
    2. Ruby/Ruby on Rails Syntax Highlight/Code completion
      Hoping someone has, or is working on, an extension that adds Ruby and Ruby on Rails syntax/code coloring/code completion to Dreamweaver!
    3. [ba-rb] BA-rb ( Bay Area Ruby Users Group ) - 'Generating Code in Ruby' by Jack Herrington.
      Count me in again, too. -- Jos Backus _/ _/_/_/ Sunnyvale, CA _/ _/ _/ _/ _/_/_/ _/ _/ _/ _/ jos at...
    4. BA-rb ( Bay Area Ruby Users Group ) - 'Generating Code in Ruby' by Jack Herrington.
      BA-rb (Bay Area Ruby language Users Group) is pleased to announce that it will begin meeting again. The topic for the first meeting will be a...
    5. Tokens??
      HI, I am brand new to Perl and I am trying to modify a script that someone else wrote. I have this line where primaryntaccount = something...
  3. #2

    Default Re: Breaking Ruby code into tokens


    "Hal Fulton" <hal9000@hypermetrics.com> wrote:
    > I've been looking at lex.c and parse.y and parse.c, ...
    Pending a correction, lex.c is an unused remnant.
    parse.c is ignorable (generated by Yacc from parse.y).
    The real ruby lexer is in parse.y (function yylex).
    >
    > How might one simply break Ruby code into tokens?
    >
    >
    > Hal
    >
    While writing IRB, Keiju ISHITSUKA seems to have taken
    the trouble to expose his lexer to other callers.
    Thank you.

    ruby-lex is a ruby emulation of the interpreter's lexer.
    (May have slight differences.)
    As part of IRB, it's standard distribution.

    I haven't seen examples -- this offering tokenizes itself
    but you can change to a script-file target.


    #------------------------------------
    require 'irb\ruby-lex'

    include RubyToken

    #File.open('testfile.rb') do |infile| # see: lex.set_input

    tree = []
    ikeys = [:name, :op, :value, :node]

    lex = RubyLex.new
    DATA.rewind
    lex.set_input(DATA) # (DATA) or (infile)

    line = lex.get_readed # read (past tense;)
    while tk = lex.token

    tkc = tk.class.to_s.sub(/\ARubyToken::/, '')

    tkih = { :tk => tkc,
    :line => tk.line_no,
    :seek => tk.seek,
    :char_no => tk.char_no }

    # some tokens have extra attributes.
    ikeys.each do |tkk|
    tkih[tkk.to_sym] = tk.respond_to?(tkk) && tk.send(tkk)
    end

    tree << tkih

    if tkc === 'TkNL'
    # puts line unless line == /\A\s*\Z/ # line sep
    line = lex.get_readed # next line
    # Note: read line left here otherwise
    # position of NL is mis-reported [BUG?].
    end
    end

    tree.each do |tkh|
    printf("line %-3d @%3d: %-12s", tkh[:line], tkh[:char_no], tkh[:tk])
    printf(" [%s]", tkh[:name]) if tkh[:name]

    tkh.each do |k, v|
    next unless (ikeys - [:name]).include?(k)
    printf(" %s(%s)", k, v) if v
    end
    puts
    puts if tkh[:tk] == 'TkNL'
    end

    #end # File.open
    __END__
    #------------------------------------


    There may be other methods of interest in:

    lib\ruby\1.8\irb\slex.rb
    lib\ruby\1.8\irb\ruby-lex.rb
    lib\ruby\1.8\irb\ruby-token.rb


    daz



    daz Guest

  4. #3

    Default Re: Breaking Ruby code into tokens

    daz wrote:
    > "Hal Fulton" <hal9000@hypermetrics.com> wrote:
    >
    >
    >>I've been looking at lex.c and parse.y and parse.c, ...
    >
    >
    > Pending a correction, lex.c is an unused remnant.
    > parse.c is ignorable (generated by Yacc from parse.y).
    > The real ruby lexer is in parse.y (function yylex).
    Didn't know lex.c was a leftover.

    I know parse.c is generated from parse.y, but I can
    read C and can't read yacc. :)
    >
    >>How might one simply break Ruby code into tokens?
    >>
    >>
    >>Hal
    >>
    >
    >
    > While writing IRB, Keiju ISHITSUKA seems to have taken
    > the trouble to expose his lexer to other callers.
    > Thank you.
    irb/ruby-lex is what I've settled on. It works nicely.
    (Mauricio or batsman on IRC also pointed me that way.)

    Thanks,
    Hal



    Hal Fulton Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139