Ask a Question related to Ruby, Design and Development.

  1. #1

    Default extracting text

    I have a HTML table which I would like to extract text inside a <TD>. For an
    example
    <TD class=12>Some text</TD>

    I can write a code that detects the beginning of TD...
    print line ~= "<TD class12>"

    But how do I make it stop at </TD>. In the code above, I just want to print
    "Some text"

    thanks


    Dan Guest

  2. Similar Questions and Discussions

    1. Acrobat 6 - Extracting text from 2-column PDF
      I have Acrobat 6 and I wish to extract some text from an 11-page PDF which has a two-column layout. I have tried the "Save as" option, and it makes a...
    2. Help with extracting text
      Hi everyone, How can I extract the text before the first occurrence of dot (.) or single space from the first field. This is my file LB1571...
    3. : Help with extracting text
      Hi Zary, I have attached a sample file with the data you offered and a perl script which can be copied and pasted into the command line on win32....
    4. Help with extracting text file
      Hi everyone, I have a file with data similar to this ........... Exxxxx|FExxxxx|NQxxxxxx|OUxxxxxx|GExxxxxx|OVxxxxxxx|IQxxxxxxxx|ORxxxx...
    5. Extracting the text in a text sprite.
      Hi, Is there a way to extract the text from a text sprite and place the text into a global variable? Thanks, Stef
  3. #2

    Default Re: extracting text

    thank you
    "Tim Hunter" <Tim.Hunter@sas.com> wrote in message
    news:snstgv4pbjgohvs40b5nv5gn9hcdpl3s9c@4ax.com...
    > Here's one answer to your question. Watch out, almost any change to
    > the input will break it.
    >
    > irb(main):012:0> s = "<TD class=12>Some text</TD>"
    > "<TD class=12>Some text</TD>"
    > irb(main):013:0> m = %r{<TD [^>]+>([^<]+)</TD>}.match(s)
    > #<MatchData:0x276f978>
    > irb(main):014:0> p m[1]
    > "Some text"
    > nil
    > irb(main):015:0>
    >
    > On Fri, 11 Jul 2003 07:46:44 -0400, "Dan" <falseflyboy@yahoo.comNONO>
    > wrote:
    >
    > >I have a HTML table which I would like to extract text inside a <TD>. For
    an
    > >example
    > ><TD class=12>Some text</TD>
    > >
    > >I can write a code that detects the beginning of TD...
    > >print line ~= "<TD class12>"
    > >
    > >But how do I make it stop at </TD>. In the code above, I just want to
    print
    > >"Some text"
    > >
    > >thanks
    > >
    >

    Dan Guest

  4. #3

    Default Re: extracting text


    > -----Original Message-----
    > From: Tim Hunter [mailto:Tim.Hunter@sas.com]
    > Sent: Friday, July 11, 2003 12:36 PM
    > To: [email]ruby-talk@ruby-lang.org[/email]
    > Subject: Re: extracting text
    >
    >
    > Here's one answer to your question. Watch out, almost any
    > change to the input will break it.
    >
    > irb(main):012:0> s = "<TD class=12>Some text</TD>"
    > "<TD class=12>Some text</TD>"
    > irb(main):013:0> m = %r{<TD [^>]+>([^<]+)</TD>}.match(s)
    > #<MatchData:0x276f978> irb(main):014:0> p m[1] "Some text"
    > nil irb(main):015:0>
    >
    > On Fri, 11 Jul 2003 07:46:44 -0400, "Dan" <falseflyboy@yahoo.comNONO>
    > wrote:
    >
    > >I have a HTML table which I would like to extract text
    > inside a <TD>.
    > >For an example <TD class=12>Some text</TD>
    > >
    > >I can write a code that detects the beginning of TD...
    > >print line ~= "<TD class12>"
    > >
    > >But how do I make it stop at </TD>. In the code above, I
    > just want to
    > >print "Some text"
    > >
    > >thanks
    > >
    I suspect you'll want to use a parser, instead of regular expressions, to
    parse HTML. There's an html-parser module on the RAA, though I haven't used
    it myself.

    Regards,

    Dan

    Berger, Daniel Guest

  5. #4

    Default Extracting Text

    Hi,

    I have a text field with values such as:

    12345/6
    12345/85
    12345/127

    There is always 5 characters before the /.

    I would like to get all the characters on the right hand side of the /,
    whether it is 1, 2, 3 characters etc. I've been playing with the position
    function but can't get it right - anyone know how this can be done?

    Many thanks in advance

    Peter Pan Guest

  6. #5

    Default Re: Extracting Text

    You can figure out which characters come after your delimiter (the slash) by
    subtracting 6 (5 plus the slash itself) from the total number of characters
    in the string:

    Length(Your_Field)-6

    ....and you know they will always be on the right. Now you know how many
    characters you want, and you can use the Right() function to retrieve their
    value.

    --
    John Weinshel
    Datagrace
    Vashon Island, WA
    (206) 463-1634
    Associate Member, Filemaker Solutions Alliance


    "Peter Pan" <splash@NOSPAMPLEASEmac.com> wrote in message
    news:BBAAFF1B.1A969%splash@NOSPAMPLEASEmac.com...
    > Hi,
    >
    > I have a text field with values such as:
    >
    > 12345/6
    > 12345/85
    > 12345/127
    >
    > There is always 5 characters before the /.
    >
    > I would like to get all the characters on the right hand side of the /,
    > whether it is 1, 2, 3 characters etc. I've been playing with the position
    > function but can't get it right - anyone know how this can be done?
    >
    > Many thanks in advance
    >

    John Weinshel Guest

  7. #6

    Default Re: Extracting Text

    Thanks John, that was just what I was after, works a treat.

    Many thanks
    >
    > You can figure out which characters come after your delimiter (the slash) by
    > subtracting 6 (5 plus the slash itself) from the total number of characters
    > in the string:
    >
    > Length(Your_Field)-6
    >
    > ...and you know they will always be on the right. Now you know how many
    > characters you want, and you can use the Right() function to retrieve their
    > value.
    >
    > --
    > John Weinshel
    > Datagrace
    > Vashon Island, WA
    > (206) 463-1634
    > Associate Member, Filemaker Solutions Alliance
    >
    >
    > "Peter Pan" <splash@NOSPAMPLEASEmac.com> wrote in message
    > news:BBAAFF1B.1A969%splash@NOSPAMPLEASEmac.com...
    >> Hi,
    >>
    >> I have a text field with values such as:
    >>
    >> 12345/6
    >> 12345/85
    >> 12345/127
    >>
    >> There is always 5 characters before the /.
    >>
    >> I would like to get all the characters on the right hand side of the /,
    >> whether it is 1, 2, 3 characters etc. I've been playing with the position
    >> function but can't get it right - anyone know how this can be done?
    >>
    >> Many thanks in advance
    >>
    >
    >
    Peter Pan Guest

  8. #7

    Default Re: Extracting Text

    One of many ways to do this, using the / as a delimeter and not relying on
    the "always 5 characters before" part:

    After (calculation, text) = Replace( text field, 1, Position( text field,
    "/", 1, 1), "")

    "Peter Pan" <splash@NOSPAMPLEASEmac.com> wrote in message
    news:BBAAFF1B.1A969%splash@NOSPAMPLEASEmac.com...
    > Hi,
    >
    > I have a text field with values such as:
    >
    > 12345/6
    > 12345/85
    > 12345/127
    >
    > There is always 5 characters before the /.
    >
    > I would like to get all the characters on the right hand side of the /,
    > whether it is 1, 2, 3 characters etc. I've been playing with the position
    > function but can't get it right - anyone know how this can be done?
    >
    > Many thanks in advance
    >

    Glenn Schwandt Guest

  9. #8

    Default Re: Extracting Text

    Other possibilities:

    = Middle( text field, Position( text field, "/", 1, 1), 9999)
    = Right( text field, Length( text field) - Position( text field, "/", 1,
    1))
    = Substitute( text field, Left( text field, Position( text field, "/", 1,
    1)), "")

    All of which utilize the Position function. My original "Replace" solution
    and the "Middle" solution above require only two functions each, so I would
    assume they would be the most efficient.

    "Glenn Schwandt" <schwandtgat@aoldot.com> wrote in message
    news:voaomfod70b2f6@corp.supernews.com...
    > One of many ways to do this, using the / as a delimeter and not relying on
    > the "always 5 characters before" part:
    >
    > After (calculation, text) = Replace( text field, 1, Position( text field,
    > "/", 1, 1), "")
    >
    > "Peter Pan" <splash@NOSPAMPLEASEmac.com> wrote in message
    > news:BBAAFF1B.1A969%splash@NOSPAMPLEASEmac.com...
    > > Hi,
    > >
    > > I have a text field with values such as:
    > >
    > > 12345/6
    > > 12345/85
    > > 12345/127
    > >
    > > There is always 5 characters before the /.
    > >
    > > I would like to get all the characters on the right hand side of the /,
    > > whether it is 1, 2, 3 characters etc. I've been playing with the
    position
    > > function but can't get it right - anyone know how this can be done?
    > >
    > > Many thanks in advance
    > >
    >
    >

    Glenn Schwandt Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139