"smart" quotes in PHP

Ask a Question related to PHP Development, Design and Development.

  1. #1

    Default "smart" quotes in PHP

    Hello all,

    I've been struggling for a few days with the question of how to convert
    "smart" (curly) quotes into straight quotes. I tried playing with the
    htmlentities() function, but all that is doing is changing the smart
    quotes into nonsense characters. I also searched the web for quite a
    while and was unsuccessful in finding a solution.

    What puzzles me is that doing it the other way around is simple enough.
    For example, this works fine in converting a straight quote into an
    "open" smart quote:

    if ($content[$k] == "\"")
    $content = substr($content, 0, $k) . "“" . substr
    ($content, $k+1, strlen($content)-$k+1);

    But the other way around doesn't work. Any ideas?

    Thanks,

    Martin Goldman
    My e-mail addresse's correct domain name is mgoldman.com.
    Martin Goldman Guest

  2. Similar Questions and Discussions

    1. SQL Stored Procedure Problem "Single Quotes"
      Quotes is something we all should be aware of when passing values to a Stored Procedure. However, this specific scenario I am really stuck on. ...
    2. Urgent: help needed : "Problem with double quotes"
      Hi, I want to call a PERL program from MS-DOS batch file. Following is just an example, the string may inturn have double quote or single quote....
    3. [PHP] PHP "Smart Refresh" for looping server lookups
      in the past people have talked about doing this in a few different ways. 1. Have an image that javascript refreshes the url on and when doing so...
    4. PHP "Smart Refresh" for looping server lookups
      <NukedWeb@aol.com> wrote in message news:164.25dc8eb3.2c9e3f84@aol.com... a would Why not do a meta refresh with the time as a variable to be...
    5. Trying to get smart quotes or "curly" quotes
      I've searched forums and help to no avail. How do I ensure that the quote marks I get displayed are smart quotes, in other words the curly kind...
  3. #2

    Default Re: "smart" quotes in PHP

    Martin Goldman <www@nowhere.foo> wrote:
    > I've been struggling for a few days with the question of how to convert
    > "smart" (curly) quotes into straight quotes.
    Smart/curly quotes? straight quotes? What are these?
    > What puzzles me is that doing it the other way around is simple enough.
    > For example, this works fine in converting a straight quote into an
    > "open" smart quote:
    >
    > if ($content[$k] == "\"")
    > $content = substr($content, 0, $k) . "“" . substr
    > ($content, $k+1, strlen($content)-$k+1);
    Funny way to do a str_replace :)

    What character is represented by #147? AFAIK it's not in any characters
    set I know (ASCII or ISO-8859-x). So your actual problem might be that
    you are using an other encoding for the character you want to preplace
    that PHP is actually using!

    BTW 3rd parameter in htmlentities specifies the character set.

    --

    Daniel Tryba

    Daniel Tryba Guest

  4. #3

    Default Re: "smart" quotes in PHP

    On Fri, 14 Nov 2003 17:42:08 GMT, Martin Goldman <www@nowhere.foo> wrote:
    >I've been struggling for a few days with the question of how to convert
    >"smart" (curly) quotes into straight quotes. I tried playing with the
    >htmlentities() function, but all that is doing is changing the smart
    >quotes into nonsense characters. I also searched the web for quite a
    >while and was unsuccessful in finding a solution.
    You've got to work out what character set the text is encoded in, for
    starters, since 'smart quotes' exist in Microsoft's Codepage 1522 but not in
    the standard ISO 8859 character sets, e.g. iso-8859-15.

    In codepage 1522:

    hex dec Unicode Unicode name
    91 145 8216 LEFT SINGLE QUOTATION MARK
    92 146 8217 RIGHT SINGLE QUOTATION MARK
    93 147 8220 LEFT DOUBLE QUOTATION MARK
    94 148 8221 RIGHT DOUBLE QUOTATION MARK

    But in iso-8859-15, 145-148 aren't defined as printable characters; 128-159
    are reserved for control characters.

    So if you change it to &#147, but output your page encoded in iso-8859-1,
    you're just changing it to the code for a non-printable character. The same
    entity will appear as a left double quotation mark if encoded in Windows-1522
    though.
    >What puzzles me is that doing it the other way around is simple enough.
    >For example, this works fine in converting a straight quote into an
    >"open" smart quote:
    >
    > if ($content[$k] == "\"")
    > $content = substr($content, 0, $k) . "“" . substr
    >($content, $k+1, strlen($content)-$k+1);
    >
    >But the other way around doesn't work. Any ideas?
    In what way doesn't it work? What does str_replace($content, chr(147), '"');
    appear to do in your setup?

    --
    Andy Hassall (andy@andyh.co.uk) icq(5747695) ([url]http://www.andyh.co.uk[/url])
    Space: disk usage analysis tool ([url]http://www.andyhsoftware.co.uk/space[/url])
    Andy Hassall Guest

  5. #4

    Default Re: "smart" quotes in PHP

    Martin Goldman wrote:
    > I've been struggling for a few days with the question of how to convert
    > "smart" (curly) quotes into straight quotes.
    As D. Tryba hinted at, str_replace should work fine. After all,
    you're replacing one character with another.

    $string = str_replace($chr,'"',$string)

    where $chr is the character you want to replace.
    > I tried playing with the htmlentities() function, but all that is doing
    > is changing the smart quotes into nonsense characters.
    I'd be interested in seeing what you actually tried. Since so-called
    smart quotes aren't in the Latin-1 repertoire, you'd have to specify
    a charset other than the default ISO-8859-1. Say you typed smart
    quotes on a bog standard Windows system by holding down Alt and
    pressing 0, 1, 4, and 7 (or 8) on the numeric keypad, you'd use

    $string = htmlentities($string,ENT_COMPAT,'cp1252')

    where $string is the string containing smart quotes. That converts
    smart quotes to their respective entity references.
    > What puzzles me is that doing it the other way around is simple enough.
    Eek! I'd have thought that was *more* difficult...
    > if ($content[$k] == "\"")
    > $content = substr($content, 0, $k) . "“" . substr
    > ($content, $k+1, strlen($content)-$k+1);
    How does your script know that the quotation mark was intended as an
    opening quotation mark? ;-)

    In HTML, the character reference “ is undefined. The LEFT DOUBLE
    QUOTATION MARK can be represented using the character reference
    “ or the entity reference &ldquo;. The RIGHT DOUBLE QUOTATION
    MARK can be represented using the character reference ” or the
    entity reference &rdquo;.

    --
    Jock
    John Dunlop Guest

  6. #5

    Default Re: "smart" quotes in PHP

    John Dunlop <john+usenet@johndunlop.info> wrote in
    news:MPG.1a1f806fb5038c649897c5@news.freeserve.net :
    > Martin Goldman wrote:
    > I'd be interested in seeing what you actually tried. Since so-called
    > smart quotes aren't in the Latin-1 repertoire, you'd have to specify
    > a charset other than the default ISO-8859-1. Say you typed smart
    > quotes on a bog standard Windows system by holding down Alt and
    > pressing 0, 1, 4, and 7 (or 8) on the numeric keypad, you'd use
    >
    > $string = htmlentities($string,ENT_COMPAT,'cp1252')
    >
    > where $string is the string containing smart quotes. That converts
    > smart quotes to their respective entity references.
    >
    This results in the smart quotes being replaced with nonsense characters.
    The thing is, though, that I'm totally unfamiliar with character sets,
    the differences between them, etc. I've never had any reason to care
    about them. So I'm a little confused about what you guys are talking
    about when it comes to them.
    > How does your script know that the quotation mark was intended as an
    > opening quotation mark? ;-)
    Well, I didn't paste the whole thing. :) I wrote a loop that goes through
    the string. It toggles a flag each time a quotation mark is found. If the
    flag is set, it makes it an open quote; if it's not, it makes it a closed
    quote. Hence the reason I'm not just using a str_replace for that. :)

    Oh, and to answer Mr. Hassall's question -- str_replace(chr(147), "\"",
    $content) doesn't do anything. The exact same string is returned.

    -Martin
    Martin Goldman Guest

  7. #6

    Default Re: "smart" quotes in PHP

    Martin Goldman <www@nowhere.foo> wrote:
    [consufed about charsets]
    > Oh, and to answer Mr. Hassall's question -- str_replace(chr(147), "\"",
    > $content) doesn't do anything. The exact same string is returned.
    That might mean that there is nog chr(147) in the string although you
    _see_ a character that might be represented as the character you know as
    147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
    cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol and
    totally lacks the eurosymbol. Thats why if you want to display the uero
    symbol one is encouraged to use the htmlentitie &euro;, which can be
    rendered in any font and any character set (with a fallback to EUR).

    So you job is to figure out how you quote is encoded (just step through
    the string and print the chr value for each character)...

    BTW unicode kind of solves the problem by defining every known character
    in one set, the problem is that not every program supports it yet. But
    unicode also introduces an other problem, the way the characters are
    encoded (eg utf7, utf8, utf16...), I don't know if PHP supports utf16+.

    --

    Daniel Tryba

    Daniel Tryba Guest

  8. #7

    Default Re: "smart" quotes in PHP

    Daniel Tryba <news_comp.lang.php@canopus.nl> wrote in news:bp5nhq$d0e$1
    @news.tue.nl:
    > That might mean that there is nog chr(147) in the string although you
    > _see_ a character that might be represented as the character you know
    as
    > 147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
    > cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol
    and
    > totally lacks the eurosymbol. Thats why if you want to display the uero
    > symbol one is encouraged to use the htmlentitie &euro;, which can be
    > rendered in any font and any character set (with a fallback to EUR).
    >
    > So you job is to figure out how you quote is encoded (just step through
    > the string and print the chr value for each character)...
    Interesting you should suggest this, because I just did that. And indeed,
    it's not coming out as 147. It's coming out as 226, followed by 128,
    followed by 156. I suppose I could do a str_replace for these 3
    characters and replace it with 147. Although, then I'd have to do that
    for every character I want to support. What a drag.

    Thanks,
    Martin
    Martin Goldman Guest

  9. #8

    Default Re: "smart" quotes in PHP

    On Sat, 15 Nov 2003 19:57:14 GMT, Martin Goldman <www@nowhere.foo> wrote:
    >Daniel Tryba <news_comp.lang.php@canopus.nl> wrote in news:bp5nhq$d0e$1
    >@news.tue.nl:
    >
    >> That might mean that there is nog chr(147) in the string although you
    >> _see_ a character that might be represented as the character you know
    >> as 147 in cp1252! Another fine example is the eurosymbol, IIRC its 128 in
    >> cp1252 and 204 in iso-8859-15, in iso-8859-1 204 is a generic symbol
    >> and totally lacks the eurosymbol. Thats why if you want to display the uero
    >> symbol one is encouraged to use the htmlentitie &euro;, which can be
    >> rendered in any font and any character set (with a fallback to EUR).
    >>
    >> So you job is to figure out how you quote is encoded (just step through
    >> the string and print the chr value for each character)...
    >
    >Interesting you should suggest this, because I just did that. And indeed,
    >it's not coming out as 147. It's coming out as 226, followed by 128,
    >followed by 156. I suppose I could do a str_replace for these 3
    >characters and replace it with 147. Although, then I'd have to do that
    >for every character I want to support. What a drag.
    Your text is encoded in UTF-8. Going back to the characters again:

    hex dec Unicode Unicode name
    91 145 8216 LEFT SINGLE QUOTATION MARK
    92 146 8217 RIGHT SINGLE QUOTATION MARK
    93 147 8220 LEFT DOUBLE QUOTATION MARK
    94 148 8221 RIGHT DOUBLE QUOTATION MARK

    226,128,147 in binary is:

    11100010
    10000000
    10011100

    '1110' in the first few bits of the first byte indicates it is a lead byte for
    a three-byte character. The remaining two are trail bytes, as they start with
    10. So separating out the data gets:

    1110 0010
    10 000000
    10 011100

    => 0010000000011100 (binary)
    = 8220 (decicmal)

    Which is LEFT DOUBLE QUOTATION MARK.

    --
    Andy Hassall (andy@andyh.co.uk) icq(5747695) ([url]http://www.andyh.co.uk[/url])
    Space: disk usage analysis tool ([url]http://www.andyhsoftware.co.uk/space[/url])
    Andy Hassall Guest

  10. #9

    Default Re: "smart" quotes in PHP

    Andy Hassall <andy@andyh.co.uk> wrote:
    >>> So you job is to figure out how you quote is encoded (just step through
    >>> the string and print the chr value for each character)...
    >>
    >>Interesting you should suggest this, because I just did that. And indeed,
    >>it's not coming out as 147. It's coming out as 226, followed by 128,
    >>followed by 156. I suppose I could do a str_replace for these 3
    >>characters and replace it with 147. Although, then I'd have to do that
    >>for every character I want to support. What a drag.
    >
    > Your text is encoded in UTF-8. Going back to the characters again:
    [in depth UTF-8 decoding :)]

    So Martin, you should take a look at iconv or if your server lacks
    support utf8_decode(). The latter has also a usercontrib on how to use
    str_replace on UTF-8 encoded string.

    --

    Daniel Tryba

    Daniel Tryba Guest

  11. #10

    Default Re: "smart" quotes in PHP

    Daniel Tryba <news_comp.lang.php@canopus.nl> wrote in
    news:bpee7i$5fr$2@news.tue.nl:
    > Andy Hassall <andy@andyh.co.uk> wrote:
    > So Martin, you should take a look at iconv or if your server lacks
    > support utf8_decode(). The latter has also a usercontrib on how to use
    > str_replace on UTF-8 encoded string.
    >
    Great. Thanks to everyone to replied.

    -Martin
    my correct domain name is mgoldman.com
    Martin Goldman Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139