extract asian language text from a PDF

Ask a Question related to Adobe Acrobat SDK, Design and Development.

  1. #1

    Default extract asian language text from a PDF

    Hi,

    In a plug-in implemtation, i want to get japanese or someother asian language text from a PDF programatically, but i want to keep the Regional language option in control panel to "US English" only(As it is supporting Engilsh language OS and pdf most of the time), Is it possible to extract japanese text(means get japanese characters and store them in a CString)without setting Regional language option to japanese.
    if yes what more i have to do for getting japanese characters from
    PDWordFinder pdWordF = PDDocGetWordFinder(pdDoc, WF_LATEST_VERSION);
    PDWordFinderEnumWords (pdWordF ,IndexPage, WordProc,(void *)&llWordSequence); functions.

    thanks a lot
    nayya7choti@adobeforums.com Guest

  2. Similar Questions and Discussions

    1. Extract all text from PNG in Fireworks???
      I'm hoping someone can point me in a useful direction. I need to extract all the text in a PNG created in Fireworks. If the program can perform...
    2. About extract text from PDF document
      Hello, I want to extract text from PDF documnet without open Acorbat or Reader. How to do that? Best regards!
    3. Can't extract text from PDF, baffled...
      Hi all, I'm not able to extract the text from a PDF. I copy and then when I paste, all I get is SPACES, bizarre. I even tried the batch process...
    4. Typing Asian Text in Photoshop
      I would really like as much input about this as possible. I am currently using the English version of Photoshop 6.0, and have tried typing Asian text...
    5. extract strings between alternating text
      hi there, i want to extract the numbers from this example input: bla trigger3 trigger4 trigger1 blabla trigger1 5000.00 trigger3 trigger1...
  3. #2

    Default Re: extract asian language text from a PDF

    Can you get the text in Japanese (or whatever language)? Of course! you can extract it as Unicode and then (if you wish) convert it to other encodings such as SJIS, etc.

    Should you store those in a CString? probably not, since that's not a great place to store non-ASCII strings. You could, but most folks recommend against it.
    Leonard_Rosenthol@adobeforums.com Guest

  4. #3

    Default Re: extract asian language text from a PDF

    if i set Regional option to japanese am getting text in japanese,but if i set it to US English, am just am getting Unicode values for japanese text, for japanese characters it is showing some invalid chars like '?' or other,
    Is it possible to get those correct characters without setting it to japanese?
    nayya7choti@adobeforums.com Guest

  5. #4

    Default Re: extract asian language text from a PDF

    AH...

    It sounds like you are confusing getting the correct Unicode code-points from the text (which you are ARE getting) AND having an available font to display them.

    So I don't know what you are doing with the text you are retrieving - but it's fine. Just be sure to set the correct font before you try to draw it...
    Leonard_Rosenthol@adobeforums.com Guest

  6. #5

    Default Re: extract asian language text from a PDF

    want that text to create bookmarks in that pdf, so i want japanese text in order to create bookmarks, but here am not getting the actual characters, but just numberical values(just like ascii values in case of english characters)and for characters it is showing some junk(say '?').
    nayya7choti@adobeforums.com Guest

  7. #6

    Default Re: extract asian language text from a PDF

    HOW are you creating the bookmarks? Remember that since you have Unicode text coming from the document, you need to flag the string in the bookmark title as being in Unicode...that means putting the BOM in front.
    Leonard_Rosenthol@adobeforums.com Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139