Break paragraph into list of words / keyword detection

Ask a Question related to Macromedia ColdFusion, Design and Development.

  1. #1

    Default Break paragraph into list of words / keyword detection

    I'm writing an journaling type of application, and I was wondering if
    someone might be able to help me re-write one of the modules so it's
    more streamlined and compact, and so that it requires less maintenance.
    Here's the situation:

    A person fills out and submits a journal entry. I then call a routine
    that extracts each individual word in the journal entry and converts
    them into an array of checkboxes, each having one of the extracted
    words beside it. The user can then check one or more of the boxes to
    indicate which of the words they'd like to have written to the database
    as a "keyword". The purpose for this is so that when the journal entry
    is retrieved from the database and viewed at a later time, any keywords
    found will be hyperlinked, and can be clicked on to display search
    results for other journal entries that contain that keyword.

    It seems to be working quite well, only I've come up against a problem
    where keywords which contain punctuation in them have to be handled
    differently, which requires lots of additional programming. Also, the
    programming I currently have in place is case sensitive, which means
    that the word "joe" and "Joe" are treated as being completley different
    strings, which I would prefer to have NOT be the case.

    What I'd like to know is if there might be a better and more efficient
    way for me to go about this. Specifically, I'd like to be able to avoid
    having to program additional routines to handle punctuation within
    keywords, and for the keyword detection to NOT be case sensitive.

    Here's the code that I have so far:

    <!--- QUERY KEYWORDS TABLE //--->

    <cfquery datasource="MyDSN" name="get_keywords">
    SELECT client_id,keyword
    FROM keywords
    WHERE client_id = '#COOKIE.USER#'
    </cfquery>

    <!--- QUERY KEYWORDS TABLE //--->

    <CFSET word_list = "#JournalEntry#">

    <CFLOOP INDEX="the_word" LIST="#word_list#" DELIMITERS=" ">

    <CFSET theword = "#the_word#">

    <!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->

    <cfif the_word CONTAINS "'s.">

    <cfset the_word_appd = REPLACE(#the_word#,"'s.",'',"ALL")>
    <cfset thewordappd = REPLACE(#the_word#,"'s.",'',"ALL")>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfelseif the_word CONTAINS "'s">

    <cfset the_word_ap = REPLACE(#the_word#,"'s",'',"ALL")>
    <cfset thewordap = REPLACE(#the_word#,"'s",'',"ALL")>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    <cfelseif the_word CONTAINS ".">

    <cfset the_word_pd = REPLACE(#the_word#,".",'',"ALL")>
    <cfset thewordpd = REPLACE(#the_word#,".",'',"ALL")>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    <cfelseif the_word CONTAINS ",">

    <cfset the_word_cm = REPLACE(#the_word#,",",'',"ALL")>
    <cfset thewordcm = REPLACE(#the_word#,",",'',"ALL")>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    <cfelse>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    </cfif>

    <!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->


    <!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->

    <cfquery dbtype="query" name="check_keyword">
    SELECT keyword
    FROM get_keywords
    WHERE (keyword = '#the_word#') OR (keyword = '#the_word_pd#') OR
    (keyword = '#the_word_cm#') OR (keyword = '#the_word_ap#') OR (keyword
    = '#the_word_appd#')
    AND client_id = '#COOKIE.USER#'
    </cfquery>

    <!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->


    <!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->

    <cfset the_word = "<a
    href=entry.cfm?search_process=true&keyword=#thewor d#
    class=#the_class#>#the_word#</a>">

    <cfset the_word_pd = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dpd#
    class=#the_class#>#the_word_pd#</a>">

    <cfset the_word_cm = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dcm#
    class=#the_class#>#the_word_cm#</a>">

    <cfset the_word_ap = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dap#
    class=#the_class#>#the_word_ap#</a>">

    <cfset the_word_appd = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dappd#
    class=#the_class#>#the_word_appd#</a>">

    <!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->

    <!--- DISPLAY THE NEXT WORD IN THE LOOP //--->

    <cfoutput>

    <font face="garamond" color="333333" size="3">

    <cfif theword CONTAINS "'s.">

    #the_word_appd#'s.

    <cfelseif theword CONTAINS "'s">

    #the_word_ap#'s

    <cfelseif theword CONTAINS ".">

    #the_word_pd#.

    <cfelseif theword CONTAINS ",">

    #the_word_cm#,

    <cfelse>

    #the_word#

    </cfif>

    </font>

    </cfoutput>

    </font>

    <!--- DISPLAY THE NEXT WORD IN THE LOOP //--->

    </cfloop>



    Any help would be appreciated. Thanks in advance!

    - yvan

    yvan@ideasdesign.com Guest

  2. Similar Questions and Discussions

    1. PDWordFinder - Setting characters used to break up words
      I am using PDWordFinderEnumWords with a PDWordfinder and a callback method to iterate through all the words in my PDF document. I notice that...
    2. Printing or Exporting Paragraph style list
      Is there anyway to export the paragraph style pallet listing to Excel so I can review the list and see what is missing? I am doing screen caps right...
    3. List words in fulltext index
      Hi all, Is there a way to get a list of all the words in a specified fulltext index? I'd like to be able to query the index for instance to get...
    4. Import "Stop Words" list before indexing
      When I catalog PDFs, I want to limit large index files to important words only. I want either to have either (1) an "Include Only" list of keywords I...
    5. break paragraph of text into individual words / keyword detection
      I'm writing an journaling type of application, and I was wondering if someone might be able to help me re-write one of the modules so it's more...
  3. #2

    Default Break paragraph into list of words / keyword detection

    I'm writing an journaling type of application, and I was wondering if
    someone might be able to help me re-write one of the modules so it's
    more streamlined and compact, and so that it requires less maintenance.
    Here's the situation:

    A person fills out and submits a journal entry. I then call a routine
    that extracts each individual word in the journal entry and converts
    them into an array of checkboxes, each having one of the extracted
    words beside it. The user can then check one or more of the boxes to
    indicate which of the words they'd like to have written to the database
    as a "keyword". The purpose for this is so that when the journal entry
    is retrieved from the database and viewed at a later time, any keywords
    found will be hyperlinked, and can be clicked on to display search
    results for other journal entries that contain that keyword.

    It seems to be working quite well, only I've come up against a problem
    where keywords which contain punctuation in them have to be handled
    differently, which requires lots of additional programming. Also, the
    programming I currently have in place is case sensitive, which means
    that the word "joe" and "Joe" are treated as being completley different
    strings, which I would prefer to have NOT be the case.

    What I'd like to know is if there might be a better and more efficient
    way for me to go about this. Specifically, I'd like to be able to avoid
    having to program additional routines to handle punctuation within
    keywords, and for the keyword detection to NOT be case sensitive.

    Here's the code that I have so far:

    <!--- QUERY KEYWORDS TABLE //--->

    <cfquery datasource="MyDSN" name="get_keywords">
    SELECT client_id,keyword
    FROM keywords
    WHERE client_id = '#COOKIE.USER#'
    </cfquery>

    <!--- QUERY KEYWORDS TABLE //--->

    <CFSET word_list = "#JournalEntry#">

    <CFLOOP INDEX="the_word" LIST="#word_list#" DELIMITERS=" ">

    <CFSET theword = "#the_word#">

    <!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->

    <cfif the_word CONTAINS "'s.">

    <cfset the_word_appd = REPLACE(#the_word#,"'s.",'',"ALL")>
    <cfset thewordappd = REPLACE(#the_word#,"'s.",'',"ALL")>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfelseif the_word CONTAINS "'s">

    <cfset the_word_ap = REPLACE(#the_word#,"'s",'',"ALL")>
    <cfset thewordap = REPLACE(#the_word#,"'s",'',"ALL")>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    <cfelseif the_word CONTAINS ".">

    <cfset the_word_pd = REPLACE(#the_word#,".",'',"ALL")>
    <cfset thewordpd = REPLACE(#the_word#,".",'',"ALL")>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    <cfelseif the_word CONTAINS ",">

    <cfset the_word_cm = REPLACE(#the_word#,",",'',"ALL")>
    <cfset thewordcm = REPLACE(#the_word#,",",'',"ALL")>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    <cfelse>

    <cfset the_word_pd = the_word>
    <cfset thewordpd = the_word>

    <cfset the_word_cm = the_word>
    <cfset thewordcm = the_word>

    <cfset the_word_ap = the_word>
    <cfset thewordap = the_word>

    <cfset the_word_appd = the_word>
    <cfset thewordappd = the_word>

    </cfif>

    <!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->


    <!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->

    <cfquery dbtype="query" name="check_keyword">
    SELECT keyword
    FROM get_keywords
    WHERE (keyword = '#the_word#') OR (keyword = '#the_word_pd#') OR
    (keyword = '#the_word_cm#') OR (keyword = '#the_word_ap#') OR (keyword
    = '#the_word_appd#')
    AND client_id = '#COOKIE.USER#'
    </cfquery>

    <!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->


    <!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->

    <cfset the_word = "<a
    href=entry.cfm?search_process=true&keyword=#thewor d#
    class=#the_class#>#the_word#</a>">

    <cfset the_word_pd = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dpd#
    class=#the_class#>#the_word_pd#</a>">

    <cfset the_word_cm = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dcm#
    class=#the_class#>#the_word_cm#</a>">

    <cfset the_word_ap = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dap#
    class=#the_class#>#the_word_ap#</a>">

    <cfset the_word_appd = "<a
    href=entry.cfm?search_process=true&keyword=#thewor dappd#
    class=#the_class#>#the_word_appd#</a>">

    <!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->

    <!--- DISPLAY THE NEXT WORD IN THE LOOP //--->

    <cfoutput>

    <font face="garamond" color="333333" size="3">

    <cfif theword CONTAINS "'s.">

    #the_word_appd#'s.

    <cfelseif theword CONTAINS "'s">

    #the_word_ap#'s

    <cfelseif theword CONTAINS ".">

    #the_word_pd#.

    <cfelseif theword CONTAINS ",">

    #the_word_cm#,

    <cfelse>

    #the_word#

    </cfif>

    </font>

    </cfoutput>

    </font>

    <!--- DISPLAY THE NEXT WORD IN THE LOOP //--->

    </cfloop>



    Any help would be appreciated. Thanks in advance!

    - yvan

    yvan@ideasdesign.com Guest

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may not edit your posts

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139