Ask a Question related to Macromedia ColdFusion, Design and Development.
-
yvan@ideasdesign.com #1
Break paragraph into list of words / keyword detection
I'm writing an journaling type of application, and I was wondering if
someone might be able to help me re-write one of the modules so it's
more streamlined and compact, and so that it requires less maintenance.
Here's the situation:
A person fills out and submits a journal entry. I then call a routine
that extracts each individual word in the journal entry and converts
them into an array of checkboxes, each having one of the extracted
words beside it. The user can then check one or more of the boxes to
indicate which of the words they'd like to have written to the database
as a "keyword". The purpose for this is so that when the journal entry
is retrieved from the database and viewed at a later time, any keywords
found will be hyperlinked, and can be clicked on to display search
results for other journal entries that contain that keyword.
It seems to be working quite well, only I've come up against a problem
where keywords which contain punctuation in them have to be handled
differently, which requires lots of additional programming. Also, the
programming I currently have in place is case sensitive, which means
that the word "joe" and "Joe" are treated as being completley different
strings, which I would prefer to have NOT be the case.
What I'd like to know is if there might be a better and more efficient
way for me to go about this. Specifically, I'd like to be able to avoid
having to program additional routines to handle punctuation within
keywords, and for the keyword detection to NOT be case sensitive.
Here's the code that I have so far:
<!--- QUERY KEYWORDS TABLE //--->
<cfquery datasource="MyDSN" name="get_keywords">
SELECT client_id,keyword
FROM keywords
WHERE client_id = '#COOKIE.USER#'
</cfquery>
<!--- QUERY KEYWORDS TABLE //--->
<CFSET word_list = "#JournalEntry#">
<CFLOOP INDEX="the_word" LIST="#word_list#" DELIMITERS=" ">
<CFSET theword = "#the_word#">
<!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->
<cfif the_word CONTAINS "'s.">
<cfset the_word_appd = REPLACE(#the_word#,"'s.",'',"ALL")>
<cfset thewordappd = REPLACE(#the_word#,"'s.",'',"ALL")>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfelseif the_word CONTAINS "'s">
<cfset the_word_ap = REPLACE(#the_word#,"'s",'',"ALL")>
<cfset thewordap = REPLACE(#the_word#,"'s",'',"ALL")>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
<cfelseif the_word CONTAINS ".">
<cfset the_word_pd = REPLACE(#the_word#,".",'',"ALL")>
<cfset thewordpd = REPLACE(#the_word#,".",'',"ALL")>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
<cfelseif the_word CONTAINS ",">
<cfset the_word_cm = REPLACE(#the_word#,",",'',"ALL")>
<cfset thewordcm = REPLACE(#the_word#,",",'',"ALL")>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
<cfelse>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
</cfif>
<!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->
<!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->
<cfquery dbtype="query" name="check_keyword">
SELECT keyword
FROM get_keywords
WHERE (keyword = '#the_word#') OR (keyword = '#the_word_pd#') OR
(keyword = '#the_word_cm#') OR (keyword = '#the_word_ap#') OR (keyword
= '#the_word_appd#')
AND client_id = '#COOKIE.USER#'
</cfquery>
<!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->
<!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->
<cfset the_word = "<a
href=entry.cfm?search_process=true&keyword=#thewor d#
class=#the_class#>#the_word#</a>">
<cfset the_word_pd = "<a
href=entry.cfm?search_process=true&keyword=#thewor dpd#
class=#the_class#>#the_word_pd#</a>">
<cfset the_word_cm = "<a
href=entry.cfm?search_process=true&keyword=#thewor dcm#
class=#the_class#>#the_word_cm#</a>">
<cfset the_word_ap = "<a
href=entry.cfm?search_process=true&keyword=#thewor dap#
class=#the_class#>#the_word_ap#</a>">
<cfset the_word_appd = "<a
href=entry.cfm?search_process=true&keyword=#thewor dappd#
class=#the_class#>#the_word_appd#</a>">
<!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->
<!--- DISPLAY THE NEXT WORD IN THE LOOP //--->
<cfoutput>
<font face="garamond" color="333333" size="3">
<cfif theword CONTAINS "'s.">
#the_word_appd#'s.
<cfelseif theword CONTAINS "'s">
#the_word_ap#'s
<cfelseif theword CONTAINS ".">
#the_word_pd#.
<cfelseif theword CONTAINS ",">
#the_word_cm#,
<cfelse>
#the_word#
</cfif>
</font>
</cfoutput>
</font>
<!--- DISPLAY THE NEXT WORD IN THE LOOP //--->
</cfloop>
Any help would be appreciated. Thanks in advance!
- yvan
yvan@ideasdesign.com Guest
-
PDWordFinder - Setting characters used to break up words
I am using PDWordFinderEnumWords with a PDWordfinder and a callback method to iterate through all the words in my PDF document. I notice that... -
Printing or Exporting Paragraph style list
Is there anyway to export the paragraph style pallet listing to Excel so I can review the list and see what is missing? I am doing screen caps right... -
List words in fulltext index
Hi all, Is there a way to get a list of all the words in a specified fulltext index? I'd like to be able to query the index for instance to get... -
Import "Stop Words" list before indexing
When I catalog PDFs, I want to limit large index files to important words only. I want either to have either (1) an "Include Only" list of keywords I... -
break paragraph of text into individual words / keyword detection
I'm writing an journaling type of application, and I was wondering if someone might be able to help me re-write one of the modules so it's more... -
yvan@ideasdesign.com #2
Break paragraph into list of words / keyword detection
I'm writing an journaling type of application, and I was wondering if
someone might be able to help me re-write one of the modules so it's
more streamlined and compact, and so that it requires less maintenance.
Here's the situation:
A person fills out and submits a journal entry. I then call a routine
that extracts each individual word in the journal entry and converts
them into an array of checkboxes, each having one of the extracted
words beside it. The user can then check one or more of the boxes to
indicate which of the words they'd like to have written to the database
as a "keyword". The purpose for this is so that when the journal entry
is retrieved from the database and viewed at a later time, any keywords
found will be hyperlinked, and can be clicked on to display search
results for other journal entries that contain that keyword.
It seems to be working quite well, only I've come up against a problem
where keywords which contain punctuation in them have to be handled
differently, which requires lots of additional programming. Also, the
programming I currently have in place is case sensitive, which means
that the word "joe" and "Joe" are treated as being completley different
strings, which I would prefer to have NOT be the case.
What I'd like to know is if there might be a better and more efficient
way for me to go about this. Specifically, I'd like to be able to avoid
having to program additional routines to handle punctuation within
keywords, and for the keyword detection to NOT be case sensitive.
Here's the code that I have so far:
<!--- QUERY KEYWORDS TABLE //--->
<cfquery datasource="MyDSN" name="get_keywords">
SELECT client_id,keyword
FROM keywords
WHERE client_id = '#COOKIE.USER#'
</cfquery>
<!--- QUERY KEYWORDS TABLE //--->
<CFSET word_list = "#JournalEntry#">
<CFLOOP INDEX="the_word" LIST="#word_list#" DELIMITERS=" ">
<CFSET theword = "#the_word#">
<!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->
<cfif the_word CONTAINS "'s.">
<cfset the_word_appd = REPLACE(#the_word#,"'s.",'',"ALL")>
<cfset thewordappd = REPLACE(#the_word#,"'s.",'',"ALL")>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfelseif the_word CONTAINS "'s">
<cfset the_word_ap = REPLACE(#the_word#,"'s",'',"ALL")>
<cfset thewordap = REPLACE(#the_word#,"'s",'',"ALL")>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
<cfelseif the_word CONTAINS ".">
<cfset the_word_pd = REPLACE(#the_word#,".",'',"ALL")>
<cfset thewordpd = REPLACE(#the_word#,".",'',"ALL")>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
<cfelseif the_word CONTAINS ",">
<cfset the_word_cm = REPLACE(#the_word#,",",'',"ALL")>
<cfset thewordcm = REPLACE(#the_word#,",",'',"ALL")>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
<cfelse>
<cfset the_word_pd = the_word>
<cfset thewordpd = the_word>
<cfset the_word_cm = the_word>
<cfset thewordcm = the_word>
<cfset the_word_ap = the_word>
<cfset thewordap = the_word>
<cfset the_word_appd = the_word>
<cfset thewordappd = the_word>
</cfif>
<!--- ASSIGN WORDS THAT CONTAIN PUNCTUATION TO NEW VARIABLES //--->
<!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->
<cfquery dbtype="query" name="check_keyword">
SELECT keyword
FROM get_keywords
WHERE (keyword = '#the_word#') OR (keyword = '#the_word_pd#') OR
(keyword = '#the_word_cm#') OR (keyword = '#the_word_ap#') OR (keyword
= '#the_word_appd#')
AND client_id = '#COOKIE.USER#'
</cfquery>
<!--- CHECK TO SEE IF THE KEYWORD IS IN THE DATABASE //--->
<!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->
<cfset the_word = "<a
href=entry.cfm?search_process=true&keyword=#thewor d#
class=#the_class#>#the_word#</a>">
<cfset the_word_pd = "<a
href=entry.cfm?search_process=true&keyword=#thewor dpd#
class=#the_class#>#the_word_pd#</a>">
<cfset the_word_cm = "<a
href=entry.cfm?search_process=true&keyword=#thewor dcm#
class=#the_class#>#the_word_cm#</a>">
<cfset the_word_ap = "<a
href=entry.cfm?search_process=true&keyword=#thewor dap#
class=#the_class#>#the_word_ap#</a>">
<cfset the_word_appd = "<a
href=entry.cfm?search_process=true&keyword=#thewor dappd#
class=#the_class#>#the_word_appd#</a>">
<!--- HIGHTLIGHT AND HYPERLINK WORD IF IT'S A KEYWORD //--->
<!--- DISPLAY THE NEXT WORD IN THE LOOP //--->
<cfoutput>
<font face="garamond" color="333333" size="3">
<cfif theword CONTAINS "'s.">
#the_word_appd#'s.
<cfelseif theword CONTAINS "'s">
#the_word_ap#'s
<cfelseif theword CONTAINS ".">
#the_word_pd#.
<cfelseif theword CONTAINS ",">
#the_word_cm#,
<cfelse>
#the_word#
</cfif>
</font>
</cfoutput>
</font>
<!--- DISPLAY THE NEXT WORD IN THE LOOP //--->
</cfloop>
Any help would be appreciated. Thanks in advance!
- yvan
yvan@ideasdesign.com Guest



Reply With Quote

