m and m <--> 1 relations is where the mess is, and those joining keys ought to be in each others' indexes, I think. Also, since fArtBkTitle is so small, the full-text on that column has to be trivial, and it seems like a UNION query with the separate conditions would be in order. If for some strange reason that doesn't work well, create a temp table, insert the results of one condition, then the other, then select distinct. Finding the distinct rows from the two queries should be no work at all. Finally, do you have to use WITH NOCHECK? I'm not sure if it applies here, but there are situations where when NOCHECK makes an index unusable for optimization. It's worth getting rid of that if you have it. SK Curtis Wensley wrote: [ref] >Hi Steve, > >Thanks for your quick response! The DISTINCT was just a test, it gives the >same lack of performance either way.... also tried 'group by' on the >subquery to try and eliminate the 180 million records in the table spool, >but that didn't work either.. > >Also, you are right in the structure of the database.. however, as the >database sits right now (it was converted from another database), there's >typically only ONE chapter, and ONE paragraph occurence for each >book/article. The paragraphs/chapters would be split up for new records, >and manually for existing ones at a later time. > >Doing some searches finds this: > >- 9902 out of 37762 records in tTextTab_Text (fText) contains 'health' >(paragraphs) >- tTextTab has 37763 records (chapters) >- 679 out of 46910 records in tContent (fArtBkTitle) contains 'health' >(books/articles) > >I am GUESSING that there might be 180 million ocurrences of HEALTH in all of >the text perhaps.. Though another perplexing thing is if I use CONTAINSTABLE >(to get the RANK), it jumps up to a whopping 300+ MILLION records in the >table spool... which is weird because it's the same search.. > >I've attached a script that can be used to create the table(s) and their >indecies (including full-text).. I stripped out any non-pertinent stuff to >keep it simple. > >Thanks, > >Curtis. > > >"Steve Kass" wrote in message >news:edu... > >[ref] >>Curtis, >> >> First, the query plans show SELECT DISTINCT in the subquery. Do >>you see a performance difference without it? IN (select ...) should >>be the same as IN (select distinct ...), but materializing a (select >>distinct) >>may be much work work. >> >>Otherwise, here's what I'm seeing in your diagram: >> >> [tContent] 1---m [tContent_TextTab] m---1 [tTextTab] 1---m >> >>[/ref] >[tTextTab_Text] > >[ref] >>A title (A) contains 1 or more chapters A_Ch1, A_Ch2, ... unique to >>that title. Each chapter consists of a text, which may be the text >>of other chapters for this or other titles. Each text contains 1 or more >>paragraphs, unique to that text: >> >>title A >> A_Ch1 = Txt1098 >> Paragraph1098_1 >> Paragraph1098_2 >> A_Ch2 = Txt4437 >> Paragraph4437_1 >> Paragraph4437_2 >> Paragraph4437_3 >> A_Ch3 = Txt1098 >> Paragraph1098_1 >> Paragraph1098_2 >>title B >> B_Ch1 = Txt1098 >> Paragraph1098_1 >> Paragraph1098_2 >> B_Ch2 = Txt1037 >> Paragraph1037_1 >> >> >>You want the titles where the title contains 'health' >>or one of the title's chapters' texts' paragraphs contains >>'health'. >> >>Can we see the table structures with indexes? There must >>be some reason why the comparison against 'health' is being >>delayed so much. Also, what percentage of Paragraphs, >>corresponding to what percentage of texts, and what percentage >>of Chapters, contain 'health'? >> >>What are typical values of m? It would help to know, say, that >>a typical text contains 200 paragraphs, and more or less all of >>them or none of them contain 'health'. >> >>SK >> >> >> >>Curtis Wensley wrote: >> >> >>[ref] >>>I forgot to mention, that it is not only a count that i'm after.. I just >>>did that because I didn't want to scroll through 600 hits. >>> >>>The results of this query are output directly to the end-user in a grid >>>control. >>> >>>A union wouldn't work (unless I used distinct I guess, but that's ugly).. >>> >>>[/ref][/ref] >as > >[ref][ref] >>>it can possibly give more results than there really is... Also, unions >>>won't work because there is the possibility of a very complex boolean >>> >>>[/ref][/ref] >search > >[ref][ref] >>>on separate fields (of the end-users choosing) which will not give the >>>correct results.. >>> >>>I really am beginning to think this is a BUG in SQL Server.. ): >>> >>>"oj" wrote in message >>>news:phx.gbl... >>> >>> >>> >>> >>>>hows this... >>>> >>>>select sum(cnt) as cnt >>>> >>>> >>>>from ( >>> >>> >>>>SELECT count(*) as cnt FROM tContent >>>>WHERE >>>>tContent.fSystemKey in ( >>>> SELECT tContent_TextTab.fTemplateKey FROM (( >>>> tContent_TextTab >>>> INNER JOIN tTextTab ON tTextTab.fSystemKey = >>>>tContent_TextTab.fTextTab) >>>> INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey = >>>>tTextTab.fSystemKey) >>>> where contains(tTextTab_Text.fText, N'"health"') >>>> ) >>>>union all >>>>SELECT count(*) as cnt FROM tContent >>>>WHERE >>>>contains(tContent.fArtBkTitle, N'"health"') >>>>)Derived >>>> >>>>-- >>>>-oj >>>>RAC v2.2 & QALite! >>>>http://www.rac4sql.net >>>> >>>> >>>>"Curtis Wensley" wrote in message >>>>news:phx.gbl... >>>> >>>> >>>> >>>> >>>>>First, sorry for the cross post, but I think I posted on the wrong >>>>>newsgroup.. (; >>>>> >>>>>Here is my situation. Sql2k, Win2k, latest service packs. I have a >>>>> >>>>> >>>>> >>>>> >>>table >>> >>> >>> >>> >>>>>structure like: >>>>> >>>>>[tContent] 1---m [tContent_TextTab] m---1 [tTextTab] 1---m >>>>> >>>>> >>>>> >>>>> >>>[tTextTab_Text] >>> >>> >>> >>> >>>>>[tContent] has the Title (nvarchar), and [tTextTab_Text] has complete >>>>>article text in an ntext column. There's under 40k records in the >>>>> >>>>> >>>>> >>>>> >>>>table(s). >>>> >>>> >>>> >>>> >>>>>I cannot seem to get this query to go fast (takes over 5 minutes): >>>>> >>>>>SELECT count(*) FROM tContent >>>>>WHERE >>>>>tContent.fSystemKey in ( >>>>> SELECT tContent_TextTab.fTemplateKey FROM (( >>>>> tContent_TextTab >>>>> INNER JOIN tTextTab ON tTextTab.fSystemKey = >>>>>tContent_TextTab.fTextTab) >>>>> INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey = >>>>>tTextTab.fSystemKey) >>>>> where contains(tTextTab_Text.fText, N'"health"') >>>>> ) >>>>>or contains(tContent.fArtBkTitle, N'"health"') >>>>> >>>>> >>>>>If I take the LAST line out, or change it to an AND, it only takes a >>>>> >>>>> >>>>> >>>>> >>>>matter >>>> >>>> >>>> >>>> >>>>>of a few *seconds* to produce results. The main purpose is to search >>>>> >>>>> >>>>> >>>>> >>>>both >>>> >>>> >>>> >>>> >>>>>the title and the article text at the same time, using an OR. I have >>>>> >>>>> >>>>> >>>>> >>>>tried >>>> >>>> >>>> >>>> >>>>>using EXISTS instead of IN, but it produces the same result. I cannot >>>>> >>>>> >>>>> >>>>> >>>>join >>>> >>>> >>>> >>>> >>>>>the tables directly as this will change the result set.. >>>>> >>>>>I have attached the estimated execution plan (which looks fine), and >>>>> >>>>>[/ref][/ref] >the > >[ref][ref] >>>>>actual execution plan. One interesting note is the # of rows in the >>>>> >>>>> >>>>> >>>>> >>>table >>> >>> >>> >>> >>>>>spool!! The amount of rows from the subquery is only about 9000 >>>>> >>>>>[/ref][/ref] >records > >[ref][ref] >>>>>(like the estimated plan shows).. but the temporary table is getting >>>>> >>>>> >>>>> >>>>> >>>over >>> >>> >>> >>> >>>>>180 MILLION records, and scanning through it many times... I would >>>>> >>>>> >>>>> >>>>> >>>THINK >>> >>> >>> >>> >>>>>this is a bug in SQL, but not too sure.. >>>>> >>>>>Has anyone run into this before? I would really like to know how to >>>>> >>>>>[/ref][/ref] >get > >[ref][ref] >>>>>around this bug or 'feature' of sql server so I can search these fields >>>>> >>>>> >>>>> >>>>> >>>in >>> >>> >>> >>> >>>>>the single query.. >>>>> >>>>>Thanks, >>>>>Curtis. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>>[/ref][/ref] > > > >[/ref] [allowsmilie] => 1 [showsignature] => 0 [ipaddress] => [iconid] => 0 [visible] => 1 [attach] => 0 [infraction] => 0 [reportthreadid] => 0 [isusenetpost] => 1 [msgid] => [ref] => <#R1vIeRXDHA.1492@TK2MSFTNGP12.phx.gbl> <3F32BFC4.6010603@drew.edu> [htmlstate] => on_nl2br [postusername] => Steve [ip] => skass@drew.edu [isdeleted] => 0 [usergroupid] => [membergroupids] => [displaygroupid] => [password] => [passworddate] => [email] => [styleid] => [parentemail] => [homepage] => [icq] => [aim] => [yahoo] => [msn] => [skype] => [showvbcode] => [showbirthday] => [usertitle] => [customtitle] => [joindate] => [daysprune] => [lastvisit] => [lastactivity] => [lastpost] => [lastpostid] => [posts] => [reputation] => [reputationlevelid] => [timezoneoffset] => [pmpopup] => [avatarid] => [avatarrevision] => [profilepicrevision] => [sigpicrevision] => [options] => [akvbghsfs_optionsfield] => [birthday] => [birthday_search] => [maxposts] => [startofweek] => [referrerid] => [languageid] => [emailstamp] => [threadedmode] => [autosubscribe] => [pmtotal] => [pmunread] => [salt] => [ipoints] => [infractions] => [warnings] => [infractiongroupids] => [infractiongroupid] => [adminoptions] => [profilevisits] => [friendcount] => [friendreqcount] => [vmunreadcount] => [vmmoderatedcount] => [socgroupinvitecount] => [socgroupreqcount] => [pcunreadcount] => [pcmoderatedcount] => [gmmoderatedcount] => [assetposthash] => [fbuserid] => [fbjoindate] => [fbname] => [logintype] => [fbaccesstoken] => [newrepcount] => [vbseo_likes_in] => [vbseo_likes_out] => [vbseo_likes_unread] => [temp] => [field1] => [field2] => [field3] => [field4] => [field5] => [subfolders] => [pmfolders] => [buddylist] => [ignorelist] => [signature] => [searchprefs] => [rank] => [icontitle] => [iconpath] => [avatarpath] => [hascustomavatar] => 0 [avatardateline] => [avwidth] => [avheight] => [edit_userid] => [edit_username] => [edit_dateline] => [edit_reason] => [hashistory] => [pagetext_html] => [hasimages] => [signatureparsed] => [sighasimages] => [sigpic] => [sigpicdateline] => [sigpicwidth] => [sigpicheight] => [postcount] => 13 [islastshown] => [isfirstshown] => [attachments] => [allattachments] => ) --> m > and m <--> 1 relations is where the mess is, and those joining keys > ought to be in each others' indexes, I think. > > Also, since fArtBkTitle is so small, the full-text on that column > has to be trivial, and it seems like a UNION query with the > separate conditions would be in order. If for some strange reason > that doesn't work well, create a temp table, insert the results of > one condition, then the other, then select distinct. Finding the distinct > rows from the two queries should be no work at all. > > Finally, do you have to use WITH NOCHECK? I'm not sure > if it applies here, but there are situations where when NOCHECK > makes an index unusable for optimization. It's worth getting rid > of that if you have it. > > SK > > > Curtis Wensley wrote: >[ref] > >Hi Steve, > > > >Thanks for your quick response! The DISTINCT was just a test, it gives[/ref][/ref] the[ref][ref] > >same lack of performance either way.... also tried 'group by' on the > >subquery to try and eliminate the 180 million records in the table spool, > >but that didn't work either.. > > > >Also, you are right in the structure of the database.. however, as the > >database sits right now (it was converted from another database), there's > >typically only ONE chapter, and ONE paragraph occurence for each > >book/article. The paragraphs/chapters would be split up for new records, > >and manually for existing ones at a later time. > > > >Doing some searches finds this: > > > >- 9902 out of 37762 records in tTextTab_Text (fText) contains 'health' > >(paragraphs) > >- tTextTab has 37763 records (chapters) > >- 679 out of 46910 records in tContent (fArtBkTitle) contains 'health' > >(books/articles) > > > >I am GUESSING that there might be 180 million ocurrences of HEALTH in all[/ref][/ref] of[ref][ref] > >the text perhaps.. Though another perplexing thing is if I use[/ref][/ref] CONTAINSTABLE[ref][ref] > >(to get the RANK), it jumps up to a whopping 300+ MILLION records in the > >table spool... which is weird because it's the same search.. > > > >I've attached a script that can be used to create the table(s) and their > >indecies (including full-text).. I stripped out any non-pertinent stuff[/ref][/ref] to[ref][ref] > >keep it simple. > > > >Thanks, > > > >Curtis. > > > > > >"Steve Kass" wrote in message > >news:edu... > > > >[ref] > >>Curtis, > >> > >> First, the query plans show SELECT DISTINCT in the subquery. Do > >>you see a performance difference without it? IN (select ...) should > >>be the same as IN (select distinct ...), but materializing a (select > >>distinct) > >>may be much work work. > >> > >>Otherwise, here's what I'm seeing in your diagram: > >> > >> [tContent] 1---m [tContent_TextTab] m---1 [tTextTab] 1---m > >> > >>[/ref] > >[tTextTab_Text] > > > >[ref] > >>A title (A) contains 1 or more chapters A_Ch1, A_Ch2, ... unique to > >>that title. Each chapter consists of a text, which may be the text > >>of other chapters for this or other titles. Each text contains 1 or[/ref][/ref][/ref] more[ref][ref][ref] > >>paragraphs, unique to that text: > >> > >>title A > >> A_Ch1 = Txt1098 > >> Paragraph1098_1 > >> Paragraph1098_2 > >> A_Ch2 = Txt4437 > >> Paragraph4437_1 > >> Paragraph4437_2 > >> Paragraph4437_3 > >> A_Ch3 = Txt1098 > >> Paragraph1098_1 > >> Paragraph1098_2 > >>title B > >> B_Ch1 = Txt1098 > >> Paragraph1098_1 > >> Paragraph1098_2 > >> B_Ch2 = Txt1037 > >> Paragraph1037_1 > >> > >> > >>You want the titles where the title contains 'health' > >>or one of the title's chapters' texts' paragraphs contains > >>'health'. > >> > >>Can we see the table structures with indexes? There must > >>be some reason why the comparison against 'health' is being > >>delayed so much. Also, what percentage of Paragraphs, > >>corresponding to what percentage of texts, and what percentage > >>of Chapters, contain 'health'? > >> > >>What are typical values of m? It would help to know, say, that > >>a typical text contains 200 paragraphs, and more or less all of > >>them or none of them contain 'health'. > >> > >>SK > >> > >> > >> > >>Curtis Wensley wrote: > >> > >> > >> > >>>I forgot to mention, that it is not only a count that i'm after.. I[/ref][/ref][/ref] just[ref][ref][ref] > >>>did that because I didn't want to scroll through 600 hits. > >>> > >>>The results of this query are output directly to the end-user in a grid > >>>control. > >>> > >>>A union wouldn't work (unless I used distinct I guess, but that's[/ref][/ref][/ref] ugly)..[ref][ref][ref] > >>> > >>>[/ref] > >as > > > >[ref] > >>>it can possibly give more results than there really is... Also,[/ref][/ref][/ref] unions[ref][ref][ref] > >>>won't work because there is the possibility of a very complex boolean > >>> > >>>[/ref] > >search > > > >[ref] > >>>on separate fields (of the end-users choosing) which will not give the > >>>correct results.. > >>> > >>>I really am beginning to think this is a BUG in SQL Server.. ): > >>> > >>>"oj" wrote in message > >>>news:phx.gbl... > >>> > >>> > >>> > >>> > >>>>hows this... > >>>> > >>>>select sum(cnt) as cnt > >>>> > >>>> > >>>>from ( > >>> > >>> > >>>>SELECT count(*) as cnt FROM tContent > >>>>WHERE > >>>>tContent.fSystemKey in ( > >>>> SELECT tContent_TextTab.fTemplateKey FROM (( > >>>> tContent_TextTab > >>>> INNER JOIN tTextTab ON tTextTab.fSystemKey = > >>>>tContent_TextTab.fTextTab) > >>>> INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey = > >>>>tTextTab.fSystemKey) > >>>> where contains(tTextTab_Text.fText, N'"health"') > >>>> ) > >>>>union all > >>>>SELECT count(*) as cnt FROM tContent > >>>>WHERE > >>>>contains(tContent.fArtBkTitle, N'"health"') > >>>>)Derived > >>>> > >>>>-- > >>>>-oj > >>>>RAC v2.2 & QALite! > >>>>http://www.rac4sql.net > >>>> > >>>> > >>>>"Curtis Wensley" wrote in message > >>>>news:phx.gbl... > >>>> > >>>> > >>>> > >>>> > >>>>>First, sorry for the cross post, but I think I posted on the wrong > >>>>>newsgroup.. (; > >>>>> > >>>>>Here is my situation. Sql2k, Win2k, latest service packs. I have a > >>>>> > >>>>> > >>>>> > >>>>> > >>>table > >>> > >>> > >>> > >>> > >>>>>structure like: > >>>>> > >>>>>[tContent] 1---m [tContent_TextTab] m---1 [tTextTab] 1---m > >>>>> > >>>>> > >>>>> > >>>>> > >>>[tTextTab_Text] > >>> > >>> > >>> > >>> > >>>>>[tContent] has the Title (nvarchar), and [tTextTab_Text] has complete > >>>>>article text in an ntext column. There's under 40k records in the > >>>>> > >>>>> > >>>>> > >>>>> > >>>>table(s). > >>>> > >>>> > >>>> > >>>> > >>>>>I cannot seem to get this query to go fast (takes over 5 minutes): > >>>>> > >>>>>SELECT count(*) FROM tContent > >>>>>WHERE > >>>>>tContent.fSystemKey in ( > >>>>> SELECT tContent_TextTab.fTemplateKey FROM (( > >>>>> tContent_TextTab > >>>>> INNER JOIN tTextTab ON tTextTab.fSystemKey = > >>>>>tContent_TextTab.fTextTab) > >>>>> INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey = > >>>>>tTextTab.fSystemKey) > >>>>> where contains(tTextTab_Text.fText, N'"health"') > >>>>> ) > >>>>>or contains(tContent.fArtBkTitle, N'"health"') > >>>>> > >>>>> > >>>>>If I take the LAST line out, or change it to an AND, it only takes a > >>>>> > >>>>> > >>>>> > >>>>> > >>>>matter > >>>> > >>>> > >>>> > >>>> > >>>>>of a few *seconds* to produce results. The main purpose is to[/ref][/ref][/ref] search[ref][ref][ref] > >>>>> > >>>>> > >>>>> > >>>>> > >>>>both > >>>> > >>>> > >>>> > >>>> > >>>>>the title and the article text at the same time, using an OR. I[/ref][/ref][/ref] have[ref][ref][ref] > >>>>> > >>>>> > >>>>> > >>>>> > >>>>tried > >>>> > >>>> > >>>> > >>>> > >>>>>using EXISTS instead of IN, but it produces the same result. I[/ref][/ref][/ref] cannot[ref][ref][ref] > >>>>> > >>>>> > >>>>> > >>>>> > >>>>join > >>>> > >>>> > >>>> > >>>> > >>>>>the tables directly as this will change the result set.. > >>>>> > >>>>>I have attached the estimated execution plan (which looks fine), and > >>>>> > >>>>>[/ref] > >the > > > >[ref] > >>>>>actual execution plan. One interesting note is the # of rows in the > >>>>> > >>>>> > >>>>> > >>>>> > >>>table > >>> > >>> > >>> > >>> > >>>>>spool!! The amount of rows from the subquery is only about 9000 > >>>>> > >>>>>[/ref] > >records > > > >[ref] > >>>>>(like the estimated plan shows).. but the temporary table is getting > >>>>> > >>>>> > >>>>> > >>>>> > >>>over > >>> > >>> > >>> > >>> > >>>>>180 MILLION records, and scanning through it many times... I would > >>>>> > >>>>> > >>>>> > >>>>> > >>>THINK > >>> > >>> > >>> > >>> > >>>>>this is a bug in SQL, but not too sure.. > >>>>> > >>>>>Has anyone run into this before? I would really like to know how to > >>>>> > >>>>>[/ref] > >get > > > >[ref] > >>>>>around this bug or 'feature' of sql server so I can search these[/ref][/ref][/ref] fields[ref][ref][ref] > >>>>> > >>>>> > >>>>> > >>>>> > >>>in > >>> > >>> > >>> > >>> > >>>>>the single query.. > >>>>> > >>>>>Thanks, > >>>>>Curtis. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > >>> > >>>[/ref] > > > > > > > >[/ref] >[/ref] [allowsmilie] => 1 [showsignature] => 0 [ipaddress] => [iconid] => 0 [visible] => 1 [attach] => 0 [infraction] => 0 [reportthreadid] => 0 [isusenetpost] => 1 [msgid] => [ref] => <#R1vIeRXDHA.1492@TK2MSFTNGP12.phx.gbl> <3F32BFC4.6010603@drew.edu> [htmlstate] => on_nl2br [postusername] => Curtis [ip] => [isdeleted] => 0 [usergroupid] => [membergroupids] => [displaygroupid] => [password] => [passworddate] => [email] => [styleid] => [parentemail] => [homepage] => [icq] => [aim] => [yahoo] => [msn] => [skype] => [showvbcode] => [showbirthday] => [usertitle] => [customtitle] => [joindate] => [daysprune] => [lastvisit] => [lastactivity] => [lastpost] => [lastpostid] => [posts] => [reputation] => [reputationlevelid] => [timezoneoffset] => [pmpopup] => [avatarid] => [avatarrevision] => [profilepicrevision] => [sigpicrevision] => [options] => [akvbghsfs_optionsfield] => [birthday] => [birthday_search] => [maxposts] => [startofweek] => [referrerid] => [languageid] => [emailstamp] => [threadedmode] => [autosubscribe] => [pmtotal] => [pmunread] => [salt] => [ipoints] => [infractions] => [warnings] => [infractiongroupids] => [infractiongroupid] => [adminoptions] => [profilevisits] => [friendcount] => [friendreqcount] => [vmunreadcount] => [vmmoderatedcount] => [socgroupinvitecount] => [socgroupreqcount] => [pcunreadcount] => [pcmoderatedcount] => [gmmoderatedcount] => [assetposthash] => [fbuserid] => [fbjoindate] => [fbname] => [logintype] => [fbaccesstoken] => [newrepcount] => [vbseo_likes_in] => [vbseo_likes_out] => [vbseo_likes_unread] => [temp] => [field1] => [field2] => [field3] => [field4] => [field5] => [subfolders] => [pmfolders] => [buddylist] => [ignorelist] => [signature] => [searchprefs] => [rank] => [icontitle] => [iconpath] => [avatarpath] => [hascustomavatar] => 0 [avatardateline] => [avwidth] => [avheight] => [edit_userid] => [edit_username] => [edit_dateline] => [edit_reason] => [hashistory] => [pagetext_html] => [hasimages] => [signatureparsed] => [sighasimages] => [sigpic] => [sigpicdateline] => [sigpicwidth] => [sigpicheight] => [postcount] => 14 [islastshown] => [isfirstshown] => [attachments] => [allattachments] => ) --> m >>and m <--> 1 relations is where the mess is, and those joining keys >>ought to be in each others' indexes, I think. >> >>Also, since fArtBkTitle is so small, the full-text on that column >>has to be trivial, and it seems like a UNION query with the >>separate conditions would be in order. If for some strange reason >>that doesn't work well, create a temp table, insert the results of >>one condition, then the other, then select distinct. Finding the distinct >>rows from the two queries should be no work at all. >> >>Finally, do you have to use WITH NOCHECK? I'm not sure >>if it applies here, but there are situations where when NOCHECK >>makes an index unusable for optimization. It's worth getting rid >>of that if you have it. >> >>SK >> >> >>Curtis Wensley wrote: >> >> >>[ref] >>>Hi Steve, >>> >>>Thanks for your quick response! The DISTINCT was just a test, it gives >>> >>>[/ref][/ref] >the > >[ref][ref] >>>same lack of performance either way.... also tried 'group by' on the >>>subquery to try and eliminate the 180 million records in the table spool, >>>but that didn't work either.. >>> >>>Also, you are right in the structure of the database.. however, as the >>>database sits right now (it was converted from another database), there's >>>typically only ONE chapter, and ONE paragraph occurence for each >>>book/article. The paragraphs/chapters would be split up for new records, >>>and manually for existing ones at a later time. >>> >>>Doing some searches finds this: >>> >>>- 9902 out of 37762 records in tTextTab_Text (fText) contains 'health' >>>(paragraphs) >>>- tTextTab has 37763 records (chapters) >>>- 679 out of 46910 records in tContent (fArtBkTitle) contains 'health' >>>(books/articles) >>> >>>I am GUESSING that there might be 180 million ocurrences of HEALTH in all >>> >>>[/ref][/ref] >of > >[ref][ref] >>>the text perhaps.. Though another perplexing thing is if I use >>> >>>[/ref][/ref] >CONTAINSTABLE > >[ref][ref] >>>(to get the RANK), it jumps up to a whopping 300+ MILLION records in the >>>table spool... which is weird because it's the same search.. >>> >>>I've attached a script that can be used to create the table(s) and their >>>indecies (including full-text).. I stripped out any non-pertinent stuff >>> >>>[/ref][/ref] >to > >[ref][ref] >>>keep it simple. >>> >>>Thanks, >>> >>>Curtis. >>> >>> >>>"Steve Kass" wrote in message >>>news:edu... >>> >>> >>> >>> >>>>Curtis, >>>> >>>> First, the query plans show SELECT DISTINCT in the subquery. Do >>>>you see a performance difference without it? IN (select ...) should >>>>be the same as IN (select distinct ...), but materializing a (select >>>>distinct) >>>>may be much work work. >>>> >>>>Otherwise, here's what I'm seeing in your diagram: >>>> >>>> [tContent] 1---m [tContent_TextTab] m---1 [tTextTab] 1---m >>>> >>>> >>>> >>>> >>>[tTextTab_Text] >>> >>> >>> >>> >>>>A title (A) contains 1 or more chapters A_Ch1, A_Ch2, ... unique to >>>>that title. Each chapter consists of a text, which may be the text >>>>of other chapters for this or other titles. Each text contains 1 or >>>> >>>>[/ref][/ref] >more > >[ref][ref] >>>>paragraphs, unique to that text: >>>> >>>>title A >>>> A_Ch1 = Txt1098 >>>> Paragraph1098_1 >>>> Paragraph1098_2 >>>> A_Ch2 = Txt4437 >>>> Paragraph4437_1 >>>> Paragraph4437_2 >>>> Paragraph4437_3 >>>> A_Ch3 = Txt1098 >>>> Paragraph1098_1 >>>> Paragraph1098_2 >>>>title B >>>> B_Ch1 = Txt1098 >>>> Paragraph1098_1 >>>> Paragraph1098_2 >>>> B_Ch2 = Txt1037 >>>> Paragraph1037_1 >>>> >>>> >>>>You want the titles where the title contains 'health' >>>>or one of the title's chapters' texts' paragraphs contains >>>>'health'. >>>> >>>>Can we see the table structures with indexes? There must >>>>be some reason why the comparison against 'health' is being >>>>delayed so much. Also, what percentage of Paragraphs, >>>>corresponding to what percentage of texts, and what percentage >>>>of Chapters, contain 'health'? >>>> >>>>What are typical values of m? It would help to know, say, that >>>>a typical text contains 200 paragraphs, and more or less all of >>>>them or none of them contain 'health'. >>>> >>>>SK >>>> >>>> >>>> >>>>Curtis Wensley wrote: >>>> >>>> >>>> >>>> >>>> >>>>>I forgot to mention, that it is not only a count that i'm after.. I >>>>> >>>>>[/ref][/ref] >just > >[ref][ref] >>>>>did that because I didn't want to scroll through 600 hits. >>>>> >>>>>The results of this query are output directly to the end-user in a grid >>>>>control. >>>>> >>>>>A union wouldn't work (unless I used distinct I guess, but that's >>>>> >>>>>[/ref][/ref] >ugly).. > >[ref][ref] >>>>> >>>>> >>>as >>> >>> >>> >>> >>>>>it can possibly give more results than there really is... Also, >>>>> >>>>>[/ref][/ref] >unions > >[ref][ref] >>>>>won't work because there is the possibility of a very complex boolean >>>>> >>>>> >>>>> >>>>> >>>search >>> >>> >>> >>> >>>>>on separate fields (of the end-users choosing) which will not give the >>>>>correct results.. >>>>> >>>>>I really am beginning to think this is a BUG in SQL Server.. ): >>>>> >>>>>"oj" wrote in message >>>>>news:phx.gbl... >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>hows this... >>>>>> >>>>>>select sum(cnt) as cnt >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>from ( >>>>> >>>>> >>>>> >>>>> >>>>>>SELECT count(*) as cnt FROM tContent >>>>>>WHERE >>>>>>tContent.fSystemKey in ( >>>>>> SELECT tContent_TextTab.fTemplateKey FROM (( >>>>>> tContent_TextTab >>>>>> INNER JOIN tTextTab ON tTextTab.fSystemKey = >>>>>>tContent_TextTab.fTextTab) >>>>>> INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey = >>>>>>tTextTab.fSystemKey) >>>>>> where contains(tTextTab_Text.fText, N'"health"') >>>>>> ) >>>>>>union all >>>>>>SELECT count(*) as cnt FROM tContent >>>>>>WHERE >>>>>>contains(tContent.fArtBkTitle, N'"health"') >>>>>>)Derived >>>>>> >>>>>>-- >>>>>>-oj >>>>>>RAC v2.2 & QALite! >>>>>>http://www.rac4sql.net >>>>>> >>>>>> >>>>>>"Curtis Wensley" wrote in message >>>>>>news:phx.gbl... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>First, sorry for the cross post, but I think I posted on the wrong >>>>>>>newsgroup.. (; >>>>>>> >>>>>>>Here is my situation. Sql2k, Win2k, latest service packs. I have a >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>table >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>>structure like: >>>>>>> >>>>>>>[tContent] 1---m [tContent_TextTab] m---1 [tTextTab] 1---m >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>[tTextTab_Text] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>>[tContent] has the Title (nvarchar), and [tTextTab_Text] has complete >>>>>>>article text in an ntext column. There's under 40k records in the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>table(s). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>I cannot seem to get this query to go fast (takes over 5 minutes): >>>>>>> >>>>>>>SELECT count(*) FROM tContent >>>>>>>WHERE >>>>>>>tContent.fSystemKey in ( >>>>>>> SELECT tContent_TextTab.fTemplateKey FROM (( >>>>>>> tContent_TextTab >>>>>>> INNER JOIN tTextTab ON tTextTab.fSystemKey = >>>>>>>tContent_TextTab.fTextTab) >>>>>>> INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey = >>>>>>>tTextTab.fSystemKey) >>>>>>> where contains(tTextTab_Text.fText, N'"health"') >>>>>>> ) >>>>>>>or contains(tContent.fArtBkTitle, N'"health"') >>>>>>> >>>>>>> >>>>>>>If I take the LAST line out, or change it to an AND, it only takes a >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>matter >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>of a few *seconds* to produce results. The main purpose is to >>>>>>> >>>>>>>[/ref][/ref] >search > >[ref][ref] >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>both >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>the title and the article text at the same time, using an OR. I >>>>>>> >>>>>>>[/ref][/ref] >have > >[ref][ref] >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>tried >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>using EXISTS instead of IN, but it produces the same result. I >>>>>>> >>>>>>>[/ref][/ref] >cannot > >[ref][ref] >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>join >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>the tables directly as this will change the result set.. >>>>>>> >>>>>>>I have attached the estimated execution plan (which looks fine), and >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>the >>> >>> >>> >>> >>>>>>>actual execution plan. One interesting note is the # of rows in the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>table >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>>spool!! The amount of rows from the subquery is only about 9000 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>records >>> >>> >>> >>> >>>>>>>(like the estimated plan shows).. but the temporary table is getting >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>over >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>>180 MILLION records, and scanning through it many times... I would >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>THINK >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>>this is a bug in SQL, but not too sure.. >>>>>>> >>>>>>>Has anyone run into this before? I would really like to know how to >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>get >>> >>> >>> >>> >>>>>>>around this bug or 'feature' of sql server so I can search these >>>>>>> >>>>>>>[/ref][/ref] >fields > >[ref][ref] >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>in >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>>the single query.. >>>>>>> >>>>>>>Thanks, >>>>>>>Curtis. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> >>>[/ref][/ref] > > > >[/ref] [allowsmilie] => 1 [showsignature] => 0 [ipaddress] => [iconid] => 0 [visible] => 1 [attach] => 0 [infraction] => 0 [reportthreadid] => 0 [isusenetpost] => 1 [msgid] => [ref] => <#R1vIeRXDHA.1492@TK2MSFTNGP12.phx.gbl> <3F32BFC4.6010603@drew.edu> [htmlstate] => on_nl2br [postusername] => Steve [ip] => skass@drew.edu [isdeleted] => 0 [usergroupid] => [membergroupids] => [displaygroupid] => [password] => [passworddate] => [email] => [styleid] => [parentemail] => [homepage] => [icq] => [aim] => [yahoo] => [msn] => [skype] => [showvbcode] => [showbirthday] => [usertitle] => [customtitle] => [joindate] => [daysprune] => [lastvisit] => [lastactivity] => [lastpost] => [lastpostid] => [posts] => [reputation] => [reputationlevelid] => [timezoneoffset] => [pmpopup] => [avatarid] => [avatarrevision] => [profilepicrevision] => [sigpicrevision] => [options] => [akvbghsfs_optionsfield] => [birthday] => [birthday_search] => [maxposts] => [startofweek] => [referrerid] => [languageid] => [emailstamp] => [threadedmode] => [autosubscribe] => [pmtotal] => [pmunread] => [salt] => [ipoints] => [infractions] => [warnings] => [infractiongroupids] => [infractiongroupid] => [adminoptions] => [profilevisits] => [friendcount] => [friendreqcount] => [vmunreadcount] => [vmmoderatedcount] => [socgroupinvitecount] => [socgroupreqcount] => [pcunreadcount] => [pcmoderatedcount] => [gmmoderatedcount] => [assetposthash] => [fbuserid] => [fbjoindate] => [fbname] => [logintype] => [fbaccesstoken] => [newrepcount] => [vbseo_likes_in] => [vbseo_likes_out] => [vbseo_likes_unread] => [temp] => [field1] => [field2] => [field3] => [field4] => [field5] => [subfolders] => [pmfolders] => [buddylist] => [ignorelist] => [signature] => [searchprefs] => [rank] => [icontitle] => [iconpath] => [avatarpath] => [hascustomavatar] => 0 [avatardateline] => [avwidth] => [avheight] => [edit_userid] => [edit_username] => [edit_dateline] => [edit_reason] => [hashistory] => [pagetext_html] => [hasimages] => [signatureparsed] => [sighasimages] => [sigpic] => [sigpicdateline] => [sigpicwidth] => [sigpicheight] => [postcount] => 17 [islastshown] => [isfirstshown] => [attachments] => [allattachments] => ) --> VERY Slow contains search in a subquery using 'or' - Microsoft SQL / MS SQL Server

VERY Slow contains search in a subquery using 'or' - Microsoft SQL / MS SQL Server

hows this... select sum(cnt) as cnt from ( SELECT count(*) as cnt FROM tContent WHERE tContent.fSystemKey in ( SELECT tContent_TextTab.fTemplateKey FROM (( tContent_TextTab INNER JOIN tTextTab ON tTextTab.fSystemKey = tContent_TextTab.fTextTab) INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey = tTextTab.fSystemKey) where contains(tTextTab_Text.fText, N'"health"') ) union all SELECT count(*) as cnt FROM tContent WHERE contains(tContent.BkTitle, N'"health"') )Derived -- -oj RAC v2.2 & QALite! http://www.rac4sql.net "Curtis Wensley" <cwensley at sydneyplus.com> wrote in message news:phx.gbl...  table(s).  matter  both  tried  join ...

  1. #1

    Default Re: VERY Slow contains search in a subquery using 'or'

    hows this...

    select sum(cnt) as cnt
    from (
    SELECT count(*) as cnt FROM tContent
    WHERE
    tContent.fSystemKey in (
    SELECT tContent_TextTab.fTemplateKey FROM ((
    tContent_TextTab
    INNER JOIN tTextTab ON tTextTab.fSystemKey =
    tContent_TextTab.fTextTab)
    INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey =
    tTextTab.fSystemKey)
    where contains(tTextTab_Text.fText, N'"health"')
    )
    union all
    SELECT count(*) as cnt FROM tContent
    WHERE
    contains(tContent.BkTitle, N'"health"')
    )Derived

    --
    -oj
    RAC v2.2 & QALite!
    http://www.rac4sql.net


    "Curtis Wensley" <cwensley at sydneyplus.com> wrote in message
    news:phx.gbl... 
    table(s). 
    matter 
    both 
    tried 
    join 


    oj Guest

  2. #2

    Default Re: VERY Slow contains search in a subquery using 'or'

    I'm no full-text expert, but won't you need to
    subtract out the number of rows counted in both
    halves of the UNION to get the right answer?

    -- Steve Kass
    -- Drew University
    -- Ref: EB1C60AD-391E-43F0-80D5-70DFDDF38013


    oj wrote:
     
    >table(s).
    >

    >matter
    >

    >both
    >

    >tried
    >

    >join
    >

    >
    >
    >
    >[/ref]

    Steve Guest

  3. #3

    Default Re: VERY Slow contains search in a subquery using 'or'

    come on, math prof...should only need to subtract in the second half (i.e.
    the OR where it's already counted in the first half) part...hehehhe...see if
    you can find way to subtract it for us.

    --
    -oj


    "Steve Kass" <edu> wrote in message
    news:phx.gbl... 



    oj Guest

  4. #4

    Default Re: VERY Slow contains search in a subquery using 'or'

    I forgot to mention, that it is not only a count that i'm after.. I just
    did that because I didn't want to scroll through 600 hits.

    The results of this query are output directly to the end-user in a grid
    control.

    A union wouldn't work (unless I used distinct I guess, but that's ugly).. as
    it can possibly give more results than there really is... Also, unions
    won't work because there is the possibility of a very complex boolean search
    on separate fields (of the end-users choosing) which will not give the
    correct results..

    I really am beginning to think this is a BUG in SQL Server.. ):

    "oj" <com> wrote in message
    news:phx.gbl... [/ref]
    table [/ref]
    [tTextTab_Text] 
    > table(s). 
    > matter 
    > both 
    > tried 
    > join [/ref]
    table [/ref]
    over [/ref]
    THINK [/ref]
    in 
    >
    >[/ref]


    Curtis Guest

  5. #5

    Default Re: VERY Slow contains search in a subquery using 'or'

    if your goal is to return

    select *
    from tb
    where c1=123 or c2=456

    then

    select *
    from tb
    where c1=123
    union
    select *
    from tb
    where c2=456

    would give you the same #rows. "union" does a distinct by design.

    if you believe this is a bug, you could contact PSS and have them take a
    look. if it's truely a bug, they will refund you.

    --
    -oj
    RAC v2.2 & QALite!
    http://www.rac4sql.net


    "Curtis Wensley" <cwensley at sydneyplus.com> wrote in message
    news:#phx.gbl... 
    as 
    search [/ref]
    > table [/ref]
    > [tTextTab_Text] 
    > > table(s). 
    > > matter 
    > > both 
    > > tried 
    > > join [/ref][/ref]
    the [/ref]
    > table [/ref][/ref]
    records [/ref]
    > over [/ref]
    > THINK [/ref][/ref]
    get [/ref][/ref]
    fields 
    > >
    > >[/ref]
    >
    >[/ref]


    oj Guest

  6. #6

    Default Re: VERY Slow contains search in a subquery using 'or'

    Curtis,

    First, the query plans show SELECT DISTINCT in the subquery. Do
    you see a performance difference without it? IN (select ...) should
    be the same as IN (select distinct ...), but materializing a (select
    distinct)
    may be much work work.

    Otherwise, here's what I'm seeing in your diagram:

    [tContent] 1---m [tContent_TextTab] m---1 [tTextTab] 1---m [tTextTab_Text]

    A title (A) contains 1 or more chapters A_Ch1, A_Ch2, ... unique to
    that title. Each chapter consists of a text, which may be the text
    of other chapters for this or other titles. Each text contains 1 or more
    paragraphs, unique to that text:

    title A
    A_Ch1 = Txt1098
    Paragraph1098_1
    Paragraph1098_2
    A_Ch2 = Txt4437
    Paragraph4437_1
    Paragraph4437_2
    Paragraph4437_3
    A_Ch3 = Txt1098
    Paragraph1098_1
    Paragraph1098_2
    title B
    B_Ch1 = Txt1098
    Paragraph1098_1
    Paragraph1098_2
    B_Ch2 = Txt1037
    Paragraph1037_1


    You want the titles where the title contains 'health'
    or one of the title's chapters' texts' paragraphs contains
    'health'.

    Can we see the table structures with indexes? There must
    be some reason why the comparison against 'health' is being
    delayed so much. Also, what percentage of Paragraphs,
    corresponding to what percentage of texts, and what percentage
    of Chapters, contain 'health'?

    What are typical values of m? It would help to know, say, that
    a typical text contains 200 paragraphs, and more or less all of
    them or none of them contain 'health'.

    SK



    Curtis Wensley wrote:
     [/ref]
    >table
    >
    > [/ref]
    >[tTextTab_Text]
    >

    >>table(s).
    >>
    >> 
    >>matter
    >>
    >> 
    >>both
    >>
    >> 
    >>tried
    >>
    >> 
    >>join
    >>
    >> [/ref]
    >table
    >
    > [/ref]
    >over
    >
    > [/ref]
    >THINK
    >
    > [/ref]
    >in
    >

    >>
    >>[/ref]
    >
    >
    >
    >[/ref]

    Steve Guest

  7. #7

    Default Re: VERY Slow contains search in a subquery using 'or'

    > would give you the same #rows.

    Nope.

    That union would return the same number of rows as:

    select distinct *
    from tb
    where c1=123 or c2=456

    which sometimes will produce fewer rows then your first query would produce.

    BTW, had you used UNION ALL instead of UNION
    the query would sometimes produce more rows then your first query would
    produce.

    Bye,
    Delbert Glass

    "oj" <com> wrote in message
    news:phx.gbl... [/ref]
    just [/ref]
    ugly).. 
    > search 
    > > table 
    > > [tTextTab_Text] [/ref][/ref]
    complete [/ref][/ref]
    search [/ref][/ref]
    have [/ref][/ref]
    cannot [/ref]
    > the 
    > > table [/ref]
    > records 
    > > over 
    > > THINK [/ref]
    > get [/ref]
    > fields 
    > >
    > >[/ref]
    >
    >[/ref]


    Delbert Guest

  8. #8

    Default Re: VERY Slow contains search in a subquery using 'or'

    perhaps...

    create table delbert(c1 int, c2 int)

    insert delbert select 123,123
    union all select 456,456
    union all select 456,123
    union all select 123,456
    union all select 123,789
    union all select 789,456

    select *
    from delbert
    where c1=123
    union
    select *
    from delbert
    where c2=456

    select *
    from delbert
    where c1=123 or c2=456

    drop table delbert


    --
    -oj
    RAC v2.2 & QALite!
    http://www.rac4sql.net


    "Delbert Glass" <com> wrote in message
    news:phx.gbl... 
    >
    > Nope.
    >
    > That union would return the same number of rows as:
    >
    > select distinct *
    > from tb
    > where c1=123 or c2=456
    >
    > which sometimes will produce fewer rows then your first query would[/ref]
    produce. 



    oj Guest

  9. #9

    Default Re: VERY Slow contains search in a subquery using 'or'

    Give these a try.

    SELECT count(*)
    FROM tContent
    WHERE tContent.fSystemKey IN (
    SELECT tContent_TextTab.fTemplateKey
    FROM
    (
    tContent_TextTab
    INNER JOIN tTextTab ON tTextTab.fSystemKey = tContent_TextTab.fTextTab
    )
    INNER JOIN tTextTab_Text ON tTextTab_Text.fTemplateKey =
    tTextTab.fSystemKey
    WHERE CONTAINS(tTextTab_Text.fText, N'"health"')
    )
    OR CONTAINS(tContent.BkTitle, N'"health"')


    SELECT count(*)
    FROM tContent
    WHERE tContent.fSystemKey IN (
    SELECT DT.fTemplateKey
    FROM (
    SELECT tContent_TextTab.fTemplateKey
    FROM
    (
    tContent_TextTab
    INNER JOIN tTextTab ON tTextTab.fSystemKey = tContent_TextTab.fTextTab
    )
    INNER JOIN
    tTextTab_Text ON tTextTab_Text.fTemplateKey = tTextTab.fSystemKey
    WHERE CONTAINS(tTextTab_Text.fText, N'"health"')
    ) DT
    )
    OR CONTAINS(tContent.BkTitle, N'"health"')

    Bye,
    Delbert Glass

    "Curtis Wensley" <cwensley at sydneyplus.com> wrote in message
    news:phx.gbl... 
    table(s). 
    matter 
    both 
    tried 
    join 


    Delbert Guest

  10. #10

    Default Re: VERY Slow contains search in a subquery using 'or'

    Sometimes does not mean always.

    Insert the data a second time and have a second look at the row counts.

    Bye,
    Delbert Glass

    "oj" <com> wrote in message
    news:phx.gbl... 
    > >
    > > Nope.
    > >
    > > That union would return the same number of rows as:
    > >
    > > select distinct *
    > > from tb
    > > where c1=123 or c2=456
    > >
    > > which sometimes will produce fewer rows then your first query would[/ref]
    > produce. 
    >
    >
    >[/ref]


    Delbert Guest

  11. #11

    Default Re: VERY Slow contains search in a subquery using 'or'

    Hi Delbert,

    Thanks for your suggestions, but unfortunately they were about the same..
    though they did change around what SQL was doing a bit, but still had the
    extreemly huge table spool, and slow performance..

    Cheers,
    Curtis.

    "Delbert Glass" <com> wrote in message
    news:phx.gbl... 
    tContent_TextTab.fTextTab 
    tContent_TextTab.fTextTab [/ref]
    table [/ref]
    [tTextTab_Text] 
    > table(s). 
    > matter 
    > both 
    > tried 
    > join [/ref]
    table [/ref]
    over [/ref]
    THINK [/ref]
    in 
    >
    >[/ref]


    Curtis Guest

  12. #12

    Default Re: VERY Slow contains search in a subquery using 'or'

    I don't have fulltext installed where I am now
    but I'll take a look at the table and index definitions to
    see if anything explains the trouble and try to look at the
    fulltext business later.

    SK

    Curtis Wensley wrote:
     
    >[tTextTab_Text]
    >
    > [/ref]
    >as
    >
    > [/ref]
    >search
    >
    > [/ref]
    >the
    >
    > [/ref]
    >records
    >
    > [/ref]
    >get
    >
    > [/ref]
    >
    >
    >
    >[/ref]

    Steve Guest

  13. #13

    Default Re: VERY Slow contains search in a subquery using 'or'

    Curtis,

    A couple of things right away:

    I don't see any index on tContent_TextTab.fTextTab, which
    is a foreign key - I didn't look at the query plan, but generally
    if you're joining the FK column, indexes on both sides can help.

    I would also try changing

    CREATE INDEX [iContent_TextTab_IxP] ON
    [dbo].[tContent_TextTab]([fTemplateKey]) ON [PRIMARY]

    to

    CREATE INDEX [iContent_TextTab_IxP] ON
    [dbo].[tContent_TextTab]([fTemplateKey],[fTextTab]) ON [PRIMARY]

    and be sure to include [fTemplateKey] as a second column of the
    (missing?) index. This might avoid some bookmarks, though I
    haven't looked too closely. That middle table between the 1<--> m
    and m <--> 1 relations is where the mess is, and those joining keys
    ought to be in each others' indexes, I think.

    Also, since BkTitle is so small, the full-text on that column
    has to be trivial, and it seems like a UNION query with the
    separate conditions would be in order. If for some strange reason
    that doesn't work well, create a temp table, insert the results of
    one condition, then the other, then select distinct. Finding the distinct
    rows from the two queries should be no work at all.

    Finally, do you have to use WITH NOCHECK? I'm not sure
    if it applies here, but there are situations where when NOCHECK
    makes an index unusable for optimization. It's worth getting rid
    of that if you have it.

    SK


    Curtis Wensley wrote:
     
    >[tTextTab_Text]
    >
    > [/ref]
    >as
    >
    > [/ref]
    >search
    >
    > [/ref]
    >the
    >
    > [/ref]
    >records
    >
    > [/ref]
    >get
    >
    > [/ref]
    >
    >
    >
    >[/ref]

    Steve Guest

  14. #14

    Default Re: VERY Slow contains search in a subquery using 'or'

    Hmm, tried adding the index and modifying the existing one as you suggested
    (with both fields, in different order), but no change..

    I'm not explicitly using NOCHECK when the tables are actually created, that
    script was generated from Enterprise Manager.. NOCHECK must be the default?
    i'll try turning that off next...

    Using a union would be WAY too messy, as the user has control over the
    filter criteria (complete boolean expressions, with sub-expressions).. and
    is generated on the fly.. doing boolean expressions between different
    criteria would be very hard to achive using unions... (if it's even
    possible)..

    Perhaps a HINT would allow me to tell the optimizer to do it a specified
    way?

    Plus, from the execution plan, it's just storing what comes from the
    CONTAINS clause into the table spool.. nothing to do with my data
    structure yet.. ):

    Hmm.. any other thoughts?

    Curtis.

    "Steve Kass" <edu> wrote in message
    news:phx.gbl... [/ref]
    the [/ref]
    of [/ref]
    CONTAINSTABLE [/ref]
    to 
    > >[tTextTab_Text]
    > >
    > > [/ref][/ref]
    more [/ref][/ref]
    just [/ref][/ref]
    ugly).. 
    > >as
    > >
    > > [/ref][/ref]
    unions 
    > >search
    > >
    > > [/ref][/ref]
    search [/ref][/ref]
    have [/ref][/ref]
    cannot 
    > >the
    > >
    > > 
    > >records
    > >
    > > 
    > >get
    > >
    > > [/ref][/ref]
    fields 
    > >
    > >
    > >
    > >[/ref]
    >[/ref]


    Curtis Guest

  15. #15

    Default Re: VERY Slow contains search in a subquery using 'or'

    Hi Andrew,

    Thank you for the reply. I have tried such, and it works for this
    particular case. However, I don't think a union will solve my problem, as
    the user has control over the filter criteria (complete boolean expressions,
    with sub-expressions).. and is generated on the fly. Doing boolean
    expressions between different criteria would be very hard to achive using
    unions... (if it's even possible?)..

    I tried executing the query with about 1000 records in each of the tables...
    the execution plan was a little different, using sorting and a merge join,
    instead of a nested join with a huge table spool...

    Perhaps there's a way for me to force it to use a specific way of executing
    the query using a hint or something so that it will be faster?

    Curtis.

    "Andrew Cencini [MS]" <microsoft.com> wrote in message
    news:es6qC%phx.gbl... 
    unioned 
    rights. [/ref]
    table [/ref]
    [tTextTab_Text] 
    > table(s). 
    > matter 
    > both 
    > tried 
    > join [/ref]
    table [/ref]
    over [/ref]
    THINK [/ref]
    in 
    >
    >[/ref]


    Curtis Guest

  16. #16

    Default Re: VERY Slow contains search in a subquery using 'or'

    instead of using explicit hints you can effectively "order" the execution
    plan by splitting the query into #temp tables. i've had success with this
    strategy in eliminating large "eager spools" (which with ease might explode
    to 100+ mil rows!)

    HTH,
    </wqw>

    "Curtis Wensley" <cwensley at sydneyplus.com> wrote in message
    news:phx.gbl... 
    expressions, 
    tables... 
    executing 
    > unioned 
    > rights. [/ref]
    > table [/ref]
    > [tTextTab_Text] 
    > > table(s). 
    > > matter 
    > > both 
    > > tried 
    > > join [/ref][/ref]
    the [/ref]
    > table [/ref][/ref]
    records [/ref]
    > over [/ref]
    > THINK [/ref][/ref]
    get [/ref][/ref]
    fields 
    > >
    > >[/ref]
    >
    >[/ref]


    Vlad Guest

  17. #17

    Default Re: VERY Slow contains search in a subquery using 'or'

    You do have the latest service packs, right?

    Can you put the title in as a "paragraph" so everything searched against
    is in the same place?

    SK

    Curtis Wensley wrote:
     [/ref]
    >the
    >
    > [/ref]
    >of
    >
    > [/ref]
    >CONTAINSTABLE
    >
    > [/ref]
    >to
    >
    > [/ref]
    >more
    >
    > [/ref]
    >just
    >
    > [/ref]
    >ugly)..
    >
    > [/ref]
    >unions
    >
    > [/ref]
    >search
    >
    > [/ref]
    >have
    >
    > [/ref]
    >cannot
    >
    > [/ref]
    >fields
    >
    > [/ref]
    >
    >
    >
    >[/ref]

    Steve Guest

  18. #18

    Default Re: VERY Slow contains search in a subquery using 'or'

    For AND between tables or columns use JOIN on the [KEY] column returned from
    CONTAINSTABLE. That should hopefully be some useful building blocks to
    start with -- you could project the key column from the table after joining
    the FT results to it and applying additional filters on those results, then
    UNION or JOIN those results and then join back to tContent.

    I guess the real challenge is putting together the logic to generate these
    statements but this to me seems to be a good path to look down.
    Thanks,
    --andrew

    Andrew Cencini
    Program Manger
    Microsoft Corp. - SQL Server Engine

    --
    This posting is provided "AS IS" with no warranties, and confers no rights.


    "Curtis Wensley" <cwensley at sydneyplus.com> wrote in message
    news:phx.gbl... 
    expressions, 
    tables... 
    executing 
    > unioned 
    > rights. [/ref]
    > table [/ref]
    > [tTextTab_Text] 
    > > table(s). 
    > > matter 
    > > both 
    > > tried 
    > > join [/ref][/ref]
    the [/ref]
    > table [/ref][/ref]
    records [/ref]
    > over [/ref]
    > THINK [/ref][/ref]
    get [/ref][/ref]
    fields 
    > >
    > >[/ref]
    >
    >[/ref]


    Andrew Guest

  19. #19

    Default Re: VERY Slow contains search in a subquery using 'or'

    Hi Andrew,

    I'm really not that eager to use union, for many reasons... First the
    complexity that it would take to build the filter criteria based on what the
    user selects, and second, because we already use a union for searching
    across multiple tables.. and this would mean there would be unions within
    unions.. (eek).. Also, if there were multiple 'or' conditions combined
    with 'and', I don't see how this would work if I join to the main query..

    The 'original' version of the statement actually did use CONTAINSTABLE,
    joining on the [Key]... this put 300+ MILLION records in the table spool,
    rather than 180+ million with just using CONTAINS()... I just used
    contains() here for simplicity.. (FYI, the full text index only has 260442
    unique keys, and an item count of 84674).. I don't know where it's getting
    the 300 million records from..

    I might not be understanding exactly what you are suggesting here though..
    this is my 'containstable' version:

    SELECT count(*) FROM
    tContent
    LEFT OUTER JOIN CONTAINSTABLE(tContent, BkTitle, N'"health"') FTT0
    ON (tContent.fSystemKey = FTT0.[Key])
    WHERE
    tContent.fSystemKey IN
    (
    SELECT tContent_TextTab.fTemplateKey FROM ((
    tContent_TextTab
    INNER JOIN tTextTab
    ON tTextTab.fSystemKey = tContent_TextTab.fTextTab)
    INNER JOIN tTextTab_Text
    ON tTextTab_Text.fTemplateKey = tTextTab.fSystemKey)
    INNER JOIN CONTAINSTABLE(tTextTab_Text, fText, N'"health"') FTT1
    ON (tTextTab_Text.fKey = FTT1.[Key])
    )
    OR tContent.fSystemKey = FTT0.[Key]

    If say, I the user also wanted to OR with the content on
    tTextTab.fChapterTitl, it might look something like:

    SELECT count(*) FROM
    tContent
    LEFT OUTER JOIN CONTAINSTABLE(tContent, BkTitle, N'"health"') FTT0
    ON (tContent.fSystemKey = FTT0.[Key])
    WHERE
    tContent.fSystemKey IN
    (
    SELECT tContent_TextTab.fTemplateKey FROM ((
    tContent_TextTab
    INNER JOIN tTextTab
    ON tTextTab.fSystemKey = tContent_TextTab.fTextTab)
    INNER JOIN tTextTab_Text
    ON tTextTab_Text.fTemplateKey = tTextTab.fSystemKey)
    INNER JOIN CONTAINSTABLE(tTextTab_Text, fText, N'"health"') FTT1
    ON (tTextTab_Text.fKey = FTT1.[Key])
    )
    OR
    tContent.fSystemKey IN
    (
    SELECT tContent_TextTab.fTemplateKey FROM ((
    tContent_TextTab
    INNER JOIN tTextTab
    ON tTextTab.fSystemKey = tContent_TextTab.fTextTab)
    INNER JOIN CONTAINSTABLE(tTextTab, fChapterTitl, N'"health"') FTT2
    ON (tTextTab.fSystemKey = FTT2.[Key])
    )
    OR tContent.fSystemKey = FTT0.[Key]

    It would be much easier if the queries being performed were static, but they
    are not.. I will look into using a union, but I don't think it would work
    in this situation (boolean expression):
    ((Title contains 'health' or Text contains 'health') and ChapterTitle
    contains 'health')

    Thanks!
    Curtis.


    "Andrew Cencini [MS]" <microsoft.com> wrote in message
    news:phx.gbl... 
    from 
    joining 
    then 
    rights. [/ref]
    as 
    > expressions, [/ref]
    using 
    > tables... [/ref]
    join, 
    > executing 
    > > unioned 
    > > rights. 
    > > table 
    > > [tTextTab_Text] [/ref][/ref]
    complete [/ref][/ref]
    search [/ref][/ref]
    have [/ref][/ref]
    cannot [/ref]
    > the 
    > > table [/ref]
    > records 
    > > over 
    > > THINK [/ref]
    > get [/ref]
    > fields 
    > >
    > >[/ref]
    >
    >[/ref]


    Curtis Guest

Similar Threads

  1. Best way to use subquery?
    By greg.scharlemann@gmail.com in forum MySQL
    Replies: 4
    Last Post: November 27th, 01:25 AM
  2. Subquery
    By Ruszaj in forum Coldfusion Database Access
    Replies: 2
    Last Post: November 2nd, 04:50 PM
  3. Help with subquery
    By Nalini in forum Dreamweaver AppDev
    Replies: 3
    Last Post: August 15th, 12:07 PM
  4. How use Subquery better ???
    By lubiel in forum Microsoft SQL / MS SQL Server
    Replies: 2
    Last Post: July 23rd, 07:47 PM
  5. subquery with more than 1 fields
    By Calvin in forum Microsoft SQL / MS SQL Server
    Replies: 2
    Last Post: June 30th, 05:53 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •