Is there a less resource intensive / faster way of performing this query (which is partly based upon: This StackOverflow question ). Currently it takes 0.008 seconds searching through only a dozen or so rows per table.
SELECT DISTINCT *
FROM (
(
SELECT DISTINCT ta.auto_id, li.address, li.title, GROUP_CONCAT( ta.tag ) , li.description, li.keyword, li.rating, li.timestamp
FROM tags AS ta
INNER JOIN links AS li ON ta.auto_id = li.auto_id
WHERE ta.user_id =1
AND (
ta.tag LIKE '%query%'
)
OR (
li.keyword LIKE '%query%'
)
GROUP BY li.auto_id
)
UNION DISTINCT (
SELECT DISTINCT auto_id, address, title, '', description, keyword, rating, `timestamp`
FROM links
WHERE user_id =1
AND (
keyword LIKE '%query%'
)
)
) AS total
GROUP BY total.auto_id
Thank you very much,
Ice
I would hope that the query optimizer would do this for you, but you might want to try doing the select on tags by user_id before doing the join just in case in the first subquery. This would reduce the number of rows that you would have to join across presumably. You also probably want to have indexes on auto_id AND user_ID.
SELECT DISTINCT *
FROM (
(SELECT ta.auto_id, li.address, li.title, GROUP_CONCAT( ta.tag ),
li.description, li.keyword, li.rating, li.timestamp
FROM (SELECT auto_id, tag FROM tags WHERE user_id = 1) AS ta
INNER JOIN links AS li ON ta.auto_id = li.auto_id
WHERE (ta.tag LIKE '%query%') OR (li.keyword LIKE '%query%')
GROUP BY li.auto_id
)
UNION (
SELECT auto_id, address, title, '', description, keyword, rating, `timestamp`
FROM links
WHERE user_id = 1 AND (keyword LIKE '%query%')
)
) AS total
GROUP BY total.auto_id
If you can use the MyISAM table format, try to use a full-text index and search on ta.tag and li.keyword.
Testing this on tables with dozens of rows won't necessarily tell you if there is a performance problem. A DBMS may use different strategies depending on the size of tables.
Try this on larget datasets to get a better assessment of whether there's a problem and just how serious it is.
It is difficult to be sure without the table definitions, but you might be able to rephrase the query as a simpler left join from LINKS to TAGS:
select li.auto_id,
address,
title,
group_concat(ta.tag),
description,
keyword,
rating,
timestamp
from links li
left join tags ta ON ta.auto_id = li.auto_id
where li.user_id = 1 and ( keyword like '%query%' or ta.tag like '%query%' )
group by li.auto_id;
The logic might need beefing up to cope with nulls in keyword or ta.tag - depending on the table definition.
The % wildcards are probably going to stop your query from being able to use the indexes, particuarly the leading ones - searching for 'cat%' can still use indexes, but '%cat%' can't. Unless your data set is small, that's probably fatal.
I'd also check whether the OR logic is causing you trouble - I'm not sure whether the optimizer will be able to separately optimize the keyword and tag criteria. If it can't, it'll give up and brute-force it.
To re-iterate some of the other comments:
test with a bigger data set
try the components of this query first (there's about three separate queries in there) before trying to bolt them all together.
Related
I would like to find a way to improve a query but it seems i've done it all. Let me give you some details.
Below is my query :
SELECT
`u`.`id` AS `id`,
`p`.`lastname` AS `lastname`,
`p`.`firstname` AS `firstname`,
COALESCE(`r`.`value`, 0) AS `rvalue`,
SUM(`rat`.`category` = 'A') AS `count_a`,
SUM(`rat`.`category` = 'B') AS `count_b`,
SUM(`rat`.`category` = 'C') AS `count_c`
FROM
`user` `u`
JOIN `user_customer` `uc` ON (`u`.`id` = `uc`.`user_id`)
JOIN `profile` `p` ON (`p`.`id` = `u`.`profile_id`)
JOIN `ad` FORCE INDEX (fk_ad_customer_idx) ON (`uc`.`customer_id` = `ad`.`customer_id`)
JOIN `ac` ON (`ac`.`id` = `ad`.`ac_id`)
JOIN `a` ON (`a`.`id` = `ac`.`a_id`)
JOIN `rat` ON (`rat`.`code` = `a`.`rat_code`)
LEFT JOIN `r` ON (`r`.`id` = `u`.`r_id`)
GROUP BY `u`.`id`
;
Note : Some table and column names are voluntarily hidden.
Now let me give you some volumetric data :
user => 6534 rows
user_customer => 12 923 rows
profile => 6511 rows
ad => 320 868 rows
ac => 4505 rows
a => 536 rows
rat => 6 rows
r => 3400 rows
And finally, my execution plan :
My query does currently run in around 1.3 to 1.7 seconds which is slow enough to annoy users of my application of course ... Also fyi result set is composed of 165 rows.
Is there a way I can improve this ?
Thanks.
EDIT 1 (answer to Rick James below) :
What are the speed and EXPLAIN when you don't use FORCE INDEX?
Surprisingly it gets faster when i don't use FORCE INDEX. To be honest, i don't really remember why i've done that change. I've probably found better results in terms of performance with it during one of my various tries and didn't remove it since.
When i don't use FORCE INDEX, it uses an other index ad_customer_ac_id_blocked_idx(customer_id, ac_id, blocked) and times are around 1.1 sec.
I don't really get it because fk_ad_customer_idx(customer_id) is the same when we talk about index on customer_id.
Get rid of FORCE INDEX. Even if it helped yesterday; it may hurt tomorrow.
Some of these indexes may be beneficial. (It is hard to predict; so simply add them all.)
a: (rat_code, id)
rat: (code, category)
ac: (a_id, id)
ad: (ac_id, customer_id)
ad: (customer_id, ac_id)
uc: (customer_id, user_id)
uc: (user_id, customer_id)
u: (profile_id, r_id, id)
(This assumes that id is the PRIMARY KEY of each table. Note that none have id first.) Most of the above are "covering".
Another approach that sometimes helps: Gather the SUMs before joining to any unnecessary table. But is seems that p is the only table not involved in getting from u (the target of GROUP BY) to r and rat (used in aggregates). It would look something like:
SELECT ..., firstname, lastname
FROM ( everything as above except for `p` ) AS most
JOIN `profile` `p` ON (`p`.`id` = most.`profile_id`)
GROUP BY most.id
This avoids hauling around firstname and lastname while doing most of the joins and the GROUP BY.
When doing JOINs and GROUP BY, be sure to sanity check the aggregates. Your COUNTs and SUMs may be larger than they should be.
First, you don't need to tick.everyTableAndColumn in your queries, nor result columns, aliases, etc. The tick marks are used primarily when you are in conflict with a reserved work so the parser knows you are referring to a specific column... like having a table with a COLUMN named "JOIN", but JOIN is part of SQL command... see the confusion it would cause. Helps clean readability too.
Next, and this is just personal preference and can help you and others following behind you on data and their relationships. I show the join as indented from where it is coming from. As you can see below, I see the chain on how do I get from the User (u alias) to the rat alias table... You get there only by going 5 levels deep, and I put the first table on the left-side of the join (coming from table) then = the table joining TO right-side of join.
Now, that I can see the relationships, I would suggest the following. Make COVERING indexes on your tables that have the criteria, and id/value where appropriate. This way the query gets as best it needs, the data from the index page vs having to go to the raw data. So here are suggestions for indexes.
table index
user_customer ( user_id, customer_id ) -- dont know what your fk_ad_customer_idx parts are)
ad ( customer_id, ac_id )
ac ( id, a_id )
a (id, rat_code )
rat ( code, category )
Reformatted query for readability and seeing relationships between the tables
SELECT
u.id,
p.lastname,
p.firstname,
COALESCE(r.value, 0) AS rvalue,
SUM(rat.category = 'A') AS count_a,
SUM(rat.category = 'B') AS count_b,
SUM(rat.category = 'C') AS count_c
FROM
user u
JOIN user_customer uc
ON u.id = uc.user_id
JOIN ad FORCE INDEX (fk_ad_customer_idx)
ON uc.customer_id = ad.customer_id
JOIN ac
ON ad.ac_id = ac.id
JOIN a
ON ac.a_id = a.id
JOIN rat
ON a.rat_code = rat.code
JOIN profile p
ON u.profile_id = p.id
LEFT JOIN r
ON u.r_id = r.id
GROUP BY
u.id
I have couple tables joined in MySQL - one has many others.
And try to select items from one, ordered by min values from another table.
Without grouping in seems to be like this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
order by product_kits.new_price ASC
Result:
But when I add group by, I get this:
Code:
select `catalog_products`.id
, `catalog_products`.alias
, `tmpKits`.`minPrice`
from `catalog_products`
left join `product_kits` on `product_kits`.`product_id` = `catalog_products`.`id`
left join (
SELECT MIN(new_price) AS minPrice, id FROM product_kits GROUP BY id
) AS tmpKits on `tmpKits`.`id` = `product_kits`.`id`
where `category_id` in ('62')
group by `catalog_products`.`id`
order by product_kits.new_price ASC
Result:
And this is incorrect sorting!
Somehow when I group this results, I get id 280 before 281!
But I need to get:
281|1600.00
280|2340.00
So, grouping breaks existing ordering!
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
For one, when you apply the GROUP BY to only one column, there is no guarantee that the values in the other columns will be consistently correct. Unfortunately, MySQL allows this type of SELECT/GROUPing to happen other products don't. Two, the syntax of using an ORDER BY in a subquery while allowed in MySQL is not allowed in other database products including SQL Server. You should use a solution that will return the proper result each time it is executed.
So the query will be:
select CP.`id`, CP.`alias`, TK.`minPrice`
from catalog_products CP
left join `product_kits` PK on PK.`product_id` = CP.`id`
left join (
SELECT MIN(`new_price`) AS "minPrice", `id` FROM product_kits GROUP BY `id`
) AS TK on TK.`id` = PK.`id`
where CP.`category_id` IN ('62')
order by PK.`new_price` ASC
group by CP.`id`
The thing is that group by does not recognize order by in MySQL.
Actually, what I was doing is really bad practice.
In this case you should use distinct and by catalog_products.*
In my opinion, group by is really useful when you need group result of agregated functions.
Otherwise you should not use it to get unique values.
I have a mysql query and it works fine when i use where clause, but when i donot use
where clause it gone and never gives the output and finally timeout.
Actually i have used Explain command to check the performance of the query and in both cases the Explain gives the same number of rows used in joining.
I have attached the image of output got with Explain command.
Below is the query.
I couldn't figure whats the problem here.
Any help is highly appreciated.
Thanks.
SELECT
MCI.CLIENT_ID AS CLIENT_ID, MCI.NAME AS CLIENT_NAME, MCI.PRIMARY_CONTACT AS CLIENT_PRIMARY_CONTACT,
MCI.ADDED_BY AS SP_ID, CONCAT(MUD_SP.FIRST_NAME, ' ', MUD_SP.LAST_NAME) AS SP_NAME,
MCI.FK_PROSPECT_ID AS PROSPECT_ID, MCI.DATE_ADDED AS ADDED_ON,
(SELECT GROUP_CONCAT(LT.TAG_TEXT SEPARATOR ', ')
FROM LK_TAG LT
INNER JOIN M_OBJECT_TAG_MAPPING MOTM
ON LT.PK_ID = MOTM.FK_TAG_ID
WHERE MOTM.FK_OBJECT_ID = MCI.FK_PROSPECT_ID
AND MOTM.OBJECT_TYPE = 1
AND MOTM.IS_ACTIVE = 1
) AS TAGS,
IFNULL(SUM(GET_DIGITS(MMR.RCP_AMOUNT)), 0) AS REVENUE_SO_FAR,
IFNULL(SUM(GET_DIGITS(MMR.RCP_RUPEES)), 0) AS REVENUE_INR,
COUNT(DISTINCT PMI_MONTHLY.PROJECT_ID) AS MONTHLY,
COUNT(DISTINCT PMI_FIXED.PROJECT_ID) AS FIXED,
COUNT(DISTINCT PMI_HOURLY.PROJECT_ID) AS HOURLY,
COUNT(DISTINCT PMI_ANNUAL.PROJECT_ID) AS ANNUAL,
COUNT(DISTINCT PMI_CURRENTLY_RUNNING.PROJECT_ID) AS CURRENTLY_RUNNING_PROJECTS,
COUNT(DISTINCT PMI_YET_TO_START.PROJECT_ID) AS YET_TO_START_PROJECTS,
COUNT(DISTINCT PMI_TECH_SALES_CLOSED.PROJECT_ID) AS TECH_SALES_CLOSED_PROJECTS
FROM
M_CLIENT_INFO MCI
INNER JOIN M_USER_DETAILS MUD_SP
ON MCI.ADDED_BY = MUD_SP.PK_ID
LEFT OUTER JOIN M_MONTH_RECEIPT MMR
ON MMR.CLIENT_ID = MCI.CLIENT_ID
LEFT OUTER JOIN M_PROJECT_INFO PMI_FIXED
ON PMI_FIXED.CLIENT_ID = MCI.CLIENT_ID AND PMI_FIXED.PROJECT_TYPE = 1
LEFT OUTER JOIN M_PROJECT_INFO PMI_MONTHLY
ON PMI_MONTHLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_MONTHLY.PROJECT_TYPE = 2
LEFT OUTER JOIN M_PROJECT_INFO PMI_HOURLY
ON PMI_HOURLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_HOURLY.PROJECT_TYPE = 3
LEFT OUTER JOIN M_PROJECT_INFO PMI_ANNUAL
ON PMI_ANNUAL.CLIENT_ID = MCI.CLIENT_ID AND PMI_ANNUAL.PROJECT_TYPE = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_CURRENTLY_RUNNING
ON PMI_CURRENTLY_RUNNING.CLIENT_ID = MCI.CLIENT_ID AND PMI_CURRENTLY_RUNNING.STATUS = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_YET_TO_START
ON PMI_YET_TO_START.CLIENT_ID = MCI.CLIENT_ID AND PMI_YET_TO_START.STATUS < 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_TECH_SALES_CLOSED
ON PMI_TECH_SALES_CLOSED.CLIENT_ID = MCI.CLIENT_ID AND PMI_TECH_SALES_CLOSED.STATUS > 4
WHERE YEAR(MCI.DATE_ADDED) = '2012'
GROUP BY MCI.CLIENT_ID ORDER BY CLIENT_NAME ASC
Yes, as many people have said, the key is that when you have the where clause, mysql engine filters the table M_CLIENT_INFO --probably drammatically--.
A similar result as removing the where clause is to to add this where clause:
where 1 = 1
You will see that the performance is degraded also because mysql will try to get all the data.
Remove the where clause and all columns from select and add a count to see how many records you get. If it is reasonable, say up to 10k, then do the following,
put back the select columns related to M_CLIENT_INFO
do not include the nested one "TAGS"
remove all your joins
run your query without where clause and gradually include the joins
this way you'll find out when the timeout is caused.
I would try the following. First, MySQL has a keyword "STRAIGHT_JOIN" which tells the optimizer to do the query in the table order you've specified. Since all you left-joins are child-related (like a lookup table), you don't want MySQL to try and interpret one of those as a primary basis of the query.
SELECT STRAIGHT_JOIN ... rest of query.
Next, your M_PROJECT_INFO table, I dont know how many columns of data are out there, but you appear to be concentrating on just a few columns on your DISTINCT aggregates. I would make sure you have a covering index on these elements to help the query via an index on
( Client_ID, Project_Type, Status, Project_ID )
This way the engine can apply the criteria and get the distinct all out of the index instead of having to go back to the raw data pages for the query.
Third, your M_CLIENT_INFO table. Ensure that has an index on both your criteria, group by AND your Order By, and change your order by from the aliased "CLIENT_NAME" to the actual column of the SQL table so it matches the index
( Date_Added, Client_ID, Name )
I have "name" in ticks as it is also a reserved word and helps clarify the column, not the keyword.
Next, the WHERE clause. Whenever you apply a function to an indexed column name, it doesn't work the greatest, especially on date/time fields... You might want to change your where clause to
WHERE MCI.Date_Added between '2012-01-01' and '2012-12-31 23:59:59'
so the BETWEEN range is showing the entire year and the index can better be utilized.
Finally, if the above do not help, I would consider splitting your query some. The GROUP_CONCACT inline select for the TAGS might be a bit of a killer for you. You might want to have all the distinct elements first for the grouping per client, THEN get those details.... Something like
select
PQ.*,
group_concat(...) tags
from
( the entire primary part of the query ) as PQ
Left join yourGroupConcatTableBasis on key columns
EDIT:
Sorry about unreadable query, I was under deadline. I managed to solve problem by breaking this query into two smaller ones, and doing some business logic in Java. Still want to know why this query can random times return two different results.
So, it randomly returns once all expected results, other time just half. I noticed that when I write it join per join, and execute after each join, in the end it returns all expected results. So am wandering if there's some kind of MySql memory or other limitation that it doesn't take whole tables in joins. Also read on undeterministic queries but not sure what to tell.
Please help, ask if needs clarification, and thank you in advance.
RESET QUERY CACHE;
SET SQL_BIG_SELECTS=1;
set #displayvideoaction_id = 2302;
set #ticSessionId = 3851;
select richtext.id,richtextcross.name,richtextcross.updates_demo_field,richtext.content from
(
select listitemcross.id,name,updates_demo_field,listitem.text_id from
(
select id,name, updates_demo_field, items_id from
(
SELECT id, name, answertype_id, updates_demo_field,
#student:=CASE WHEN #class <> updates_demo_field THEN 0 ELSE #student+1 END AS rn,
#class:=updates_demo_field AS clset FROM
(SELECT #student:= -1) s,
(SELECT #class:= '-1') c,
(
select id, name, answertype_id, updates_demo_field from
(
select manytomany.questions_id from
(
select questiongroup_id from
(
select questiongroup_id from `ticnotes`.`scriptaction` where ticsession_id=#ticSessionId and questiongroup_id is not null
) scriptaction
inner join
(
select * from `ticnotes`.`questiongroup`
) questiongroup on scriptaction.questiongroup_id=questiongroup.id
) scriptgroup
inner join
(
select * from `ticnotes`.`questiongroup_question`
) manytomany on scriptgroup.questiongroup_id=manytomany.questiongroup_id
) questionrelation
inner join
(
select * from `ticnotes`.`question`
) questiontable on questionrelation.questions_id=questiontable.id
where updates_demo_field = 'DEMO1' or updates_demo_field = 'DEMO2'
order by updates_demo_field, id desc
) t
having rn=0
) firstrowofgroup
inner join
(
select * from `ticnotes`.`multipleoptionstype_listitem`
) selectlistanswers on firstrowofgroup.answertype_id=selectlistanswers.multipleoptionstype_id
) listitemcross
inner join
(
select * from `ticnotes`.`listitem`
) listitem on listitemcross.items_id=listitem.id
) richtextcross
inner join
(
select * from `ticnotes`.`richtext`
) richtext on richtextcross.text_id=richtext.id;
My first impression is - don't use short cuts to describe your tables. I am lost at which td3 is where ,then td6, tdx3... I guess you might be lost as well.
If you name your aliases more sensibly there will be less chance to get something wrong and mix 6 with 8 or whatever.
Just a sugestion :)
There is no limitation on mySQL so my bet would be on human error - somewhere there join logic fails.
I currently have two tables, one for pages (objects) and one for images (media).
I want to get a list of all the pages in the system, and join a single image record onto each page record. It's very close to working now, but it is returning all of the records, when I just want to get one back.
I've tried a few sub queries + Distinct, but not quite got it working yet.
Can anyone point me in the right direction. I've tried group by, but it doesn't work after doing the sort and therefore brings back the wrong image.
SELECT title,
url,
media_file,
media_order
FROM ndxz_objects
LEFT JOIN ndxz_media ON ndxz_objects.id = ndxz_media.media_ref_id
WHERE `status` = 1
AND `section_id` = 2
GROUP BY title, url, media_file, media_order
ORDER BY ndxz_objects.ord, ndxz_media.media_order
If you have an AUTO_INCREMENT inside ndxz_media called id (for example), you can get the MAX(id) or MIN(id) in a subquery, and join against it.
I think this works because MySQL allows columns not specified in the GROUP BY to be included along with the aggregate. Here, that's media_file and media_order in the subquery.
SELECT title,
url,
media_file,
media_order
FROM ndxz_objects
JOIN (
SELECT media_ref_id, MIN(media_order), media_file
FROM ndxz_media
GROUP BY media_ref_id
) maxmedia ON ndxz_objects.id = maxmedia.media_ref_id
WHERE etc. etc. etc.
Assuming media_order as a high/low value instead of using the theoretical id, the inner subquery would look like the following. Everything else outside the parens doesn't change.
SELECT
media_ref_id,
MIN(media_order) AS media_order,
media_file
FROM ndxz_media
GROUP BY media_ref_id, media_file
Run the subquery alone first. It should retrieve singular records. Then when placed into the parens, the join should work properly.
I didn't find that query worked, this is what I used in the end.
SELECT title, url, media_file
FROM ".PX."objects
LEFT JOIN
(
SELECT m1.media_id, m1.media_ref_id, m1.media_file, m1.media_order
FROM ".PX."media AS m1
LEFT JOIN ".PX."media AS m2 ON m1.media_ref_id = m2.media_ref_id AND m1.media_order > m2.media_order
WHERE m2.media_ref_id IS NULL
)
maxmedia ON ndxz_objects.id = maxmedia.media_ref_id
WHERE `status` = 1
AND `section_id` = 2
ORDER BY `ord`
I found this from the within group aggrigates section here which seems like a good group of sql examples