Making query efficient - mysql

I have come across a query which, while working, is hard to understand(make changes) and in my opinioin is un-optimized.
SELECT cp.`order` AS `order`, cp.parent_id, cp.id AS category_id, cp.stub, cp.name as category_name, dc.deals_in_cat, d.*
FROM category_parent cp,
(
SELECT id, title, subtitle, image, image_m, discount, itemid, price, new_price, catalog_id, property_id, seller_id, category FROM deals
WHERE deals.category = 1
AND itemid NOT IN (156785431)
ORDER BY e_order LIMIT 8
) d,
(
SELECT a.`id` AS parent_id, COUNT( DISTINCT c.`itemid` ) AS deals_in_cat
FROM `category_parent` AS a
LEFT JOIN `navigation_filters_weightage` AS d ON a.`id` = d.`cat_id`,
`deals_parent_cat` AS b,
`deals` AS c
WHERE a.`parent_id` = b.`id`
AND c.`category` = a.`id` GROUP BY a.id ORDER BY b.`order` ASC , a.`order` ASC
) AS dc
WHERE cp.id = d.category
AND cp.active = '1'
AND dc.parent_id = cp.id;
Can you please suggest ways on making it more simpler.
Thanks

As noted in comments, indexes are probably a big factor for your query.
I would start by confirming you have at least the following indexes available
table index
deals (category, e_order, itemid )
category_parent (active, id )
Typically, I would have the itemID before the order by since it is part of the WHERE clause, but since you are getting all EXCEPT one ID, I think the order-by clause column would help out more.
One additional note... Your "dc" query for getting counts is doing the counts for ALL entries, but your outer query is only considering "active=1". I would add this qualifier in your "dc" query as well via
WHERE a.Active='1' AND -- rest of your criteria
Finally, being a website, doing counts on the fly repeatedly is always going to be a big performance hit. As suggested in other posts and again here, you may be better off by adding a column to your category_parents table for "Deals_In_Cat" and have it updated via a trigger whenever any deals are added or removed. This way, you get the count done ONCE when added/deleted, but all future references no longer requires the count being applied. This will probably be the best thing you can apply for performance.

Related

MySQL Spring complicated query - ways to order and query efficiency

I run this complicated query on Spring JPA Repository.
My goal is to get all info from the site table, ordering it by events severity on each site.
This is my query:
SELECT alls.* FROM sites AS alls JOIN
(
SELECT distinct ets.id FROM
(
SELECT s.id, et.`type`, et.severity_level, COUNT(et.`type`) FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
ORDER BY et.severity_level, COUNT(et.`type`) DESC
) AS ets
) as etsd ON alls.id = etsd.id
The second select (the one with "distinct") returns site_ids ordered correctly by severity.
Note that there are different event_types + severity in each site, and I use pagination on the answer, so I need the distinct.
The problem is - the main select doesn't keep this order.
Is there any way to keep the order in one complicated query?
Another related question - one of my ideas was making two queries:
The "select distinct" query that will return me the order --> saved in a list "order list"
The main "sites" query (that becomes very simple) with "where id in {"order list"}
Order the second query in code by "order list".
I use the query every 10 seconds, so it is very sensitive on performance.
What seems to be faster in this case - original complicated query or those 2?
Any insight will be appreciated.
Tnx a lot.
A quirk of SQL's declarative set-oriented syntax for us procedural programmers: ORDER by clauses in subqueries are not carried through to the outer query, except sometimes by accident. If you want ordering at any query level, you must specify it at that level or you will get unpredictable results. The query optimizers are usually smart enough to avoid wasting sort operations.
Your requirement: give at most one sites row for each sites.id value, ordered by the worst event. Worst: lowest event severity, and if there are more than one event with lowest severity, the largest count.
Use this sort of thing to get the "worst" for each id, in place of DISTINCT.
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
This gives at most one row per sites.id value. Then your outer query is
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
/* your inner query */
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
Putting it all together:
SELECT alls.*
FROM sites alls
JOIN (
SELECT id, MIN(severity_level) severity_level, MAX(num) num
FROM (
SELECT s.id, et.severity_level, COUNT(et.`type`) num
FROM sites AS s
JOIN users_sites AS us ON (s.id=us.site_id)
JOIN users AS u ON (us.user_id=u.user_id)
JOIN areas AS a ON (s.id=a.site_id)
JOIN panels AS p ON (a.id=p.area_id)
JOIN events AS e ON (p.id=e.panel_id)
JOIN event_types AS et ON (e.event_type_id=et.id)
WHERE u.user_id="98765432-123a-1a23-123b-11a1111b2cd3"
GROUP BY s.id , et.`type`, et.severity_level
) ets
GROUP BY id
) worstevents ON alls.id = worstevents.id
ORDER BY worstevents.severity_level, worstevents.num DESC, alls.id
An index on users.user_id will help performance for these single-user queries.
If you still have performance trouble, please read this and ask another question.

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

mysql query is taking too much time to execute

Hello everyone I am working on phpmyadmin database. Whenever I try to execute query it takes too much time more than 10 mins to show results. Is there any way to speed it up. please response.
The query is
SELECT ib.*, b.brand_name, m.model_name,
s.id as sale_id, br.branch_code,br.branch_name,r.rentry_date,r.id as rid
from in_book ib
left join brand b on ib.brand_id=b.id
left join model m on ib.vehicle_id=m.id
left join re_entry r on r.in_book_id=ib.id
left join sale s on ib.id=s.in_book_id
left join branch br on ib.branch_id=br.id
where ib.id !=''
and ib.branch_id='65'
group by ib.id
order by r.id ASC,
count(r.in_book_id) DESC ,
ib.purchaes_date ASC,
ib.id ASC
there are almost 7 tables
make sure you got an index on every key you use to join the tables.
from http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html:
The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
.. this of course also applies to the JOIN conditions.
You don't list any such indexes, however, I would start with the following suggested indexes
table index
in_book ( branch_id, id, brand_id, vehicle_id )
brand ( id, brand_name )
model ( id, model_name )
re_entry ( in_book_id, id, reentry_date )
sale ( in_book_id, id )
branch ( id )
Also, with MySQL, you can use a special keyword "STRAIGHT_JOIN" which tells the engine to query in the order you have selected the tables... Although you are doing LEFT JOINs, I don't think it will matter as it appears the secondary tables are all lookup type of tables and in_book is your primary. But as just a try it would be..
SELECT STRAIGHT_JOIN (...rest of query...)

Fast query slows down when within a subquery

I have this query:
SELECT timestmp
FROM devicelog
WHERE device_id = 5
ORDER BY id DESC LIMIT 1
which takes less than 0.001 seconds to fetch, but once I put it in a subquery, it slows down to about 3.05 seconds. Any reason to why it does this, or how I can remedy it?
Here is the second query (which is the one I want to optimize):
SELECT device.id,
(SELECT timestmp
FROM devicelog
WHERE device_id = device.id
ORDER BY id DESC LIMIT 1) as timestmp
FROM device
Table "device" only has like 10-15 records in it (devicelog has several million), so I would assume it goes 1 by 1 through each record and then executes the subquery, but obviously it's doing something else. The PK of devicelog is the id, and the PK in device is its id as well. There is an index on devicelog for timestmp (which is a datetime) and device_id which is also a FK back to devicelog. There are other indices as well, but they are irrelevant (things like names, descriptions, etc).
I just need it to loop through devices, then display the last timestamp record.
If I list each device in PHP, then perform the first query separately, it will be perform extraordinary well, but I want to do this in one entire query. Like, I could do something like (pseudocode):
foreach($row in <devicelog>)
query('<first query> where id = $row[id]')
Doing an entire join would be too expensive on devicelog just because the high count.
Your question and query do not match what you are looking for per the comment to the first answer offered.
What you can do is a pre-aggregate on a per-device basis to get the max ID log, then join that to your master list of devices...
SELECT
d.name,
d.id,
DeviceMax.lastTime
from
device d
LEFT JOIN ( select dl.device_id,
max( dl.timestamp ) lastTime
from
devicelog dl
group by
dl.device_id ) as DeviceMax
ON d.id = DeviceMax.device_id
Now, if you needed other stuff from the device log for that specific entry, we could just add on to that...
LEFT JOIN devicelog dl2
on DeviceMax.Device_id = dl2.Device_id
AND DeviceMax.lastTime = dl2.timestamp
then you can get any other columns from the "dl2" alias added to your query.
Also, for your device log table, I would have a covering index on (device_id, timestamp)
COMMENT FROM FEEDBACK
Then I would offer this as a suggestion for you, which is something also common in the world of web development when someone needs "highest count", or "most" of something, or "most recent" etc.
Denormalize your Device table with only one respect... add a column for the LastDeviceLogID. Then, whenever your DeviceLog has an entry added to it, you just use an after insert trigger that does...
update Device
set LastDeviceLogID = newRecord.DeviceLogID
where ID = newRecord.Device_ID
Columns may not be exact, but the principle is there. This way, you never need to do a LIMIT 1, MAX(), etc and go through your millions of records, you can get as simply as doing
SELECT
d.name,
d.id,
dl.timestamp,
dl.othercolumns
from
device d
LEFT JOIN devicelog dl
on d.LastDeviceLogID = dl.DeviceLogID
try this with inner join
SELECT d.id, dl.timestamp
FROM device d
INNER JOIN devicelog dl
ON dl.device_id = d.id
ORDER BY d.id DESC LIMIT 1
to list all devices you may consider the GROUP BY
SELECT d.id, dl.timestamp
FROM device d
INNER JOIN devicelog dl
ON dl.device_id = d.id
GROUP BY d.id
ORDER BY d.id DESC
Edit:
if you have many indices , thn better to index your column id
try this
ALTER TABLE `device` ADD INDEX `id` (`id`)

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.