How to Optimize mysql query the brings huge quantity of rows?

How to Optimize mysql query the brings huge quantity of rows? - mysql

I really need your help. I´m doing one work from my universitiy and before I come here I read a lot of things from the documentations of mysql, searched and searched but none of this helped me in my sql query. Look I have this query:
SELECT a.nome, COUNT(*)
FROM publ p JOIN auth a on p.pubid = a.pubid
WHERE p.pubid IN (SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3) // THIS VALUE 3 here I have to do with value 2, 4 and 5
GROUP BY a.nome // in different querys.
ORDER BY COUNT(*) DESC, a.nome ASC
I tried to put index in the where clause but I never get the results and takes to long time. What can I do to increase my query to bring me more faster the results? Thank you for the help

I would create these indexes and reorder the query
CREATE INDEX publ_pubid ON publ(pubid);
CREATE INDEX auth_pubid ON auth(pubid, nome);
SELECT a.nome, COUNT(*)
FROM (
SELECT pubid
FROM auth
GROUP BY pubid
HAVING COUNT(*) < 3
) L
LEFT JOIN publ p on L.pubid=publ.pubid
JOIN auth a on p.pubid = a.pubid
GROUP BY a.nome
ORDER BY COUNT(*) DESC, a.nome ASC;

Related

mysql avg function not returning all records

I am using following query
select
*,
dealer.id As dealerID,
services.id as serviceID
from services
LEFT JOIN dealer
on services.dealer=dealer.id
LEFT JOIN reviews
ON reviews.dealer_id=dealer.id
where services.brand_id = '9' and
services.model_id='107' and
services.petrol > 0
ORDER BY services.total asc ,
AVG(reviews.rating) desc
I have 6 records and it should display 6 records instead its displaying only 1. When i remove AVG(reviews.rating) desc. It display all records.
mysql tables are
services
dealer
brand_id
model_id
petrol
id
total
dealer
id
name
reviews
id
dealer_id
rating
I am not sure where i am doing mistake. If some can help.

avg() is an aggregation function. That is, it takes data from multiple rows and summarizes it.
Without a group by, the query is an aggregation query over all the data. Such a query always returns exactly one row.
Most databases would return an error when you use select *, use an aggregation function, and have no group by. MySQL has a (mis)feature where this syntax is allowed (although on the newest versions, the default settings disallow this).
I'm not sure what you are trying to do, but avg() doesn't make sense in this context. Perhaps this does what you want:
ORDER BY services.total asc, reviews.rating desc

As already mentioned AVG() is aggregate ftn, so I have changed the desc of your order by to include to select the average values.
For future reference:
Providing snippets of raw data also helps. Creating an sql fiddle helps even more
select
*,
dealer.id As dealerID,
services.id as serviceID
from services
LEFT JOIN dealer
on services.dealer=dealer.id
LEFT JOIN reviews
ON reviews.dealer_id=dealer.id
where services.brand_id = '9' and
services.model_id='107' and
services.petrol > 0
ORDER BY services.total asc ,
(SELECT AVG(r2.rating) FROM reviews r2 RIGHT JOIN ON r2.dealer_id=dealer.id) desc

You might try:
SELECT
*,
AVG(c.rating) AS `avg__rating`,
b.id AS dealerID,
a.id AS serviceID
FROM services a
LEFT JOIN dealer b
on a.dealer = b.id
LEFT JOIN reviews c
ON c.dealer_id = b.id
WHERE a.brand_id = '9' and
a.model_id='107' and
a.petrol > 0
GROUP BY a.dealer, a.brand_id, a.model_id, a.petrol, a.id, a.total
ORDER BY a.total asc,
AVG(c.rating) desc
This adds a GROUP BY on the columns in your services table so you will get one row per services/dealer.

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.

All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further

The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

How to use MAX and COUNT function in MYSQL with 2 tables

I am a newbie in MYSQL and had a question regarding the use of MAX and COUNT functions together in MYSQL. I have 2 tables worker and assignment and the primary key of worker is a foreign key in assignment table.
I need to show the employees name and id and the total assignment assigned to him, and only show the person with the most assignment that is the employee with the most assignment.
my code is
SELECT worker.Wrk_ID, worker.Wrk_LastName, MAX(a.count_id)
FROM worker,
(SELECT COUNT(assignment.Wrk_ID) as count_ID
FROM worker, assignment
WHERE worker.Wrk_ID = assignment.Wrk_ID
GROUP BY worker.Wrk_ID)as a
GROUP BY worker.Wrk_ID;
The code is giving an error no. #1054.
Please can anyone help me.
Thanking you in anticipation.

Try something like this:
SELECT worker.Wrk_ID, worker.Wrk_LastName, S.Count
FROM worker
JOIN
(SELECT Wrk_ID, COUNT(*) AS Count FROM Assignments
GROUP BY Wrk_Id ORDER BY COUNT(*) DESC LIMIT 1) S
ON worker.Wrk_ID = S.Wrk_ID
If you want a list of employees sorted by their total assignments:
SELECT w.WrkID, w.Wrk_LastName, COUNT(*) AS Assignments
FROM work w left join Assignments a
ON w.WrkID=a.WrkID
GROUP BY w.WrkID
ORDER BY COUNT(*) DESC;
To allow multiple winners:
SELECT s.*, w.Wrk_Lastname FROM
(
SELECT wrk_id , COUNT(*) AS tot_assignments
FROM Assignments
GROUP BY wrk_id
HAVING COUNT(*) =
(
SELECT MAX(tot) FROM
(
SELECT COUNT(*) AS TOT FROM Assignments GROUP BY wrk_id
) counts
)
) winners
INNER JOIN worker w ON s.wrk_id = w.wrk_id;
It can be slow since it does multiple GROUP BY. Doing it in separated steps in a procedure can be better.

generate index columns for "ORDER BY x, y"

I use this query to summarize the contents of the table export_blocks, aggregated by user and date, and save it as a new table:
CREATE TABLE export_days
SELECT user_id DATE(submitted) AS date_str,
FROM export_blocks
GROUP BY user_id, DATE(submitted)
ORDER BY user_id, submitted
How can I, for each user_id get an incremental index for the date of records for that user? The indicies should start at 1 for each user, following the ORDER BY. I.e. I'd like to generate the date_index of the output below using SQL:
user_id date_str date_index
brian 2014-06-10 1
brian 2014-06-12 2
brian 2014-06-15 3
louis 2014-06-08 1
louis 2014-06-16 2
lucy 2013-11-15 1
(etc...)
I've been trying https://stackoverflow.com/a/5493480/1297830 but I cannot get it to work. It stops the counters prematurely, giving too low numbers for id_no and date_no.

Basing it on your sample query, you can do simple (dependent) subqueries to get the result;
SELECT id, date_str,
(SELECT COUNT(DISTINCT id)+1 FROM mytable WHERE id < a.id) id_no,
(SELECT COUNT(id)+1 FROM mytable WHERE id = a.id AND date_str < a.date_str) date_no
FROM mytable a
ORDER BY id;
...or you could do a couple of self joins;
SELECT a.id, a.date_str,
COUNT(DISTINCT b.id)+1 id_no,
COUNT(DISTINCT c.date_str)+1 date_no
FROM mytable a
LEFT JOIN mytable b ON a.id > b.id
LEFT JOIN mytable c ON a.id = c.id AND a.date_str > c.date_str
GROUP BY a.id, a.date_str
ORDER BY a.id, a.date_str;
An SQLfiddle showing both in action.
Sadly neither is really a very performant solution, but since MySQL lacks analytical (ie ranking) functions, the options are limited. Using user variables to do the ranking is also an option, however they're notoriously tricky to use and aren't portable so I'd go there only if performance demands it.

Based on Joachim's excellent answer I worked out the solution. It also works when there's multiple rows per day for each user.
CREATE TABLE export_days
SELECT
user_id, DATE(submitted) AS date_str,
(SELECT COUNT(DISTINCT DATE(submitted))+1 FROM export_blocks WHERE user_id = a.user_id AND submitted < a.submitted) date_no
FROM export_blocks a
GROUP BY user_id, DATE(submitted)
ORDER BY user_id, submitted

MySQL is not using INDEX in subquery

I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?

Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;

Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to Optimize mysql query the brings huge quantity of rows? - mysql

Related

mysql avg function not returning all records

How can I speed up a multiple inner join query?

How to use MAX and COUNT function in MYSQL with 2 tables

generate index columns for "ORDER BY x, y"

MySQL is not using INDEX in subquery

Categories

Resources