Optimizing Group By and Date functions

Optimizing Group By and Date functions - mysql

SELECT p.id_product
FROM popularity p, products pr
WHERE p.id_product = pr.id_product
AND p.date >= (DATE_SUB(CURDATE(), INTERVAL 30 DAY))
GROUP BY p.id_product order by count(p.id_product) desc LIMIT 0, 6
I'm wondering how to optimize my queries as much as possible. I'm trying to trim off the fat and this query in particular is giving me trouble because it is used everywhere. I have a table in which I store each instance of a product page being visited, and another table which has the products.
p.id_product is a non-unique key, whereas pr.id_product is the products' table primary key. I don't think I'm doing my query right as it seems like I'm doing a full index scan to do my join between both tables. And I'm not sure calling DATE_SUB in such a way is the best idea (I'm looking at the past month), or if my group/order is correct. My database's tables also run off InnoDB.
Is there any way I can make this faster? I've already set a limit to the query since I noticed the software never uses more than 6 results

Use show create table popularity and show create table products to determine your indexes already in place.
Indexes speed up performance but slow updates and inserts.
You need to find the right balance of indexes based on different ways of going after your data, so I am not one to say just pepper the tables with indexes.

You are doing virtually nothing with products pr, remove it from the query. This will speed things up some. (If you do need the table, then explain why.)
Then, adding this composite, covering, index to popularity will speed it up:
INDEX(date, id_product)

Related

How to make a faster query when joining multiple huge tables?

I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?

All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?

A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)

I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.

MySQL Slow query ~ 10 seconds

I have this query which basically goes through a bunch of tables to get me some formatted results but I can't seem to find the bottleneck. The easiest bottleneck was the ORDER BY RAND() but the performance are still bad.
The query takes from 10 sec to 20 secs without ORDER BY RAND();
SELECT
c.prix AS prix,
ST_X(a.point) AS X,
ST_Y(a.point) AS Y,
s.sizeFormat AS size,
es.name AS estateSize,
c.title AS title,
DATE_FORMAT(c.datePub, '%m-%d-%y') AS datePub,
dbr.name AS dateBuiltRange,
m.myId AS meuble,
c.rawData_id AS rawData_id,
GROUP_CONCAT(img.captionWebPath) AS paths
FROM
immobilier_ad_blank AS c
LEFT JOIN PropertyFeature AS pf ON (c.propertyFeature_id = pf.id)
LEFT JOIN Adresse AS a ON (c.adresse_id = a.id)
LEFT JOIN Size AS s ON (pf.size_id = s.id)
LEFT JOIN EstateSize AS es ON (pf.estateSize_id = es.id)
LEFT JOIN Meuble AS m ON (pf.meuble_id = m.id)
LEFT JOIN DateBuiltRange AS dbr ON (pf.dateBuiltRange_id = dbr.id)
LEFT JOIN ImageAd AS img ON (img.commonAd_id = c.rawData_id)
WHERE
c.prix != 0
AND pf.subCatMyId = 1
AND (
(
c.datePub > STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y')
)
OR date_format(c.datePub, '%d-%m-%Y') = '30-04-2016'
)
AND a.validPoint = 1
GROUP BY
c.id
#ORDER BY
# RAND()
LIMIT
5000
Here is the explain query:
Visual Portion:
And here is a screenshot of mysqltuner
EDIT 1
I have many indexes Here they are:
EDIT 2:
So you guys did it. Down to .5 secs to 2.5 secs.
I mostly followed all of your advices and changed some of my.cnf + runned optimized on my tables.

You're searching for dates in a very suboptimal way. Try this.
... c.datePub >= STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y') + INTERVAL 1 DAY
That allows a range scan on an index on the datePub column. You should create a compound index for that table on (datePub, prix, addresse_id, rawData_id) and see if it helps.
Also try an index on a (valid_point). Notice that your use of a geometry data type in that table is probably not helping anything.

To begin with you have quite a lot of indexes but many of them are not useful. Remember more indexes means slower inserts and updates. Also mysql is not good at using more than one index per table in complex queries. The following indexes have a cardinality < 10 and probably should be dropped.
IDX_...E88B
IDX....62AF
IDX....7DEE
idx2
UNIQ...F210
UNIQ...F210..
IDX....0C00
IDX....A2F1
At this point I got tired of the excercise, there are many more
Then you have some duplicated data.
point
lat
lng
The point field has the lat and lng in it. So the latter two are not needed. That means you can lose two more indexes idxlat and idxlng. I am not quite sure how idxlng appears twice in the index list for the same table.
These optimizations will lead to an overall increase in performance for INSERTS and UPDATES and possibly for all SELECTs as well because the query planner needs to spend less time deciding which index to use.
Then we notice from your explain that the query does not use any index on table Adresse (a). But your where clause has a.validPoint = 1 clearly you need an index on it as suggested by #Ollie-Jones
However I suspect that this index may have low cardinality. In that case I recommend that you create a composite index on this column + another.

The problem is your join with (a). The table has an index, but the index can't be used, more than likely due to the sort (/group by), or possibly incompatible types. The EXPLAIN shows three quarters of a million rows examined, this means that index lookup was not possible.
When designing a query, look for the smallest possible result set - search by that index, and then join from there. Perhaps "c" isn't the best table for the primary query.
(You could try using FORCE INDEX (id) on table a, if it doesn't work, the error may give you more information).

As others have pointed out, you need an index on a.validPoint but what about c.datePub that is also used in the WHERE clause. Why not a multiple column index on datePub, address_id the index on address_id is already used, so a multiple column index will be better here.

Mysql query - faster?

So I have this MySQL query, and as I have lots of records this gets very slow, the computers that use the software (cash registers) aren't that powerful either.
Is there a way to get the same result, but faster? Would really appreciate help!
SELECT d.sifra, COUNT(d.sifra) AS pogosti, c.*, s.Stevilka as Stev_sk FROM Cenik c, dnevna d, Podskupina s
WHERE d.sifra = c.Sifra AND d.datum >= DATE(DATE_SUB(NOW(),INTERVAL 3 DAY))
GROUP BY d.sifra ORDER BY pogosti DESC limit 27

Have you tried indexing?
You are using c.Sifra in the WHERE, so you probably want
CREATE INDEX Cenik_Sifra ON Cenik(Sifra);
Also you use datum and sifra from dnevna, and datum is your SELECT, so
CREATE INDEX dnevna_ndx ON dnevna(datum, sifra);
Finally there's no JOIN condition on Podskupina, whence you draw Stevilka. Is this a constant table? As it is, you're just counting rows in Podskupina and/or getting an unspecified value out of it, unless it only has the one row.
On some versions of MySQL you might also find benefit in pre-calculating the datum:
SELECT #datum := DATE(DATE_SUB(NOW(), INTERVAL 3 DAY))
and then use #datum in your query. This might improve its chances of a good indexed performance.
Without knowing more about the structure and cardinality of the involved tables, though, there's little that can be done.
At the very least you should post the result of
EXPLAIN SELECT...(your select)
in the question.

you don't have condition to join Podskupina s, and you get cross join (all to all), so you get x rows from join "d.sifra = c.Sifra" multiplicate by y rows of Podskupina s

This looks like a very problematic query. Do you really need to return all of c.* ? And where's the join or filter on Podskupina? Once you tighten the query, make sure you've created good indexes on the tables. For example, presuming you've already got a clustered index on a unique ID as a primary key in dnevna, performance would typically benefit by putting a secondary index on the sifra and datum columns.

Is this the best approach to this complex MySQL multi table query?

I'm building a complex multi-table MySQL query, and even though it works, I'm wondering could I make it more simple.
The idea behind it is this, using the Events table that logs all site interaction, select the ID, Title, and Slug of the 10 most popular blog posts, and order by the most hits descending.
SELECT content.id, content.title, content.slug, COUNT(events.id) AS hits
FROM content, events
WHERE events.created >= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND events.page_url REGEXP '^/posts/[0-9]'
AND content.id = events.content_id
GROUP BY content.id
ORDER BY hits DESC
LIMIT 10
Blog post URLs have the following format:
/posts/2013-05-16-hello-world
As I mentioned it seems to work, but I'm sure I could be doing this cleaner.
Thanks,

The condition on created and the condition on page_url are both range conditions. You can get index-assistance for only one range condition per table in a SQL query, so you have to pick one or the other to index.
I would create an index on the events table over two columns (content_id, created).
ALTER TABLE events ADD KEY (content_id, created);
I'm assuming that restricting by created date is more selective than restricting by page_url, because I assume "/posts/" is going to match a large majority of the events.
After narrowing down the matching rows by created date, the page-url condition will have to be handled by the SQL layer, but hopefully that won't be too inefficient.
There is no performance difference between SQL-89 ("comma-style") join syntax and SQL-92 JOIN syntax. I do recommend SQL-92 syntax because it's more clear and it supports outer joins, but performance is not a reason to use it. The SQL query optimizer supports both join styles.
Temporary table and filesort are often costly for performance. This query is bound to create a temporary table and use a filesort, because you're using GROUP BY and ORDER BY against different columns. You can only hope that the temp table will be small enough to fit within your tmp_table_size limit (or increase that value). But that won't help if content.title or content.slug are BLOB/TEXT columns, the temp table will be forced to be spooled on disk anyway.

Instead of a regular expression, you can use the left function:
SELECT content.id, content.title, content.slug, COUNT(events.id) AS hits FROM content JOIN events ON content.id = events.content_id
WHERE events.created >= DATE_SUB(NOW(), INTERVAL 1 MONTH)
AND left( events.page_url, 7) = '/posts/'
GROUP BY content.id
ORDER BY hits DESC
LIMIT 10)
But that's just off the top of my head, and without a fiddle, untested. The JOIN suggestion, made in the comment, is also good and has been reflected in my answer.

Why does the query take a long time in mysql even with a LIMIT clause?

Say I have an Order table that has 100+ columns and 1 million rows. It has a PK on OrderID and FK constraint StoreID --> Store.StoreID.
1) select * from 'Order' order by OrderID desc limit 10;
the above takes a few milliseconds.
2) select * from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this somehow can take up to many seconds. The more inner joins I add, slows it down further more.
3) select OrderID, column1 from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this seems to speed the execution up, by limiting the columns we select.
There are a few points that I dont understand here and would really appreciate it if anyone more knowledgeable with mysql (or rmdb query execution in general) can enlighten me.
Query 1 is fast since it's just a reverse lookup by PK and DB only needs to return the first 10 rows it encountered.
I don't see why Query 2 should take for ever. Shouldn't the operation be the same? i.e. get the first 10 rows by PK and then join with other tables. Since there's a FK constraint, it is guaranteed that the relationship will be satisfied. So DB doesn't need to join more rows than necessary and then trim the result, right? Unless, FK constraint allows null FK? In which case I guess a left join would make this much faster than an inner join?
Lastly, I'm guess query 3 is simply faster because less columns are used in those unnecessary joins? But why would the query execution need the other columns while joining? Shouldn't it just join using PKs first, and then get the columns for just the 10 rows?
Thanks!

My understanding is that the mysql engine applies limit after any join's happen.
From http://dev.mysql.com/doc/refman/5.0/en/select.html, The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
EDIT: You could try using this query to take advantage of the PK speed.
select * from (select * from 'Order' order by OrderID desc limit 10) o
join 'Store' s on s.StoreID = o.StoreID;

All of your examples are asking for tablescans of the existing tables, so none of them will be more or less performant than the degree to which mysql can cache the data or results. Some of your queries have order by or join criteria, which can take advantage of indexes purely to make the joining process more efficient, however, that still is not the same as having a set of criteria that will trigger the use of indexes.
Limit is not a criteria -- it can be thought of as filtration once a result set is determined. You save time on the client, once the result set is prepared, but not on the server.
Really, the only way to get the answers you are seeking is to become familiar with:
EXPLAIN EXTENDED your_sql_statement
The output of EXPLAIN will show you how many rows are being looked at by mysql, as well as whether or not any indexes are being used.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008