How to count related id on second table quicker in mysql? - mysql

I need to know how many orders made to each product within a day by their ids. I tried select all the product_today.id. And count each of them from the second table - product_today_order.hid. I'm now have 20k+ rows of data. It took me 10s+ only this query.
Is there any way to make the query faster?
SELECT t.id,(select count(o.hid) from product_today_order o where o.hid=t.id) as zid
FROM product_today t
where date(t.dtime)='2021-11-26'
group by t.id

5 tips:
Probably the main slowdown is the un-sargable date(t.dtime)='...'. Change that to
WHERE t.dtime >= '2021-11-26'
AND t.dtime < '2021-11-26' + INTERVAL 1 DAY
Also, get rid of the GROUP BY. It is unnecessary (if t.id is the PRIMARY KEY).
Do you have an index on t that starts with dtime?
Do you need to check o.hid for being not-NULL? If not, simply say COUNT(*).
Do you have an index on o that starts with hid?

Related

Will adding an index to a column improve the select query (without where) performance in SQL?

I have a MySQL table that contains 20 000 000 rows, and columns like (user_id, registered_timestamp, etc). I have written a below query to get a count of users registered day wise. The query was taking a long time to execute. Will adding an index to the registered_timestamp column improve the execution time?
select date(registered_timestamp), count(userid) from table group by 1
Consider using this query to get a list of dates and the number of registrations on each date.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
GROUP BY date(registered_timestamp)
Then an index on table(registered_timestamp) will help a little because it's a covering index.
If you adapt your query to return dates from a limited range, for example.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE registered_timestamp >= CURDATE() - INTERVAL 8 DAY
AND registered_timestamp < CURDATE()
GROUP BY date(registered_timestamp)
the index will help. (This query returns results for the week ending yesterday.) However, the index will not help this query.
SELECT date(registered_timestamp) date, COUNT(*)
FROM table
WHERE DATE(registered_timestamp) >= CURDATE() - INTERVAL 8 DAY /* slow! */
GROUP BY date(registered_timestamp)
because the function on the column makes the query unsargeable.
You probably can address this performance issue with a MySQL generated column. This command:
ALTER TABLE `table`
ADD registered_date DATE
GENERATED ALWAYS AS DATE(registered_timestamp)
STORED;
Then you can add an index on the generated column
CREATE INDEX regdate ON `table` ( registered_date );
Then you can use that generated (derived) column in your query, and get a lot of help from that index.
SELECT registered_date, COUNT(*)
FROM table
GROUP BY registered_date;
But beware, creating the generated column and its index will take a while.
select date(registered_timestamp), count(userid) from table group by 1
Would benefit from INDEX(registered_timestamp, userid) but only because such an index is "covering". The query will still need to read every row of the index, and do a filesort.
If userid is the PRIMARY KEY, then this would give you the same answers without bothering to check each userid for being NOT NULL.
select date(registered_timestamp), count(*) from table group by 1
And INDEX(registered_timestamp) would be equivalent to the above suggestion. (This is because InnoDB implicitly tacks on the PK.)
If this query is common, then you could build and maintain a "summary table", which collects the count every night for the day's registrations. Then the query would be a much faster fetch from that smaller table.

Better way to exclude duplicates without creating new table

I have a query that uses a subquery to detect if an item in a joined table has a duplicate record, and if so the data is not pulled into the parent query:
select
(f.listing_datetime) as datetime,
round(avg(f.listing_price), 0) as price,
round(avg(f.listing_sqft), 0) as sqft,
round(avg(f.listing_p_per_sqft), 2) as p_per_ft,
f.listing_neighborhood, count(*) as points
from (
select
a.listing_datetime, a.listing_price, a.listing_sqft, a.listing_p_per_sqft,
a.listing_neighborhood, i.listing_tokens, count(i.listing_tokens) as c
from
agg_cl_data as a
left join incoming_cl_data_desc as i
on a.listing_url = i.listing_url
where a.listing_datetime between curdate() - interval 30 day and curdate()
group by i.listing_tokens
having c < 2
) as f
group by day(f.listing_datetime), f.listing_neighborhood
order by f.listing_datetime;
As you can see, by using a simple way to deal with dupes with the HAVING clause, I'm actually losing the original record that was stored because any aggregated record with great than 2 is thrown out. Is there a better way to do this so that I don't lose some of the data, WITHOUT creating a new table that would be queried against?
If you want to remove duplicate rows then use DISTINCT clause. If you want to find out duplicate based on partitioning on a particular column use the ROW_NUMBER window function.
On first glance, your subquery is invalid since you are grouping by one column and not using any other aggregate function in the other columns.
select distinct
a.listing_datetime, a.listing_price, a.listing_sqft, a.listing_p_per_sqft,
a.listing_neighborhood, i.listing_tokens
from
agg_cl_data as a
left join incoming_cl_data_desc as i
on a.listing_url = i.listing_url
where a.listing_datetime between curdate() - interval 30 day and curdate()
Try using 'distinct' instead if 'having' in subquery. You will get each url only once without loosing it, even if there were two entries for it.
So your code should be:
select DISTINCT a.listing_datetime, ...
and then no 'having' in the end.

Mysql query - faster?

So I have this MySQL query, and as I have lots of records this gets very slow, the computers that use the software (cash registers) aren't that powerful either.
Is there a way to get the same result, but faster? Would really appreciate help!
SELECT d.sifra, COUNT(d.sifra) AS pogosti, c.*, s.Stevilka as Stev_sk FROM Cenik c, dnevna d, Podskupina s
WHERE d.sifra = c.Sifra AND d.datum >= DATE(DATE_SUB(NOW(),INTERVAL 3 DAY))
GROUP BY d.sifra ORDER BY pogosti DESC limit 27
Have you tried indexing?
You are using c.Sifra in the WHERE, so you probably want
CREATE INDEX Cenik_Sifra ON Cenik(Sifra);
Also you use datum and sifra from dnevna, and datum is your SELECT, so
CREATE INDEX dnevna_ndx ON dnevna(datum, sifra);
Finally there's no JOIN condition on Podskupina, whence you draw Stevilka. Is this a constant table? As it is, you're just counting rows in Podskupina and/or getting an unspecified value out of it, unless it only has the one row.
On some versions of MySQL you might also find benefit in pre-calculating the datum:
SELECT #datum := DATE(DATE_SUB(NOW(), INTERVAL 3 DAY))
and then use #datum in your query. This might improve its chances of a good indexed performance.
Without knowing more about the structure and cardinality of the involved tables, though, there's little that can be done.
At the very least you should post the result of
EXPLAIN SELECT...(your select)
in the question.
you don't have condition to join Podskupina s, and you get cross join (all to all), so you get x rows from join "d.sifra = c.Sifra" multiplicate by y rows of Podskupina s
This looks like a very problematic query. Do you really need to return all of c.* ? And where's the join or filter on Podskupina? Once you tighten the query, make sure you've created good indexes on the tables. For example, presuming you've already got a clustered index on a unique ID as a primary key in dnevna, performance would typically benefit by putting a secondary index on the sifra and datum columns.

What SQL indexes to put for big table

I have two big tables from which I mostly select but complex queries with 2 joins are extremely slow.
First table is GameHistory in which I store records for every finished game (I have 15 games in separate table).
Fields: id, date_end, game_id, ..
Second table is GameHistoryParticipants in which I store records for every player participated in certain game.
Fields: player_id, history_id, is_winner
Query to get top players today is very slow (20+ seconds).
Query:
SELECT p.nickname, count(ghp.player_id) as num_games_today
FROM `GameHistory` as gh
INNER JOIN GameHistoryParticipants as ghp ON gh.id=ghp.history_id
INNER JOIN Players as p ON p.id=ghp.player_id
WHERE TIMESTAMPDIFF(DAY, gh.date_end, NOW())=0 AND gh.game_id='scrabble'
GROUP BY ghp.player_id ORDER BY count(ghp.player_id) DESC LIMIT 10
First table has 1.5 million records and the second one 3.5 million.
What indexes should I put ? (I tried some and it was all slow)
You are only interested in today's records. However, you search the whole GameHistory table with TIMESTAMPDIFF to detect those records. Even if you have an index on that column, it cannot be used, due to the fact that you use a function on the field.
You should have an index on both fields game_id and date_end. Then ask for the date_end value directly:
WHERE gh.date_end >= DATE(NOW())
AND gh.date_end < DATE_ADD(DATE(NOW()), INTERVAL 1 DAY)
AND gh.game_id = 'scrabble'
It would even be better to have an index on date_end's date part rather then on the whole time carrying date_end. This is not possible in MySQL however. So consider adding another column trunc_date_end for the date part alone which you'd fill with a before-insert trigger. Then you'd have an index on trunc_date_end and game_id, which should help you find the desired records in no time.
WHERE gh.trunc_date_end = DATE(NOW())
AND gh.game_id = 'scrabble'
add 'EXPLAIN' command at the beginning of your query then run it in a database viewer(ex: sqlyog) and you will see the details about the query, look for the 'rows' column and you will see different integer values. Now, index the table columns indicated in the EXPLAIN command result that contain large rows.
-i think my explanation is kinda messy, you can ask for clarification

mysql query and performance

I would like to know the impact on performance if I run this query in the following conditions.
Query:
select `players`.*, count(`clicks`.`id`) as `clicks_count`
from `players` left join `clicks` on `clicks`.`player_id` = `players`.`id`
group by `players`.`id`
order by `clicks_count` desc
limit 1
Conditions:
In the clicks table I expect to get
insert 1000 times in a 1 minute
The clicks table will contain more
then 1,000,000 rows
The players table will contain
10,000 rows
The players table get inserted into every 5
minutes
I would like to know what to expect performance-wise if I run the query 1000 times in 1 minute.
Thanks
That query will never run in milliseconds with any meaningful amounts of data in your tables. It'll run two full table scans, join the two together, aggregate the mess, and fetch the top row from that.
Use a trigger to store the total in the players, and index that field. You'll then be able to avoid the join altogether:
select p.* from players p order by clicks_count desc limit 1
First & foremost, you should worry about your schema if you want decent performance with that number of records and frequent writes; i.e. proper indexes and constraints must be created if not already in place.
Next, the query itself, select the minimum number of fields needed (so if you do not need ALL players field, avoid using "players.*").
Personal pref, I'd restructure tables (e.g. playerID in place of id) and query like so:
SELECT p.*, COUNT(c.id) as clicks_count
FROM players p
JOIN clicks c USING(playerID)
GROUP BY p.playerID
ORDER BY clicks_count desc
LIMIT 1
Again, see if you really need ALL player table fields; if not, omit "p.*" and replace with p.foo, p.bar, etc.