I am trying to retrieve rows from my database table, and prefer they have a provider_count of higher then 0. I want to retrieve records with a provider_count lower then 0 as the last results.
I'm currently using the following query on a table with around 1M records:
SELECT
`products`.*
FROM
`products`
WHERE
`products`.`provider_count` IN
(12,2,3,4,5,6,7,8,9,10,11,13,14,18,19,21,22,42,46,58,0)
GROUP BY
`products`.`search_name`
ORDER BY
FIELD(provider_count, 12,2,3,4,5,6,7,8,9,10,11,13,14,18,19,21,22,42,46,58,0)
LIMIT 48
Unfortunately the order by makes the query really slow, there is already an index on the provider_count column and I've tried adding an index to provider_count+search_name but that didn't show any improvements regarding the speed.
I've also tried to change the order by removing the WHERE statement and changing the ORDER BY to:
ORDER BY `products`.`provider_count` = 0 ASC, `products`.`provider_count`
But that results in quite the same execution time (give or take 5 seconds).
Without the order by, query execution time is only 0.005s
Any suggestions on how I can improve this query?
Schema of my products table:
https://www.db-fiddle.com/f/azBzXiRRsLtXCmLDCVe9DP/0
I'm going to try to answer - it's still not totally clear what your query is supposed to do. The Fiddle you provide doesn't run your query (due to the GROUP BY bug #rickJames mentioned).
I believe your intent is to retrieve every record in the products table, for a list of providers. You want the records where the provider is 0 to be at the end of that list. You want the query to perform quickly.
You should be able to achieve that via this query:
SELECT
`products`.*, 1 as no_provider
FROM
`products`
WHERE
`products`.`provider_count` IN
(12,2,3,4,5,6,7,8,9,10,11,13,14,18,19,21,22,42,46,58)
union
SELECT
`products`.*, 2 as no_provider
FROM
`products`
WHERE
`products`.`provider_count` = 0
order by no_provider, provider_count
LIMIT 48
Your limit clause means that if there are more than 48 records with a provider count > 0, the last clause won't show in your result set.
You could, of course, keep it even simpler:
SELECT
`products`.*
FROM
`products`
WHERE
`products`.`provider_count` IN
(12,2,3,4,5,6,7,8,9,10,11,13,14,18,19,21,22,42,46,58,0)
ORDER BY provider_count desc
Related
I am trying to do a simple test where I'm pulling from a table the information of a specific part number as such:
SELECT *
FROM table_name
WHERE part_no IN ('abc123')
This returns 25 rows. Now I want to count the number that meet the "accepted" condition in a specific column but the result is limited to only the 10 most recent. My approach is to write it as follows:
Select Count(*)
FROM table_name
WHERE part_no IN ('abc123') AND lot IN ('accepted')
ORDER BY date DESC
LIMIT 10
I'm having a hard time to get the ORDER BY and LIMIT operations to work. I could use help just getting it to limit appropriately, and I can figure out the rest from there.
Edit: I understand that the operations are happening on the COUNT which only returns one row with a value; but I put the second clip to show where I am stuck in my thought process.
Your query SELECT Count(*) FROM ... will always return exactly one row.
It's not 100% clear what exactly you want to do, but if you want to know how many of the last 10 have been accepted, you could use a subquery - something like:
SELECT COUNT(*) FROM (
SELECT lot
FROM table_name
WHERE part_no IN ('abc123')
ORDER BY date DESC
LIMIT 10
)
WHERE lot IN ('accepted')
The inner query will return the 10 most recent rows for part abc123, then the outer query will count the accepted ones.
There are also other solution (for example, you could have the inner query output a field that is 0 when the part is not accepted and 1 when the part is accepted, then take the sum). Depending on which exact dialect/database you are using, you may also have more elegant options.
Select count returns ONE ROW therefore the ORDER BY and the LIMIT will not work on the results
The bellow statement does not work but i cant seem to figure out why
select AVG(delay_in_seconds) from A_TABLE ORDER by created_at DESC GROUP BY row_type limit 1000;
I want to get the avg's of the most recent 1000 rows for each row_type. created_at is of type DATETIME and row_type is of type VARCHAR
If you only want the 1000 most recent rows, regardless of row_type, and then get the average of delay_in_seconds for each row_type, that's a fairly straightforward query. For example:
SELECT t.row_type
, AVG(t.delay_in_seconds)
FROM (
SELECT r.row_type
, r.delay_in_seconds
FROM A_table r
ORDER BY r.created_at DESC
LIMIT 1000
) t
GROUP BY t.row_type
I suspect, however, that this query does not satisfy the requirements that were specified. (I know it doesn't satisfy what I understood as the specification.)
If what we want is the average of the most recent 1000 rows for each row_type, that would also be fairly straightforward... if we were using a database that supported analytic functions.
Unfortunately, MySQL doesn't provide support for analytic functions. But it is possible to emulate one in MySQL, but the syntax is a bit involved, and it is dependent on behavior that is not guaranteed.
As an example:
SELECT s.row_type
, AVG(s.delay_in_seconds)
FROM (
SELECT #row_ := IF(#prev_row_type = t.row_type, #row_ + 1, 1) AS row_
, #prev_row_type := t.row_type AS row_type
, t.delay_in_seconds
FROM A_table t
CROSS
JOIN (SELECT #prev_row_type := NULL, #row_ := NULL) i
ORDER BY t.row_type DESC, t.created_at DESC
) s
WHERE s.row_ <= 1000
GROUP
BY s.row_type
NOTES:
The inline view query is going to be expensive for large sets. What that's effectively doing is assigning a row number to each row. The "order by" is sorting the rows in descending sequence by created_at, what we want is for the most recent row to be assigned a value of 1, the next most recent 2, etc. This numbering of rows will be repeated for each distinct value of row_type.
For performance, we'd want a suitable index with leading columns (row_type,created_at,delay_seconds) to avoid an expensive "Using filesort" operation. We need at least those first two columns for that, including the delay_seconds makes it a covering index (the query can be satisfied entirely from the index.)
The outer query then runs against the resultset returned from the view query (a "derived table"). The predicate in the WHERE filters out all rows that were assigned a row number greater than 1000, the rest is a straighforward GROUP BY and and AVG aggregate.
A LIMIT clause is entirely unnecessary. It may be possible to incorporate some additional predicates for some additional performance enhancement... like, what if we specified the most recent 1000 rows, but only that were create_at within the past 30 or 90 days?
(I'm not entirely sure this answers the question that OP was asking. What this answers is: Is there a query that can return the specified resultset, making use of AVG aggregate and GROUP BY, ORDER BY and LIMIT clauses.)
N.B. This query is dependent on a behavior of MySQL user-defined variables which is not guaranteed.
The query above shows one approach, but there is also another approach. It's possible to use a "join" operation (of A_table with A_table) to get a row number assigned (getting a COUNT of the number of rows that are "more recent" than each row. With large sets, however, that can produce a humongous intermediate result, if we aren't careful to limit it.
Write the ORDER BY at the last of the statement.
SELECT AVG(delay_in_seconds) from A_TABLE GROUP BY row_type ORDER by created_at DESC limit 1000;
read mysql dev site for details.
I was under the impression that using an ORDER BY in an SQL query would not affect which records were selected for the result set. I thought that ORDER BY would only affect the presentation of the result set.
Recently, however, I was getting unexpected results from a query until I used an ORDER BY clause. This suggests that either a) ORDER BY can affect which records are included in the result set, or b) I have some other bug which I need to work on.
Which is it?
Here's the query: SELECT node_id FROM users ORDER BY node_id LIMIT 100
(node_id is both a primary key and foreign key).
As you can see, the query includes a LIMIT clause. It seems that if I use the ORDER BY, the records are ordered before the top 100 are selected. I had expected it to select 100 records based on natural order, then order them according to node_id.
I've looked for info on ORDER BY but as yet, the only info I can find suggests that it affects presentation only... I am using MySQL.
ORDER BY reflects the order of all of the records before the LIMIT Clause. To get the result you want you will need this:
select u.node_id
from users u
join
(
SELECT node_id
FROM users
LIMIT 100
) us ON u.node_id = us.node_id
ORDER BY u.node_id
This way you will use the limit clause first and get the top 100 records and then you will sort the result of that. The join clause is faster than a double Select statement especially if you are working with many records.
You can use a nested query:
SELECT node_id FROM
(
SELECT node_id FROM users LIMIT 100
) u
ORDER BY node_id
Say I have an Order table that has 100+ columns and 1 million rows. It has a PK on OrderID and FK constraint StoreID --> Store.StoreID.
1) select * from 'Order' order by OrderID desc limit 10;
the above takes a few milliseconds.
2) select * from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this somehow can take up to many seconds. The more inner joins I add, slows it down further more.
3) select OrderID, column1 from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this seems to speed the execution up, by limiting the columns we select.
There are a few points that I dont understand here and would really appreciate it if anyone more knowledgeable with mysql (or rmdb query execution in general) can enlighten me.
Query 1 is fast since it's just a reverse lookup by PK and DB only needs to return the first 10 rows it encountered.
I don't see why Query 2 should take for ever. Shouldn't the operation be the same? i.e. get the first 10 rows by PK and then join with other tables. Since there's a FK constraint, it is guaranteed that the relationship will be satisfied. So DB doesn't need to join more rows than necessary and then trim the result, right? Unless, FK constraint allows null FK? In which case I guess a left join would make this much faster than an inner join?
Lastly, I'm guess query 3 is simply faster because less columns are used in those unnecessary joins? But why would the query execution need the other columns while joining? Shouldn't it just join using PKs first, and then get the columns for just the 10 rows?
Thanks!
My understanding is that the mysql engine applies limit after any join's happen.
From http://dev.mysql.com/doc/refman/5.0/en/select.html, The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
EDIT: You could try using this query to take advantage of the PK speed.
select * from (select * from 'Order' order by OrderID desc limit 10) o
join 'Store' s on s.StoreID = o.StoreID;
All of your examples are asking for tablescans of the existing tables, so none of them will be more or less performant than the degree to which mysql can cache the data or results. Some of your queries have order by or join criteria, which can take advantage of indexes purely to make the joining process more efficient, however, that still is not the same as having a set of criteria that will trigger the use of indexes.
Limit is not a criteria -- it can be thought of as filtration once a result set is determined. You save time on the client, once the result set is prepared, but not on the server.
Really, the only way to get the answers you are seeking is to become familiar with:
EXPLAIN EXTENDED your_sql_statement
The output of EXPLAIN will show you how many rows are being looked at by mysql, as well as whether or not any indexes are being used.
I would like to know the impact on performance if I run this query in the following conditions.
Query:
select `players`.*, count(`clicks`.`id`) as `clicks_count`
from `players` left join `clicks` on `clicks`.`player_id` = `players`.`id`
group by `players`.`id`
order by `clicks_count` desc
limit 1
Conditions:
In the clicks table I expect to get
insert 1000 times in a 1 minute
The clicks table will contain more
then 1,000,000 rows
The players table will contain
10,000 rows
The players table get inserted into every 5
minutes
I would like to know what to expect performance-wise if I run the query 1000 times in 1 minute.
Thanks
That query will never run in milliseconds with any meaningful amounts of data in your tables. It'll run two full table scans, join the two together, aggregate the mess, and fetch the top row from that.
Use a trigger to store the total in the players, and index that field. You'll then be able to avoid the join altogether:
select p.* from players p order by clicks_count desc limit 1
First & foremost, you should worry about your schema if you want decent performance with that number of records and frequent writes; i.e. proper indexes and constraints must be created if not already in place.
Next, the query itself, select the minimum number of fields needed (so if you do not need ALL players field, avoid using "players.*").
Personal pref, I'd restructure tables (e.g. playerID in place of id) and query like so:
SELECT p.*, COUNT(c.id) as clicks_count
FROM players p
JOIN clicks c USING(playerID)
GROUP BY p.playerID
ORDER BY clicks_count desc
LIMIT 1
Again, see if you really need ALL player table fields; if not, omit "p.*" and replace with p.foo, p.bar, etc.