I want to improve my current query. So I have this table called Incomes. Where I have a sourceId varchar field. I have a single SELECT for the fields I need, but I needed to add an extra field called isFirstTime to represent if it was the first time on the row on what that sourceId was used. This is my current query:
SELECT DISTINCT
`income`.*,
CASE WHEN (
SELECT
`income2`.id
FROM
`income` as `income2`
WHERE
`income2`."sourceId" = `income`."sourceId"
ORDER BY
`income2`.created asc
LIMIT 1
) = `income`.id THEN true ELSE false END
as isFirstIncome
FROM
`income` as `income`
WHERE `income`.incomeType IN ('passive', 'active') AND `income`.status = 'paid'
ORDER BY `income`.created desc
LIMIT 50
The query works but slows down if I keep increasing the LIMIT or OFFSET. Any suggestions?
UPDATE 1:
Added WHERE statements used on the original query
UPDATE 2:
MYSQL version 5.7.22
You can achieve it using Ordered Analytical Function.
You can use ROW_NUMBER or RANK to get the desired result.
Below query will give the desired output.
SELECT *,
CASE
WHEN Row_number()
OVER(
PARTITION BY sourceid
ORDER BY created ASC) = 1 THEN true
ELSE false
END AS isFirstIncome
FROM income
WHERE incomeType IN ('passive', 'active') AND status = 'paid'
ORDER BY created desc
DB Fiddle: See the result here
My first thought is that isFirstIncome should be an extra column in the table. It should be populated as the data is inserted.
If you don't like that, let's try to optimize the query...
Let's avoid doing the subquery more than 50 times. This requires turning the query inside-out. (It's like "explode-implode", where the query gathers lots of stuff, then sorts it and throws most of the rows away.)
To summarize:
do the least amount of effort to just identify the 5 rows.
JOIN to whatever tables are needed (including itself if appropriate); this is to get any other columns desired (including isFirstIncome).
SELECT i3.*,
( ... using i3 ... ) as isFirstIncome
FROM (
SELECT i1.id, i1.sourceId
FROM `income` AS i1
WHERE i1.incomeType IN ('passive', 'active')
AND i1.status = 'paid'
ORDER BY i1.created DESC
LIMIT 50
) AS i2
JOIN income AS i3 USING(id)
ORDER BY i2.created DESC -- yes, repeated
(I left out the computation of isFirstIncome; it is discussed in other Answers. But note that it will be executed at most 50 times.)
(The aliases -- i1, i2, i3 -- are numbered in the order they will be "used"; this is to assist in following the SQL.)
To assist in performance, add
INDEX(status, incomeType, created, id, sourceId)
It should help with my formulation, but probably not for the other versions. Your version would benefit from
INDEX(sourceId, created, id)
Related
I have the following mysql query that I think should be faster. The database table has 1 million records and the query table 3.5 seconds
set #numberofdayssinceexpiration = 1;
set #today = DATE(now());
set #start_position = (#pagenumber-1)* #pagesize;
SELECT *
FROM (SELECT ad.id,
title,
description,
startson,
expireson,
ad.appuserid UserId,
user.email UserName,
ExpiredCount.totalcount
FROM advertisement ad
LEFT JOIN (SELECT servicetypeid,
Count(*) AS TotalCount
FROM advertisement
WHERE Datediff(#today,expireson) =
#numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
GROUP BY servicetypeid) AS ExpiredCount
ON ExpiredCount.servicetypeid = ad.servicetypeid
LEFT JOIN aspnetusers user
ON user.id = ad.appuserid
WHERE Datediff(#today,expireson) = #numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
ORDER BY ad.id) AS expiredAds
LIMIT 20 offset 1;
Here's the execution plan:
Here are the indexes defined on the table:
I wonder what I am doing wrong.
Thanks for any help
First, I would like to point out some problems. Then I will get into your Question.
LIMIT 20 OFFSET 1 gives you 20 rows starting with the second row.
The lack of an ORDER BY in the outer query may lead to an unpredictable ordering. In particular, the Limit and Offset can pick whatever they want. New versions will actually throw away the ORDER BY in the subquery.
DATEDIFF, being a function, makes that part of the WHERE not 'sargeable'. That is it can't use an INDEX. The usual way (which is sargeable) to compare dates is (assuming expireson is of datatype DATE):
WHERE expireson >= CURDATE() - INTERVAL 1 DAY
Please qualify each column name. With that, I may be able to advise on optimal indexes.
Please provide SHOW CREATE TABLE so that we can see what column(s) are in each index.
I am looking for the most efficient way to find the next or previous ID of the following query:
SELECT *
FROM transactions
ORDER
BY CASE order_status
WHEN 'order_accepted' THEN 1
WHEN 'processing_order' THEN 2
WHEN 'order_send_mailer' THEN 3
WHEN 'order_send' THEN 4
WHEN 'order_received' THEN 5
WHEN 'order_refunded' THEN 6
ELSE 7 END
, id DESC limit 1;
I tried adding a where id > '$id' or where id < '$id' claus to the query but it didn't give me te next or previous ID I was looking for.
For those that need some explanation of what I am trying to do: It's to go to the next or previous order by case with a forward of backward button.
What it currently looks like:
-id- -order_status-
9399 order_accepted
9398 processing_order
9363 processing_order
9403 order_send_mailer
9318 order_send
9346 order_received
9345 order_received
9050 order_refunded
The next ID for example of 9403 would be 9363 and previous ID would be 9318
Change your order_status into an enum column. This will save disk space and make sorting by order_status simpler and faster.
-- Add a new version of the column using an enum.
-- These strings are aliases for ordered numbers.
-- 'order_accepted' is 1, 'processing_order' is 2, etc.
alter table transactions add column enum_order_status enum(
'order_accepted',
'processing_order',
'order_send_mailer',
'order_send',
'order_received',
'order_refunded'
) not null;
-- Copy the status into the new enum column.
-- MySQL will translate the string into the number for you.
update transactions
set enum_order_status = order_status;
-- Drop the old column.
alter table transactions drop column order_status;
-- Rename the new enum column.
alter table transactions rename column enum_order_status to order_status;
-- Index it.
create index transactions_order_status on transactions(order_status);
-- Enjoy your vastly simplified and much faster query.
select *
from transactions
order by order_status, id desc
That's not actually necessary, but it makes everything much simpler.
With that out of the way, use the window functions lead and lag to refer to the previous and next rows in a query.
select
id, order_status,
lead(id) over w, lead(order_status) over w,
lag(id) over w, lag(order_status) over w
from transactions
window w as (order by order_status, id desc);
Note, window functions were added in MySQL 8. If you're using an older version I recommend upgrading ASAP; MySQL 8 has many big improvements. Otherwise you can simulate it with correlated subqueries and self-joins.
If you want the previous and next rows of a specific row, use the technique from this answer. We add row_numbers to the table in the desired order, and then fetch 9403 and its previous and next row by row number.
-- Add a row number to your table in the desired order.
with ordered_transactions as (
select
*, row_number() over w as rn
from transactions
window w as (order by order_status, id desc)
)
select *
from ordered_transactions
-- Find the row number for ID 9403, then add -1, 0, and 1.
-- If 9403 is row number 5 you'll fetch row numbers 4, 5, and 6.
where ot.rn in (
select rn+i
from ordered_transactions ot
-- All this is doing is making us three "rows" where i = -1, 0, and 1.
cross join (SELECT -1 AS i UNION ALL SELECT 0 UNION ALL SELECT 1) cj
where ot.id = 9403
);
Try it.
This my query with its performance (slow_query_log):
SELECT j.`offer_id`, o.`offer_name`, j.`success_rate`
FROM
(
SELECT
t.`offer_id`,
(
SUM(CASE WHEN `offer_id` = t.`offer_id` AND `sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) / COUNT(*)
) AS `success_rate`
FROM `tblSales` AS t
WHERE DATE(t.`sales_time`) = CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
) AS j
LEFT JOIN `tblOffers` AS o
ON j.`offer_id` = o.`offer_id`
LIMIT 5;
# Time: 180113 18:51:19
# User#Host: root[root] # localhost [127.0.0.1] Id: 71
# Query_time: 10.472599 Lock_time: 0.001000 Rows_sent: 0 Rows_examined: 1156134
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
The query works fine and provides the output I needed. But it appears to be that its a bit slower.
offer_id and sales_status are already indexed in the tblSales. So do you have any suggestion on improving the inner query (where it calculates the success rate) so that performance can be improved? I have been playing with the math for more than 2hrs. But couldn't get a better way.
Btw, tblSales has lots of data. It contains those sales which are SUCCESSFUL, FAILED, PENDING, etc.
Thank you
EDIT
As you requested am including the table design also(only relevant fields are included):
tblSales
`sales_id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
`offer_id` bigint UNSIGNED NOT NULL DEFAULT '0',
`sales_time` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`sales_status` ENUM('WAITING', 'SUCCESS', 'FAILED', 'CANCELLED') NOT NULL DEFAULT 'WAITING',
PRIMARY KEY (`sales_id`),
KEY (`offer_id`),
KEY (`sales_status`)
There are some other fields also in this table, that holds some other info. Amount, user_id, etc. which are not relevant for my question.
Numerous 'problems', none of which involve "math".
JOINs make things difficult. LEFT JOIN says "I don't care whether the row exists in the 'right' table. (I suspect you don't need LEFT??) But it also says "There may be multiple rows in the right table. Based on the column names, I will guess that there is only one offer_name for each offer_id. If this is correct, then here my first recommendation. (This will convince the Optimizer that there is no issue with the JOIN.) Change from
SELECT ..., o.offer_name, ...
LEFT JOIN `tblOffers` AS o ON j.`offer_id` = o.`offer_id`
...
to
SELECT ...,
( SELECT offer_name FROM tbloffers WHERE offer_id j.offer_id
) AS offer_name, ...
It also gets rid of a bug wherein you are assuming that the inner ORDER BY will be preserved for the LIMIT. This used to be the case, but in newer versions of MariaDB / MySQL, it is not. The ORDER BY in a "derived table" (your subquery) is now ignored.
2 down, a few more to go.
"Don't hide an indexed column in a function." I am referring to DATE(t.sales_time) = CURDATE(). Assuming you have no sales_time values for the 'future', then that test can be changed to t.sales_time >= CURDATE(). If you really need to restrict to just today, then do this:
AND sales_time >= CURDATE()
AND sales_time < CURDATE() + INTERVAL 1 DAY
The ORDER BY and the LIMIT should usually be put together. In your case, you may as well add the LIMIT to the "derived table", thereby leading to only 5 rows for the outer query to work with. But... There is still the question of getting them sorted correctly. So change from
SELECT ...
FROM ( SELECT ...
ORDER BY ... )
LIMIT ...
to
SELECT ...
FROM ( SELECT ...
ORDER BY ...
LIMIT 5 ) -- trim sooner
ORDER BY ... -- deal with the loss of ordering from derived table
Rolling it all together, I have
SELECT j.`offer_id`,
( SELECT offer_name
FROM tbloffers
WHERE offer_id = j.offer_id
) AS offer_name,
j.`success_rate`
FROM
( SELECT t.`offer_id`,
AVG(t.sales_status = 'SUCCESS') AS `success_rate`
FROM `tblSales` AS t
WHERE t.sales_time >= CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 5
) AS j
ORDER BY `success_rate` DESC;
(I took the liberty of shortening the SUM(...) in two ways.)
Now for the indexes...
tblSales needs at least (sales_time), but let's go for a "covering" (with sales_time specifically first):
INDEX(sales_time, sales_status, order_id)
If tbloffers has PRIMARY KEY(offer_id), then no further index is worth adding. Else, add this covering index (in this order):
INDEX(offer_id, offer_name)
(Apologies to other Answerers; I stole some of your ideas.)
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
Approach this with a simple JOIN and GROUP BY:
SELECT s.offer_id, o.offer_name,
AVG(s.sales_status = 'SUCCESS') as success_rate
FROM tblSales s JOIN
tblOffers o
ON o.offer_id = s.offer_id
WHERE s.sales_time >= CURDATE() AND
s.sales_time < CURDATE() + INTERVAL 1 DAY
GROUP BY s.offer_id, o.offer_name
ORDER BY success_rate DESC;
Notes:
The use of date arithmetic allows the query to make use of an index on tblSales(sales_time) -- or better yet tblSales(salesTime, offer_id, sales_status).
The arithmetic for success_rate has been simplified -- although this has minimal impact on performance.
I added offer_name to the GROUP BY. If you are learning SQL, you should always have all the unaggregated keys in the GROUP BY clause.
A LEFT JOIN is only needed if you have offers in tblSales which are not in tblOffers. I am guessing you have proper foreign key relationships defined, and this is not the case.
Based on not much information that you have provided (i mean table schema) you could try the following.
SELECT `o`.`offer_id`, `o`.`offer_name`, SUM(CASE WHEN `t`.`sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) AS `success_rate`
FROM `tblOffers` `o`
INNER JOIN `tblSales` `t`
ON `o`.`offer_id` = `t`.`offer_id`
WHERE DATE(`t`.`sales_time`) = CURDATE()
GROUP BY `o`.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 0,5;
You can find a sample of this query in this SQL Fiddle example
Without knowing your schema, the lowest hanging fruit I see is this part....
WHERE DATE(t.`sales_time`) = CURDATE()
Try changing that to something that looks like
Where t.sales_time >= #12-midnight-of-current-date and t.sales_time <= #23:59:59-of-current-date
Here is my DB data's sample:
both two columns are int
tuanId ,tuanSort
'375579', '55'
'370576', '54'
'366222', '54'
...
'346268', '52'
'369608', '52'
'370587', '52'
'370775', '52'
...
'370225', '52'
'370588', '52'
'360758', '52'
'366390', '51'
and I try these sqls bellow:
SELECT * FROM `tuan`.`TuanItem` WHERE ... ORDER BY `tuanSort` DESC LIMIT 140,20;
SELECT * FROM `tuan`.`TuanItem` WHERE ... ORDER BY `tuanSort` DESC LIMIT 160,20;
and I get these wrong data, I want to make a pagination, but the second page has some same data in the first page:
For Example, the first pic's 17th row has shown twice in the 2 pics
So, is the sort value the same can cause such a problem? Or MySQL has problem with such a select?
Given that tuanSort is not unique, the behavior is within the specification.
You are observing that one query returns a particular row as 157th row. In another query execution, it's returned as the 161st row.
To get more deterministic sequence, specify additional columns in the ORDER BY clause, e.g.
ORDER BY tuanSort DESC, tuanId DESC
If the intent behind this sequence of statements is "paging", there are more efficient approaches, such as saving a unique, sequenced identifier from the "last" row of the previous page.
If tuanSort,tuanId tuple is unique...
WHERE tuanSort <= :last_tuanSort
AND ( tuanSort < :last_tuanSort OR tuanId < :last_tuanId )
AND ...
ORDER BY tuanSort DESC, tuanId DESC
LIMIT 20
If you fetch all twenty rows, save tuanSort and tuanId from that last row. On the next "page", supply those saved values in the query predicates.
But that's an answer to a question you didn't ask.
I think you need to refresh your database table.
Try this : Go ->TuanItem->Click Operations->Alter table order by-> select your tuanid and set as an ascending order.
I think your problem is, your getting same data in 2 query.
then change the limit 160,20 to
SELECT * FROM `tuan`.`TuanItem` WHERE ... ORDER BY `tuanSort` DESC LIMIT 140,0;
SELECT * FROM `tuan`.`TuanItem` WHERE ... ORDER BY `tuanSort` DESC LIMIT 160,141;
the 0 is the offset which means the data start at 0 and the second query start at the 141 so that you cannot get same value twice
how could we remove or not displaying duplicate row with some conditional clause in sqlserver query, the case look like this one,
code decs
-------------------------
G-006 New
G-006 Re-Registration
how can we display just G-006 with Re-Registration Desc, i have tried with this query but no luck either
with x as (
select new_registration_no,category,rn = row_number()
over(PARTITION BY new_registration_no order by new_registration_no)
from equipment_registrations
)
select * from x
By using the same field in the PARTITION BY and ORDER BY clause, the rn field will always equal 1.
Assuming that new_registration_no = code and category = decs, you could change the ORDER BY field to be ORDER BY category DESC to get that result. However, that's a pretty arbitrary ORDER BY - you're just basing it on a random text value. I'm also not 100% sure how well the ROW_NUMBER() function works in a CTE.
A better solution might be something like:
SELECT *
FROM
(
SELECT
New_Registration_No,
Category,
ROW_NUMBER() OVER
(
PARTITION BY New_Registration_No
ORDER BY
CASE
WHEN Category = 'Re-Registration' THEN 1
WHEN Category = 'New' THEN 2
ELSE 3
END ASC ) rn
FROM Equipment_Registrations
) s
WHERE rn = 1
You can set the order in the CASE statement to be whatever you want - I'm afraid that without more information, that's the best solution I can offer you. If you have a known list of values that might appear in that field, it should be easy; if not, it will be a little harder to configure, but that will be based on business rules that you did not include in your original post.