I need to get data between Decemember 2012 to November 2014.
Each month I only need 1500 rows.
For example:
SELECT * FROM data WHERE YEAR(submit_date) = 2012 AND MONTH(submit_date) = 12 limit 1500;
SELECT * FROM data WHERE YEAR(submit_date) = 2013 AND MONTH(submit_date) = 1 limit 1500;
SELECT * FROM data WHERE YEAR(submit_date) = 2013 AND MONTH(submit_date) = 2 limit 1500;
SELECT * FROM data WHERE YEAR(submit_date) = 2013 AND MONTH(submit_date) = 3 limit 1500;
and until Nov 2014
Is there a way to write SQL query smaller?
There are some options list here: http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
IMHO one of the best is using a row-counter:
set #num := 0, #type := '';
select id, name, submit_date,
#num := if(#type = CONCAT(YEAR(submit_date), MONTH(submit_date)), #num + 1, 1) as row_number,
#type := CONCAT(YEAR(submit_date), MONTH(submit_date)) as dummy
from data force index(IX_submit_date)
group by id, name, submit_date
having row_number <= 2;
You can test it here: http://sqlfiddle.com/#!2/e829c/13 (I do a cut for 2 elements, not for 1500)
I think you're looking for a GROUP BY clause. I would need to know a bit more to give you a definitive answer. But the following pseduo-query might guide you in the right direction.
SELECT *, SUM(some_field)
FROM data
GROUP BY MONTH(submit_date)
Or if you only need 1500 rows, select the top 1500 ordered by the date
SELECT TOP(1500) *
FROM data
WHERE submit_date > '12-01-2012' AND submit_date < '11-01-2014'
ORDER BY MONTH(submit_date)
With MySQL you can use LIMIT
SELECT *
FROM data
WHERE submit_date > '12-01-2012' AND submit_date < '11-01-2014'
ORDER BY MONTH(submit_date)
LIMIT 0,1500;
You can do it almost like you have it, just add a UNION between your queries. But you still have to create 1 query per month.
Otherwise you need to enumerate the rows that are returned. You need to first order and enumerate your records, then you can do a select on that select to get only the top X. Not sure if you want to include the last month or not.
SET #prev_date='';
SELECT * FROM (
SELECT IF(#prev_date=submit_date, #incr := #incr+1, #incr:=1) AS row_num,
data.*,
(#prev_date := submit_date) AS set_prev_date
FROM data WHERE submit_date BETWEEN "2012-12-01" AND "2014-11-30"
ORDER BY submit_date
) tmp WHERE row_num<1500;
Related
I have a table like this:
01-Jul-17 100
02-Jul-17 100
03-Jul-17 300
04-Jul-17 300
05-Jul-17 500
06-Jul-17 500
07-Jul-17 300
08-Jul-17 400
09-Jul-17 100
10-Jul-17 100
What I want to output is (in this order) by eliminating the continuous duplicates but not all duplicates:
100
300
500
300
400
100
I cannot select Distinct, as it will eliminate the second instances of 300, 100. Is there a way to achieve this result in MySQL?
Thanks!
You want to get the previous value. If the dates really have no gaps or duplicates, just do:
select t.*
from t left join
t tprev
on t.col1 = date_add(tprev.col1, interval 1 day)
where tprev.col2 is null or tprev.col2 <> t.col2;
EDIT:
If the dates don't meet these conditions, then you can use variables:
select t.*
from (select t.*,
(#rn := if(#v = col2, #rn + 1,
if(#v := col2, 1, 1)
)
) as rn
from t cross join
(select #v := 0, #rn := 0) params
order by t.col1
) t
where rn = 1;
Note that MySQL does not guarantee the order of evaluation of expressions in the SELECT. So variables should not be assigned in one expression and then used in another -- they should be assigned in a single expression.
One way to handle this problem is by using session variables to track the changes of the values as ordered by your date column. In the query below, we keep track of the value, ordered by date, and assign a row number to each group of identical value. Then, only the first value in each group is retained. Note that this approach is robust to any number of duplicates. It is also robust with respect to there being gaps in your dates, so long as each record can be ordered by date.
SET #rn = 1;
SET #val = NULL;
SELECT t.val
FROM
(
SELECT
#rn:=CASE WHEN #val = val THEN #rn+1 ELSE 1 END rn,
#val:=val AS val,
dt
FROM yourTable
ORDER BY dt
) t
WHERE t.rn = 1
ORDER BY t.dt;
Output:
Demo here:
Rextester
You can make use of lag and lead functions.
select y from (select y , lag(y,1,0) over (order by x) as prev_y from t1) where y <> prev_y;
Is it possible to get specific row in query using like SUM?
Example:
id tickets
1 10 1-10 10=10
2 35 11-45 10+35=45
3 45 46-90 10+35+45=90
4 110 91-200 10+35+45+110=200
Total: 200 tickets(In SUM), I need to get row ID who have ticket with number like 23(Output would be ID: 2, because ID: 2 contains 11-45tickets in SUM)
You can do it by defining a local variable into your select query (in form clause), e.g.:
select id, #total := #total + tickets as seats
from test, (select #total := 0) t
Here is the SQL Fiddle.
You seem to want the row where "23" fits in. I think this does the trick:
select t.*
from (select t.*, (#total := #total + tickets) as running_total
from t cross join
(select #total := 0) params
order by id
) t
where 23 > running_total - tickets and 23 <= running_total;
SELECT
d.id
,d.tickets
,CONCAT(
TRIM(CAST(d.RunningTotal - d.tickets + 1 AS CHAR(10)))
,'-'
,TRIM(CAST(d.RunningTotal AS CHAR(10)))
) as TicketRange
,d.RunningTotal
FROM
(
SELECT
id
,tickets
,#total := #total + tickets as RunningTotal
FROM
test
CROSS JOIN (select #total := 0) var
ORDER BY
id
) d
This is similar to Darshan's answer but there are a few key differences:
You shouldn't use implicit join syntax, explicit join has more functionality in the long run and has been a standard for more than 20 years
ORDER BY will make a huge difference on your running total when calculated with a variable! if you change the order it will calculate differently so you need to consider how you want to do the running total, by date? by id? by??? and make sure you put it in the query.
finally I actually calculated the range as well.
And here is how you can do it without using variables:
SELECT
d.id
,d.tickets
,CONCAT(
TRIM(d.LowRange)
,'-'
,TRIM(
CAST(RunningTotal AS CHAR(10))
)
) as TicketRange
,d.RunningTotal
FROM
(
SELECT
t.id
,t.tickets
,CAST(COALESCE(SUM(t2.tickets),0) + 1 AS CHAR(10)) as LowRange
,t.tickets + COALESCE(SUM(t2.tickets),0) as RunningTotal
FROM
test t
LEFT JOIN test t2
ON t.id > t2. id
GROUP BY
t.id
,t.tickets
) d
Couldn't really explain my problem with words, but with an example I can show it clearly:
I have a table like this:
id num val
0 3 10
1 5 12
2 7 12
3 11 15
And I want to go through all the rows, and calculate the increase of the "num", and multiply that difference with the "val" value. And when I calculated all of these, I want to add these results together.
This is the mathematical equation, that I want to run on the table:
Result = (3-0)*10 + (5-3)*12 + (7-5)*12 + (11-7)*15
138 = Result
Thank you.
You can do with mysql variables, but you will still get one record for each entry.
select
#lastTotal := #lastTotal + ( (yt.num - #lastNum) * yt.val ) thisLineTotal,
#lastNum := yt.num as saveForNextRow,
yt.id
from
yourTable yt,
( select #lastTotal := 0,
#lastNum := 0 ) sqlvars
order by
id
This SHOULD give you what you want to confirm the calculations to each record basis.
Now, to get the one record and one column result, you can wrap it such as
select
pq.thisLineTotal
from
(above entire query ) as pq
order by
pq.id DESC
limit 1
Assuming the IDs are consecutive as your sample data suggests, just join the table to itself:
select sum((t1.num-ifnull(t2.num,0))*t1.val) YourValue
from YourTable t1
left join YourTable t2
on t2.id = t1.id - 1;
http://www.sqlfiddle.com/#!2/40b9f/12
This will give you the total. Make sure to order in the order you wish - I have ordered by id
SET #runtot:=0;
SET #prevval:=0;
select max(rt) as total FROM (
SELECT
q.val,
q.num,
(#runtot := #runtot + (q.num- #prevval) * q.val) AS rt,
(#prevval := q.num) AS pv
FROM thetable q
ORDER by ID) tot
If you want to see the details of the calculation, leave out the outer select as so:
SET #runtot:=0;
SET #prevval:=0;
SELECT
q.val,
q.num,
(#runtot := #runtot + (q.num- #prevval) * q.val) AS rt,
(#prevval := q.num) AS pv
FROM thetable q
ORDER by ID
If it is possible to have negative numbers for your column values, using max(rt) won't work for the total. You should then use:
SET #runtot:=0;
SET #prevval:=0;
select #runtot as total FROM (
SELECT
q.val,
q.num,
(#runtot := #runtot + (q.num- #prevval) * q.val) AS rt,
(#prevval := q.num) AS pv
FROM thetable q
ORDER by ID) tot LIMIT 1
This type of question is asked every now and then. The queries provided works, but it affects performance.
I have tried the JOIN method:
SELECT *
FROM nbk_tabl
INNER JOIN (
SELECT ITEM_NO, MAX(REF_DATE) as LDATE
FROM nbk_tabl
GROUP BY ITEM_NO) nbk2
ON nbk_tabl.REF_DATE = nbk2.LDATE
AND nbk_tabl.ITEM_NO = nbk2.ITEM_NO
And the tuple one (way slower):
SELECT *
FROM nbk_tabl
WHERE REF_DATE IN (
SELECT MAX(REF_DATE)
FROM nbk_tabl
GROUP BY ITEM_NO
)
Is there any other performance friendly way of doing this?
EDIT: To be clear, I'm applying this to a table with thousands of rows.
Yes, there is a faster way.
select *
from nbk_table
order by ref_date desc
limit <n>
Where is the number of rows that you want to return.
Hold on. I see you are trying to do this for a particular item. You might try this:
select *
from nbk_table n
where ref_date = (select max(ref_date) from nbk_table n2 where n.item_no = n2.item_no)
It might optimize better than the "in" version.
Also in MySQL you can use user variables (Suppose nbk_tabl.Item_no<>0):
select *
from (
select nbk_tabl.*,
#i := if(#ITEM_NO = ITEM_NO, #i + 1, 1) as row_num,
#ITEM_NO := ITEM_NO as t_itemNo
from nbk_tabl,(select #i := 0, #ITEM_NO := 0) t
order by Item_no, REF_DATE DESC
) as x where x.row_num = 1;
I wanted to get the latest 4 dates for each symbolid. I adapted the code here as follows:
set #num := 0, #symbolid := '';
select symbolid, date,
#num := if(#symbolid = symbolid, #num + 1, 1) as row_number,
#symbolid := symbolid as dummy
from projections
group by symbolid, date desc
having row_number < 5
and get the following results:
symbolid date row_number dummy
1 '2011-09-01 00:00:00' 1 1
1 '2011-08-31 00:00:00' 3 1
1 '2011-08-30 00:00:00' 5 1
2 '2011-09-01 00:00:00' 1 2
2 '2011-08-31 00:00:00' 3 2
2 '2011-08-30 00:00:00' 5 2
3 '2011-09-01 00:00:00' 1 3
3 '2011-08-31 00:00:00' 3 3
3 '2011-08-30 00:00:00' 5 3
4 '2011-09-01 00:00:00' 1 4
...
The obvious question is, why did I only get 3 rows per symbolid, and why are they numbered 1,3,5? A few details:
I tried both forcing an index and not (as seen here), and got the same results both ways.
The dates are correct, i.e., the listing correctly shows the top 3 dates per symbolid, but the row_number value is off
When I don't use the "having" statement, the row numbers are correct, i.e., the most recent date is 1, the next most recent is 2, etc
Obviously the row_number computed field is being affected by the "having" clause, but I don't know how to fix it.
I realize that I could just change the "having" to "having row_number < 7" (6 gives the same as 5), but it's very ugly and would like to know what to do to make it "behave".
I'm not 100% sure why it behaves this way (maybe it's because logically SELECT is processed prior to ORDER BY), but it should work as expected:
SELECT *
FROM
(
select symbolid, date,
#num := if(#symbolid = symbolid, #num + 1, 1) as row_number,
#symbolid := symbolid as dummy
from projections
INNER JOIN (SELECT #symbolid:=0)c
INNER JOIN (SELECT #num:=0)d
group by symbolid, date desc
) a
WHERE row_number < 5
The user defined variables does not work well, (refer here)
As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement; in addition, this order is not guaranteed to be the same between releases of the MySQL Server. In SELECT #a, #a:=#a+1, ..., you might think that MySQL will evaluate #a first and then do an assignment second. However, changing the statement (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may cause MySQL to select an execution plan with a different order of evaluation.
Here is my proposal
select symbolid,
substring_index(group_concat(date order by date desc), ',', 4) as last_4_dates
from projections
group by symbolid
The drawback of this approach is it will group collapse the date,
and you need to explode before you can actually use it.
Final code:
set #num := 0, #symbolid := '';
select d.* from
(
select symbolid, date,
#num := if(#symbolid = symbolid, #num + 1, 1) as row_number,
#symbolid := symbolid as dummy
from projections
order by symbolid, date desc
) d
where d.row_number < 5