Eliminate First 14 For Each Symbol From Query - mysql

The following query pulls all rows that do not exist in a relative_strength_index table. But I also need to eliminate the first 14 rows for each symbol based on date asc from the historical_data table. I have tried several attempts to do this but am having real trouble with the 14 days. How could this issue be resolved and added into my current query?
Current Query
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date);

What you want is the first argument of the limit clause. Which states which row to start from accompanied by order by asc.
select * from historical_data hd where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date ORDER BY rsi_date ASC LIMIT 14)

use OFFSET along with LIMIT like this this will return maximum of 100,000 rows starting at row 15
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by date asc
limit 100000 offset 14;
but because you're using limit and offset, you might want to ORDER BY by some order before specifying limit and offset.
UPDATE you mentioned for each symbol, so try this query, it ranks each symbol based on date asc, then only selects rows where rank >= 15
SELECT *
FROM
(select hd.*,
CASE WHEN #previous_symbol = hd.symbol THEN #rank:=#rank+1
ELSE #rank := 1
END as rank,
#previous_symbol := hd.symbol
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by hd.symbol, hd.date asc
)T
WHERE T.rank >= 15

It's not clear (to me) what resultset you want to return, or the conditions that specify whether a row should be returned.
All we have to go on is a confusingly vague description, to exclude "the first 14 rows", or "the first 14 days" for each symbol.
What we don't have is a represetative sample of the data, or an example of what rows should be returned.
Without that, we don't have a way to know if we understand the description of the specification, and we don't have anything to test against or to compare our results to.
So, we are basically just guessing. (Which seems to be the most popular kind of answer provided by the "try this" enthusiatss.)
I can provide some examples of some patterns, which may suit your specification, or may not.
To get the earliest `histdate` for each `symbol`, and add 14 days to that, we can use an inline view. We can then do a semi-join to the `historical_data` data, to exclude rows that have a `histdate` before the date returned from the inline view.
(This is based on an assumption that the datatype of the `histdate` column is DATE.)
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
ORDER
BY hd.symbol
, hd.histdate
But that query doesn't include any reference to the `relative_strength_index` table. The original query includes a NOT EXISTS predicate, with a correlated subquery of the `relative_strength_index` table.
If the goal is get the earliest `rsi_date` for each `rsi_symbol` from that table, and then add 14 days to that value...
SELECT hd.*
FROM ( SELECT rsi.rsi_symbol
, MIN(rsi.rsi_date) + INTERVAL 14 DAY AS rsi_date
FROM relative_strength_index rsi
GROUP BY rsi.rsi_symbol
) rs
JOIN historical_data hd
ON hd.symbol = rs.rsi_symbol
ON hd.histdate > rs.rsi_date
ORDER
BY hd.symbol
, hd.histdate
If the goal is to exclude rows where a matching row in relative_strength_index already exists, I would use an anti-join pattern...
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
LEFT
JOIN relative_strength_index xr
ON xr.rsi_symbol = hd.symbol
AND xr.rsi_date = hd.histdate
WHERE xr.rsi_symbol IS NULL
ORDER
BY hd.symbol
, hd.histdate
These are just example query patterns, which are likely not suited to your exact specification, since they are guesses.
It doesn't make much sense to provide more examples of other patterns, without a more detailed specification.

Related

Finding intersection of two select query

I need to find intersection between the following queries in MYSQL
SELECT *
FROM project.backup_table
where project.backup_table.date <= (SELECT date FROM project.main_inout_table ORDER BY date desc LIMIT 1)
and project.backup_table.date >= (SELECT date FROM project.main_inout_table ORDER BY date asc LIMIT 1)
SELECT *
FROM project.backup_table
WHERE concat(empid,date) not IN (SELECT concat(empid,date) FROM project.main_inout_table
The tables are:
maintable
backuptable
My atttempt:
SELECT * FROM project.backup_table
where project.backup_table.date <= (SELECT date FROM project.main_inout_table
ORDER BY date desc LIMIT 1) and project.backup_table.date >= (SELECT date FROM project.main_inout_table
ORDER BY date asc LIMIT 1) and exists (SELECT * FROM project.backup_table
WHERE concat(empid,date) not IN (SELECT concat(empid,date)
FROM project.main_inout_table));
Problem: the details of tid 4 is present shouldn't it be filter out by second select query ?
The intersection would be the rows that meet both conditions. So, just bring the conditions together:
SELECT bt.*
FROM project.backup_table bt
WHERE bt.date <= (SELECT MAX(date) FROM project.main_inout_table mit) AND
bt.date >= (SELECT MIN(date) FROM project.main_inout_table mit) AND
NOT EXISTS (SELECT 1
FROM project.main_inout_table mit
WHERE mit.empid = bt.empid AND mit.date = bt.date
);
Note the following changes:
The tables are given aliases, which are abbreviations for the table names.
The columns are all qualified with the table aliases.
The first two subqueries simply use MIN() and MAX(). These could be combined into one subquery or join, but this follows your original formulation.
The last subquery uses EXISTS rather than CONCAT(). Actually, this could also use IN with tuples (something that MySQL supports, but not all databases).

SQL query with a major NOT IN not working

Does anyone know what's wrong with this query?
This works perfectly on its own:
SELECT * FROM
(SELECT * FROM data WHERE site = '".$id."'
AND disabled = '0'
AND carvotes NOT LIKE '0'
AND (time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car ORDER BY carvotes DESC LIMIT 0 , 10)
X order by time DESC
So does this:
SELECT * FROM data WHERE site = '".$id."' AND disabled = '0' GROUP BY car DESC ORDER BY time desc LIMIT 0 , 30
But combining them like this:
SELECT * FROM data WHERE site = '".$id."' AND disabled = '0' AND car NOT IN (SELECT * FROM
(SELECT * FROM data WHERE site = '".$id."'
AND disabled = '0'
AND carvotes NOT LIKE '0'
AND (time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car ORDER BY carvotes DESC LIMIT 0 , 10)
X order by time DESC) GROUP BY car DESC ORDER BY time desc LIMIT 0 , 30
Gives errors. Any ideas?
Please try the following...
$result = mysqli_query( $con,
"SELECT *
FROM data
WHERE site = '" . $id .
"' AND disabled = '0'
AND car NOT IN ( SELECT car
FROM ( SELECT car,
carvotes
FROM data
WHERE site = '" . $id .
"' AND disabled = '0'
AND carvotes NOT LIKE '0'
AND ( time > ( NOW( ) - INTERVAL 14 DAY ) )
GROUP BY car
ORDER BY carvotes DESC
LIMIT 10 ) X
)
GROUP BY car
ORDER BY time DESC
LIMIT 30" );
The main cause of your problem is that with car NOT IN ( SELECT * FROM ( SELECT *... you are trying to compare each record's value of car with each row returned by your subquery. IN requires you to have the same number of fields on both sides of the comparison. By using SELECT * at both levels of the subquery you were ensuring that the right side of the comparison had however many fields are in data versus your single field on the left, which confused MySQL.
Since you are aiming to compare to a single field, namely car, our subquery has to select just the car field from its dataset. Since the sort order of the subquery's results has no effect upon the IN comparison, and since our innermost query will be returning just car, I have removed the outer level of the subquery.
Beyond changing the first part of the subquery to SELECT car, the only other change that I have made to the subquery is to change LIMIT 0, 10 to LIMIT 10. The former means limit to the the 10 records that are offset by 0 from the first record. This is useful if you want records 6 to 15, but redundant for 1 to 10 as LIMIT 10 has the same affect and is slightly simpler. Ditto for LIMIT 0, 30 at the end of your overall statement.
As for the main body of the statement, I have not made any attempt to specify what fields (or aggregate functions of those fields) should be returned since you have made no statement indicating what your requirements / preferences are. If you are satisfied that GROUP BY has left you with a still valid set of values, then all the good, but if not then I recommend that you rewrite your Question to be specific about that detail.
By default, MySQL sorts the data subjected to a GROUP BY into ascending order, but if an ORDER BY clause is also present then it overrides the GROUP BY's sort pattern. As such, there is no benefit to specifying DESC after either of your GROUP BY car clauses, so I have removed it where it occurs.
Interesting Sidenote : You can override a GROUP BY's sort by specifying ORDER BY NULL.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html - on optimising your ORDER BY sorting
https://dev.mysql.com/doc/refman/5.7/en/select.html - on the SELECT statement's syntax - specifically the parts to do with LIMIT.
https://www.w3schools.com/php/php_mysql_select_limit.asp - a simpler explanation of LIMIT
This is your query:
SELECT *
FROM data
WHERE site = '".$id."' AND disabled = '0' AND
car NOT IN (SELECT *
FROM (SELECT *
FROM data
WHERE site = '".$id."' AND
disabled = '0' AND
carvotes NOT LIKE '0' AND
(time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car
ORDER BY carvotes DESC
LIMIT 0 , 10
) x
ORDER BY time DESC
)
GROUP BY car DESC
ORDER BY time desc
LIMIT 0 , 30 ;
Several comments:
Do not wrap integer constants in single quotes. This can mislead people. This can mislead optimizers.
Do not use string functions on integers (such as like). Same reason.
NOT IN with subqueries is dangerous. The construct does not handle NULL values the way you expect. Use NOT EXISTS or LEFT JOIN instead.
When using subqueries, ORDER BY is almost never appropriate.
Never use SELECT * with GROUP BY. It is just wrong. Happily, MySQL 5.7 has changed its defaults to reject this anti-pattern
So, a better way to write this query is something like this:
SELECT d.car, MAX(time) as time
FROM data d LEFT JOIN
(SELECT d2.*
FROM data d2
WHERE d2.site = '".$id."' AND
d2.disabled = 0 AND
d2.carvotes NOT LIKE 0 AND
(d2.time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY d2.car
ORDER BY carvotes DESC
LIMIT 0 , 10
) car10
ON d.car = car10.car
WHERE d.site = '".$id."' AND d.disabled = 0' AND
car10.car IS NOT NULL
GROUP BY car DESC
ORDER BY MAX(time) desc
LIMIT 0 , 30 ;
Alternatively, use SELECT * and remove the GROUP BY in the outer query.

SQL select aggregate values in columns

I have a table in this structure:
editor_id
rev_user
rev_year
rev_month
rev_page
edit_count
here is the sqlFiddle: http://sqlfiddle.com/#!2/8cbb1/1
I need to surface the 5 most active editors during March 2011 for example - i.e. for each rev_user - sum all of the edit_count for each rev_month and rev_year to all of the rev_pages.
Any suggestions how to do it?
UPDATE -
updated fiddle with demo data
You should be able to do it like this:
Select the total using SUM and GROUP BY, filtering by rev_year and rev_month
Order by the SUM in descending order
Limit the results to the top five items
Here is how:
SELECT * FROM (
SELECT rev_user, SUM(edit_count) AS total_edits
FROM edit_count_user_date
rev_year='2006' AND rev_month='09'
GROUP BY rev_user
) x
ORDER BY total_edits DESC
LIMIT 5
Demo on sqlfiddle.
Surely this is as straightforward as :
SELECT rev_user, SUM(edit_count) as TotalEdits
FROM edit_count_user_date
WHERE rev_month = 'March' and rev_year = '2014'
GROUP BY rev_user
ORDER BY TotalEdits DESC
LIMIT 5;
SqlFiddle here
May I also suggest using a more appropriate DATE type for the year and month storage?
Edit, re new Info
The below will return all edits for the given month for the 'highest' MonthTotal editor, and then re-group the totals by the rev_page.
SELECT e.rev_user, e.rev_page, SUM(e.edit_count) as TotalEdits
FROM edit_count_user_date e
INNER JOIN
(
SELECT rev_user, rev_year, rev_month, SUM(edit_count) AS MonthTotal
FROM edit_count_user_date
WHERE rev_month = '09' and rev_year = '2010'
GROUP BY rev_user, rev_year, rev_month
ORDER BY MonthTotal DESC
LIMIT 1
) as x
ON e.rev_user = x.rev_user AND e.rev_month = x.rev_month AND e.rev_year = x.rev_year
GROUP BY e.rev_user, e.rev_page;
SqlFiddle here - I've adjusted the data to make it more interesting.
However, if you need to do this across several months at a time, it will be more difficult given MySql's lack of partition by / analytical windowing functions.

Catching latest column value change in SQL

How can I get the date for the latest value change in one column with one SQL query?
Possible database situation:
Date State
2012-11-25 state one
2012-11-26 state one
2012-11-27 state two
2012-11-28 state two
2012-11-29 state one
2012-11-30 state one
So result should return 2012-11-29 as latest change state. If I group by State value, I will get the date for first time I have that state in database.
The query will group the table on state and show the state and in the date field the latest date created of that state.
From the given input the output would be
Date State
2012-11-30 state one
2012-11-28 state two
This will get you the last state:
-- Query 1
SELECT state
FROM tableX
ORDER BY date DESC
LIMIT 1 ;
Encapsulating the above, we can use it to get the date just before the last change:
-- Query 2
SELECT t.date
FROM tableX AS t
JOIN
( SELECT state
FROM tableX
ORDER BY date DESC
LIMIT 1
) AS last
ON last.state <> t.state
ORDER BY t.date DESC
LIMIT 1 ;
And then use that to find the date (or the whole row) where the last change occurred:
-- Query 3
SELECT a.date -- can also be used: a.*
FROM tableX AS a
JOIN
( SELECT t.date
FROM tableX AS t
JOIN
( SELECT state
FROM tableX
ORDER BY date DESC
LIMIT 1
) AS last
ON last.state <> t.state
ORDER BY t.date DESC
LIMIT 1
) AS b
ON a.date > b.date
ORDER BY a.date
LIMIT 1 ;
Tested in SQL-Fiddle
And a solution that uses MySQL variables:
-- Query 4
SELECT date
FROM
( SELECT t.date
, #r := (#s <> state) AS result
, #s := state AS prev_state
FROM tableX AS t
CROSS JOIN
( SELECT #r := 0, #s := ''
) AS dummy
ORDER BY t.date ASC
) AS tmp
WHERE result = 1
ORDER BY date DESC
LIMIT 1 ;
I believe this is the answer:
SELECT
DISTINCT State AS State, `Date`
FROM
Table_1 t1
WHERE t1.`Date`=(SELECT MAX(`Date`) FROM Table_1 WHERE State=t1.State)
...and the test:
http://sqlfiddle.com/#!2/8b0d8/5
If you add another column 'changed datetime' you can fill this using an update trigger that inserts NOW(). If you query your table ordering on the changed column, it will endup first.
CREATE TRIGGER `trigger` BEFORE UPDATE ON `table`
FOR EACH ROW
BEGIN
SET ROW.changed = NOW();
END$$
Try this ::
Select
MAX(`Date`), state from mytable
group by state
If you had been using postgres, you could compare different rows in the same table using "LEAD .. OVER" I have not managed to find the same functionallity in mysql.
A bit hairy, but I think this will do:
select min(t1.date) from table_1 t1 where
(select count(distinct state) from table_1 where table_1.date>=t1.date)=1
Basically, this asks for the first time no changes in state is found for any later values. Be warned, it may be this query scales terribly for large data sets....
I think your best choice here are analytical functions. Try this - it should be OK performance-wise:
SELECT *
FROM test
WHERE my_date = (SELECT MAX (my_date)
FROM (SELECT MY_DATE
FROM ( SELECT MY_DATE,
STATE,
LAG (state) OVER (ORDER BY MY_DATE)
lag_val
FROM test
ORDER BY MY_DATE) a
WHERE state != lag_val))
In the inner select, the LAG function gets the previous value in the STATE column and in the outer select I mark the date of a change - those with lag value different than the current state value. And outside, I'm getting the latest date from those dates of a change... I hope that this is what you needed.
SELECT MAX(DATE) FROM YOUR_TABLE
Above answer doesn't seem to satisfy what OP needs.
UPDATED ANSWER WITH AFTER INSERT/UPDATE TRIGGER
DELCARE #latestState varchar;
DELCARE #latestDate date;
CREATE TRIGGER latestInsertTrigger AFTER INSERT ON myTable
FOR EACH ROW
BEGIN
IF OLD.DATE <> NEW.date THEN
SET #latestState = NEW.state
SET #latestDate = NEW.date
END IF
END
;
CREATE TRIGGER latestUpdateTrigger AFTER UPDATE ON myTable
FOR EACH ROW
BEGIN
IF OLD.DATE = NEW.date AND OLD.STATE <> NEW.STATE THEN
SET #latestState = NEW.state
SET #latestDate = NEW.date
END IF
END
;
You may use the following query to get the latest record added/updated:
SELECT DATE, STATE FROM myTable
WHERE STATE = #latestState
OR DATE = #latestDate
ORDER BY DATE DESC
;
Results:
DATE STATE
November, 30 2012 00:00:00+0000 state one
November, 28 2012 00:00:00+0000 state two
November, 27 2012 00:00:00+0000 state two
The above query results needs to be limitted to 2, 3 or n based on what you need.
Frankly it seems like you want to get max from both columns based on the data sample you have given. Assuming that your state only increases with the date. Only I wish if the state was an integer :D
Then union of two max sub queries on both columns would have solved it easily. Still a string manipulation regex can find what's the max in state column. Finally this approach needs limit x. However it still has lope hole. Anyway it took me sometime to figure your need out :$

Checking for maximum length of consecutive days which satisfy specific condition

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID