SQL query with a major NOT IN not working - mysql

Does anyone know what's wrong with this query?
This works perfectly on its own:
SELECT * FROM
(SELECT * FROM data WHERE site = '".$id."'
AND disabled = '0'
AND carvotes NOT LIKE '0'
AND (time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car ORDER BY carvotes DESC LIMIT 0 , 10)
X order by time DESC
So does this:
SELECT * FROM data WHERE site = '".$id."' AND disabled = '0' GROUP BY car DESC ORDER BY time desc LIMIT 0 , 30
But combining them like this:
SELECT * FROM data WHERE site = '".$id."' AND disabled = '0' AND car NOT IN (SELECT * FROM
(SELECT * FROM data WHERE site = '".$id."'
AND disabled = '0'
AND carvotes NOT LIKE '0'
AND (time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car ORDER BY carvotes DESC LIMIT 0 , 10)
X order by time DESC) GROUP BY car DESC ORDER BY time desc LIMIT 0 , 30
Gives errors. Any ideas?

Please try the following...
$result = mysqli_query( $con,
"SELECT *
FROM data
WHERE site = '" . $id .
"' AND disabled = '0'
AND car NOT IN ( SELECT car
FROM ( SELECT car,
carvotes
FROM data
WHERE site = '" . $id .
"' AND disabled = '0'
AND carvotes NOT LIKE '0'
AND ( time > ( NOW( ) - INTERVAL 14 DAY ) )
GROUP BY car
ORDER BY carvotes DESC
LIMIT 10 ) X
)
GROUP BY car
ORDER BY time DESC
LIMIT 30" );
The main cause of your problem is that with car NOT IN ( SELECT * FROM ( SELECT *... you are trying to compare each record's value of car with each row returned by your subquery. IN requires you to have the same number of fields on both sides of the comparison. By using SELECT * at both levels of the subquery you were ensuring that the right side of the comparison had however many fields are in data versus your single field on the left, which confused MySQL.
Since you are aiming to compare to a single field, namely car, our subquery has to select just the car field from its dataset. Since the sort order of the subquery's results has no effect upon the IN comparison, and since our innermost query will be returning just car, I have removed the outer level of the subquery.
Beyond changing the first part of the subquery to SELECT car, the only other change that I have made to the subquery is to change LIMIT 0, 10 to LIMIT 10. The former means limit to the the 10 records that are offset by 0 from the first record. This is useful if you want records 6 to 15, but redundant for 1 to 10 as LIMIT 10 has the same affect and is slightly simpler. Ditto for LIMIT 0, 30 at the end of your overall statement.
As for the main body of the statement, I have not made any attempt to specify what fields (or aggregate functions of those fields) should be returned since you have made no statement indicating what your requirements / preferences are. If you are satisfied that GROUP BY has left you with a still valid set of values, then all the good, but if not then I recommend that you rewrite your Question to be specific about that detail.
By default, MySQL sorts the data subjected to a GROUP BY into ascending order, but if an ORDER BY clause is also present then it overrides the GROUP BY's sort pattern. As such, there is no benefit to specifying DESC after either of your GROUP BY car clauses, so I have removed it where it occurs.
Interesting Sidenote : You can override a GROUP BY's sort by specifying ORDER BY NULL.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html - on optimising your ORDER BY sorting
https://dev.mysql.com/doc/refman/5.7/en/select.html - on the SELECT statement's syntax - specifically the parts to do with LIMIT.
https://www.w3schools.com/php/php_mysql_select_limit.asp - a simpler explanation of LIMIT

This is your query:
SELECT *
FROM data
WHERE site = '".$id."' AND disabled = '0' AND
car NOT IN (SELECT *
FROM (SELECT *
FROM data
WHERE site = '".$id."' AND
disabled = '0' AND
carvotes NOT LIKE '0' AND
(time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car
ORDER BY carvotes DESC
LIMIT 0 , 10
) x
ORDER BY time DESC
)
GROUP BY car DESC
ORDER BY time desc
LIMIT 0 , 30 ;
Several comments:
Do not wrap integer constants in single quotes. This can mislead people. This can mislead optimizers.
Do not use string functions on integers (such as like). Same reason.
NOT IN with subqueries is dangerous. The construct does not handle NULL values the way you expect. Use NOT EXISTS or LEFT JOIN instead.
When using subqueries, ORDER BY is almost never appropriate.
Never use SELECT * with GROUP BY. It is just wrong. Happily, MySQL 5.7 has changed its defaults to reject this anti-pattern
So, a better way to write this query is something like this:
SELECT d.car, MAX(time) as time
FROM data d LEFT JOIN
(SELECT d2.*
FROM data d2
WHERE d2.site = '".$id."' AND
d2.disabled = 0 AND
d2.carvotes NOT LIKE 0 AND
(d2.time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY d2.car
ORDER BY carvotes DESC
LIMIT 0 , 10
) car10
ON d.car = car10.car
WHERE d.site = '".$id."' AND d.disabled = 0' AND
car10.car IS NOT NULL
GROUP BY car DESC
ORDER BY MAX(time) desc
LIMIT 0 , 30 ;
Alternatively, use SELECT * and remove the GROUP BY in the outer query.

Related

Eliminate First 14 For Each Symbol From Query

The following query pulls all rows that do not exist in a relative_strength_index table. But I also need to eliminate the first 14 rows for each symbol based on date asc from the historical_data table. I have tried several attempts to do this but am having real trouble with the 14 days. How could this issue be resolved and added into my current query?
Current Query
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date);
What you want is the first argument of the limit clause. Which states which row to start from accompanied by order by asc.
select * from historical_data hd where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date ORDER BY rsi_date ASC LIMIT 14)
use OFFSET along with LIMIT like this this will return maximum of 100,000 rows starting at row 15
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by date asc
limit 100000 offset 14;
but because you're using limit and offset, you might want to ORDER BY by some order before specifying limit and offset.
UPDATE you mentioned for each symbol, so try this query, it ranks each symbol based on date asc, then only selects rows where rank >= 15
SELECT *
FROM
(select hd.*,
CASE WHEN #previous_symbol = hd.symbol THEN #rank:=#rank+1
ELSE #rank := 1
END as rank,
#previous_symbol := hd.symbol
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by hd.symbol, hd.date asc
)T
WHERE T.rank >= 15
It's not clear (to me) what resultset you want to return, or the conditions that specify whether a row should be returned.
All we have to go on is a confusingly vague description, to exclude "the first 14 rows", or "the first 14 days" for each symbol.
What we don't have is a represetative sample of the data, or an example of what rows should be returned.
Without that, we don't have a way to know if we understand the description of the specification, and we don't have anything to test against or to compare our results to.
So, we are basically just guessing. (Which seems to be the most popular kind of answer provided by the "try this" enthusiatss.)
I can provide some examples of some patterns, which may suit your specification, or may not.
To get the earliest `histdate` for each `symbol`, and add 14 days to that, we can use an inline view. We can then do a semi-join to the `historical_data` data, to exclude rows that have a `histdate` before the date returned from the inline view.
(This is based on an assumption that the datatype of the `histdate` column is DATE.)
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
ORDER
BY hd.symbol
, hd.histdate
But that query doesn't include any reference to the `relative_strength_index` table. The original query includes a NOT EXISTS predicate, with a correlated subquery of the `relative_strength_index` table.
If the goal is get the earliest `rsi_date` for each `rsi_symbol` from that table, and then add 14 days to that value...
SELECT hd.*
FROM ( SELECT rsi.rsi_symbol
, MIN(rsi.rsi_date) + INTERVAL 14 DAY AS rsi_date
FROM relative_strength_index rsi
GROUP BY rsi.rsi_symbol
) rs
JOIN historical_data hd
ON hd.symbol = rs.rsi_symbol
ON hd.histdate > rs.rsi_date
ORDER
BY hd.symbol
, hd.histdate
If the goal is to exclude rows where a matching row in relative_strength_index already exists, I would use an anti-join pattern...
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
LEFT
JOIN relative_strength_index xr
ON xr.rsi_symbol = hd.symbol
AND xr.rsi_date = hd.histdate
WHERE xr.rsi_symbol IS NULL
ORDER
BY hd.symbol
, hd.histdate
These are just example query patterns, which are likely not suited to your exact specification, since they are guesses.
It doesn't make much sense to provide more examples of other patterns, without a more detailed specification.

SQL select aggregate values in columns

I have a table in this structure:
editor_id
rev_user
rev_year
rev_month
rev_page
edit_count
here is the sqlFiddle: http://sqlfiddle.com/#!2/8cbb1/1
I need to surface the 5 most active editors during March 2011 for example - i.e. for each rev_user - sum all of the edit_count for each rev_month and rev_year to all of the rev_pages.
Any suggestions how to do it?
UPDATE -
updated fiddle with demo data
You should be able to do it like this:
Select the total using SUM and GROUP BY, filtering by rev_year and rev_month
Order by the SUM in descending order
Limit the results to the top five items
Here is how:
SELECT * FROM (
SELECT rev_user, SUM(edit_count) AS total_edits
FROM edit_count_user_date
rev_year='2006' AND rev_month='09'
GROUP BY rev_user
) x
ORDER BY total_edits DESC
LIMIT 5
Demo on sqlfiddle.
Surely this is as straightforward as :
SELECT rev_user, SUM(edit_count) as TotalEdits
FROM edit_count_user_date
WHERE rev_month = 'March' and rev_year = '2014'
GROUP BY rev_user
ORDER BY TotalEdits DESC
LIMIT 5;
SqlFiddle here
May I also suggest using a more appropriate DATE type for the year and month storage?
Edit, re new Info
The below will return all edits for the given month for the 'highest' MonthTotal editor, and then re-group the totals by the rev_page.
SELECT e.rev_user, e.rev_page, SUM(e.edit_count) as TotalEdits
FROM edit_count_user_date e
INNER JOIN
(
SELECT rev_user, rev_year, rev_month, SUM(edit_count) AS MonthTotal
FROM edit_count_user_date
WHERE rev_month = '09' and rev_year = '2010'
GROUP BY rev_user, rev_year, rev_month
ORDER BY MonthTotal DESC
LIMIT 1
) as x
ON e.rev_user = x.rev_user AND e.rev_month = x.rev_month AND e.rev_year = x.rev_year
GROUP BY e.rev_user, e.rev_page;
SqlFiddle here - I've adjusted the data to make it more interesting.
However, if you need to do this across several months at a time, it will be more difficult given MySql's lack of partition by / analytical windowing functions.

Mysql time difference UNIX_TIMESTAMP() - UNIX_TIMESTAMP(timestamp)

My current sql statement is like follows..
SELECT * FROM `Main`
WHERE username = '$username'
AND fromsite = '$website'
ORDER BY `votes`.`timestamp` DESC
LIMIT 0 , 1
however the timestamp column shows a timestamp like " "2014-03-19 12:00:43" ...Following another question I had an answer along the lines of using ...
SELECT UNIX_TIMESTAMP() - UNIX_TIMESTAMP(timestamp) AS seconds_ago
However not great with mysql and still don't see how I can still use the original select statement and work in a function like above, so that converts the timestamp column and shows as seconds ago instead.
Just ditch the * and list the columns and expressions you want returned.
SELECT m.username
, m.fromsite
, m.timestamp
, UNIX_TIMESTAMP()-UNIX_TIMESTAMP(m.timestamp) AS seconds_ago
, m.votes
FROM `Main` m
WHERE m.username = '$username'
AND m.fromsite = '$website'
ORDER BY m.votes, m.timestamp DESC
LIMIT 0,1

need the work around with the mysql query

Folks
when i m running the below query , i m getting the error for invalid use of group by function
SELECT `Margin`.`destination`,
ROUND(sum(duration),2) as total_duration,
sum(calls) as total_calls
FROM `ilax`.`margins` AS `Margin`
WHERE `date1` = '2013-08-30' and `destination` like "af%"
AND ROUND(sum(duration),2) like "3%"
group by `destination`
ORDER BY duration Asc LIMIT 0, 20;
let me know the work around
The WHERE clause is evaluated before grouping takes place, so SUM() cannot be used therein; use the HAVING clause instead, which is evaluated after grouping:
SELECT destination,
ROUND(SUM(duration), 2) AS total_duration,
SUM(calls) AS total_calls
FROM ilax.margins
WHERE date1 = '2013-08-30'
AND destination LIKE 'af%'
GROUP BY destination
HAVING total_duration LIKE '3%'
ORDER BY total_duration ASC
LIMIT 0, 20
Note also that one really ought to use numeric comparison operations for numeric values, rather than string pattern matching. For example:
HAVING total_duration >= 3000 AND total_duration < 4000

MySql sort within sorted results set

I have the following query which queries a table of sports results for the last 20 matches that involved a teams, returning goals conceeded in each of these matches.
SELECT *, `against` AS `goalsF` , `for` AS `goalsA`
FROM `matches` , `teams` , `outcomes`
WHERE (
`home_team_id`=7 AND `matches`.away_team_id = `teams`.team_id
OR
`away_team_id`=7 AND `matches`.home_team_id = `teams`.team_id
)
AND `matches`.score_id = `outcomes`.outcome_id
ORDER BY `against', `date` DESC
LIMIT 0 , 20
I want sort the results by goals conceeded and then within each group of goals conceeded by date so for example.
the first 4 results where goals conceded=1 in date order
then the next 3 might be results where conceded=2 in date order
I have tried ORDER by date,against - this gives me a strict date order
I have tried ORDER by against,date - this gives me matches beyond the last 20
Is it possible to do what I want to do?
Thanks everyone, I found this worked. This solution was posted by another user but then was removed, not sure why?
SELECT * FROM (
SELECT *, `against` AS `goalsF` , `for` AS `goalsA`
FROM `matches` , `teams` , `outcomes`
WHERE (
`home_team_id`=7 AND `matches`.away_team_id = `teams`.team_id
OR
`away_team_id`=7 AND `matches`.home_team_id = `teams`.team_id
)
AND `matches`.score_id = `outcomes`.outcome_id
ORDER by `goalsF`
LIMIT 0 , 20
) res
ORDER BY `date` DESC
If you want to limit by date, add the date range you are looking for into your WHERE clause and then order by the number of goals conceded.