I have a table in this structure:
editor_id
rev_user
rev_year
rev_month
rev_page
edit_count
here is the sqlFiddle: http://sqlfiddle.com/#!2/8cbb1/1
I need to surface the 5 most active editors during March 2011 for example - i.e. for each rev_user - sum all of the edit_count for each rev_month and rev_year to all of the rev_pages.
Any suggestions how to do it?
UPDATE -
updated fiddle with demo data
You should be able to do it like this:
Select the total using SUM and GROUP BY, filtering by rev_year and rev_month
Order by the SUM in descending order
Limit the results to the top five items
Here is how:
SELECT * FROM (
SELECT rev_user, SUM(edit_count) AS total_edits
FROM edit_count_user_date
rev_year='2006' AND rev_month='09'
GROUP BY rev_user
) x
ORDER BY total_edits DESC
LIMIT 5
Demo on sqlfiddle.
Surely this is as straightforward as :
SELECT rev_user, SUM(edit_count) as TotalEdits
FROM edit_count_user_date
WHERE rev_month = 'March' and rev_year = '2014'
GROUP BY rev_user
ORDER BY TotalEdits DESC
LIMIT 5;
SqlFiddle here
May I also suggest using a more appropriate DATE type for the year and month storage?
Edit, re new Info
The below will return all edits for the given month for the 'highest' MonthTotal editor, and then re-group the totals by the rev_page.
SELECT e.rev_user, e.rev_page, SUM(e.edit_count) as TotalEdits
FROM edit_count_user_date e
INNER JOIN
(
SELECT rev_user, rev_year, rev_month, SUM(edit_count) AS MonthTotal
FROM edit_count_user_date
WHERE rev_month = '09' and rev_year = '2010'
GROUP BY rev_user, rev_year, rev_month
ORDER BY MonthTotal DESC
LIMIT 1
) as x
ON e.rev_user = x.rev_user AND e.rev_month = x.rev_month AND e.rev_year = x.rev_year
GROUP BY e.rev_user, e.rev_page;
SqlFiddle here - I've adjusted the data to make it more interesting.
However, if you need to do this across several months at a time, it will be more difficult given MySql's lack of partition by / analytical windowing functions.
Related
Hi I am doing MySQL and using 'Sum over (partition by )'
I want to see the values are adding up by following lines like below
but my result is like just
I'm using the following query:
select dea.location, sum(cast(vac.new_vaccinations as signed)) over (partition by dea.location order by dea.location)
From pr.CovidDeaths_csv dea
join pr.CovidVaccinations_csv vac
on dea.location = vac.location
and dea.date = vac.date
where dea.continent is not null
order by 2;
Does anyone know about this problem?
You're missing the frame specification for window functions in MySQL. It allows you to apply a cumulative sum instead of a static sum:
select dea.location,
sum(cast(vac.new_vaccinations as signed))
over(partition by dea.location
order by dea.location ROWS UNBOUNDED PRECEDING)
From pr.CovidDeaths_csv dea
join pr.CovidVaccinations_csv vac
on dea.location = vac.location
and dea.date = vac.date
where dea.continent is not null
order by 2;
As you've not shared your data from all your tables, I cannot replicate your case, but you can see an analogous pattern on sample data here.
Does anyone know what's wrong with this query?
This works perfectly on its own:
SELECT * FROM
(SELECT * FROM data WHERE site = '".$id."'
AND disabled = '0'
AND carvotes NOT LIKE '0'
AND (time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car ORDER BY carvotes DESC LIMIT 0 , 10)
X order by time DESC
So does this:
SELECT * FROM data WHERE site = '".$id."' AND disabled = '0' GROUP BY car DESC ORDER BY time desc LIMIT 0 , 30
But combining them like this:
SELECT * FROM data WHERE site = '".$id."' AND disabled = '0' AND car NOT IN (SELECT * FROM
(SELECT * FROM data WHERE site = '".$id."'
AND disabled = '0'
AND carvotes NOT LIKE '0'
AND (time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car ORDER BY carvotes DESC LIMIT 0 , 10)
X order by time DESC) GROUP BY car DESC ORDER BY time desc LIMIT 0 , 30
Gives errors. Any ideas?
Please try the following...
$result = mysqli_query( $con,
"SELECT *
FROM data
WHERE site = '" . $id .
"' AND disabled = '0'
AND car NOT IN ( SELECT car
FROM ( SELECT car,
carvotes
FROM data
WHERE site = '" . $id .
"' AND disabled = '0'
AND carvotes NOT LIKE '0'
AND ( time > ( NOW( ) - INTERVAL 14 DAY ) )
GROUP BY car
ORDER BY carvotes DESC
LIMIT 10 ) X
)
GROUP BY car
ORDER BY time DESC
LIMIT 30" );
The main cause of your problem is that with car NOT IN ( SELECT * FROM ( SELECT *... you are trying to compare each record's value of car with each row returned by your subquery. IN requires you to have the same number of fields on both sides of the comparison. By using SELECT * at both levels of the subquery you were ensuring that the right side of the comparison had however many fields are in data versus your single field on the left, which confused MySQL.
Since you are aiming to compare to a single field, namely car, our subquery has to select just the car field from its dataset. Since the sort order of the subquery's results has no effect upon the IN comparison, and since our innermost query will be returning just car, I have removed the outer level of the subquery.
Beyond changing the first part of the subquery to SELECT car, the only other change that I have made to the subquery is to change LIMIT 0, 10 to LIMIT 10. The former means limit to the the 10 records that are offset by 0 from the first record. This is useful if you want records 6 to 15, but redundant for 1 to 10 as LIMIT 10 has the same affect and is slightly simpler. Ditto for LIMIT 0, 30 at the end of your overall statement.
As for the main body of the statement, I have not made any attempt to specify what fields (or aggregate functions of those fields) should be returned since you have made no statement indicating what your requirements / preferences are. If you are satisfied that GROUP BY has left you with a still valid set of values, then all the good, but if not then I recommend that you rewrite your Question to be specific about that detail.
By default, MySQL sorts the data subjected to a GROUP BY into ascending order, but if an ORDER BY clause is also present then it overrides the GROUP BY's sort pattern. As such, there is no benefit to specifying DESC after either of your GROUP BY car clauses, so I have removed it where it occurs.
Interesting Sidenote : You can override a GROUP BY's sort by specifying ORDER BY NULL.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html - on optimising your ORDER BY sorting
https://dev.mysql.com/doc/refman/5.7/en/select.html - on the SELECT statement's syntax - specifically the parts to do with LIMIT.
https://www.w3schools.com/php/php_mysql_select_limit.asp - a simpler explanation of LIMIT
This is your query:
SELECT *
FROM data
WHERE site = '".$id."' AND disabled = '0' AND
car NOT IN (SELECT *
FROM (SELECT *
FROM data
WHERE site = '".$id."' AND
disabled = '0' AND
carvotes NOT LIKE '0' AND
(time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY car
ORDER BY carvotes DESC
LIMIT 0 , 10
) x
ORDER BY time DESC
)
GROUP BY car DESC
ORDER BY time desc
LIMIT 0 , 30 ;
Several comments:
Do not wrap integer constants in single quotes. This can mislead people. This can mislead optimizers.
Do not use string functions on integers (such as like). Same reason.
NOT IN with subqueries is dangerous. The construct does not handle NULL values the way you expect. Use NOT EXISTS or LEFT JOIN instead.
When using subqueries, ORDER BY is almost never appropriate.
Never use SELECT * with GROUP BY. It is just wrong. Happily, MySQL 5.7 has changed its defaults to reject this anti-pattern
So, a better way to write this query is something like this:
SELECT d.car, MAX(time) as time
FROM data d LEFT JOIN
(SELECT d2.*
FROM data d2
WHERE d2.site = '".$id."' AND
d2.disabled = 0 AND
d2.carvotes NOT LIKE 0 AND
(d2.time > ( now( ) - INTERVAL 14 DAY ))
GROUP BY d2.car
ORDER BY carvotes DESC
LIMIT 0 , 10
) car10
ON d.car = car10.car
WHERE d.site = '".$id."' AND d.disabled = 0' AND
car10.car IS NOT NULL
GROUP BY car DESC
ORDER BY MAX(time) desc
LIMIT 0 , 30 ;
Alternatively, use SELECT * and remove the GROUP BY in the outer query.
The following query pulls all rows that do not exist in a relative_strength_index table. But I also need to eliminate the first 14 rows for each symbol based on date asc from the historical_data table. I have tried several attempts to do this but am having real trouble with the 14 days. How could this issue be resolved and added into my current query?
Current Query
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date);
What you want is the first argument of the limit clause. Which states which row to start from accompanied by order by asc.
select * from historical_data hd where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date ORDER BY rsi_date ASC LIMIT 14)
use OFFSET along with LIMIT like this this will return maximum of 100,000 rows starting at row 15
select *
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by date asc
limit 100000 offset 14;
but because you're using limit and offset, you might want to ORDER BY by some order before specifying limit and offset.
UPDATE you mentioned for each symbol, so try this query, it ranks each symbol based on date asc, then only selects rows where rank >= 15
SELECT *
FROM
(select hd.*,
CASE WHEN #previous_symbol = hd.symbol THEN #rank:=#rank+1
ELSE #rank := 1
END as rank,
#previous_symbol := hd.symbol
from historical_data hd
where not exists (select rsi_symbol, rsi_date from relative_strength_index where hd.symbol = rsi_symbol and hd.histDate = rsi_date)
order by hd.symbol, hd.date asc
)T
WHERE T.rank >= 15
It's not clear (to me) what resultset you want to return, or the conditions that specify whether a row should be returned.
All we have to go on is a confusingly vague description, to exclude "the first 14 rows", or "the first 14 days" for each symbol.
What we don't have is a represetative sample of the data, or an example of what rows should be returned.
Without that, we don't have a way to know if we understand the description of the specification, and we don't have anything to test against or to compare our results to.
So, we are basically just guessing. (Which seems to be the most popular kind of answer provided by the "try this" enthusiatss.)
I can provide some examples of some patterns, which may suit your specification, or may not.
To get the earliest `histdate` for each `symbol`, and add 14 days to that, we can use an inline view. We can then do a semi-join to the `historical_data` data, to exclude rows that have a `histdate` before the date returned from the inline view.
(This is based on an assumption that the datatype of the `histdate` column is DATE.)
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
ORDER
BY hd.symbol
, hd.histdate
But that query doesn't include any reference to the `relative_strength_index` table. The original query includes a NOT EXISTS predicate, with a correlated subquery of the `relative_strength_index` table.
If the goal is get the earliest `rsi_date` for each `rsi_symbol` from that table, and then add 14 days to that value...
SELECT hd.*
FROM ( SELECT rsi.rsi_symbol
, MIN(rsi.rsi_date) + INTERVAL 14 DAY AS rsi_date
FROM relative_strength_index rsi
GROUP BY rsi.rsi_symbol
) rs
JOIN historical_data hd
ON hd.symbol = rs.rsi_symbol
ON hd.histdate > rs.rsi_date
ORDER
BY hd.symbol
, hd.histdate
If the goal is to exclude rows where a matching row in relative_strength_index already exists, I would use an anti-join pattern...
SELECT hd.*
FROM ( SELECT d.symbol
, MIN(d.histdate) + INTERVAL 14 DAY AS histdate
FROM historical_data d
GROUP BY d.symbol
) dd
JOIN historical_data hd
ON hd.symbol = dd.symbol
AND hd.histdate > dd.histdate
LEFT
JOIN relative_strength_index xr
ON xr.rsi_symbol = hd.symbol
AND xr.rsi_date = hd.histdate
WHERE xr.rsi_symbol IS NULL
ORDER
BY hd.symbol
, hd.histdate
These are just example query patterns, which are likely not suited to your exact specification, since they are guesses.
It doesn't make much sense to provide more examples of other patterns, without a more detailed specification.
I have an data set that simulates the rate of return for a trading account. There is an entry for each day showing the balance and the open equity. I want to calculate the yearly, or quarterly, or monthly change and percent gain or loss. I have this working for daily data, but for some reason I can't seem to get it to work for yearly data.
The code for daily data follows:
SELECT b.`Date`, b.Open_Equity, delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM tim_account_history p
CROSS JOIN
(SELECT #pequity:= NULL
FROM tim_account_history
ORDER by `Date` LIMIT 1) as a
ORDER BY `Date`) as b
ORDER by `Date` ASC
Grouping by YEAR(Date) doesn't seem to make the desired difference. I have tried everything I can think of, but it still seems to return daily rate of change even if you group by month or year, etc. I think I'm not using windowing correctly, but I can't seem to figure it out. If anyone knows of a good book about this sort of query I'd appreciate that also.
Thanks.sqlfiddle example
Using what Lolo contributed, I have added some code so the data comes from the last day of the year, instead of the first. I also just need the Open_Equity, not the sum.
I'm still not certain I understand why this works, but it does give me what I was looking for. Using another select statement as a from seems to be the key here; I don't think I would have come up with this without Lolo's help. Thank you.
SELECT b.`yyyy`, b.Open_Equity,
concat('$',round(delta, 2)) as delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM (SELECT (EXTRACT(YEAR FROM `Date`)) as `yyyy`,
(SUBSTRING_INDEX(GROUP_CONCAT(CAST(`Open_Equity` AS CHAR) ORDER BY `Date` DESC), ',', 1 )) AS `Open_Equity`
FROM tim_account_history GROUP BY `yyyy` ORDER BY `yyyy` DESC) p
CROSS JOIN
(SELECT #pequity:= NULL) as a
ORDER BY `yyyy` ) as b
ORDER by `yyyy` ASC
Try this:
SELECT b.`Date`, b.Open_Equity, delta,
concat(round(delta_p*100,4),'%') as delta_p
FROM (SELECT *,
(Open_Equity - #pequity) as delta,
(Open_Equity - #pequity)/#pequity as delta_p,
(#pequity:= Open_Equity)
FROM (SELECT YEAR(`Date`) `Date`, SUM(Open_Equity) Open_Equity FROM tim_account_history GROUP BY YEAR(`Date`)) p
CROSS JOIN
(SELECT #pequity:= NULL) as a
ORDER BY `Date` ) as b
ORDER by `Date` ASC
I have the following query which queries a table of sports results for the last 20 matches that involved a teams, returning goals conceeded in each of these matches.
SELECT *, `against` AS `goalsF` , `for` AS `goalsA`
FROM `matches` , `teams` , `outcomes`
WHERE (
`home_team_id`=7 AND `matches`.away_team_id = `teams`.team_id
OR
`away_team_id`=7 AND `matches`.home_team_id = `teams`.team_id
)
AND `matches`.score_id = `outcomes`.outcome_id
ORDER BY `against', `date` DESC
LIMIT 0 , 20
I want sort the results by goals conceeded and then within each group of goals conceeded by date so for example.
the first 4 results where goals conceded=1 in date order
then the next 3 might be results where conceded=2 in date order
I have tried ORDER by date,against - this gives me a strict date order
I have tried ORDER by against,date - this gives me matches beyond the last 20
Is it possible to do what I want to do?
Thanks everyone, I found this worked. This solution was posted by another user but then was removed, not sure why?
SELECT * FROM (
SELECT *, `against` AS `goalsF` , `for` AS `goalsA`
FROM `matches` , `teams` , `outcomes`
WHERE (
`home_team_id`=7 AND `matches`.away_team_id = `teams`.team_id
OR
`away_team_id`=7 AND `matches`.home_team_id = `teams`.team_id
)
AND `matches`.score_id = `outcomes`.outcome_id
ORDER by `goalsF`
LIMIT 0 , 20
) res
ORDER BY `date` DESC
If you want to limit by date, add the date range you are looking for into your WHERE clause and then order by the number of goals conceded.