I have a table, in which there are date wise quiz score of different users. I want to load top 5 scorers for every date.
Table sample create statement:
CREATE TABLE `subscriber_score` (
`msisdn` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
`date` date NOT NULL,
`score` int(11) NOT NULL DEFAULT '0',
`total_questions_sent` int(11) NOT NULL DEFAULT '0',
`total_correct_answers` int(11) NOT NULL DEFAULT '0',
`total_wrong_answers` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`msisdn`,`date`),
KEY `fk_subscriber_score_subscriber1` (`msisdn`),
CONSTRAINT `fk_subscriber_score_subscriber1` FOREIGN KEY (`msisdn`) REFERENCES `subscriber` (`msisdn`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query which I have tried:
SELECT subscriber.msisdn AS msisdn,subscriber.name AS name,subscriber.gender AS gender,tmp2.score AS score,tmp2.date AS winning_date
FROM subscriber,
(SELECT msisdn,tmp.date,tmp.score
FROM subscriber_score,
(SELECT date,MAX(score) AS score
FROM subscriber_score
WHERE date > '2014-10-10' AND date < '2014-11-10' GROUP BY date)
tmp
WHERE subscriber_score.date=tmp.date AND subscriber_score.score=tmp.score)
tmp2
WHERE subscriber.msisdn=tmp2.msisdn ORDER BY winning_date
Actual output: Only one top scorer for every date is shown.
Wanted Output Top 5(or say 10) records for every date are required.
I think you can do this using variables to assign each row a row number, then filter the top 5 for each date.
SELECT s.name AS name,
s.gender AS gender,
s.msisdn,
ss.date,
ss.score
FROM ( SELECT ss.msisdn,
ss.score,
#r:= CASE WHEN ss.Date = #d THEN #r + 1 ELSE 1 END AS RowNum,
#d:= ss.date AS winning_date
FROM subscriber_score AS ss
CROSS JOIN (SELECT #d:= '', #r:= 0) AS v
WHERE ss.date > '2014-10-10'
AND ss.date < '2014-11-10'
ORDER BY ss.Date, ss.Score DESC
) AS ss
INNER JOIN Subscriber AS s
ON s.msisdn = ss.msisdn
WHERE ss.RowNum <= 5;
Example on SQL Fiddle
refer this query its not complete but hope it helps
SELECT SCORE
FROM table
WHERE date='somedate'
ORDER BY SCORE DESC LIMIT 5
select bc.msisdn msisdn,bc.name name,bc.gender gender,ab.score score,ab.date winning_date
(
select msisdn,date,score,
dense_rank() over (partition by date order by score desc) rnk
from subscriber_score
) ab,subscriber bc
where bc.msisdn=ab.msisdn and ab.rnk<=5
order by winning_date ;
This is how you can get solution of your problem in oracle sql.
try below
SELECT subscriber.msisdn AS msisdn,subscriber.name AS name,subscriber.gender AS gender,tmp2.score AS score,tmp2.date AS winning_date
FROM subscriber inner join
(select msisdn,date, score, ROW_NUMBER() OVER(PARTITION BY date ORDER BY score DESC) AS Row
FROM subscriber_score
WHERE date > '2014-10-10' AND date < '2014-11-10' GROUP BY date)
tmp
on subscriber.msisdn=tmp.msisdn and tmp.row<=5
Related
Tables
CREATE TABLE `aircrafts_in` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city_from` int(11) NOT NULL COMMENT 'Откуда',
`city_to` int(11) NOT NULL COMMENT 'Куда',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=utf8 COMMENT='Самолёты по направлениям'
CREATE TABLE `aircrafts_in_parsed_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`price` int(11) NOT NULL COMMENT 'Ценник',
`airline` varchar(255) NOT NULL COMMENT 'Авиакомпания',
`date` date NOT NULL COMMENT 'Дата вылета',
`info_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `info_id` (`info_id`),
KEY `price` (`price`),
KEY `date` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=940682 DEFAULT CHARSET=utf8
date - departure date
CREATE TABLE `aircrafts_in_parsed_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` enum('success','error') DEFAULT NULL,
`type` enum('roundtrip','oneway') NOT NULL,
`date` datetime NOT NULL COMMENT 'Дата парсинга',
`aircrafts_in_id` int(11) DEFAULT NULL COMMENT 'ID направления',
PRIMARY KEY (`id`),
KEY `aircrafts_in_id` (`aircrafts_in_id`)
) ENGINE=InnoDB AUTO_INCREMENT=577759 DEFAULT CHARSET=utf8
date - created date, when was parsed
Task
Get lowest price of ticket and date of departure for each month. Be aware that the minimum price is relevant, not just the minimum. If multiple dates with minimum cost, we need a first.
My solution
I think that there's something not quite right.
I don't like subqueries for grouping, how to solve this problem
select *
from (
select * from (
select airline,
price,
pdata.`date` as `date`
from aircrafts_in_parsed_data `pdata`
inner join aircrafts_in_parsed_info `pinfo`
on pdata.`info_id` = pinfo.`id`
where pinfo.`aircrafts_in_id` = {$id}
and pinfo.status = 'success'
and pinfo.`type` = 'roundtrip'
and `price` <> 0
group by pdata.`date`, year(pinfo.`date`) desc, month(pinfo.`date`) desc, day(pinfo.`date`) desc
) base
group by `date`
order by price, year(`date`) desc, month(`date`) desc, day(`date`) asc
) minpriceperdate
group by year(`date`) desc, month(`date`) desc
Takes 0.015 s without cache, table size can view in auto increment
SELECT MIN(price) AS min_price,
LEFT(date, 7) AS yyyy_mm
FROM aircrafts_in_parsed_data
GROUP BY LEFT(date, 7)
will get the lowest price for each month. But it can't say 'first'.
From my groupwise-max cheat-sheet, I derive this:
SELECT
yyyy_mm, date, price, airline -- The desired columns
FROM
( SELECT #prev := '' ) init
JOIN
( SELECT LEFT(date, 7) != #prev AS first,
#prev := LEFT(date, 7)
LEFT(date, 7) AS yyyy_mm, date, price, airline
FROM aircrafts_in_parsed_data
ORDER BY
LEFT(date, 7), -- The 'GROUP BY'
price ASC, -- ASC to do "MIN()"
date -- To get the 'first' if there are dup prices for a month
) x
WHERE first -- extract only the first of the lowest price for each month
ORDER BY yyyy_mm; -- Whatever you like
Sorry, but subqueries are necessary. (I avoided YEAR(), MONTH(), and DAY().)
You are right, your query is not correct.
Let's start with the innermost query: You group by pdata.date + pinfo.date, so you get one result row per date combination. As you don't specify which price or airline you are interested in for each date combination (such as MAX(airline) and MIN(price)), you get one airline arbitrarily chosen for a date combination and one price also arbitrarily chosen. These don't even have to belong to the same record in the table; the DBMS is free to chose one airline and one price matching the dates. Well, maybe the date combination of pdata.date and pinfo.date is already unique, but then you wouldn't have to group by at all. So however we look at this, this isn't proper.
In the next query you group by pdata.date only, thus again getting arbitrary matches for airline and price. You could have done that in the innermost query already. It makes no sense to say: "give me a randomly picked price per pdata.date and pinfo.date and from these give me a randomly picked price per pdata.date", you could just as well say it directly: "give me a randomly picked price per pdata.date". Then you order your result rows. This is completely useless, as you are using the results as a subquery (derived table) again, and such is considered an unordered set. So the ORDER BY gives the DBMS more work to do, but is in no way guaranteed to influence the main queries results.
In your main query then you group by year and month, again resulting in arbitrarily picked values.
Here is the same query a tad shorter and cleaner:
select
pdata.airline, -- some arbitrily chosen airline matching year and month
pdata.price, -- some arbitrily chosen price matching year and month
pdata.date -- some arbitrily chosen date matching year and month
from aircrafts_in_parsed_data pdata
inner join aircrafts_in_parsed_info pinfo on pdata.info_id = pinfo.id
where pinfo.aircrafts_in_id = {$id}
and pinfo.status = 'success'
and pinfo.type = 'roundtrip'
and pdata.price <> 0
group by year(pdata.date), month(pdata.date)
order by year(pdata.date) desc, month(pdata.date) desc
As to the original task (as far as I understand it): Find the records with the lowest price per month. Per month means GROUP BY month. The lowest price is MIN(price).
select
min_price_record.departure_year,
min_price_record.departure_month,
min_price_record.min_price,
full_record.departure_date,
full_record.airline
from
(
select
year(`date`) as departure_year,
month(`date`) as departure_month,
min(price) as min_price
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
group by year(`date`), month(`date`)
) min_price_record
join
(
select
`date` as departure_date,
year(`date`) as departure_year,
month(`date`) as departure_month,
price,
airline
from aircrafts_in_parsed_data
where price <> 0
and info_id in
(
select id
from aircrafts_in_parsed_info
where aircrafts_in_id = {$id}
and status = 'success'
and type = 'roundtrip'
)
) full_record on full_record.departure_year = min_price_record.departure_year
and full_record.departure_month = min_price_record.departure_month
and full_record.price = min_price_record.min_price
order by
min_price_record.departure_year desc,
min_price_record.departure_month desc;
I have the following query, and I would like to get 100 items from the database, but host_id is in the urls table many times, and I would like to get a maximum of 10 unique rows from that table per host_id.
select *
from urls
join hosts using(host_id)
where
(
last_run_date is null
or last_run_date <= date_sub(curdate(), interval 30 day)
)
and ignore_url != 1
limit 100
So, I would like:
Maximum Results = 100
Max Rows Per Host = 10
I am not sure what I would need to do to accomplish this task. Is there a way to do this without a subquery?
Hosts Table
CREATE TABLE `hosts` (
`host_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`host` VARCHAR(50) NOT NULL,
`last_fetched` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`ignore_host` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`host_id`),
UNIQUE INDEX `host` (`host`)
)
Urls Table
CREATE TABLE `urls` (
`url_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`parent_url_id` INT(10) UNSIGNED NOT NULL,
`scheme` VARCHAR(5) NOT NULL,
`host_id` INT(10) UNSIGNED NOT NULL,
`path` VARCHAR(500) NOT NULL,
`query` VARCHAR(500) NOT NULL,
`date_found` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`last_run_date` DATETIME NULL DEFAULT NULL,
`ignore_url` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`url_id`),
UNIQUE INDEX `host_path_query` (`host_id`, `path`, `query`)
)
Thats it (I hope)
I cant test i real. i have no data. pls test it and give me a little ping.
SELECT *
FROM (
SELECT
#nr:=IF(#lasthost = host_id, #nr+1, 1) AS nr,
u.*,
#lasthost:=IF(#lasthost = host_id, #lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) AS t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
ok,
first:
I only select the rows with your query, and order it
by the host_id and time
SELECT
u.*
FROM
urls u
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
second
I add to variables nr and lasthost and setup it in the select. Now
i count nr each row and reset it to 1 if the host_id is change. So i get a
list of rows numbert from 1 to n for each host_id
SELECT
#nr:=IF(#lasthost = host_id, #nr+1, 1) AS nr,
u.*,
#lasthost:=IF(#lasthost = host_id, #lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
third
i put it this query in a new select so i can join your second table and restrict the result only for rows less 11 and also limit the result to 100
SELECT *
FROM (
SELECT
#nr:=IF(#lasthost = host_id, #nr+1, 1) AS nr,
u.*,
#lasthost:=IF(#lasthost = host_id, #lasthost, host_id) AS lasthost
FROM
urls u,
( SELECT #nr:=4, #lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) AS t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
Thats all
So you need a limited JOIN. Another guess:
SELECT * FROM hosts
LEFT JOIN urls ON
urls.host_id = hosts.host_id
WHERE urls.host_id IN
(SELECT host_id FROM urls
LIMIT 0,10)
LIMIT 0,100
I have a table with three fields, an ID, a Date(string), and an INT. like this.
+---------------------------
+BH|2012-09-01|56789
+BH|2011-09-01|56765
+BH|2010-08-01|67866
+CH|2012-09-01|58789
+CH|2011-09-01|56795
+CH|2010-08-01|67866
+DH|2012-09-01|52789
+DH|2011-09-01|56665
+DH|2010-08-01|67866
I need to essentially for each ID, i need to return only the row with the highest Date string. From this example, my results would need to be.
+---------------------------
+BH|2012-09-01|56789
+CH|2012-09-01|58789
+DH|2012-09-01|52789
SELECT t.id, t.date_column, t.int_column
FROM YourTable t
INNER JOIN (SELECT id, MAX(date_column) AS MaxDate
FROM YourTable
GROUP BY id) q
ON t.id = q.id
AND t.date_column = q.MaxDate
SELECT id, date, int
FROM ( SELECT id, date, int
FROM table_name
ORDER BY date DESC) AS h
GROUP BY id
Replace table_name and columns to the right ones.
Assuming the following structure:
CREATE TABLE `stackoverflow`.`table_10357817` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Date` datetime NOT NULL,
`Number` int(11) NOT NULL,
`Code` char(2) NOT NULL,
PRIMARY KEY (`Id`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=11 DEFAULT CHARSET=latin1
The following query will wield the expected results:
SELECT Code, Date, Number
FROM table_10357817
GROUP BY Code
HAVING Date = MAX(Date)
The GROUP BY forces a single result per Code (you called it id) and the HAVING clauses returns only the data where it matches the max date per code/id.
Update
Used the following data script:
INSERT INTO table_10357817
(Code, Date, Number)
VALUES
('BH', '2012-09-01', 56789),
('BH', '2011-09-01', 56765),
('BH', '2010-08-01', 67866),
('CH', '2012-09-01', 58789),
('CH', '2011-09-01', 56795),
('CH', '2010-08-01', 67866),
('DH', '2012-09-01', 52789),
('DH', '2011-09-01', 56665),
('DH', '2010-08-01', 67866)
I have a question that is almost the same as Sum amount of overlapping datetime ranges in MySQL, so I'm reusing part of his text, hope that is ok...
I have a table of events, each with a StartTime and EndTime (as type DateTime) in a MySQL Table.
I'm trying to output the sum of overlapping times for each type of event and the number of events that overlapped.
What is the most efficient / simple way to perform this query in MySQL?
CREATE TABLE IF NOT EXISTS `events` (
`EventID` int(10) unsigned NOT NULL auto_increment,
`EventType` int(10) unsigned NOT NULL,
`StartTime` datetime NOT NULL,
`EndTime` datetime default NULL,
PRIMARY KEY (`EventID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=37 ;
INSERT INTO `events` (`EventID`, EventType,`StartTime`, `EndTime`) VALUES
(10001,1, '2009-02-09 03:00:00', '2009-02-09 10:00:00'),
(10002,1, '2009-02-09 05:00:00', '2009-02-09 09:00:00'),
(10003,1, '2009-02-09 07:00:00', '2009-02-09 09:00:00'),
(10004,3, '2009-02-09 11:00:00', '2009-02-09 13:00:00'),
(10005,3, '2009-02-09 12:00:00', '2009-02-09 14:00:00');
# if the query was run using the data above,
# the table below would be the desired output
# Number of Overlapped Events , The event type, | Total Amount of Time those events overlapped.
1,1, 03:00:00
2,1, 02:00:00
3,1, 02:00:00
1,3, 01:00:00
There is a really beautiful solution given there by Mark Byers and I'm wondering if that one can be extended to include "Event Type".
His solution without event type was:
SELECT `COUNT`, SEC_TO_TIME(SUM(Duration))
FROM (
SELECT
COUNT(*) AS `Count`,
UNIX_TIMESTAMP(Times2.Time) - UNIX_TIMESTAMP(Times1.Time) AS Duration
FROM (
SELECT #rownum1 := #rownum1 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum1 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times1
JOIN (
SELECT #rownum2 := #rownum2 + 1 AS rownum, `Time`
FROM (
SELECT DISTINCT(StartTime) AS `Time` FROM events
UNION
SELECT DISTINCT(EndTime) AS `Time` FROM events
) AS AllTimes, (SELECT #rownum2 := 0) AS Rownum
ORDER BY `Time` DESC
) As Times2
ON Times1.rownum = Times2.rownum + 1
JOIN events ON Times1.Time >= events.StartTime AND Times2.Time <= events.EndTime
GROUP BY Times1.rownum
) Totals
GROUP BY `Count`
SELECT
COUNT(*) as occurrence
, sub.event_id
, SEC_TO_TIME(SUM(LEAST(e1end, e2end) - GREATEST(e1start, e2start)))) as duration
FROM
( SELECT
, e1.event_id
, UNIX_TIMESTAMP(e1.starttime) as e1start
, UNIX_TIMESTAMP(e1.endtime) as e1end
, UNIX_TIMESTAMP(e2.starttime) as e2start
, UNIX_TIMESTAMP(e2.endtime) as e2end
FROM events e1
INNER JOIN events e2
ON (e1.eventtype = e2.eventtype AND e1.id <> e2.id
AND NOT(e1.starttime > e2.endtime OR e1.endtime < e2.starttime))
) sub
GROUP BY sub.event_id
ORDER BY occurrence DESC
If my table looks like this:
CREATE TABLE `daily_individual_tracking` (
`daily_individual_tracking_id` int(10) unsigned NOT NULL auto_increment,
`daily_individual_tracking_date` date NOT NULL default ''0000-00-00'',
`sales` enum(''no'',''yes'') NOT NULL COMMENT ''no'',
`repairs` enum(''no'',''yes'') NOT NULL COMMENT ''no'',
`shipping` enum(''no'',''yes'') NOT NULL COMMENT ''no'',
PRIMARY KEY (`daily_individual_tracking_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1
basically the fields can be either yes or no.
How can I count how many yes's their are for each column over a date range?
Thanks!!
You can either run three queries like this:
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE sales = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
Or if you want you can get all three at once like this:
SELECT (
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE sales = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
) AS sales_count, (
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE repairs = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
) AS repairs_count, (
SELECT COUNT(*)
FROM daily_individual_tracking
WHERE shipping = 'YES'
AND daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'
) AS shipping_count
Another way to do it is to use SUM instead of COUNT. You could try this too to see how it affects the performance:
SELECT
SUM(sales = 'YES') AS sales_count,
SUM(repairs = 'YES') AS repairs_count,
SUM(shipping = 'YES') AS shipping_count
FROM daily_individual_tracking
WHERE daily_individual_tracking_date BETWEEN '2010-01-01' AND '2010-03-31'