Select top N rows out of M groups

Select top N rows out of M groups - mysql

I have this table:
CREATE TABLE IF NOT EXISTS `catalog_sites` (
`id` int(10) unsigned NOT NULL auto_increment,
`cat_id` int(10) unsigned NOT NULL,
`date` datetime NOT NULL,
`url` varchar(255) NOT NULL,
`title` varchar(255) NOT NULL,
`description` varchar(255) NOT NULL,
`keywords` varchar(255) NOT NULL,
`visited` int(10) unsigned NOT NULL,
`shown` int(10) unsigned NOT NULL,
`meta_try` int(1) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
I think my problem is simple, but cant seem to find an appropriate solution..
So, this is a table with web-sites, I would like to get 6 sites in 6 different categories (cat_id, total: 36 rows) with the highest rating for each category. The rating is calculated as visited / shown.
I should get 36 rows containing 6 top categories (we can find them by sorting with AVG(visited / shown) ), and 6 top sites in each of these 6 categories.
If you have any ideas how this might happen differently, please tell me.

This should get you what you want using MySQL Variables, the inner query will pre-calculate the rank of visited / shown, and using an order by by the condition you want... Per Category, the highest ranks... and then using #vars will keep the #RankSeq sequentially 1-? per category. From that Prequery (aliased PQ), the OUTER query just queries the PreQuery where the URL's Rank Sequence is <= 6
To further ensure you are only getting the top 6 categories, the inner PreQuery also has a pre-query / limit for the "TopCategories" (alias)
select
PQ.URL,
PQ.Cat_ID,
PQ.Rank,
PQ.URLRankSeq
from
( select
CS.cat_id,
(CS.visited / CS.shown ) as Rank,
CS.url,
#RankSeq := if( #LastCat = CS.Cat_ID, #RankSeq +1, 1 ) URLRankSeq,
#LastCat := CS.Cat_ID as ignoreIt
from
( select cat_id,
avg( visited / shown )
from catalog_sites
group by 1
order by 2 desc
limit 6 ) TopCategories
JOIN catalog_sites CS
on TopCategories.Cat_ID = CS.Cat_ID,
(select #RankSeq := 0, #LastCat = 0 ) SQLVars
order by
CS.cat_id,
Rank ) PQ
where
PQ.URLRankSeq <= 6

I've tried your example, but it doesn't really work for me, or I just don't know how to adapt it to my case. Anyway, I'm still a noob as far as SQL goes, so I couldn't understand your query.
I have managed to solve my problem however. It's complicated and probably the worst possible approach. It is slow too, but I'll cache the results, so that shouldn't be a problem.
Here is my solution:
SET #site_limit = 2;
SET #cat_limit = 6;
SET #row = 0;
SET #limiter = 0;
SET #last_cat = 0;
SELECT `cat_id`, `url`, `visited` / `shown` AS `rating`, #limiter := IF(#last_cat = `cat_id`, IF(#limiter >= #site_limit - 1, #limiter, #limiter + 1), 0) AS `limiter`, #last_cat := `cat_id` AS `last_cat`
FROM `catalog_sites`
WHERE `cat_id`
IN (
SELECT `cat_id`
FROM (
SELECT `cat_id`, #row := #row + 1 AS `row`
FROM (
SELECT `cat_id`
FROM `catalog_sites`
GROUP BY `cat_id`
ORDER BY AVG(`visited` / `shown`) DESC
) AS derived1
) AS derived2
WHERE `row` <= #cat_limit
)
GROUP BY `cat_id`, `limiter`
ORDER BY `cat_id`, `rating` DESC

Related

MysqL big table query optimization

I have a chatting application. I have an api which returns list of users who the user talked. But it takes a long to mysql return a list messages when it reachs 100000 rows of data.
This is my messages table
CREATE TABLE IF NOT EXISTS `messages` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`fromid` int(11) NOT NULL,
`toid` int(11) NOT NULL,
`message` text NOT NULL,
`attachments` text NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`date` datetime NOT NULL,
`delete` varchar(50) NOT NULL,
`uuid_read` varchar(250) NOT NULL,
PRIMARY KEY (`_id`),
KEY `fromid` (`fromid`,`toid`,`status`,`delete`,`uuid_read`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=118561 ;
and this is my users table (simplified)
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(50) DEFAULT '',
`sex` tinyint(1) DEFAULT '0',
`status` varchar(255) DEFAULT '',
`avatar` varchar(30) DEFAULT '0',
`last_active` datetime DEFAULT NULL,
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=15523 ;
And here is my query (for user with id 1930)
select SQL_CALC_FOUND_ROWS `u_id`, `id`, `login`, `sex`, `birthdate`, `avatar`, `online_status`, SUM(`count`) as `count`, SUM(`nr_count`) as `nr_count`, `date`, `last_mesg` from
(
(select `m`.`fromid` as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, (COUNT(`m`.`_id`)-SUM(`m`.`status`)) as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`fromid`=`m`.`fromid`) left join `users` as u on `u`.`id`=`m`.`fromid` where `m`.`toid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
UNION
(select `m`.toid as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, 0 as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`toid`=`m`.`toid`) left join `users` as u on `u`.`id`=`m`.`toid` where `m`.`fromid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
order by `date` desc ) as `f` group by `u_id` order by `date` desc limit 0,10
Please help to optimize this query
What I need,
Who user talked to (name, sex, and etc)
What was the last message (from me or to me)
Count of messages (all)
Count of unread messages (only to me)
The query works well, but takes too long.
The output must be like this

You have some design problems on your query and database.
You should avoid keywords as column names, as that delete column or the count column;
You should avoid selecting columns not declared in the group by without an aggregation function... although MySQL allows this, it's not a standard and you don't have any control on what data will be selected;
Your not like construction may cause a bad behavior on your query because '%1930;%' may match 11930; and 11930 is not equal to 1930;
You should avoid like constructions starting and ending with % wildcard, which will cause the text processing to take longer;
You should design a better way to represent a message deletion, probably a better flag and/or another table to save any important data related with the action;
Try to limit your result before the join conditions (with a derived table) to perform less processing;
I tried to rewrite your query the best way I understood it. I've executed my query in a messages table with ~200.000 rows and no indexes and it performed in 0,15 seconds. But, for sure you should create the right indexes to help it perform better when the amount of data increase.
SELECT SQL_CALC_FOUND_ROWS
u.id,
u.login,
u.sex,
u.birthdate,
u.avatar,
u.last_active AS online_status,
g._count,
CASE WHEN m.toid = 1930
THEN g.nr_count
ELSE 0
END AS nr_count,
m.`date`,
m.message AS last_mesg
FROM
(
SELECT
MAX(_id) AS _id,
COUNT(*) AS _count,
COUNT(*) - SUM(m.status) AS nr_count
FROM messages m
WHERE 1=1
AND m.`delete` NOT LIKE '%1930;%'
AND
(0=1
OR m.fromid = 1930
OR m.toid = 1930
)
GROUP BY
CASE WHEN m.fromid = 1930
THEN m.toid
ELSE m.fromid
END
ORDER BY MAX(`date`) DESC
LIMIT 0, 10
) g
INNER JOIN messages AS m ON 1=1
AND m._id = g._id
LEFT JOIN users AS u ON 0=1
OR (m.fromid <> 1930 AND u.id = m.fromid)
OR (m.toid <> 1930 AND u.id = m.toid)
ORDER BY m.`date` DESC
;

different values in same column, output in different columns

Currently I am working on a project, which has to do with Formula 1.
That's my structure of the table for results.
CREATE TABLE IF NOT EXISTS `races_results` (
`resultid` int(11) NOT NULL,
`seasonyear` int(4) NOT NULL,
`trackid` tinyint(2) NOT NULL,
`raceid` int(2) NOT NULL,
`session` tinyint(1) NOT NULL,
`q` int(11) NOT NULL,
`place` tinyint(2) NOT NULL,
`driverid` int(2) NOT NULL,
`teamid` int(2) NOT NULL,
`time` int(11) NOT NULL,
`laps` int(2) NOT NULL,
`status` varchar(3) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
My big problem is that I don't get the result in output as I want.
SELECT place, driverid, teamid, if(q=1, time, '') as time1, if(q=2, time, '') as time2, if(q=3, time, '') as time3
FROM `races_results`
WHERE `seasonyear` = 2015 AND `raceid` = 3 AND `session` = 2 AND `q` IN (1,2,3)
GROUP BY driverid
ORDER BY CASE WHEN q = 3 THEN place >= 1 AND place <= 10 END ASC, CASE WHEN q = 2 THEN place >= 11 AND place <= 16 END ASC, CASE WHEN q = 1 THEN place >= 17 AND place <= 22 END ASC
My target is that I want that the all times of a driver will show side by side and after this should be ordered by the participants of the sections.
After this I should have an output like this http://www.formula1.com/content/fom-website/en/championship/results/2015-race-results/2015-japan-results/qualifying.html

From your question I understand that the table races_results has a line for each result, so the times of the different qualifications are on different lines. To get these on one line you can do a join of the same table:
SELECT place, driverid, teamid, r1.time as time1, r2.time as time2, r3.time as time3
FROM races_results r1 LEFT JOIN races_results r2 on (r1.driverid=r2.driverid and r1.raceid=r2.raceid)
LEFT JOIN races_results r3 on (r1.driverid=r3.driverid and r1.raceid=r3.raceid)
WHERE r1.q=1 AND r2.q=2 AND r3.q=3 AND
`seasonyear` = 2015 AND `raceid` = 3 AND `session` = 2
GROUP BY driverid
ORDER BY place;
I assume:
that there is always a result for q=1;
the driverid and raceid are unique for a race on a specific year for a driver;
you want to order by place.

Selecting sets with the top 10% of combined votes among objects

For some background, I previously asked about retrieving sets with highest number of combined votes among objects. That works great for getting the top 25, but now I would like to get the top 10%, ordered by rankset's timestamp. Here are the tables in question:
CREATE TABLE IF NOT EXISTS `rankset` (
`id` INT NOT NULL AUTO_INCREMENT ,
`name` TEXT NOT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB
CREATE TABLE IF NOT EXISTS `item` (
`id` BIGINT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(128) NOT NULL ,
`rankset` BIGINT NOT NULL ,
`image` VARCHAR(45) NULL ,
`description` VARCHAR(140) NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB
CREATE TABLE IF NOT EXISTS `mydb`.`vote` (
`id` BIGINT NOT NULL AUTO_INCREMENT ,
`value` TINYINT NOT NULL ,
`item` BIGINT NOT NULL ,
`user` BIGINT NOT NULL ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB
I'd list what I've tried so far, but I honestly don't even know where to begin with this one. Here's the SQL fiddle:
http://sqlfiddle.com/#!2/fe315/9

From the comments:
Look at this answer: https://stackoverflow.com/a/4474389/97513, He generates a column for the ranking of his items, use a WHERE clause that uses rank <= (SELECT COUNT(1) / 10 FROM rankset) to identify the top 10%.
I put together an SQL fiddle to demonstrate: http://sqlfiddle.com/#!2/fe315/21 - Here's another with more results so you can see it scales up when you add more rows: http://sqlfiddle.com/#!2/a02ea/1
SQL Used:
SET #rn := 0;
SELECT (#rn:=#rn+1) AS rank, q.*
FROM (
SELECT rankset.*, COALESCE(COUNT(vote.id), 0) AS votes
FROM rankset, vote, item
WHERE item.rankset = rankset.id
AND vote.item = item.id
GROUP BY rankset.id
ORDER BY votes DESC
) q
WHERE
#rn <= (SELECT COUNT(1)/10 FROM rankset);

Optimize MySQL UPDATE query that contains WHERE and ORDER BY?

How can I optimize this query? If I run it without the ORDER BY clause, it executes in <100ms. With the ORDER BY clause it takes many seconds, and crushes the server when more than one system is trying to make this query at once.
UPDATE companies
SET
crawling = 1
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1;
If I run this query as a SELECT, it's also fast ( <100ms ).
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1;
Here is my table schema:
CREATE TABLE `companies` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) DEFAULT NULL,
`url_scheme` varchar(10) DEFAULT NULL,
`url_host` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`crawl` tinyint(1) unsigned NOT NULL DEFAULT '1',
`crawling` tinyint(1) unsigned NOT NULL DEFAULT '0',
`last_crawled` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `url_host` (`url_host`),
KEY `crawl` (`crawl`),
KEY `crawling` (`crawling`),
KEY `last_crawled` (`last_crawled`),
KEY `url_scheme` (`url_scheme`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
UPDATE ONE
This query gives me the following error: You can't specify target table 'companies' for update in FROM clause
UPDATE companies
SET crawling = 1
WHERE id = (
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1
);
This query gives me the following error: This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
UPDATE companies
SET crawling = 1
WHERE id in (
SELECT id
FROM companies
WHERE
crawling = 0
AND url_host IS NOT NULL
ORDER BY
last_crawled ASC
LIMIT 1
);

try not to use ORDER-BY and LIMIT for such small number of updates.
UPDATE companies t1
join
(
SELECT c.id,#RowNum:=#RowNum+1 AS RowID
FROM companies c, (SELECT #RowNum := 0)r
WHERE c.crawling = 0 AND c.url_host IS NOT NULL
ORDER BY c.last_crawled ASC
)t2
ON t2.RowID=1 AND t1.id=t2.id
SET t1.crawling = 1
EDIT:1
make sure you have the index on (last_crawled ASC , id ASC)
UPDATE companies t1
join
(
Select ID,RowID
From
(
SELECT c.id,#RowNum:=#RowNum+1 AS RowID
FROM companies c, (SELECT #RowNum := 0)r
WHERE c.crawling = 0 AND c.url_host IS NOT NULL
ORDER BY c.last_crawled ASC
)t2
WHERE ROWID=1
)t3
ON t1.id=t3.id
SET t1.crawling = 1

Need help creating SQL query

I have thus two tables:
CREATE TABLE `workers` (
`id` int(7) NOT NULL AUTO_INCREMENT,
`number` int(7) NOT NULL,
`percent` int(3) NOT NULL,
`order` int(7) NOT NULL,
PRIMARY KEY (`id`)
);
CREATE `data` (
`id` bigint(15) NOT NULL AUTO_INCREMENT,
`workerId` int(7) NOT NULL,
PRIMARY KEY (`id`)
);
I want to return the first worker (order by order ASC) that his number of rows in the table data times percent(from table workers) /100 is smaller than number(from table workers.
I have tried this query:
SELECT workers.id, COUNT(data.id) AS `countOfData`
FROM `workers` as workers, `data` as data
WHERE data.workerId = workers.id
AND workers.percent * `countOfData` < workers.number
LIMIT 1
But I get the error:
#1054 - Unknown column 'countOfData' in 'where clause'

This should work:
SELECT A.id
FROM workers A
LEFT JOIN (SELECT workerId, COUNT(*) AS Quant
FROM data
GROUP BY workerId) B
ON A.id = B.workerId
WHERE (COALESCE(Quant,0) * `percent`)/100 < `number`
ORDER BY `order`
LIMIT 1

You could calculate the number of rows per worker in a subquery. The subquery can be joined to the worker table. If you use a left join, a worker with no data rows will be considered:
select *
from workers w
left join
(
select workerId
, count(*) as cnt
from data
group by
workerId
) d
on w.id = d.workerId
where coalesce(d.cnt, 0) * w.percent / 100 < w.number
order by
w.order
limit 1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Select top N rows out of M groups - mysql

Related

MysqL big table query optimization

different values in same column, output in different columns

Selecting sets with the top 10% of combined votes among objects

Optimize MySQL UPDATE query that contains WHERE and ORDER BY?

Need help creating SQL query

Categories

Resources