Retrieving Nth subquery for INSERT - mysql

Abstract
From a table holding various posts of users to a forum, another table shall be daily updated with the top 20 posters. Posts are stored in posts, daily high-scores are held in hiscore.
Tables
posts:
post_id(PK:INT) | user_id(INT) | ... | timestamp(TIMESTAMP)
hiscore:
user_id(INT) | rank(INT)
Query
TRUNCATE TABLE `hiscore` ;
INSERT INTO `hiscore` (`user_id`,`rank`)
(
SELECT `user_id`, ???
FROM `posts`
WHERE `timestamp` BETWEEN blah AND blah
GROUP BY `user_id`
ORDER BY COUNT(`post_id`) DESC
LIMIT 20
)
The actual question
What is to be inserted in the above query instead of ??? to account for the rank?
Is there a variable like #NTH_SUBQUERY that'll substitute for 5 on the fifth run of the SELECT subquery?
UPDATE: The table hiscore is supposed to only hold the top 20 posters. I know the table structure can be optimized. The focus of the answers should be on how to determine the current retrieved row of the sub-query.

INSERT INTO `hiscore` (`user_id`,`rank`)
(
SELECT `user_id`, #rank = #rank + 1
FROM `posts`, (SELECT #rank := 0) r
WHERE `timestamp` BETWEEN blah AND blah
GROUP BY `user_id`
ORDER BY COUNT(`post_id`) DESC
LIMIT 20
)

You seems too fancy on truncate, for you cases
hiscore:
the_date (DATE) | user_id(INT) | rank(INT)
and built a key on the_date, rank
insertion
set #pos=0;
insert into hiscore
select cur_date(), user_id, #pos:=#pos+1
from ...
to keep the table size manageable, you probably can delete once in few months
Or you can set an auto_increment on rank
create table hiscore
(
the_date date not null,
rank int(3) not null auto_increment,
user_id int(10) not null,
primary key (the_date, rank)
);
So, the rank is auto incremented (which is the same as order by number of daily posts descending)

Related

MySQL - INSERT INTO … ON DUPLICATE KEY UPDATE with values select from the same table

I have two tables, one is posts(ID, edited_date, is_public, ... etc), another one is post_pagination(index, post_id) for storing the pagination data of posts. In post_pagination, index is primary key.
One step in the pagination process, I need to check the sorted result and post_pagination table, and do INSERT/UPDATE, like:
INSERT INTO post_pagination(`index`, post_id)
SELECT * /* full outer join new sorted result and pagination table */
FROM (
WITH
cte1 AS (
SELECT ROW_NUMBER() OVER(ORDER BY edited_date DESC, ID DESC) AS new_rank, ID
FROM posts
WHERE is_public = 1
),
cte2 AS (
SELECT `index`
FROM post_pagination
)
SELECT `index` AS dummy_rank, ID
FROM cte2
LEFT JOIN cte1 on `index` = new_rank
UNION ALL
SELECT new_rank AS dummy_rank, ID
FROM cte2
RIGHT JOIN cte1 on `index` = new_rank
WHERE `index` IS NULL
) AS a
ORDER BY dummy_rank
ON DUPLICATE KEY UPDATE post_id = ID
The first time run this query, post_pagination is empty, so MySQL insert all data into the table.
index post_id
1 390
2 391
3 392
4 393
5 307
it works well.
When I run the second time, I expect all data will not be changed, but it update all post_id field to the last value of the result.
index post_id
1 307
2 307
3 307
4 307
5 307
I did a few tests, it seems like inserting values into a table by using values select from the same table would cause this problem, but I can't figure out why
Now I simply fix this problem by modifying the last line of the query to:
ON DUPLICATE KEY UPDATE post_id = ID + 0
Is there any better way to solve this issue?
Can you instead use a temporary table?
CREATE TEMPORARY TABLE `post_pagination_temp` (
`index` INTEGER(1) AUTO_INCREMENT PRIMARY KEY,
`post_id` INTEGER(1)
);
INSERT INTO `post_pagination_temp` (`post_id`)
SELECT `ID` FROM `posts` ORDER BY `edited_date` DESC, `ID` DESC;
INSERT INTO `post_pagination`
SELECT * FROM `post_pagination_temp`
ON DUPLICATE KEY UPDATE `index` = VALUES(`index`);
This assumes you have the correct index on post_pagination, i.e. the only unique being the primary key on post_id.

MySQL update table only with the highest values of another table

For a game site.
All games are recorded if the player's score is greater than his old score
Table of all players (over 10,000 players)
CREATE TABLE games (
PlayerID INT UNSIGNED,
Date TIMESTAMP(12),
Score BIGINT UNSIGNED DEFAULT 0,
#...other data
);
Once a month, I do an update of the table of records best. And after I erase all games.
Table of best players (top 50)
CREATE TABLE best (
#...same as games, without final other data
PlayerID INT UNSIGNED,
Date TIMESTAMP(12),
Score BIGINT UNSIGNED DEFAULT 0
);
So I add the 50 best players of the table games in to the table best:
INSERT INTO best (PlayerID, Date, Score)
SELECT PlayerID, Date, Score FROM games ORDER BY Score DESC LIMIT 50;
And after (and this is where I have a problem) I try to keep in best only the best 50. At this point best contains 100 lines.
What I have to do:
Do not store several times the same player PlayerID.
Delete the worst Score for this player.
And at the end, leaving only the top 50.
->
+----------+---------+
| PlayerID | Score |
+----------+---------+
| 25 | 20000 | New
| 25 | 25000 | Old best
| 40 | 10000 | Old best
| 57 | 80000 | New best
| 57 | 45000 | Old
| 80 | 35000 | New best
+----------+---------+
I have to retain in the end only 50 lines (the ones with "best" in my example).
I tried many things, but I have not succeeded in achieve the expected result.
I am using PHP, so if it is possible to do it simply with a intermediare storage in an array, that's fine too.
The speed is not a priority because it is an operation that is done only once a month.
The following SQL returns the top 50 scores:
SELECT `PlayerId`, max(`Score`) MaxScore
FROM (
SELECT `PlayerId`, `Date`, `Score` FROM games
UNION
SELECT `PlayerId`, `Date`, `Score` FROM best
) t
GROUP BY `PlayerId`
ORDER BY `MaxScore` DESC
LIMIT 50
You can use the result to overwrite the table best. For this you also need the corresponding Date field, which is missing so far. The next SQL will also return a maxDate field which corresponds to the highscore.
SELECT t2.`PlayerId`, max(t2.`Date`) maxDate, top.`MaxScore`
FROM
(
SELECT `PlayerId`, max(`Score`) MaxScore
FROM (
SELECT `PlayerId`, `Date`, `Score` FROM games
UNION
SELECT `PlayerId`, `Date`, `Score` FROM best
) t1
GROUP BY `PlayerId`
ORDER BY `MaxScore` DESC
LIMIT 50
) top
LEFT JOIN (
SELECT `PlayerId`, `Date`, `Score` FROM games
UNION
SELECT `PlayerId`, `Date`, `Score` FROM best
) t2 ON t2.`PlayerId` = top.`PlayerId` AND t2.`Score` = top.`MaxScore`
GROUP BY t2.`PlayerId`
ORDER BY top.`MaxScore` DESC
To transfer the new top 50 highscores into the best table you can use a temporary table like tmp_best. Insert the top scores into the empty table tmp_best with (you have to insert your select query from above):
INSERT INTO tmp_best (`PlayerId`, `Date`, `Score`)
SELECT ...
After this the best table can be emptied and then you can copy the rows from tmp_best into best.
Here is an alternative solution, which has simplified SQL. The difference
to the solution above is the using of a temporary table tmp_all at the beginning for the unified data. Before using the following SQL you have to create tmp_all, which can be a copy of the structure of games or best.
DELETE FROM tmp_all;
INSERT INTO tmp_all
SELECT `PlayerId`, `Date`, `Score` FROM games
UNION
SELECT `PlayerId`, `Date`, `Score` FROM best
;
DELETE FROM best;
INSERT INTO best (`PlayerId`, `Date`, `Score`)
SELECT t2.`PlayerId`, max(t2.`Date`) maxDate, top.`MaxScore`
FROM
(
SELECT `PlayerId`, max(`Score`) MaxScore
FROM tmp_all t1
GROUP BY `PlayerId`
ORDER BY `MaxScore` DESC
LIMIT 50
) top
LEFT JOIN tmp_all t2 ON t2.`PlayerId` = top.`PlayerId` AND t2.`Score` = top.`MaxScore`
GROUP BY t2.`PlayerId`
ORDER BY top.`MaxScore` DESC
;
SELECT PlayerID, Date, Score FROM games ORDER BY Score DESC LIMIT 50
UNION
SELECT PlayerID, Date, Score FROM best
Here you'll get the best 50 players all-time. Then, as suggested by #ethrbunny, erase the best table and populate it again with the above query. You can use a TEMPORARY TABLE
UNION guarantees you that you'll get no duplicated player

MySQL query optimization with group by clause

I want to calculate total and unique clickouts based on country,partner and retailer.
I have achieved the desired result but i think its not a optimal solution and for longer data sets it will take longer time. how can I improve this query?
here is my test table, designed query and expected output:
"country_id","partner","retailer","id_customer","id_clickout"
"1","A","B","100","XX"
"1","A","B","100","XX"
"2","A","B","100","XX"
"2","A","B","100","GG"
"2","A","B","100","XX"
"2","A","B","101","XX"
DROP TABLE IF EXISTS x;
CREATE TEMPORARY TABLE x AS
SELECT test1.country_id, test1.partner,test1.retailer, test1.id_customer,
SUM(CASE WHEN test1.id_clickout IS NULL THEN 0 ELSE 1 END) AS clicks,
CASE WHEN test1.id_clickout IS NULL THEN 0 ELSE 1 END AS unique_clicks
FROM test1
GROUP BY 1,2,3,4
;
SELECT country_id,partner,retailer, SUM(clicks), SUM(unique_clicks)
FROM x
GROUP BY 1,2,3
Output:
"country_id","partner","retailer","SUM(clicks)","SUM(unique_clicks)"
"1","A","B","2","1"
"2","A","B","4","2"
And here is DDL and input data:
CREATE TABLE test (
country_id INT(11) DEFAULT NULL,
partner VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL,
retailer VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL,
id_customer BIGINT(20) DEFAULT NULL,
id_clickout VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO test VALUES(1,'A','B','100','XX'),(1,'A','B','100','XX'),
(2,'A','B','100','XX'),(2,'A','B','100','GG'),
(2,'A','B','100','XX'),(2,'A','B','101','xx')
SELECT
country_id,
partner,
retailer,
COUNT(id_clickout) AS clicks,
COUNT(DISTINCT CASE WHEN id_clickout IS NOT NULL THEN id_customer END) AS unique_clicks
FROM
test1
GROUP BY
1,2,3
;
COUNT(a_field) won't count any NULL values.
So, COUNT(id_clickout) will only count the number of times that it is NOT NULL.
Equally, the CASE WHEN statement in the unique_clicks only returns the id_customer for records where they clicked, otherwise it returns NULL. This means that the COUNT(DISTINCT CASE) only counts distinct customers, and only when they clicked.
EDIT :
I just realised, it's potentially even simpler than that...
SELECT
country_id,
partner,
retailer,
COUNT(*) AS clicks,
COUNT(DISTINCT id_customer) AS unique_clicks
FROM
test1
WHERe
id_clickout IS NOT NULL
GROUP BY
1,2,3
;
The only material difference in the results will be that any country_id, partner, retailed that previously showed up with 0 clicks will now not appear in the results at all.
With an INDEX on country_id, partner, retailed, id_clickout, id_customer or country_id, partner, retailed, id_customer, id_clickout, however, this query should be significantly faster.
I think this is what you are after:
SELECT country_id,partner,retailer,COUNT(retailer) as `sum(clicks)`,count(distinct id_clickout) as `SUM(unique_clicks)`
FROM test1
GROUP BY country_id,partner,retailer
Result:
COUNTRY_ID PARTNER RETAILER SUM(CLICKS) SUM(UNIQUE_CLICKS)
1 A B 2 1
2 A B 4 2
See result in SQL Fiddle.

Limit count in sql

I have a query that looks like the below
SELECT
venueid as VENUES, venue2.venue AS LOCATION,
(SELECT COUNT(*) FROM events WHERE (VENUES = venueid) AND eventdate < CURDATE()) AS number
FROM events
INNER JOIN venues as venue2 ON events.venueid=venue2.id
GROUP BY VENUES
ORDER BY number DESC
I want to limit the count to count the last 5 rows in the table (sorting by id) however when I add a limt 0,5 the results don't seem to change. When counting where do you add in the limit to limit the amount of rows that are being counted?
CREATE TABLE venues (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
venue VARCHAR(255)
) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;
CREATE TABLE categories (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
category VARCHAR(255)
) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;
CREATE TABLE events (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
eventdate DATE NOT NULL,
title VARCHAR(255),
venueid INT,
categoryid INT
) DEFAULT CHARACTER SET utf8 ENGINE=InnoDB;
INSERT INTO venues (id, venue) VALUES
(1, 'USA'),
(2, 'UK'),
(3, 'Japan');
INSERT INTO categories (id, category) VALUES
(1, 'Jazz'),
(2, 'Rock'),
(3, 'Pop');
INSERT INTO events (id, eventdate, title, venueid, categoryid) VALUES
(1,20121003,'Title number 1',1,3),
(2,20121010,'Title number 2',2,1),
(3,20121015,'Title number 3',3,2),
(4,20121020,'Title number 4',1,3),
(5,20121022,'Title number 5',2,1),
(6,20121025,'Title number 6',3,2),
(7,20121030,'Title number 7',1,3),
(8,20121130,'Title number 8',1,1),
(9,20121230,'Title number 9',1,2),
(10,20130130,'Title number 10',1,3);
The expected result should look like the below
|VENUES |LOCATION |NUMBER |
|1 | USA | 3 |
|2 | UK | 1 |
|3 | Japan | 1 |
As of the time of posting id 9,8,7,6,5 are the last 5 events before the current date.
See SQL Fiddle link below for full table details.
http://sqlfiddle.com/#!2/21ad85/32
This query gives you the five rows that you are trying to group and count:
SELECT *
FROM events
WHERE eventdate < CURDATE()
ORDER BY eventdate DESC
LIMIT 5
Now you can use this query as a subquery. You can join with the result of a subquery just as if it were an ordinary table:
SELECT
venueid as VENUES,
venue2.venue AS LOCATION,
COUNT(*) AS number
FROM
(
SELECT *
FROM events
WHERE eventdate < CURDATE()
ORDER BY eventdate DESC
LIMIT 5
) AS events
INNER JOIN venues as venue2 ON events.venueid=venue2.id
GROUP BY VENUES
ORDER BY number DESC
http://sqlfiddle.com/#!2/21ad85/37

SELECT newest record of any GROUP of records (ignoring records with one record)

Having trouble with a query to return the newest order of any grouped set of orders having more than 1 order. CREATE & INSERTs for the test data are below.
This query returns the unique customer id's I want to work with, along with the grouped order_id's. Of these records, I only need the most recent order (based on date_added).
SELECT COUNT(customer_id), customer_id, GROUP_CONCAT(order_id) FROM orderTable GROUP BY customer_id HAVING COUNT(customer_id)>1 LIMIT 10;
mysql> SELECT COUNT(customer_id), customer_id, GROUP_CONCAT(order_id) FROM orderTable GROUP BY customer_id HAVING COUNT(customer_id)>1 LIMIT 10;
+--------------------+-------------+------------------------+
| COUNT(customer_id) | customer_id | GROUP_CONCAT(order_id) |
+--------------------+-------------+------------------------+
| 2 | 0487 | F9,Z33 |
| 3 | 1234 | 3A,5A,88B |
+--------------------+-------------+------------------------+
2 rows in set (0.00 sec)
I'm looking for order Z33 (customer_id 0487) and 3A (customer_id 1234).
For clarification, I do not want orders for customers that have only ordered once.
Any help or tips to get me pointed in the right direction appreciated.
Sample table data:
--
-- Table structure for table orderTable
CREATE TABLE IF NOT EXISTS orderTable (
customer_id varchar(10) NOT NULL,
order_id varchar(4) NOT NULL,
date_added date NOT NULL,
PRIMARY KEY (customer_id,order_id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
--
-- Dumping data for table orderTable
INSERT INTO orderTable (customer_id, order_id, date_added) VALUES
('1234', '5A', '1997-01-22'),
('1234', '88B', '1992-05-09'),
('0487', 'F9', '2002-01-23'),
('5799', 'A12F', '2007-01-23'),
('1234', '3A', '2009-01-22'),
('3333', '7FHS', '2009-01-22'),
('0487', 'Z33', '2004-06-23');
==========================================================
Clarification of the query.
The question was to only include those customers that had more... hence my query has it INSIDE with the GROUP BY... This way it ONLY GIVES the customer in question that HAD multiple orders, but at the same time, only gives the most recent date OF the last order for the person... Then the PreQuery is re-joined to the orders table by the common customer ID, but only for the order that matches the last date as detected in the prequery. If a customer only had a single order, its inner PreQuery count would have only been 1 and thus excluded from the final PreQuery result set.
select ot.*
from
( select
customer_id,
max( date_added ) as LastOrderDate,
from
orderTable
having
count(*) > 1
group by
customer_id ) PreQuery
join orderTable ot
on PreQuery.Customer_ID = ot.Customer_ID
and PreQuery.LastOrderDate = ot.date_added