EDITED Looking for SQL improvement - mysql

By referencing Collaborative filtering in MySQL? , I have created the following ones:
CREATE TABLE `ub` (
`user_id` int(11) NOT NULL,
`book_id` varchar(10) NOT NULL,
`rate` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`book_id`),
UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into ub values (1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
insert into ub values (2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
insert into ub values (3, 'X', '10'), (3, 'Y', '8'), (3, 'C', '10'), (3,'Z', '10');
insert into ub values (4, 'W', '8'), (4, 'Q', '8'), (4, 'C', '10'), (4,'Z', '8');
Then, I can able to get the following table and understand how it works.
create temporary table ub_rank as
select similar.user_id,count(*) rank
from ub target
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id and target.rate= similar.rate
where target.user_id = 1
group by similar.user_id;
select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+---------+------+
However, I start to be confused after the following code.
select similar.rate, similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate
where target.book_id is null
group by similar.book_id
order by total_rank desc, rate desc;
+---------+------------+
| book_id | total_rank |
+---------+------------+
| X | 4 |
| D | 3 |
| Z | 2 |
| Y | 1 |
| Q | 1 |
| W | 1 |
+---------+------------+
(SOLVED) First, I wondering the total rank of X and D why not the same (i.e. 3). Isn't it count the number of books which the same as user A for user B? So, D and X should be 3?!
(SOLVED) Second, how should I modify the code such as the rate can act as an element for the ranking. That is, if the rank of 2 books are the same, then the one with higher marks will place higher rank.
Thanks
EDITED
(1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
(2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
What I wanna do is that, suppose user 1 and 2 have similar behavior ( chosen A,B,C before with matched rating), thus I will recommend D to user A , as it has a higher rate.
Seems the code above not to do so? As, the first ranked is X.

First, I wondering the total rank of X and D why not the same (i.e.
3). Isn't it count the number of books which the same as user A for
user B? So, D and X should be 3?!
X has a greater rank as its present for the second user_id and the third user_id, the query gets the total of the rank, in this case 3 (user_id = 2) + 1 (user_id = 3)
Second, how should I modify the code such as the rate can act as an
element for the ranking. That is, if the rank of 2 books are the same,
then the one with higher marks will place higher rank.
Use the same query and order it by rate desc after the rank, like
select similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate
where target.book_id is null
group by similar.book_id
order by total_rank desc, rate desc;
Update: As per your requirement, you need to the get the list of books that have the closest match with other users and have the maximum price, try the below query for the same
SELECT
temp.book_id,
temp.rate as book_rate
FROM (
SELECT
similar.user_id,
COUNT( similar.book_id ) as book_match_count
FROM
ub target
JOIN ub similar ON target.book_id= similar.book_id AND target.user_id != similar.user_id
WHERE
target.user_id = 1
GROUP BY
similar.user_id
) AS users_with_book_matches
JOIN ub temp ON ( temp.user_id =users_with_book_matches.user_id AND temp.book_id NOT IN ( SELECT book_id FROM ub WHERE ub.user_id = 1 ) )
GROUP BY
temp.book_id
ORDER BY
users_with_book_matches.book_match_count DESC,
temp.rate DESC
limit 5
The above query gets the top 5 closest book matches
Here's the SqlFiddle, make sure to change the user_id at 2 places, hope this serves your purpose

Related

How do I select the max(timestamp) from a relational mysql table fast

We are developing a ticket system and for the dashboard we want to show the tickets with it's latest status. We have two tables. The first one for the ticket itself and a second table for the individual edits.
The system is running already, but the performance for the dashboard is very bad (6 seconds for ~1300 tickets). At first we used a statemant which selected 'where timestamp = (select max(Timestamp))' for every ticket. In the second step we created a view which only includes the latest timestamp for every ticket, but we are not able to also include the correct status into this view.
So the main Problem might be, that we can't build a table in which for every ticket the lastest ins_date and also the latest status is selected.
Simplyfied database looks like:
CREATE TABLE `ticket` (
`id` int(10) NOT NULL,
`betreff` varchar(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `ticket_relation` (
`id` int(11) NOT NULL,
`ticket` int(10) NOT NULL,
`info` varchar(10000) DEFAULT NULL,
`status` int(1) NOT NULL DEFAULT '0',
`ins_date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`ins_user` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `ticket` (`id`, `betreff`) VALUES
(1, 'Technische Frage'),
(2, 'Ticket 2'),
(3, 'Weitere Fragen');
INSERT INTO `ticket_relation` (`id`, `ticket`, `info`, `status`, `ins_date`, `ins_user`) VALUES
(1, 1, 'Betreff 1', 0, '2019-05-28 11:02:18', 123),
(2, 1, 'Betreff 2', 3, '2019-05-28 12:07:36', 123),
(3, 2, 'Betreff 3', 0, '2019-05-29 06:49:32', 123),
(4, 3, 'Betreff 4', 1, '2019-05-29 07:44:07', 123),
(5, 2, 'Betreff 5', 1, '2019-05-29 07:49:32', 123),
(6, 2, 'Betreff 6', 3, '2019-05-29 08:49:32', 123),
(7, 3, 'Betreff 7', 2, '2019-05-29 09:49:32', 123),
(8, 2, 'Betreff 8', 1, '2019-05-29 10:49:32', 123),
(9, 3, 'Betreff 9', 2, '2019-05-29 11:49:32', 123),
(10, 3, 'Betreff 10', 3, '2019-05-29 12:49:32', 123);
I have created a SQL Fiddle: http://sqlfiddle.com/#!9/a873b6/3
The first three Statements are attempts that won't work correct or way too slow. The last one is the key I think, but I don't understand, why this gets the status wrong.
The attempt to create the table with latest ins_date AND status for each ticket:
SELECT
ticket, status, MAX(ins_date) as max_date
FROM
ticket_relation
GROUP BY
ticket
ORDER BY
ins_date DESC;
This query gets the correct (latest) ins_date for every ticket, but not the latest status:
+--------+--------+----------------------+
| ticket | status | max_date |
+--------+--------+----------------------+
| 3 | 1 | 2019-05-29T12:49:32Z |
+--------+--------+----------------------+
| 2 | 0 | 2019-05-29T10:49:32Z |
+--------+--------+----------------------+
| 1 | 0 | 2019-05-28T12:07:36Z |
+--------+--------+----------------------+
Expected output would be this:
+--------+--------+----------------------+
| ticket | status | max_date |
+--------+--------+----------------------+
| 3 | 3 | 2019-05-29T12:49:32Z |
+--------+--------+----------------------+
| 2 | 1 | 2019-05-29T10:49:32Z |
+--------+--------+----------------------+
| 1 | 3 | 2019-05-28T12:07:36Z |
+--------+--------+----------------------+
Is there a efficient way to select the latest timestamp and status for every ticket in the tiket-table?
Other approach is to think filtering not GROUPing..
Query
SELECT
ticket_relation_1.ticket
, ticket_relation_1.status
, ticket_relation_1.ins_date
FROM
ticket_relation AS ticket_relation_1
LEFT JOIN
ticket_relation AS ticket_relation_2
ON
ticket_relation_1.ticket = ticket_relation_2.ticket
AND
ticket_relation_1.ins_date < ticket_relation_2.ins_date
WHERE
ticket_relation_2.id IS NULL
ORDER BY
ticket_relation_1.id DESC
Result
| ticket | status | ins_date |
| ------ | ------ | ------------------- |
| 3 | 3 | 2019-05-29 12:49:32 |
| 2 | 1 | 2019-05-29 10:49:32 |
| 1 | 3 | 2019-05-28 12:07:36 |
see demo
This query would require a index KEY(ticket, ins_date, id) to get max performance..
One solution would be to use a subquery to compute the latest insert date for each ticket, and then to join the results with the original table, like:
SELECT t.ticket, t.status, t.ins_date
FROM ticket_relation t
INNER JOIN (
SELECT ticket, max(ins_date) max_ins_date
FROM ticket_relation
GROUP BY ticket
) x ON t.ticket = x.ticket AND t.ins_date = x.max_ins_date
For better performance with this query, you want an index on (ticket, ins_date).
Anoter option would be to use a NOT EXISTS condition to ensure that only the latest record is selected, like:
SELECT t.ticket, t.status, t.ins_date
FROM ticket_relation t
WHERE NOT EXISTS (
SELECT 1
FROM ticket_relation t1
WHERE t1.ticket = t.ticket AND t1.ins_date > t.ins_date)
)
NB: when dealing with GROUP BY, all non-aggregated columns must appear in the GROUP BY clause. Else, you will get either an error or unprectictable results (depending on whether server option ONLY_FULL_GROUP_BY is, respectively, enabled or disabled).
If you are able to upgrade to a recent version of mysql (8.0), then window functions can be used to simplify the query and possibly increase its performance, like:
SELECT ticket, status, ins_date
FROM (
SELECT
ticket,
status,
ins_date,
row_number() over(partition by ticket order by ins_date desc) rn
FROM ticket_relation
) x WHERE rn = 1
You can try below query -
SELECT
ticket, status, ins_date as max_date
FROM ticket_relation a
where ins_date in (select max(ins_date) from ticket_relation b where a.ticket=b.ticket)

How to join two mysql table and retrieve latest results from joined table?

I have two tables. I need to join these two tables and retrieve latest status from execution table. How can I retrieve?
My schema and data:
CREATE TABLE test
(`id` serial primary key, `ref_id` int, `ref_name` varchar(7))
;
INSERT INTO test
(`id`, `ref_id`, `ref_name`)
VALUES
(1, 1, 'trial'),
(2, 3, 'test'),
(3, 7, 'testing')
;
CREATE TABLE execution
(`id` serial primary key, `ref_id` int, `status` varchar(11))
;
INSERT INTO execution
(`id`, `ref_id`, `status`)
VALUES
(1, 1, 'Completed'),
(2, 2, 'Completed'),
(3, 1, 'Completed'),
(4, 3, 'In progress'),
(5, 3, 'To do'),
(6, 2, 'In progress'),
(7, 1, 'Completed'),
(7, 1, 'To do')
;
Expected result is here below.
ref_id | ref_name | status |
3 | testing | In progress |
2 | test | To do |
1 | trial | To do |
I have tried with below query:
SELECT
ref_id,
ref_name,
status
FROM
test
JOIN execution ON test.ref_id = execution.ref_id
GROUP BY `ref_id`
ORDER BY `ref_id` DESC;
This query retrieves the status, but the retrieved status is not a latest one. How can retrieve the latest status by joining these two tables.
you can use below query
select T2.ref_id,T2.ref_name,OE.status from
(
select t1.ref_id,t1.ref_name,e.id from test t1 inner join
(select max(id) as id,ref_id from execution group by ref_id) as e
on
t1.ref_id=e.ref_id
) as T2
inner join execution OE on T2.id=OE.id
https://www.db-fiddle.com/f/rvnm8APX27dmW9a84JkCsS/1
It seems you have given in-correct data as an example as ref_id 7 not found in
execution table. However this might help you
SELECT b.ref_id,
b.ref_name,
a.status
FROM execution a
JOIN (SELECT MAX(id) id ,ref_id
FROM execution
GROUP BY ref_id) a1
USING(id,ref_id)
JOIN test b ON a.ref_id = b.ref_id ORDER BY ref_id DESC;

(mysql) Select 50 highest rated items, with at most one item coming from each user

I'm not sure how to go about doing this efficiently in MySQL and would appreciate any help.
The goal is to select 50 of the top-selling items, with at most one item from each user. I'm used to doing this with either CTE's or DISTINCT ON, but of course that's not an option in MySQL. I'm hoping for a single-query solution, and I'd like to avoid using stored procedures.
The basic schema is a table of items posted by users, and a table of sales with a field determining the score of that particular sale.
CREATE TABLE items (
item_id INT PRIMARY KEY,
user_id INT NOT NULL
)
CREATE TABLE sales (
item_id INT NOT NULL,
score INT NOT NULL
)
-- Create some sample data
INSERT INTO items VALUES (1, 1), (2, 1), (3, 1), (4, 2), (5, 2), (6, 3), (7, 3);
INSERT INTO sales VALUES (1, 1), (1, 1), (2, 1), (3, 2), (3, 1), (4, 3), (4, 2), (5, 2), (6, 1), (6, 1), (6, 1), (7, 2);
The result of the query against this sample data should be
+---------+---------+-------------+
| user_id | item_id | total_score |
+---------+---------+-------------+
| 2 | 4 | 5 |
| 1 | 3 | 3 |
| 3 | 6 | 3 |
+---------+---------+-------------+
Here's the PostgreSQL solution:
SELECT DISTIN ON (items.user_id)
items.user_id,
items.item_id,
SUM(sales.score) AS total_score
FROM items
JOIN sales ON (sales.item_id = items.item_id)
GROUP BY items.item_id
ORDER BY total_score DESC
LIMIT 50
Here's the MySQL solution I've come up with, but it's quite ugly. I tried doing essentially the same thing using a temporary table, but in the process learned that MySQL doesn't allow joining to a temporary table multiple times in the same query.
SELECT items_scores.user_id, items_scores.item_id, items_scores.total_score
FROM (
SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
FROM items
JOIN sales ON
sales.item_id = items.item_id
GROUP BY items.item_id
) AS items_scores
WHERE items_scores.total_score =
(
SELECT MAX(t.total_score)
FROM (
SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
FROM items
JOIN sales ON
sales.item_id = items.item_id
GROUP BY items.item_id
) AS t
WHERE t.user_id = items_scores.user_id
)
ORDER BY items_scores.total_score DESC
MySQL query for it:
select user, item, total_score
from (
select sum(sales.score) as total_score, items.user_id as user, items.item_id as item
from sales
inner join items on sales.item_id = items.item_id
group by item,user
order by total_score desc) as t
group by user limit 50;
Output:
+------+------+-------------+
| user | item | total_score |
+------+------+-------------+
| 1 | 3 | 3 |
| 2 | 4 | 5 |
| 3 | 6 | 3 |
+------+------+-------------+
3 rows in set (0.00 sec)
Some explanation
MySQL documentation says:
However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.
In our subquery... the nonagregated columns are user_id and item_id , we expect them to be same for every group that we are doing the sum on. Also we are not doing any order by that can influence the agregation..we want all the values of the group to be summed up. Finally we are sorting the output and saving it as a derived table.
Finally we run a select query on this derived table where we do the Group By user .. and Limit the output to 50

SQL for filtering

By referencing Collaborative filtering in MySQL? , I have created the following ones:
CREATE TABLE `ub` (
`user_id` int(11) NOT NULL,
`book_id` varchar(10) NOT NULL,
`rate` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`book_id`),
UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
insert into ub values (1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
insert into ub values (2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
insert into ub values (3, 'X', '10'), (3, 'Y', '8'), (3, 'C', '10'), (3,'Z', '10');
insert into ub values (4, 'W', '8'), (4, 'Q', '8'), (4, 'C', '10'), (4,'Z', '8');
Then, I can able to get the following table and understand how it works.
create temporary table ub_rank as
select similar.user_id,count(*) rank
from ub target
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id and target.rate= similar.rate
where target.user_id = 1
group by similar.user_id;
select * from ub_rank;
+---------+------+
| user_id | rank |
+---------+------+
| 2 | 3 |
| 3 | 1 |
| 4 | 1 |
+---------+------+
However, I start to be confused after the following code.
select similar.rate, similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate
where target.book_id is null
group by similar.book_id
order by total_rank desc, rate desc;
+---------+------------+
| book_id | total_rank |
+---------+------------+
| X | 4 |
| D | 3 |
| Z | 2 |
| Y | 1 |
| Q | 1 |
| W | 1 |
+---------+------------+
(1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
(2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
What I wanna do is that, suppose user 1 and 2 have similar behavior ( chosen A,B,C before with matched rating), thus I will recommend D to user A , as it has a higher rate.
Seems the code above not to do so? As, the first ranked is X. How can I change the code in order to achieve the goal mentioned?
Or, actually does the existing method is a better/more accuracy for recommendation?
The existing query is ranking the results based on the total value of rank for each book, and then using rate as a tie-break for books which have the same total rank. (Also, rate will essentially be random since similar.rate is not aggregated, grouped on or functionally dependent on a grouping item in the query.)
As such, X will be ranked higher than D because it has been chosen by one user of rank 3 and one user of rank 1, giving a total rank of 4, whereas D has only been chosen by one user of rank 3.
You could change the query to include a rating element weighted by ranking - for example:
select similar.book_id,
sum(ub_rank.rank) total_rank,
sum(ub_rank.rank*similar.rate) wtd_rate
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate
where target.book_id is null
group by similar.book_id
order by wtd_rate desc, total_rank desc
- although in this case this will still rank X higher, as it has a rating of 7 from a user of rank 3 plus a rating of 10 from a user of rank 1, giving a total rank of 31, compared with D's total rank of 24.
(SQLFiddle here)
If you want X to rank higher than D, you need to decide what criteria you are going to use that would rank X higher than D.

MySQL Ordering a query - further question

Further to a recently answered question, I have the following code:
SELECT q21coding, COUNT(q21coding) AS Count
FROM tresults_acme
WHERE q21 IS NOT NULL AND q21 <> ''
GROUP BY q21coding
ORDER BY IF(q21coding = 'Other', 1, 0) ASC, Count DESC
It brings back the following:
q21coding Count
Difficulty in navigating/finding content 53
Positive comments 28
Suggestions for improvement 14
Inappropriate content/use 13
Improve search facility 6
Include information about staff and teams 5
Content needs updating 4
Other 30
You'll notice that Other is now at the bottom - However is there a way of ensuring that Positive comments and Other is ALWAYS the bottom two (with other at the bottom) regardless of the Count size?
Thanks,
Homer
Actually there was no need to use IF(q21coding = 'Other', 1, 0) in your original query. In MySQL you can use any expression in the ORDER BY caluse and q21coding = 'Other' would have been enough:
... ORDER BY q21coding = 'Other', Count DESC
The q21coding = 'Other' expression will return 1 if true, or 0 if false. That will put rows with a q21coding = 'Other' at the bottom.
What you need to do to have 'Positive Comments' and 'Other' both at the bottom is something like this:
... ORDER BY q21coding = 'Other', q21coding = 'Positive comments', Count DESC
Basic test case:
CREATE TABLE my_table (id int, q21coding varchar(100), count int);
INSERT INTO my_table VALUES (1, 'Inappropriate content/use', 13);
INSERT INTO my_table VALUES (2, 'Other', 30);
INSERT INTO my_table VALUES (3, 'Difficulty in navigating/finding content', 53);
INSERT INTO my_table VALUES (4, 'Positive comments', 28);
INSERT INTO my_table VALUES (5, 'Improve search facility', 6);
INSERT INTO my_table VALUES (6, 'Content needs updating', 4);
INSERT INTO my_table VALUES (7, 'Suggestions for improvement', 14);
INSERT INTO my_table VALUES (8, 'Include information about staff and teams', 5);
Result:
SELECT q21coding, count
FROM my_table
ORDER BY q21coding = 'Other', q21coding = 'Positive comments', Count DESC;
+-------------------------------------------+-------+
| q21coding | count |
+-------------------------------------------+-------+
| Difficulty in navigating/finding content | 53 |
| Suggestions for improvement | 14 |
| Inappropriate content/use | 13 |
| Improve search facility | 6 |
| Include information about staff and teams | 5 |
| Content needs updating | 4 |
| Positive comments | 28 |
| Other | 30 |
+-------------------------------------------+-------+
8 rows in set (0.00 sec)