Group Concat TWO columns but in GROUPS -- THREE separators involved - mysql

We have a mysql table
id name groupid
1 user1 0
2 user2 0
3 user3 1
4 user4 1
We want the GROUP CONCAT such that we get the output as
1,user1;2,user2---3,user3;4,user4

This does what you describe:
create table NoOneEverNamesTheTableInSqlQuestions (
id int,
name text,
groupid int
);
insert into NoOneEverNamesTheTableInSqlQuestions values
(1, 'user1', 0),
(2, 'user2', 0),
(3, 'user3', 1),
(4, 'user4', 1);
select group_concat(g separator '---') as output
from (
select group_concat(concat_ws(',',id,name) separator ';') as g
from NoOneEverNamesTheTableInSqlQuestions
group by groupid
) as g;
Output, tested with MySQL 8.0.0-dmr:
+-----------------------------------+
| output |
+-----------------------------------+
| 1,user1;2,user2---3,user3;4,user4 |
+-----------------------------------+
But I don't know why you would want to do this. It seems like something that would be easier to do in application code.

Related

Using the count function on third table in two table select statement in MariaDB

I just spent a few hours reading through the MariaDB docs and various questions here trying to figure out a SQL statement that did what I want. I'm definitely not an expert... eventually I did get the result I expected, but I have no idea why it works. I want to be sure I am actually getting the result I want, and it isn't just working for the few test cases I have thrown at it.
I have three tables guestbook, users, and user_likes. I am trying to write a SQL statement that will return the user name and first name from users, post content, post date, post id from guestbook, and a third column likes which is the total number of times that post id from guestbook appears in the user_likes table. It should only return posts which are of type standard and should order the rows by ascending post date.
Sample data:
CREATE TABLE users
(`user_id` int, `user_first` varchar(6), `user_last` varchar(7),
`user_email` varchar(26), `user_uname` varchar(6))
;
INSERT INTO users
(`user_id`, `user_first`, `user_last`, `user_email`, `user_uname`)
VALUES
(0, 'Bob', 'Abc', 'email#example.com', 'user1'),
(13, 'Larry', 'Abc', 'email#example.com', 'user2'),
(15, 'Noel', 'Abc', 'email#example.com', 'user3'),
(16, 'Kate', 'Abc', 'email#example.com', 'user4'),
(17, 'Walter', 'Sobchak', 'walter.sobchak#shabbus.com', 'Walter'),
(18, 'Jae', 'Abc', 'email#example.com', 'user5')
;
CREATE TABLE user_likes
(`user_id` int, `post_id` int, `like_id` int)
;
INSERT INTO user_likes
(`user_id`, `post_id`, `like_id`)
VALUES
(0, 23, 1),
(0, 41, 2),
(13, 23, 7)
;
CREATE TABLE guestbook
(`post_id` int, `user_id` int, `post_date` datetime,
`post_content` varchar(27), `post_type` varchar(8),
`post_level` int, `post_parent` varchar(4))
;
INSERT INTO guestbook
(`post_id`, `user_id`, `post_date`, `post_content`,
`post_type`, `post_level`, `post_parent`)
VALUES
(2, 0, '2018-12-15 20:32:40', 'test1', 'testing', 0, NULL),
(8, 0, '2018-12-16 14:06:40', 'test2', 'testing', 0, NULL),
(9, 13, '2018-12-16 15:47:55', 'test4', 'testing', 0, NULL),
(23, 0, '2018-12-25 17:59:46', 'Merry Christmas!', 'standard', 0, NULL),
(39, 16, '2018-12-26 00:28:04', 'Hello!', 'standard', 0, NULL),
(40, 15, '2019-01-27 00:46:12', 'Hello 2', 'standard', 0, NULL),
(41, 18, '2019-02-25 00:44:35', 'What are you doing?', 'standard', 0, NULL)
;
I tried a whole bunch of convoluted statements involving count and couldn't get what I wanted. Through what seems like dumb luck I stumbled into creating this statement which appears to be giving me what I want.
SELECT
u.user_uname, u.user_first, g.post_id, g.post_date,
g.post_content, count(user_likes.post_id) AS likes
FROM
users AS u, guestbook AS g
LEFT JOIN
user_likes on g.post_id=user_likes.post_id
WHERE
u.user_id=g.user_id AND g.post_type='standard'
GROUP BY
g.post_id
ORDER BY
g.post_date ASC;
Question:
Why does this count function appear to work?
The count function that I was able to get working is this, but it only works for hard coded post_id values.
SELECT COUNT(CASE post_id WHEN 23 THEN 1 ELSE null END) FROM user_likes;
When I try to match the post_id from guestbook table by changing to this I get an incorrect value which appears to be the whole table of user_likes.
SELECT COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) FROM guestbook, user_likes;
Adding a GROUP BY guestbook.post_id to the end gets me closer, but now I need to figure out how to combine that with my original select statement.
+----------------------------------------------------------------------------+
| COUNT(case when guestbook.post_id=user_likes.post_id then 1 else null end) |
+----------------------------------------------------------------------------+
| 0 |
| 0 |
| 0 |
| 2 |
| 0 |
| 0 |
| 1 |
+----------------------------------------------------------------------------+
This is the output I want, which I am getting. I just don't trust that my statement is reliable or correct.
+------------+------------+---------+---------------------+---------------------+-------+
| user_uname | user_first | post_id | post_date | post_content | likes |
+------------+------------+---------+---------------------+---------------------+-------+
| user1 | Bob | 23 | 2018-12-25 17:59:46 | Merry Christmas! | 2 |
| user4 | Kate | 39 | 2018-12-26 00:28:04 | Hello! | 0 |
| user3 | Noel | 40 | 2019-01-27 00:46:12 | Hello 2 | 0 |
| user5 | Jae | 41 | 2019-02-25 00:44:35 | What are you doing? | 1 |
+------------+------------+---------+---------------------+---------------------+-------+
Fiddle of statement working: http://sqlfiddle.com/#!9/968656/1/0
JOIN + COUNT -- A query first combines the tables as directed by the JOIN and ON clauses. The result is put (at least logically) into a temporary table. Often this temp table has many more rows than any of the tables being JOINed.
Then the COUNT(..) is performed. It is counting the number of rows in that temp table. Maybe that count is exactly what you want, maybe it is a hugely inflated number.
count(user_likes.post_id) has the additional hiccup of not counting any rows where user_likes.post_id IS NULL. That is usually irrelevant, in which case, you should simply say COUNT(*).
Please don't use the commalist form for joining. Always use FROM a JOIN b ON ... where the ON clause says how tables a and b are related. If there is also some filtering, put that into the WHERE clause.
If the COUNT is too big, put aside the query you have developed and start over to develop a query that does exactly one thing -- compute the county. This query will probably use fewer tables.
Then build on that to get any other data you need. It may look something like
SELECT ...
FROM ( SELECT foo, COUNT(*) AS ct FROM t1 GROUP BY foo ) AS sub1
JOIN t2 ON t2.foo = sub1.foo
JOIN t3 ON ...
WHERE ...
Get that initial query that gets the right COUNT. Then, if needed, come back for more help.
As tried by Bryan
OK, I made a few changes.
SELECT u.user_uname, u.user_first,
g2.post_id, g2.post_content, g2.post_date,
sub.likes
FROM
(
SELECT g.post_id,
SUM(g.post_id = ul.post_id) AS likes
FROM guestbook AS g
JOIN user_likes AS ul
WHERE g.post_type = 'standard'
) AS sub
JOIN guestbook AS g2 ON sub.post_id = g2.post_id
JOIN users AS u ON u.user_id = g2.user_id;
Indexes:
guestbook: (post_type, post_id) -- for derived table
guestbook: (post_id) -- for outer SELECT
users: (user_id)
user_likes: (post_id)
Notes:
ORDER BY removed since it was useless in context.
COUNT..CASE changed to shorter SUM.
JOIN ON used
Since there is only one value coming from the derived table, this might work equally well:
SELECT u.user_uname, u.user_first,
g.post_id, g.post_content, g.post_date,
( SELECT COUNT(*)
FROM user_likes AS ul
WHERE g.post_id = ul.post_id
) AS likes
FROM guestbook AS g
JOIN users AS u USING(user_id);
WHERE g.post_type = 'standard'
This involved lots of changes; see if it looks 'right'. It is now a lot simpler.
Indexes are same as above.

SQL join multiple criteria

I have a difficult task to build up an array retrieved from a table similar to the one below:
table_a
id | scenario_id | entity_id
1 1;2;3;4;5 1;3
2 4;5;8;10 2;3
3 1;5;8;11 1;2;4;
4 3;5;8;9 4;5;
Now, if one user selects from one entity_id, let's say 3, the SQL query should return something similiar to:
scenario_id
1;2;3;4;5;8;10
Or, if he selects 5, the returned array should look like:
scenario_id
3;5;8;9
Could that be done using only SQL statements?
For SQL Server you can use this to get desired output:
DECLARE #xml xml, #entity_id int = 3
--Here I generate data similar to yours
;WITH cte AS (
SELECT *
FROM (VALUES
(1, '1;2;3;4;5', '1;3'),
(2, '4;5;8;10', '2;3'),
(3, '1;5;8;11', '1;2;4;'),
(4, '3;5;8;9', '4;5;')
) as t(id, scenario_id, [entity_id])
)
--create xml
SELECT #xml = (
SELECT CAST('<i id="'+ CAST(id as nvarchar(10)) +'"><s>' + REPLACE(scenario_id,';','</s><s>') + '</s><e>' + REPLACE([entity_id],';','</e><e>') + '</e></i>' as xml)
FROM cte
FOR XML PATH('')
)
--Normalizing the table and getting result
SELECT STUFF((
SELECT ';' + CAST(scenario_id as nvarchar(10))
FROM (
SELECT DISTINCT t.v.value('.','int') as scenario_id
FROM #xml.nodes('/i/s') as t(v)
INNER JOIN #xml.nodes('/i/e') as s(r)
ON t.v.value('../#id','int') = s.r.value('../#id','int')
WHERE s.r.value('.','int') = #entity_id
) as p
FOR XML PATH('')),1,1,'') as scenario_id
Output for entity_id = 3:
scenario_id
1;2;3;4;5;8;10
For entity_id = 5
scenario_id
3;5;8;9
you can use something like this to find a id in the scenario_id, but its always a FULL TABLE scan.
SELECT *
FROM table_a
WHERE
FIND_IN_SET('3', REPLACE(scenario_id,';',',')) > 0;
Simple. NORMALISE your schema... At it's crudest, that might be as follows...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL
,scenario_id INT NOT NULL
,entity_id INT NOT NULL
,PRIMARY KEY (id,scenario_id,entity_id)
);
INSERT INTO my_table VALUES
(1, 1,1),
(1, 1,3),
(1, 2,1),
(1, 2,3),
(1, 3,1),
(1, 3,3),
(1, 4,1),
(1, 4,3),
(1, 5,1),
(1, 5,3),
(2, 4,2),
(2, 4,3),
(2, 5,2),
(2, 5,3),
(2, 8,2),
(2, 8,3),
(2,10,2),
(2,10,3),
(3, 1,1),
(3, 1,2),
(3, 1,4),
(3, 5,1),
(3, 5,2),
(3, 5,4),
(3, 8,1),
(3, 8,2),
(3, 8,4),
(3,11,1),
(3,11,2),
(3,11,4),
(4, 3,4),
(4, 3,5),
(4, 5,4),
(4, 5,5),
(4, 8,4),
(4, 8,5),
(4, 9,4),
(4, 9,5);
SELECT DISTINCT scenario_id FROM my_table WHERE entity_id = 3 ORDER BY scenario_id;
+-------------+
| scenario_id |
+-------------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 8 |
| 10 |
+-------------+
split the scenario_id by ';' and copy to temporary table to use that for your query use instr and substring functions
this link may help you but you need a loop function to call your procedure as the ';' is repeated

(mysql) Select 50 highest rated items, with at most one item coming from each user

I'm not sure how to go about doing this efficiently in MySQL and would appreciate any help.
The goal is to select 50 of the top-selling items, with at most one item from each user. I'm used to doing this with either CTE's or DISTINCT ON, but of course that's not an option in MySQL. I'm hoping for a single-query solution, and I'd like to avoid using stored procedures.
The basic schema is a table of items posted by users, and a table of sales with a field determining the score of that particular sale.
CREATE TABLE items (
item_id INT PRIMARY KEY,
user_id INT NOT NULL
)
CREATE TABLE sales (
item_id INT NOT NULL,
score INT NOT NULL
)
-- Create some sample data
INSERT INTO items VALUES (1, 1), (2, 1), (3, 1), (4, 2), (5, 2), (6, 3), (7, 3);
INSERT INTO sales VALUES (1, 1), (1, 1), (2, 1), (3, 2), (3, 1), (4, 3), (4, 2), (5, 2), (6, 1), (6, 1), (6, 1), (7, 2);
The result of the query against this sample data should be
+---------+---------+-------------+
| user_id | item_id | total_score |
+---------+---------+-------------+
| 2 | 4 | 5 |
| 1 | 3 | 3 |
| 3 | 6 | 3 |
+---------+---------+-------------+
Here's the PostgreSQL solution:
SELECT DISTIN ON (items.user_id)
items.user_id,
items.item_id,
SUM(sales.score) AS total_score
FROM items
JOIN sales ON (sales.item_id = items.item_id)
GROUP BY items.item_id
ORDER BY total_score DESC
LIMIT 50
Here's the MySQL solution I've come up with, but it's quite ugly. I tried doing essentially the same thing using a temporary table, but in the process learned that MySQL doesn't allow joining to a temporary table multiple times in the same query.
SELECT items_scores.user_id, items_scores.item_id, items_scores.total_score
FROM (
SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
FROM items
JOIN sales ON
sales.item_id = items.item_id
GROUP BY items.item_id
) AS items_scores
WHERE items_scores.total_score =
(
SELECT MAX(t.total_score)
FROM (
SELECT items.user_id, items.item_id, SUM(sales.score) as total_score
FROM items
JOIN sales ON
sales.item_id = items.item_id
GROUP BY items.item_id
) AS t
WHERE t.user_id = items_scores.user_id
)
ORDER BY items_scores.total_score DESC
MySQL query for it:
select user, item, total_score
from (
select sum(sales.score) as total_score, items.user_id as user, items.item_id as item
from sales
inner join items on sales.item_id = items.item_id
group by item,user
order by total_score desc) as t
group by user limit 50;
Output:
+------+------+-------------+
| user | item | total_score |
+------+------+-------------+
| 1 | 3 | 3 |
| 2 | 4 | 5 |
| 3 | 6 | 3 |
+------+------+-------------+
3 rows in set (0.00 sec)
Some explanation
MySQL documentation says:
However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.
In our subquery... the nonagregated columns are user_id and item_id , we expect them to be same for every group that we are doing the sum on. Also we are not doing any order by that can influence the agregation..we want all the values of the group to be summed up. Finally we are sorting the output and saving it as a derived table.
Finally we run a select query on this derived table where we do the Group By user .. and Limit the output to 50

MySQL Ordering a query - further question

Further to a recently answered question, I have the following code:
SELECT q21coding, COUNT(q21coding) AS Count
FROM tresults_acme
WHERE q21 IS NOT NULL AND q21 <> ''
GROUP BY q21coding
ORDER BY IF(q21coding = 'Other', 1, 0) ASC, Count DESC
It brings back the following:
q21coding Count
Difficulty in navigating/finding content 53
Positive comments 28
Suggestions for improvement 14
Inappropriate content/use 13
Improve search facility 6
Include information about staff and teams 5
Content needs updating 4
Other 30
You'll notice that Other is now at the bottom - However is there a way of ensuring that Positive comments and Other is ALWAYS the bottom two (with other at the bottom) regardless of the Count size?
Thanks,
Homer
Actually there was no need to use IF(q21coding = 'Other', 1, 0) in your original query. In MySQL you can use any expression in the ORDER BY caluse and q21coding = 'Other' would have been enough:
... ORDER BY q21coding = 'Other', Count DESC
The q21coding = 'Other' expression will return 1 if true, or 0 if false. That will put rows with a q21coding = 'Other' at the bottom.
What you need to do to have 'Positive Comments' and 'Other' both at the bottom is something like this:
... ORDER BY q21coding = 'Other', q21coding = 'Positive comments', Count DESC
Basic test case:
CREATE TABLE my_table (id int, q21coding varchar(100), count int);
INSERT INTO my_table VALUES (1, 'Inappropriate content/use', 13);
INSERT INTO my_table VALUES (2, 'Other', 30);
INSERT INTO my_table VALUES (3, 'Difficulty in navigating/finding content', 53);
INSERT INTO my_table VALUES (4, 'Positive comments', 28);
INSERT INTO my_table VALUES (5, 'Improve search facility', 6);
INSERT INTO my_table VALUES (6, 'Content needs updating', 4);
INSERT INTO my_table VALUES (7, 'Suggestions for improvement', 14);
INSERT INTO my_table VALUES (8, 'Include information about staff and teams', 5);
Result:
SELECT q21coding, count
FROM my_table
ORDER BY q21coding = 'Other', q21coding = 'Positive comments', Count DESC;
+-------------------------------------------+-------+
| q21coding | count |
+-------------------------------------------+-------+
| Difficulty in navigating/finding content | 53 |
| Suggestions for improvement | 14 |
| Inappropriate content/use | 13 |
| Improve search facility | 6 |
| Include information about staff and teams | 5 |
| Content needs updating | 4 |
| Positive comments | 28 |
| Other | 30 |
+-------------------------------------------+-------+
8 rows in set (0.00 sec)

MySQL multiple table query with average for each row

This is my setup:
Table "files": id (PK), filename, user_id, date, filesize
Table "scores": id(PK), file_id, user_id, score
Table "files" contains a list of files with details; table "scores" keeps track of 1-5 points scored per file. I need to get entries from the "files" table and in each row I need all the info for the file, as well as the average score. I can do another query for teh current file_id while I'm looping through the rows, but obviousely that's not very optimized. I tried something like below, but no success.
SELECT files.*, (SUM(scores.score)/(COUNT(scores.score))) AS total FROM files INNER JOIN scores ON files.id=scores.file_id;
Please point me in the right direction - thanks!
You may want to try the following:
SELECT f.id, f.filename, f.user_id, f.date, f.filesize,
(
SELECT AVG(s.score)
FROM scores s
WHERE s.file_id = f.id
) average_score
FROM files f;
Note that you can use the AVG() aggregate function. There is no need to divide the SUM() by the COUNT().
Test case:
CREATE TABLE files (id int, filename varchar(10));
CREATE TABLE scores (id int, file_id int, score int);
INSERT INTO files VALUES (1, 'f1.txt');
INSERT INTO files VALUES (2, 'f2.txt');
INSERT INTO files VALUES (3, 'f3.txt');
INSERT INTO files VALUES (4, 'f4.txt');
INSERT INTO scores VALUES (1, 1, 10);
INSERT INTO scores VALUES (2, 1, 15);
INSERT INTO scores VALUES (3, 1, 20);
INSERT INTO scores VALUES (4, 2, 5);
INSERT INTO scores VALUES (5, 2, 10);
INSERT INTO scores VALUES (6, 3, 20);
INSERT INTO scores VALUES (7, 3, 15);
INSERT INTO scores VALUES (8, 3, 15);
INSERT INTO scores VALUES (9, 4, 12);
Result:
SELECT f.id, f.filename,
(
SELECT AVG(s.score)
FROM scores s
WHERE s.file_id = f.id
) average_score
FROM files f;
+------+----------+---------------+
| id | filename | average_score |
+------+----------+---------------+
| 1 | f1.txt | 15.0000 |
| 2 | f2.txt | 7.5000 |
| 3 | f3.txt | 16.6667 |
| 4 | f4.txt | 12.0000 |
+------+----------+---------------+
4 rows in set (0.06 sec)
Note that #Ignacio's solution produces the same result, and is therefore another option.
Aggregate functions are not usually useful without aggregation.
SELECT f.*, AVG(s.score) AS total
FROM files AS f
INNER JOIN scores AS s
ON f.id=s.file_id
GROUP BY f.id