Problems with a very advanced sql query - mysql

I need to do an advanced selection in SQL, but I'm stuck.
I have the following table:
id | user_id | position | value
1 | 1 | 1 | 1
1 | 1 | 2 | 1
1 | 1 | 3 | 3
1 | 2 | 1 | 2
1 | 2 | 2 | 2
1 | 2 | 3 | 2
1 | 3 | 1 | 3
1 | 3 | 2 | 2
1 | 3 | 3 | 1
I need a query that gives me a result set ordered as this:
Total sum for each user (user 1: 5, user 2: 6, user 3: 6)
Value for position 3 for each user (user 1: 3, user 2: 2, user 3: 1)
Val for pos 3 + val for pos 2 for each user (user 1: 4, user 2: 4, user 3: 4)
Val for pos 3 + val for pos 2 + val for pos 1 for each user (user 1: 5, user 2: 6, user 3: 6)
This is just an example, the table can actually contain more positions, so I need a query that is not hard coded on three positions.
NOTE: There is always the same number of positions for each user_id. In this example it's three, but I could as well truncate the table and add data for each user using five positions.
An ugly solution is to assume that there are never no more than ten positions, creating pos1, pos2, and so on as columns and just add them accordingly in the query. If you only use three positions you get a lot of NULL values and you also get stuck with a maximum of ten positions.
I have considered the use of temporary tables, but haven't found a breakthrough there either.
How would you do it?

I need a query that is not hard coded on three positions.
Then you can't output the subtotals in columns. SQL requires that the columns are fixed at the time you prepare the query; you can't write a query that appends more columns dynamically as it discovers how many distinct values are in the data.
You can, however, output a dynamic number of rows.
SELECT t1.user_id, CONCAT(t1.position, '-', MAX(t2.position)) AS position_range,
SUM(t2.value) AS subtotal
FROM MyTable t1
INNER JOIN MyTable t2
ON t1.user_id = t2.user_id AND t1.position <= t2.position
GROUP BY t1.user_id, t1.position;
The output is:
+---------+----------------+----------+
| user_id | position_range | subtotal |
+---------+----------------+----------+
| 1 | 1-3 | 5 |
| 1 | 2-3 | 4 |
| 1 | 3-3 | 3 |
| 2 | 1-3 | 6 |
| 2 | 2-3 | 4 |
| 2 | 3-3 | 2 |
| 3 | 1-3 | 6 |
| 3 | 2-3 | 3 |
| 3 | 3-3 | 1 |
+---------+----------------+----------+
You'll have to write application code to pivot this into columns after you fetch the whole result set.
Sorry, there is no way to write a fully dynamic pivot query in any brand of RDBMS. You have two choices:
Write code to generate the SQL based on data, as shown in #TimLehner's updated answer
Write code to post-process a general-purpose query like the one I show above.

You can potentially do something like this:
select user_id
, sum(value) as value_sum
, (select value from my_table where user_id = t.user_id and position = 3) as pos_3_val
, (select sum(value) from my_table where user_id = t.user_id and position >= 2) as pos_2_3_val
, (select sum(value) from my_table where user_id = t.user_id and position >= 1) as pos_1_2_3_val
from my_table as t
group by user_id
order by user_id
I think this should work in most any RDBMS.
If it has to by dynamic, you could potentially create this query in stored procedure or your application and run it.
You could also dynamically pivot your results from a query like this:
select *
, (
select sum(value)
from my_table
where user_id = t.user_id
and position >= t.position
) as running_total_descending
from my_table t
Please let us know if any of this works, and if you have trouble creating a dynamic version (and which RDBMS).
UPDATE
Now that we know the RDBMS (MySQL) we can have a specific dynamic version:
set #sql = null;
select
group_concat(distinct
concat(
' sum(case when position >= ',
position,
' then value end) as pos_',
position,
'_plus'
)
) into #sql
from my_table;
set #sql = concat('select user_id,', #sql, ' from my_table t group by user_id;');
prepare stmt from #sql;
execute stmt;
deallocate prepare stmt;
SQL Fiddle
Special thanks to #bluefeet for posting this type of solution often.
I should also note that many devs believe this type of pivoting often belongs in the application or front-end. I'm no exception, both for separation of concerns and because your app can generally scale better than your OLTP database.

Related

How to get maximum appearance count of number from comma separated number string from multiple rows in MySQL?

My MySQL table having column with comma separated numbers. See below example -
| style_ids |
| ---------- |
| 5,3,10,2,7 |
| 1,5,12,9 |
| 6,3,5,9,4 |
| 8,3,5,7,12 |
| 7,4,9,3,5 |
So my expected result should have top 5 numbers with maximum appearance count in descending order as 5 rows as below -
| number | appearance_count_in_all_rows |
| -------|----------------------------- |
| 5 | 5 |
| 3 | 4 |
| 9 | 3 |
| 7 | 2 |
| 4 | 2 |
Is it possible to get above result by MySQL query ?
As already alluded to in the comments, this is a really bad idea. But here is one way of doing it -
WITH RECURSIVE seq (n) AS (
SELECT 1 UNION ALL SELECT n+1 FROM seq WHERE n < 20
), tbl (style_ids) AS (
SELECT '5,3,10,2,7' UNION ALL
SELECT '1,5,12,9' UNION ALL
SELECT '6,3,5,9,4' UNION ALL
SELECT '8,3,5,7,12' UNION ALL
SELECT '7,4,9,3,5'
)
SELECT seq.n, COUNT(*) appearance_count_in_all_rows
FROM seq
JOIN tbl ON FIND_IN_SET(seq.n, tbl.style_ids)
GROUP BY seq.n
ORDER BY appearance_count_in_all_rows DESC
LIMIT 5;
Just replace the tbl cte with your table.
As already pointed out you should fix the data if possible.
For further details read Is storing a delimited list in a database column really that bad?.
You could use below answer which is well explained here and a working fiddle can be found here.
Try,
select distinct_nr,count(distinct_nr) as appearance_count_in_all_rows
from ( select substring_index(substring_index(style_ids, ',', n), ',', -1) as distinct_nr
from test
join numbers on char_length(style_ids) - char_length(replace(style_ids, ',', '')) >= n - 1
) x
group by distinct_nr
order by appearance_count_in_all_rows desc ;

Leaderboard position SQL optimization

I'm offering an experience leaderboard for a Discord bot I actively develop with stuff like profile cards showing one's rank. The SQL query I'm currently using works flawlessly, however I notice that this query takes a rather long processing time.
SELECT id,
discord_id,
discord_tag,
xp,
level
FROM (SELECT #rank := #rank + 1 AS id,
discord_id,
discord_tag,
xp,
level
FROM profile_xp,
(SELECT #rank := 0) r
ORDER BY xp DESC) t
WHERE discord_id = '12345678901';
The table isn't too big (roughly 20k unique records), but this query is taking anywhere between 300-450ms on average, which piles up relatively fast with a lot of concurrent requests.
I was wondering if this query can be optimized to increase performance. I've isolated this to this query, the rest of the MySQL server is responsive and swift.
I'd be happy about any hint and thanks in advance! :)
You're scanning 20,000 rows to assign "row numbers" then selecting exactly one row from it. You can use aggregation instead:
SELECT *, (
SELECT COUNT(*)
FROM profile_xp AS x
WHERE xp > profile_xp.xp
) + 1 AS rnk
FROM profile_xp
WHERE discord_id = '12345678901'
This will give you rank of the player. For dense rank use COUNT(DISTINCT xp). Create an index on xp column if necessary.
Not an answer; too long for a comment:
I usually write this kind of thing exactly the same way that you have done, because it's quick and easy, but actually there's a technical flaw with this method - although it only becomes apparent in certain situations.
By way of illustration, consider the following:
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT NOT NULL PRIMARY KEY);
INSERT INTO ints VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Your query:
SELECT a.*
, #i:=#i+1 rank
FROM ints a
JOIN (SELECT #i:=0) vars
ORDER
BY RAND() DESC;
+---+------+
| i | rank |
+---+------+
| 3 | 4 |
| 2 | 3 |
| 5 | 6 |
| 1 | 2 |
| 7 | 8 |
| 9 | 10 |
| 4 | 5 |
| 6 | 7 |
| 8 | 9 |
| 0 | 1 |
+---+------+
Look, the result set isn't 'random' at all. rank always corresponds to i
Now compare that with the following:
SELECT a.*
, #i:=#i+1 rank
FROM
( SELECT * FROM ints ORDER by RAND() DESC) a
JOIN (SELECT #i:=0) vars;
+---+------+
| i | rank |
+---+------+
| 5 | 1 |
| 2 | 2 |
| 8 | 3 |
| 7 | 4 |
| 4 | 5 |
| 6 | 6 |
| 0 | 7 |
| 1 | 8 |
| 3 | 9 |
| 9 | 10 |
+---+------+
Assuming discord_id is the primary key for the table, and you're just trying to get one entry's "rank", you should be able to take a different approach.
SELECT px.discord_id, px.discord_tag, px.xp, px.level
, 1 + COUNT(leaders.xp) AS rank
, 1 + COUNT(DISTINCT leaders.xp) AS altRank
FROM profile_xp AS px
LEFT JOIN profile_xp AS leaders ON px.xp < leaders.xp
WHERE px.discord_id = '12345678901'
GROUP BY px.discord_id, px.discord_tag, px.xp, px.level
;
Note I have "rank" and "altRank". rank should give you a similar position to what you were originally looking for; your results could have fluctuated for "ties", this rank will always put tied players at their highest "tie". If 3 records tie for 2nd place, those (queried separately with this) will show 2nd place, the next xp down would should 5th place (assuming 1 in 1st, 2,3,4 in 2nd, 5 in 5th). The altRank would "close the gaps" putting 5 in the 3rd place "group".
I would also recommend an index on xp to speed this up further.

How do I combine two queries on the same table to get a single result set in MySQL

I am not very good at sql but I am getting there. I have searched stackoverflow but I can't seem to find the solution and I hope someone out there can help me. I have a table (users) with data like the following. The book_id column is a key to another table that contains a book the user is subscribed to.
|--------|---------------------|------------------|
| id | book_id | name |
|--------|---------------------|------------------|
| 1 | 1 | jim |
| 2 | 1 | joyce |
| 3 | 1 | mike |
| 4 | 1 | eleven |
| 5 | 2 | max |
| 6 | 2 | dustin |
| 7 | 2 | lucas |
|--------|---------------------|------------------|
I have a function in my PHP code that returns two random users from a specific book id (either 1 or 2). Query one returns the result in column 1 and result two returns the results in column 2 like:
|---------------------|------------------|
| 1 | 2 |
|---------------------|------------------|
| jim | max |
| joyce | dustin |
|---------------------|------------------|
I have achieved this by running two separate queries as seen below. I want to know if it's possible to achieve this functionality with one query and how.
$random_users_with_book_id_1 = SELECT name FROM users WHERE book_id=1 LIMIT 2
$random_users_with_book_id_2 = SELECT name FROM users WHERE book_id=2 LIMIT 2
Again, I apologise if it's too specific. The query below has been closest to what I was trying to achieve.:
SELECT a.name AS book_id_1, b.name AS book_id_2
FROM users a, users b
WHERE a.book_id=1 AND b.book_id = 2
LIMIT 2
EDIT: I have created a fiddle to play around with his. I appreciate any help! Thank you!! http://sqlfiddle.com/#!9/7fcbca/1
It is easy actually :)
you can use UNION like this:
SELECT * FROM (
(SELECT * FROM user WHERE n_id=1 LIMIT 2)
UNION
(SELECT * FROM user WHERE n_id=2 LIMIT 2))
collection;
if you read this article about the documentation you can use the () to group the individual queries and the apply the union in the middle. Without the parenthesis it would still LIMIT 2 and show only the two first. Ref. "To apply ORDER BY or LIMIT to an individual SELECT, place the clause inside the parentheses that enclose the SELECT:"
If you want to combine the queries in MySQL, you can just use parentheses:
(SELECT name
FROM users
WHERE n_id = 1
LIMIT 2
) UNION ALL
(SELECT name
FROM users
WHERE n_id = 2
LIMIT 2
);
First, only use UNION if you specifically want to incur the overhead of removing duplicates. Otherwise, use UNION ALL.
Second, this does not return random rows. This returns arbitrary rows. In many cases, this might be two rows near the beginning of the data. If you want random rows, then use ORDER BY rand():
(SELECT name
FROM users
WHERE n_id = 1
ORDER by rand()
LIMIT 2
) UNION ALL
(SELECT name
FROM users
WHERE n_id = 2
ORDER BY rand()
LIMIT 2
);
There are other methods that are more efficient, but this should be fine for up to a few thousand rows.

SQL query to find missing entries in

I have a database in which I need to find some missing entries and fill them in.
I have a table called "menu", each restaurant has multiple dishes and each dish has 4 different language entries (actually 8 in the main database but for simplicity lets go with 4), I need to find out which dishes for a particular restaurant are missing any language entries.
select * from menu where restaurantid = 1
i get stuck there, something along the lines of where language 1 or 2 or 3 or 4 doesn't exist which is the complicated bit because I need to see the languages that exist in order to see the language that's missing because I can't display something that isn't there. I hope that makes sense?
In the example table below restaurant 2 dishid 2 is missing language 3, that's what i need to find.
+--------------+--------+----------+-----------+
| RestaurantID | DishID | DishName | Language |
+--------------+--------+----------+-----------+
| 1 | 1 | Soup | 1 |
| 1 | 1 | Soúp | 2 |
| 1 | 1 | Soupe | 3 |
| 1 | 1 | Soupa | 4 |
| 1 | 2 | Bread | 1 |
| 1 | 2 | Bréad | 2 |
| 1 | 2 | Breade | 3 |
| 1 | 1 | Breada | 4 |
| 2 | 1 | Dish1 | 1 |
| 2 | 1 | Dísh1 | 2 |
| 2 | 1 | Disha1 | 3 |
| 2 | 1 | Dishe1 | 4 |
| 2 | 2 | Dish2 | 1 |
| 2 | 2 | Dísh2 | 2 |
| 2 | 2 | Dishe2 | 4 |
+--------------+--------+----------+-----------+
An anti-join pattern is usually the most efficient, in terms of performance.
Your particular case is a little more tricky, in that you need to "generate" rows that are missing. If every (ResturantID,DishID) should have 4 rows, with Language values of 1,2,3 and 4, we can generate that set of all rows with a CROSS JOIN operation.
The next step is to apply an anti-join... a LEFT OUTER JOIN to the rows that exist in the menu table, so we get all the rows from the CROSS JOIN set, along with matching rows.
The "trick" is to use a predicate in the WHERE clause that filters out rows where we found a match, so we are left rows that didn't have a match.
(It seems a bit strange at first, but once you get your brain wrapped around the anti-join pattern, it becomes familiar.)
So a query of this form should return the specified result set.
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
LEFT
JOIN menu m
ON m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
Let's unpack that bit.
First we get a set containing the numbers 1 thru 4.
Next we get a set containing the (RestaurantID, DishID) distinct tuples. (For each distinct Restaurant, a distinct list of DishID, as long as there is at least one row for any Language for that combination.)
We do a CROSS JOIN, matching every row from set one (lang) with every row from set (d), to generate a "complete" set of every (RestaurantID, DishID, Language) we want to have.
The next part is the anti-join... the left outer join to menu to find which of the rows from the "complete" set has a matching row in menu, and filtering out all the rows that had a match.
That may be a little confusing. If we think of that CROSS JOIN operation producing a temporary table that looks like the menu table, but containing all possible rows... we can think of it in terms of pseudocode:
create temporary table all_menu_rows (RestaurantID, MenuID, Language) ;
insert into all_menu_rows ... all possible rows, combinations ;
Then the anti-join pattern is a little easier to see:
SELECT r.RestaurantID
, r.DishID
, r.Language
FROM all_menu_rows r
LEFT
JOIN menu m
ON m.RestaurantID = r.RestaurantID
AND m.DishID = r.DishID
AND m.Language = r.Language
WHERE m.RestaurantID IS NULL
ORDER BY 1,2,3
(But we don't have to incur the extra overhead of creating and populating the temporary table, we can do that right in the query.)
Of course, this isn't the only approach. We could use a NOT EXISTS predicate instead of an anti-join, though this is not usually as efficient. The first part of the query is the same, to generate the "complete" set of rows we expect to have; what differs is how we identify whether or not there is a matching row in the menu table:
SELECT d.RestaurantID
, d.DishID
, lang.id AS missing_language
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
) lang
CROSS
JOIN (SELECT e.RestaurantID, e.DishID
FROM menu e
GROUP BY e.RestaurantID, e.DishID
) d
WHERE NOT EXISTS ( SELECT 1
FROM menu m
WHERE m.RestaurantID = d.RestaurantID
AND m.DishID = d.DishID
AND m.Language = lang.id
)
ORDER BY 1,2,3
For each row in the "complete" set (generated by the CROSS JOIN operation), we're going to run a correlated subquery that checks whether a matching row is found. The NOT EXISTS predicate returns TRUE if no matching row is found. (This is a little easier to understand, but it usually doesn't perform as well as the anti-join pattern.)
You can use the following statement if each menu item should have a record on each language (8 in real life 4 in example). You can change the number 4 to 8 if you want to see all menu items per restaurant that doesn't have all 8 entries.
SELECT RestaurantID,DishID, COUNT( * )
FROM Menu
GROUP BY RestaurantID,DishID
HAVING COUNT( * ) <4

How to do specific MySQL calculations depending on the column value?

Is there a way to multiply a column with a predefined number based on another column? There are multiple predefined numbers that are used depending on the value in the column.
Example:
Table
Columns: persons_id,activity,scale
Values
1,swimming,4
1,baseball,2
1,basketball,3
2,swimming,6
2,basketball,3
If my predefined numbers are: 6 (swimming), 8 (baseball), 5 (basketball)
The output would look like this
1,swimming,4,24
1,baseball,2,16
1,basketball,2,10
2,swimming,6,36
2,basketball,3,15
Edit: Thank you everyone for contributing. I ended up using the solution from sgeddes.
Sure, you can use CASE:
SELECT Persons_Id, Activity, Scale,
Scale *
CASE
WHEN Activity = 'swimming' THEN 6
WHEN Activity = 'baseball' THEN 8
WHEN Activity = 'basketball' THEN 5
ELSE 1
END Total
FROM YourTable
Good luck.
Have another column called WEIGHT that multiples the SCALE value. Perhaps you can calculate the product using a trigger to populate the column. Otherwise, a simple SELECT will do fine.
you can use this query:
select persons_id, activity, scale,
scale * case when activity = 'swimming' then 6
when activity = 'baseball' then 8
when activity = 'basketball' then 5 end as result
from Table1
but a better solution will be defining a new table Coefficients(activity, coefficient)
so that you can insert rows:
'swimming', 6
'baseball', 8
'basketball', 5
then use something like this:
select persons_id, activity, scale, scale * coefficient as result
from Table1 inner join Coefficients on Table1.activity = Coefficients.activity
You can also use a table that stores the value or create a subquery that will return the multipliers:
select persons_id,
t.activity,
scale,
scale * s.val as result
from yourtable t
inner join
(
select 'swimming' activity, 6 val
union all
select 'baseball' activity, 8 val
union all
select 'basketball' activity, 5 val
) s
on t.activity = s.activity
See SQL Fiddle with Demo
The result is:
| PERSONS_ID | ACTIVITY | SCALE | RESULT |
--------------------------------------------
| 1 | swimming | 4 | 24 |
| 1 | baseball | 2 | 16 |
| 1 | basketball | 3 | 15 |
| 2 | swimming | 6 | 36 |
| 2 | basketball | 3 | 15 |