Issue with grouping? - mysql

I asked earlier about a solution to my problem which worked however now when I'm trying to get some information from a second table (that stores more information) I'm running into a few issues.
My tables are as follows
Users
+----+----------------------+---------------+------------------+
| id | username | primary_group | secondary_groups |
+----+----------------------+---------------+------------------+
| 1 | Username1 | 3 | 7,10 |
| 2 | Username2 | 7 | 3,5,10 |
| 3 | LongUsername | 1 | 3,7 |
| 4 | Username3 | 1 | 3,10 |
| 5 | Username4 | 7 | |
| 6 | Username5 | 5 | 3,7,10 |
| 7 | Username6 | 2 | 7 |
| 8 | Username7 | 4 | |
+----+----------------------+---------------+------------------+
Profile
+----+---------------+------------------+
| id | facebook | steam |
+----+---------------+------------------+
| 1 | 10049424151 | 11 |
| 2 | 10051277183 | 55 |
| 3 | 10051281183 | 751 |
| 4 | | 735 |
| 5 | 10051215770 | 4444 |
| 6 | 10020210531 | 50415 |
| 7 | 10021056938 | 421501 |
| 8 | 10011547143 | 761 |
+----+---------------+------------------+
My SQL is as follows (based off the previous thread)
SELECT u.id, u.username, p.id, p.facebook, p.steam
FROM users u, profile p
WHERE p.id=u.id AND FIND_IN_SET( '7', secondary_groups )
OR primary_group = 7
GROUP BY u.id
The problem is my output is displayed as below
+----+----------------------+-------------+-------+
| id | username | facebook | steam |
+----+----------------------+-------------+-------+
| 1 | Username1 | 10049424151 | 11 |
| 2 | Username2 | 10051277183 | 55 |
| 3 | LongUsername | 10051281183 | 751 |
| 4 | Username4 | 10051215770 | 4444 |
| 5 | Username5 | 10049424151 | 11 |
| 6 | Username6 | 10049424151 | 55 |
+----+----------------------+-------------+-------+

I'm guessing that the problem is that profile rows with a primary_group of 7 are getting matched to all user rows. Remove the GROUP BY, and you'll be able to better see what is happening.
But that's just a guess. It's not clear what you are attempting to achieve.
I suspect you are getting tripped up with the order of precedence of the AND and OR. (The AND operator has a higher order of precedence than OR operator. That means the AND will be evaluated before the OR.)
The quick fix is to just add some parens, to override the default order of operations. Something like this:
WHERE p.id=u.id AND ( FIND_IN_SET('7',secondary_groups) OR primary_group = 7 )
-- ^ ^
The parens will cause the OR operation to be evaluated (as either TRUE, FALSE or NULL) and then the result from that will be evaluated in the AND.
Without the parens, it's the same as if the parens were here:
WHERE ( p.id=u.id AND FIND_IN_SET('7',secondary_groups) ) OR primary_group = 7
-- ^ ^
With the AND condition evaluated first, and the result from that is operated on by OR. This is what is causing profile rows with a 7 to be matched to rows in user with different id values.
A few pointers on style:
avoid the old-school comma operator for join operations, and use the newer JOIN syntax
place the join predicates (conditions) in the ON clause, other filtering criteria in the WHERE clause
qualify all column references
As an example:
SELECT u.id
, u.username
, p.id
, p.facebook
, p.steam
FROM users u
JOIN profile p
ON p.id = u.id
WHERE u.primary_group = 7
OR FIND_IN_SET('7',u.secondary_groups)
ORDER BY u.id
We only need a GROUP BY clause if we want to "collapse" rows. If the id column is unique in both the users and profile tables, then there's no need for a GROUP BY u.id. We can add an ORDER BY clause if we want rows returned in a particular sequence.

I don't know, what exactly do you want to do with output, but you can't group informations like this. MySQL isn't really a classic programming language, it's more like powerful tool for set mathematics. So if you want to get informations based on corelations between two or more tables, first you write a select statement which contains raw data which you want to work with, like this:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id
GROUP BY u.id;
Now you select relevant data with WHERE statement:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id WHERE
FIND_IN_SET( '7', secondary_groups ) OR primary_group = 7
GROUP BY u.id;
Now you should see grouped joined tables profile and users, and can start mining data. For example, if you want to count items in these groups, just add count function in SELECT and so on.
When debugging SQL, I highly recommend these steps:
1.) First, you should write down all corelations between data, all foreign keys between tables, so you will know if your selection is fully deterministic. You can now start JOINing tables from left to right
2.) Try small bits of querys on model database. Then you will see which selection works right and which doesn't do what you expected.

I think #SIDU has it in the comments: You are experiencing a Boolean order of operations problem. See also SQL Logic Operator Precedence: And and Or
For example:
SELECT 0 AND 0 OR 1 AS test;
+------+
| test |
+------+
| 1 |
+------+
When doing complex statements with both AND and OR, use parenthesis. The operator order problem is leading to you doing an unintended outer join that's being masked by your GROUP BY. You shouldn't need a GROUP BY for that statement.
Although I don't personally care for the style #spencer7593 suggests in his answer(using INNER JOIN, etc.), it does have the advantage of preventing or identifying errors early for people new to SQL, so it's something to consider.

Related

SQL - get records from many-to-many relations by the user itself -OR- his group

I have two database tables, one as the main table and the other as the relation table.
The first table is a table of contents and the second table is a table that connects to users or groups.
Some data may also be modified in this second table.
I'm not sure about the structure and performance.
for example, we have User Id 160 which is under group id 7
So for the first, we have a post Table.
id | title | content | cover | status
------------------------------------------------
1 | first | content 1 | /img/... | 1
2 | second | content 2 | /img/... | 1
3 | another | content 3 | /img/... | 1
4 | four | content 4 | /img/... | 1
5 | five | content 5 | /img/... | 1
and for the second we have a post_rel Table:
id | group_id | user_id | post_id | title | cover | sort | status
---------------------------------------------------------------------------
1 | 7 | NULL | 1 | g title | img/... | 1 | 1
2 | NULL | 160 | 1 | u title | NULL | 2 | 1 *** selected for user_id
3 | 7 | NULL | 2 | NULL | img/... | 6 | 0
4 | NULL | 160 | 2 | NULL | img/... | 4 | 1 *** selected for user_id
5 | NULL | 160 | 3 | some | img/... | 3 | 1 *** selected for user_id
6 | 7 | NULL | 4 | NULL | img/... | 9 | 1 *** selected for group_id
7 | NULL | 165 | 5 | NULL | img/... | 5 | 0
This is the basic query we have.
select
`post_rel`.`title` as `custom_title`,
`post_rel`.`cover` as `custom_cover`,
`post_rel`.`group_id`,
`post_rel`.`user_id`,
`post`.*
from
`post`
inner join `post_rel` on `post`.`id` = `post_rel`.`post_id`
where
`post`.`status` = 1
and `post_rel`.`status` = 1
and (
`post_rel`.`user_id` = 160
or (
`post_rel`.`group_id` = 7
and `post_rel`.`post_id` not in (
select
`post_rel`.`post_id`
from
`post_rel`
where
`post_rel`.`user_id` = 160
)
)
)
order by
`post_rel`.`sort` asc
So, what you think about the basic query? Especially in the subquery, won't performance drop in a large table? Is it possible to write a better and simpler query or change the structure?
Edit: this is sqlfiddle example of my code and structure http://sqlfiddle.com/#!9/ed9d4b/1
I would change it to use "not exists" instead of "not in" and would use aliases so I could pull it off like so:
select
b.`title` as `custom_title`,
b.`cover` as `custom_cover`,
b.`group_id`,
b.`user_id`,
a.*
from
`post` a
inner join `post_rel` b on a.`id` = b.`post_id`
where
a.`status` = 1
and b.`status` = 1
and (
b.`user_id` = 160
or (
b.`group_id` = 7
and not exists (
select
'x'
from
`post_rel` c
where
c.`user_id` = 160 and c.`post_id`=b.`post_id`
)
)
)
order by
b.`sort` asc
Typically when managing users and group, there's this notion of an exception user who directly can get assigned to assets just like the whole group. This seems to be an example of that.
From a modeling-only perspective, there are 2 ways to deal with that:
Ensure that every user exists in a group and that you only assign assets to groups. For the exception user, create a group. You could even enforce that every user belongs to only one group. This way your post_rel table deals with only groups. Unfortunately, the relationship between group and user is not understood well enough to weigh in appropriately.
Driven only by the need to eliminate null values towards a good model which also reduces overhead, the other option is to use name value pairs and allows the User and Group to exist in the same field with another field besides it, denoting Group or User.
These are the SQL Fiddle:
NOT EXISTS version: http://sqlfiddle.com/#!9/1af8cf/2
NOT IN version: http://sqlfiddle.com/#!9/1af8cf/1
Some reading on nulls https://dev.mysql.com/doc/refman/5.6/en/data-size.html
Specifically:
Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.

"GROUP BY" on MariaDB behaves differently from MySQL

I have been told many times that same queries MariaDB will work just the same like how it is on MySQL... until I meet this problem.
Recently, I am trying to clone an application from MySQL(InnoDB) to MariaDB(XtraDB).
Although MariaDB runs MySQL queries without the need of changing anything, I was surprised to discover that the same queries actually behave quite differently on both platforms particularly in ORDER BY and GROUP BY.
For an example:
MyTable
=======
+----+----------+---------------------+-----------+
| id | parentId | creationDate | name |
+----+----------+---------------------+-----------+
| 1 | 2357 | 2017-01-01 06:03:40 | Anna |
+----+----------+---------------------+-----------+
| 2 | 5480 | 2017-01-02 07:13:20 | Becky |
+----+----------+---------------------+-----------+
| 3 | 2357 | 2017-01-03 08:20:12 | Christina |
+----+----------+---------------------+-----------+
| 4 | 2357 | 2017-01-03 08:20:15 | Dorothy |
+----+----------+---------------------+-----------+
| 5 | 5480 | 2017-01-04 09:25:45 | Emma |
+----+----------+---------------------+-----------+
| 6 | 1168 | 2017-01-05 10:30:10 | Fiona |
+----+----------+---------------------+-----------+
| 7 | 5480 | 2017-01-05 10:33:23 | Gigi |
+----+----------+---------------------+-----------+
| 8 | 1168 | 2017-01-06 12:46:34 | Heidi |
+----+----------+---------------------+-----------+
| 9 | 1168 | 2017-01-06 12:46:34 | Irene |
+----+----------+---------------------+-----------+
| 10 | 2357 | 2017-01-07 14:58:37 | Jane |
+----+----------+---------------------+-----------+
| 11 | 2357 | 2017-01-07 14:58:37 | Katy |
+----+----------+---------------------+-----------+
Basically what I want to get from a query is the latest records from each GROUPing (i.e. parentId). By latest, I mean MAX(creationDate) and MAX(id)
So, for the above example, since there are only three different parentId values, I am hoping to get:
+----+----------+---------------------+-----------+
| id | parentId | creationDate | name |
+----+----------+---------------------+-----------+
| 11 | 2357 | 2017-01-07 14:58:37 | Katy |
+----+----------+---------------------+-----------+
| 9 | 1168 | 2017-01-06 12:46:34 | Irene |
+----+----------+---------------------+-----------+
| 7 | 5480 | 2017-01-05 10:33:23 | Gigi |
+----+----------+---------------------+-----------+
Originally the application has queries similar to this fashion:
SELECT * FROM
( SELECT * FROM `MyTable` WHERE `parentId` IN (...)
ORDER BY `creationDate` DESC, `id` DESC ) AS `t`
GROUP BY `parentId`;
On MySQL, this works, since the inner query will order and then the outer query gets the first of each GROUP from the result of the inner query. The outer query basically obeys ordering of the inner query.
But on MariaDB, the outer query will ignore the ordering of the inner query result. I get this on MariaDB instead:
+----+----------+---------------------+-----------+
| id | parentId | creationDate | name |
+----+----------+---------------------+-----------+
| 1 | 2357 | 2017-01-01 06:03:40 | Anna |
+----+----------+---------------------+-----------+
| 2 | 5480 | 2017-01-02 07:13:20 | Becky |
+----+----------+---------------------+-----------+
| 6 | 1168 | 2017-01-05 10:30:10 | Fiona |
+----+----------+---------------------+-----------+
To achieve the same behaviour on MariaDB, I have come up with something like this. (Not sure if this is accurate though.)
SELECT `t1`.* FROM `MyTable` `t1` LEFT JOIN `MyTable` `t2` ON (
`t1`.`parentId` = `t2`.`parentId`
AND `t2`.`parentId` IN (...)
AND `t1`.`creationDate` <= `t2`.`creationDate`
AND `t1`.`id` < `t2`.`id`)
) WHERE `t2`.`id` IS NULL;
Now the problem is... If I am going to rewrite the queries, I have to rewrite hundreds of them... and they are some how a little bit different from each other.
I wonder if anyone here have any ideas that would allow me to make the least changes possible.
Thank you all in advance.
Yeah, this is a link-only answer. But the links are to the MariaDB site.
Here is another discussion of the 'incompatibility': https://mariadb.com/kb/en/mariadb/group-by-trick-has-been-optimized-away/
Technically, speaking, MySQL implemented an extension to the the Ansi standard. Much later, it decided to remove it, so I think you will find that MySQL has migrated toward MariaDB.
Here is list of "fast" ways to do group-wise max, which is probably what you are trying to do: https://mariadb.com/kb/en/mariadb/groupwise-max-in-mariadb/
Your first query would probably work in MySQL but its behavior is not documented: you are grouping by groupid but you are selecting non-aggregated columns with * and the value of any of those non-aggregated columns is undefined - if the value you get is the first value encountered it's just a "matter of luck".
It is true that, even if it cannot be considered correct, on MySQL I have never seen this "trick" fail (and here on stackoverflow there are plenty of upvoted answers suggesting you to use this trick), but MariaDB uses a different optimization engine and you cannot rely on MySQL undocumented behavior.
Your second query needs a little adjustment:
and (
`t1`.`creationDate` < `t2`.`creationDate`
or (
`t1`.`creationDate` = `t2`.`creationDate`
and `t1`.`id` < `t2`.`id`
)
)
because first you are ordering by creation date, then if more than one record share the same creation date you are getting the one with the highest id.
There are other ways to write the same query, e.g.
select * from mytable
where id in (
select max(m.id)
from mytable m inner join (
select parentID, max(creationDate) as max_cd
from mytable
group by ParentID
) t on m.parentID = t.parentID and m.creationDate = t.max_cd
group by m.parentID, m.creationDate
)
but every query needs to be rewritten separately.
Edit
Your example is a little more complicated because you are ordering by both creationDate and id. Let me explain better. First thing to do, for every parentID you have to get the last creationDate:
select parentID, max(creationDate) as max_cd
from MyTable
group by parentID
then for every max creationDate you have to get the highest id:
select t.parentID, t.max_cd, max(t.id) as max_id
from
MyTable t inner join (
select parentID, max(creationDate) as max_cd
from MyTable
group by parentID
) t1 on t.parentID = t1.parentID and t.creationDate = t1.max_cd
group t.parentID, t.max_cd
then you have to get all records where the id are returned by this query. In this particular context a LEFT JOIN with the table itself should be easier to write and more performant.

Calculating row indices with subquery having joins, results in A*B examined rows

This question is derived from a one I started previously: Incorrect row index when grouping
Due to different natures, I'm asking here and will provide the answer back there once I have resolved this issue.
I thought about subqueries, and came up with this:
SELECT
mq.*,
#indexer := #indexer + 1 AS indexer
FROM
(
SELECT
p.id,
p.tag_id,
p.title,
p.created_at
FROM
`posts` AS p
LEFT JOIN
`votes` AS v
ON p.id = v.votable_id
AND v.votable_type = "Post"
AND v.deleted_at IS NULL
WHERE
p.deleted_at IS NULL
GROUP BY
p.id
) AS mq
JOIN
(SELECT #indexer := 0) AS i
Which actually works, I get the desired result:
+----+--------+------------------------------------+---------------------+---------+
| id | tag_id | title | created_at | indexer |
+----+--------+------------------------------------+---------------------+---------+
| 2 | 2 | PostPostPost | 2014-10-23 23:53:15 | 1 |
| 3 | 3 | Title | 2014-10-23 23:56:13 | 2 |
| 4 | 2 | GIFGIFIGIIF | 2014-10-23 23:59:03 | 3 |
| 5 | 2 | GIFGIFIGIIF | 2014-10-23 23:59:03 | 4 |
| 6 | 4 | My new avatar | 2014-10-26 22:22:30 | 5 |
| 7 | 5 | Hi, haiii, oh Hey ! | 2014-10-26 22:38:10 | 6 |
| 8 | 6 | Mclaren testing stealth technology | 2014-10-26 22:44:15 | 7 |
| 9 | 7 | Just random thoughts while pooping | 2014-10-26 22:50:03 | 8 |
+----+--------+------------------------------------+---------------------+---------+
The problem now is... I ran a EXPLAIN query, to see how fast it works. And, I have a number there that is really bugging me:
Well, the number is obvious: 252 * 1663 = 419076.
This worries me, though - is the row count normal there, or I have to optimize the query? And if so, then how do I optimize this one?
As of MySQL version 5.7 all joins are treated as nested loop joins.
MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on.
So to answer your question... no, you won't be able to get that row count down. However, by adding indexes to your join columns you may be able to achieve faster results but your row count will be the same.

MySQL: optimize query for scoring calculation

I have a data table that I use to do some calculations. The resulting data set after calculations looks like:
+------------+-----------+------+----------+
| id_process | id_region | type | result |
+------------+-----------+------+----------+
| 1 | 4 | 1 | 65.2174 |
| 1 | 5 | 1 | 78.7419 |
| 1 | 6 | 1 | 95.2308 |
| 1 | 4 | 1 | 25.0000 |
| 1 | 7 | 1 | 100.0000 |
+------------+-----------+------+----------+
By other hand I have other table that contains a set of ranges that are used to classify the calculations results. The range tables looks like:
+----------+--------------+---------+
| id_level | start | end | status |
+----------+--------------+---------+
| 1 | 0 | 75 | Danger |
| 2 | 76 | 90 | Alert |
| 3 | 91 | 100 | Good |
+----------+--------------+---------+
I need to do a query that add the corresponding 'status' column to each value when do calculations. Currently, I can do that adding the following field to calculation query:
select
...,
...,
[math formula] as result,
(select status
from ranges r
where result between r.start and r.end) status
from ...
where ...
It works ok. But when I have a lot of rows (more than 200K), calculation query become slow.
My question is: there is some way to find that 'status' value without do that subquery?
Some one have worked on something similar before?
Thanks
Yes, you are looking for a subquery and join:
select s.*, r.status
from (select s.*
from <your query here>
) s left outer join
ranges r
on s.result between r.start and r.end
Explicit joins often optimize better than nested select. In this case, though, the ranges table seems pretty small, so this may not be the performance issue.

In MYSQL, how do I get a LEFT JOIN to return every row in one table, and a flag if there were any matching rows in another table?

Basically, I have two tables, admin_privilege and admin_roll_privilege. I'm trying to write a query to get every row from admin_privilege, and if there is a row in admin_roll_privilege with a matching admin_privilege_id AND a matching admin_roll_id, to set a new column to 1. So far, I have this:
SELECT ap.*,
IF(arp.admin_privilege_id IS NULL,0,1) AS has_privilege
FROM admin_privilege ap LEFT JOIN admin_roll_privilege arp
ON ap.admin_privilege_id=arp.admin_privilege_id
WHERE arp.admin_roll_id=3
OR arp.admin_roll_id IS NULL;
This works in every case except where there are no matching rows admin_roll_privilege.
See Example:
+---------------+--------------------+
| admin_roll_id | admin_privilege_id |
+---------------+--------------------+
| 1 | 2 |
| 1 | 3 |
+---------------+--------------------+
+--------------------+------------------------+
| admin_privilege_id | admin_privilege_name |
+--------------------+------------------------+
| 1 | Access Developer Tools |
| 4 | Edit System Settings |
| 2 | Edit User Profiles |
| 3 | Resolve Challenges |
+--------------------+------------------------+
Querying for WHERE admin roll id=1 works as expected:
+--------------------+------------------------+---------------+
| admin_privilege_id | admin_privilege_name | has_privilege |
+--------------------+------------------------+---------------+
| 1 | Access Developer Tools | 0 |
| 4 | Edit System Settings | 0 |
| 2 | Edit User Profiles | 1 |
| 3 | Resolve Challenges | 1 |
+--------------------+------------------------+---------------+
But, if i query for admin_roll_id=3, i only get two rows returned:
+--------------------+------------------------+---------------+
| admin_privilege_id | admin_privilege_name | has_privilege |
+--------------------+------------------------+---------------+
| 1 | Access Developer Tools | 0 |
| 4 | Edit System Settings | 0 |
+--------------------+------------------------+---------------+
How can I get this query to return all 4?
Edit: This is what ended up working, moving the condition to the on clause:
SELECT ap.*,
IF(arp.admin_privilege_id IS NULL,0,1) AS has_privilege
FROM admin_privilege ap LEFT JOIN admin_roll_privilege arp
ON (ap.admin_privilege_id=arp.admin_privilege_id AND arp.admin_roll_id=1)
Move the appropriate conditions from the WHERE clause to the ON clause.
You are not returning all rows by using the WHERE clause on the entire statement.
Turn the LEFT JOIN into a subselect on wich you can add the WHERE clause you need.
SELECT ap.admin_privilege_id
, ap.admin_privilege_name
, IF(arp.admin_privilege_id IS NULL,0,1) AS has_privilege
FROM admin_privilege ap
LEFT OUTER JOIN (
SELECT admin_privilege_id
FROM admin_roll_privilege arp
WHERE arp.admin_roll_id = 3
) arp ON arp.admin_privilege_id = ap.admin_privilege_id