SELECT grouping by value in field

SELECT grouping by value in field - mysql

Given the following (greatly simplified) example table:
CREATE TABLE `permissions` (
`name` varchar(64) NOT NULL DEFAULT '',
`access` enum('read_only','read_write') NOT NULL DEFAULT 'read_only'
);
And the following example contents:
| name | access |
=====================
| foo | read_only |
| foo | read_write |
| bar | read_only |
What I want to do is run a SELECT query that fetches one row for each unique value in name, favouring those with an access value of read_write, is there a way that this can be done? i.e- such that the results I would get are:
foo | read_write |
bar | read_only |
I may need to add new options to the access column in future, but they will always be in order of importance (lowest to highest) so, if possible, a solution that can cope with this would be especially useful.
Also, to clarify, my actual table includes other fields than these, which is why I'm not using a unique key on the name column; there will be multiple rows by name by design to suit various criteria.

The following will work on your data:
select name, max(access)
from permissions
group by name;
However, this orders by the string values, not the indexes. Here is another method:
select name,
substring_index(group_concat(access order by access desc), ',') as access
from permissions
group by name;
It is rather funky that order by goes by the index but min() and max() use the character value. Some might even call that a bug.

You can create another table with the priority of the access (so you can add new options), and then group by and find the MIN() value of the priority table:
E.g. create a table called Priority with the values
| PriorityID| access |
========================
| 1 | read_write |
| 2 | read_only |
And then,
SELECT A.Name, B.Access
FROM (
SELECT A.name, MIN(B.PriorityID) AS Most_Valued_Option -- This will be 1 if there is a read_write for that name
FROM permissions A
INNER JOIN Priority B
ON A.Access = B.Access
GROUP BY A.Name ) A
INNER JOIN Priority B
ON A.Most_Valued_Option = B.PriorityID
-- Join that ID with the actual access
-- (and we will select the value of the access in the select statement)

The solution proposed by Gordon is sufficient for the current requirements.
If we anticipate a future requirement for a priority order to be other than alphabetical string order (or by enum index value)...
As a modified version of Gordon's answer, I would be tempted to use the MySQL FIELD function and (its converse) ELT function, something like this:
SELECT p.name
, ELT(
MIN(
FIELD(p.access
,'read_only','read_write','read_some'
)
)
,'read_only','read_write','read_some'
) AS access
FROM `permissions` p
GROUP BY p.name
If the specification is to pull the entire row, and not just the value of the access column, we could use an inline view query to find the preferred access, and a join back to the preferences table to pull the whole row...
SELECT p.*
FROM ( -- inline view, to get the highest priority value of access
SELECT r.name
, MIN(FIELD(r.access,'read_only','read_write','read_some')) AS ax
FROM `permissions` r
GROUP BY r.name
) q
JOIN `permissions` p
ON p.name = q.name
AND p.access = ELT(q.ax,'read_only','read_write','read_some')
Note that this query returns not just the access with the highest priority, but can also return any columns from that row.
With the FIELD and ELT functions, we can implement any ad-hoc ordering of a list of specific, known values. Not just alphabetic ordering, or ordering by the enum index value.
That logic for "priority" can be contained within the query, and won't rely on an extra column(s) in the permissions table, or the contents of any other table(s).
To get the behavior we are looking for, just specifying a priority for access, the "list of the values" used in the FIELD function will need to match the "list of values" in the ELT function, in the same order, and the lists should include all possible values of access.
Reference:
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_elt
http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_field
ADVANCED USAGE
Not that you have a requirement to do this, but considering possible future requirements... we note that...
A different order of the "list of values" will result in a different ordering of priority of access. So a variety of queries could each implement their own different rules for the "priority". Which access value to look for first, second and so on, by reordering the complete "list of values".
Beyond just reordering, it is also possible to omit a possible value from the "list of values" in the FIELD and ELT functions. Consider for example, omitting the 'read_only' value from the list on this line:
, MIN(FIELD(r.access,'read_write','read_some')) AS ax
and from this line:
AND p.access = ELT(q.ax,'read_write','read_some')
That will effectively limit the name rows returned. Only name that have an access value of 'read_write' or 'read_some'. Another way to look at that, a name that has only a 'read_only' for access will not be returned by the query.
Other modifications to the "list of values", where the lists don't "match" are also possible, to implement even more powerful rules. For example, we could exclude a name that has a row with 'read_only'.
For example, in the ELT function, in place of the 'read_only' value, we use a value that we know does not (and cannot) exist on any rows. To illustrate,
we can include 'read_only' as the "highest priority" on this line...
, MIN(FIELD(r.access,'read_only','read_write','read_some')) AS ax
^^^^^^^^^^^
so if a row with 'read_only' is found, that will take priority. But in the ELT function in the outer query, we can translate that back to a different value...
AND p.access = ELT(q.ax,'eXcluDe','read_write','read_some')
^^^^^^^^^
If we know that 'eXcluDe' doesn't exist in the access column, we have effectively excluded any name which has a 'read_only' row, even if there is a 'read_write' row.
Not that you have a specification or current requirement to do any of that. Something to keep in mind for future queries that do have these kinds of requirements.

You can use distinct statement (or Group by)
SELECT distinct name, access
FROM tab;

This works too:
SELECT name, MAX(access)
FROM permissions
GROUP BY name ORDER BY MAX(access) desc

Related

SSRS 2008 need to return records with actual NULL value in a field

I have a report that has a parameter where I need to be able to have the actual value of NULL as an option to be selected and to actually have results returned but can't figure it out.
Some background (I'm oversimplifying here to make it more clear):
table 1: Major_Grp, with 2 fields: Major_grp (ID) and Major_desc
(description of group)
table 2: Order_Line (order line details) with
many fields, one of which is Major_Grp (linking the ID to table
Major_Grp)
Table Major_Grp has 5 values, C,P,R,S and W with corresponding descriptions of Construction, Plants, Retail, Seed and Wholesale. However, a record in Order_Line very well could have NULL as the value in Major_Grp. I need to be able to select JUST the records with NULL by themselves, and NOT have them included if I choose either C,P,S,R or W.
The parameter for Major_Grp is set to Allow Nulls (no blanks, and no multiples) and uses a query in the MajorGrp dataset to load the available and default values. In my MajorGrp dataset, I have the following: select distinct major_grp, major_grp_desc from major_grp
UNION select NULL,'Prelim'
When I open the dropdown, I see "Construction","Plants","Retail","Seed", "Wholesale and "Prelim". The report works when I choose one of the real values (i.e. "Plants") but if I choose "Prelim" and want to get ONLY the records that are really NULL, nothing is returned. I've tried Select '(NULL)' and 'NULL' as well with no luck.
In my dataset to retrieve the Order_Line (ol) data, if I use (ol.major_grp IN (#MajorGrp) OR ol.major_grp IS NULL) and I get the NULL records every time along with the real group I chose; if I use just (ol.major_grp IN (#MajorGrp)) and I get the Order_Line records that have a real value when I choose a single real value, but if I try "Prelim" to get just the NULLS, no NULLS are returned.
This is NOT a case where I want NULL to get everything, I literally want to get either Order_Line records that have either a real MajorGrp in them (C,P,R,S or W) or is NULL.

You're really close.
Instead of (ol.major_grp IN (#MajorGrp) OR ol.major_grp IS NULL) and ...
try (ol.major_grp IN (#MajorGrp) OR (ol.major_grp IS NULL and #MajorGrp IS NULL)) and ...
This should work if you are really getting a null through with your parameter. Sometimes this can be hard to control though, so sometimes I cheat a little and use a string such as "<No Value>" instead of NULL. Then I could do something like this:
(ISNULL(ol.major_grp, '<No Value>') IN (#MajorGrp)) and ...
This, of course, will not work if "<No Value>" is a valid value for your #majorgrp.

SQL Query sorting rows by duplicate name keeping lowest in result

I've got a table with 11 columns and I want to create a query that removes the rows with duplicate names in the Full Name's column but keeps the row with the lowest value in the Result's column. Currently I have this.
SELECT
MIN(sql363686.Results2014.Result),
sql363686.Results2014.Temp,
sql363686.Results2014.Full Name,
sql363686.Results2014.Province,
sql363686.Results2014.BirthDate,
sql363686.Results2014.Position,
sql363686.Results2014.Location,
sql363686.Results2014.Date
FROM
sql363686.Results2014
WHERE
sql363686.Results2014.Event = '50m Freestyle'
AND sql363686.Results2014.Gender = 'M'
AND sql363686.Results2014.Agegroup = 'Junior'
GROUP BY
sql363686.Results2014.Full Name
ORDER BY
sql363686.Results2014.Result ASC ;
At first glance it seems to work fine and I get all the correct values, but I seem to be getting a different (wrong) value in the Position column then what I have in my database table. All other values seem to be right. Any ideas on what I'm doing wrong?
I'm currently using dbVisualizer connected to a mysql database. Also, my knowledge and experience with sql is the bare mimimum

Use group by and a join:
select r.*
from sql363686.Results2014 r
(select fullname, min(result) as minresult
from sql363686.Results2014 r
group by fullname
) rr
on rr.fullname = r.fullname and rr.minresult = r.minresult;

You have fallen into the trap of the nonstandard MySQL extension to GROUP BY.
(I'm not going to work with all those fully qualified column names; it's unnecessary and verbose.)
I think you're looking for each swimmer's best time in a particular event, and you're trying to pull that from a so-called denormalized table. It looks like your table has these columns.
Result
Temp
FullName
Province
BirthDate
Position
Location
Date
Event
Gender
Agegroup
So, the first step is to locate the best time in each event for each swimmer. To do this we need to make a couple of assumptions.
A person is uniquely identified by FullName, BirthDate, and Gender.
An event is uniquely identified by Event, Gender, Agegroup.
This subquery will get the best time for each swimmer in each event.
SELECT MIN(Result) BestResult,
FullName,BirthDate, Gender,
Event, Agegroup
FROM Results2014
GROUP BY FullName,BirthDate, Gender, Event, Agegroup
This gets you a virtual table with each person's fastest result in each event (using the definitions of person and event mentioned earlier).
Now the challenge is to go find out the circumstances of each person's best time. Those circumstances include Temp, Province, Position, Location, Date. We'll do that with a JOIN between the original table and our virtual table, like this
SELECT resu.Event,
resu.Gender,
resu.Agegroup,
resu.Result,
resu.Temp.
resu.FullName,
resu.Province,
resu.BirthDate,
resu.Position,
resu.Location,
resu.Date
FROM Results2014 resu
JOIN (
SELECT MIN(Result) BestResult,
FullName,BirthDate, Gender,
Event, Agegroup
FROM Results2014
GROUP BY FullName,BirthDate, Gender, Event, Agegroup
) best
ON resu.Result = best.BestResult
AND resu.FullName = best.FullName
AND resu.BirthDate = best.BirthDate
AND resu.Gender = best.Gender
AND resu.Event = best.Event
AND resu.Agegroup = best.Agegroup
ORDER BY resu.Agegroup, resu.Gender, resu.Event, resu.FullName, resu.BirthDate
Do you see how this works? You need an aggregate query that pulls the best times. Then you need to use the column values in that aggregate query in the ON clause to go get the details of the best times from the detail table.
If you want to report on just one event you can include an appropriate WHERE clause right before ORDER BY as follows.
WHERE resu.Event = '50m Freestyle'
AND resu.Gender = 'M'
AND resu.Agegroup = 'Junior'

Empty set returned when asking for items not in empty set

I have a query that is behaving in ways I would otherwise not expect.
I have two tables, stagin_users and users. In both tables I have a column called name. In the users table, EVERY value for name is NULL. In staging_users I have 13 rows that do not have a NULL value. I am trying to run a query where I get all users in the staging table whose name is not in the users table.
My query as written is:
SELECT name
FROM staging_users
WHERE name NOT IN (SELECT name FROM users);
As the query is written, I get NO results back. What is the reason for this behavior?
As the users table only has NULL values I know I could say WHERE name IS NOT NULL and I would get the same results, but I want this query to work against the values in the table, which all happen to be NULL.

From the docs:
To comply with the SQL standard, IN returns NULL not only if the expression on the left hand side is NULL, but also if no match is found in the list and one of the expressions in the list is NULL.
And
expr NOT IN (value,...) is the same as NOT (expr IN (value,...)).
Thus as you are SELECTing NULL values.. NOT IN returns NULL.. so no rows match.
You could rectify as so:
SELECT name
FROM staging_users
WHERE name IS NOT NULL
AND name NOT IN (
SELECT name
FROM users
WHERE name IS NOT NULL
);
Or, same logic:
SELECT su.name
FROM staging_users su
LEFT JOIN users u
ON u.name = su.name
AND su.name IS NOT NULL
AND u.name IS NOT NULL;
As an extra note, I would seriously question a data structure that allows users to have NULL names.. your original query will work if this is changed.

NOT EXISTS is usually a better option for this type of query. NOT IN tends to be inherently unsafe when working with potentially NULL values. See the following link for more details.
https://dba.stackexchange.com/questions/17407/why-does-not-in-with-a-set-containing-null-always-return-false-null

Storing boolean options in SQL

I am writing code (using MySQL) to solve a problem similar to the following:
There are 20 boolean options (per every user).
Should I store 20 ENUM('false','true') or put into a table only IDs of these options which are true (so probably having less than 20 rows per user)?

If new options are likely to appear and you don't filter by the options, you may as well go with a EAV structure (a record per option).
This way, you can add new options more easily (no change to metadata).
Assuming that the options values are either TRUE or FALSE (no NULL possible), you should create records only for non-default option values (TRUE in your case). An absence of the record would mean false.
To retrieve all options, you could use this:
SELECT *, GROUP_CONCAT(CONCAT(o.id, ': ', ov.user IS NULL), ', ' ORDER BY o.id)
FROM users u
CROSS JOIN
options o
LEFT JOIN
option_value ov
ON (ov.user, ov.option) = (u.id, o.id)
GROUP BY
u.id
, which would give you dynamic output:
user_id options
1 1: 0, 2: 1, 3: 0

I'd suggest creating an Options table with the different options.
+---Options---+
ID
Option
+---Users---+
ID
Name
+---User_Options---+
User_id
Option_id
Now if you need more options, insert them into the Options table, you dont need to alter your database this way.
EDIT: Removed condition in user_options: like Quassnoi mensioned, it would be better to just add records in case of "TRUE" and the absence of a record should be considered "FALSE"

I would recommend storing as a TINYINT 0 or 1. many frameworks work out of box with the TINYINT data type and handle it as a boolean.

Create 3 tables . First one is 'user_table' . It contain username and user_id. Sample data is given below
Table create script is given below
Now create another table called options_table. It contain option_name and option_id for each option. Sample is given below
Now create a third table called 'selected_options'. That table maps user to options. It contain user_id and option_id
Sample is given below
In the above example user1 selected option1 and option2
and user2 selected option1,option2 and option3 ie..option1,option2 and option3 are true for user2

I would recommend using a bitmask column. If you have numerous options, rather than creating a new column per option, you would be able to quickly perform bit-wise comparisons.
For additional info, see:
SELECT users from MySQL database by privileges bitmask?
Implement bitmask or relational ACL in PHP
Using bitmasks to indicate status

Table with a lot of attributes

I'm planing to build some database project.
One of the tables have a lot of attributes.
My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example
create table User { id, name, surname,... show_name, show_photos, ...)
or
create table User { id, name, surname,... )
create table UserPrivacy {usr_id, show_name, show_photos, ...)
The performance i suppose is similar due to i can use index.

It's best to put all the attributes in the same table.
If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.
Besides, keeping them all in the same table simplifies your queries.
Would you rather have:
SELECT show_photos FROM User WHERE user_id = 1
Or
SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1
Joins are okay, but keep them for associating separate entities and 1->N relationships.
There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.
There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).
Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:
ON u.id = up.user_id
Which seems odd, and the preferred way is this:
ON u.user_id = up.user_id
or more simply:
USING(user_id)
Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.

I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.
You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.

You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.
This will help you to keep the main table smaller.
If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.

Since this appears to be a one to one relationship, I would normally keep it all in one table unless:
You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.
Or if you will normally be querying the main table separately and won't need those fields much of the time.

If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.

Why not have a User table and Features table, e.g.:
create table User ( id int primary key, name varchar(255) ... )
create table Features (
user_id int,
feature varchar(50),
enabled bit,
primary key (user_id, feature)
)
Then the data in your Features table would look like:
| user_id | feature | enabled
| -------------------------------
| 291 | show_photos | 1
| -------------------------------
| 291 | show_name | 1
| -------------------------------
| 292 | show_photos | 0
| -------------------------------
| 293 | show_name | 0

I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:
TABLE Attribute
(
ID
Name
)
TABLE User
(
ID
...
)
TABLE UserAttributes
(
UserID FK Users.ID
Attribute FK Attributes.ID
Value...
)
Good comments from everyone. I should have been clearer in my response.
We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008