Prioritize By Values Within Group - mysql

In MySQL, say I have the following table (called workers):
| id | specialty | status | name
| :- | :-------- | :--------- | :--- |
| 1 | Bricks | Unemployed | Joe
| 2 | Bricks | Employed | Eric
| 3 | Bricks | Contracted | Bob
| 4 | Tiles | Employed | Dylan
| 5 | Tiles | Contracted | James
In my query, say I want to find who is a prospective person for a new job. Thus, I would want to first find who is Unemployed, if no one is Unemployed, then who is only Contracted, and if no one is Contracted then at least who is Employed.
This would be GROUP BY specialty. The only methods I could figure out are either complex sub-queries or sets of UNIONs (or both). I also tried GROUP_CONCAT however this didn't work (or I didn't do it right). Googling this has not yielded any results.
Another idea is to assign a value to each category, and then do a group-wise max/min sub-query. I piloted this and it works, however seems quite messy and definitely not normalized:
SELECT
`id`,
`name`,
`status`,
-- I haven't been able to figure out how to get rid of MIN from the actual select
-- statement except by wrapping this in another sub-query, which I'm not keen on
MIN(`priority`) AS `priority`
FROM workers
INNER JOIN (
SELECT 'Unemployed' AS `status`, 0 AS `priority` FROM dual UNION
SELECT 'Contracted' AS `status`, 1 AS `priority` FROM dual UNION
SELECT 'Employed' AS `status`, 2 AS `priority` FROM dual
) AS priorities USING (`status`)
GROUP BY `specialty`;
I am looking for a more standard, efficient, normalized or versatile method of doing this.
Update:
An additional method I could be to use a CASE expression in the SELECT clause of the statement. This would be if I were to normalize the status column, through a foreign-key relationship or other related table:
New table called statuses
| id | status |
| :- | :------------- |
| 1 | Employed |
| 2 | Contracted |
| 3 | Unemployed |
| 4 | Not contracted |
Diffs: 'Not Contracted' is a new status and my workers table now stores the foreign key to the new statuses table.
Then my SQL would be:
SELECT
`id`,
`name`,
statuses.status,
MIN(`priority`) AS `priority`
FROM workers
INNER JOIN (
SELECT
`id`,
`status`,
CASE
-- currently uses text in `status`,
-- could also explicitly use `id`
WHEN `status` IN ('Unemployed', 'Not Contracted') THEN 0
WHEN `status` = 'Contracted' THEN 1
WHEN `status` = 'Employed' THEN 2
ELSE 3
END AS `priority`
FROM statuses
) AS statuses ON workers.status = statuses.id
GROUP BY `specialty`;
Note: You might think - why not put the priority in the statuses table? The reason why I am not doing that is because the priority changes depending on the data needed / the purpose of the report being generated.
Potentially this is a cleaner solution (for the times that the related data to prioritize against is in another table). Again, I am looking for a more standard, efficient, normalized or versatile method of doing this. Also, if there is more of a way this could be configurable to user input / variables.

The difficulty here mainly arises because you don't have an ordinal column which ranks the various status in some order. Absent that, we can introduce one using a CASE expression, similar to what your second query is trying to do:
SELECT w1.*
FROM workers w1
INNER JOIN
(
SELECT
specialty,
MIN(CASE status WHEN 'Unemployed' THEN 1
WHEN 'Contracted' THEN 2
ELSE 3 END) AS status_rnk
FROM workers
GROUP BY specialty
) w2
ON w1.specialty = w2.specialty AND
w2.status_rnk = CASE w1.status WHEN 'Unemployed' THEN 1
WHEN 'Contracted' THEN 2
ELSE 3 END;

Related

How to count distinct values from two columns into one number

The two tables I'm working on are these:
Submissions:
+----+------------+
| id | student_id |
+----+------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+----+------------+
Group_submissions:
+----+---------------+------------+
| id | submission_id | student_id |
+----+---------------+------------+
| 1 | 1 | 2 |
| 2 | 2 | 1 |
+----+---------------+------------+
Only one student actually makes the submission and goes into the submissions table while the others go to the group_submissions table(if the submission is a group submission)
I want to count the unique number of students that have made submission either as a group or alone
I want just the number to be returned in the end (3 based on the data on the tables above)
A student that is in the submissions table should not be counted twice if he is in the group_submission table and vice-versa.
Also students that only have done individual submissions(that are not in the group_submissions table) also should be counted regardless if the have ever been in a group submission
I'm already doing some other operations on these table in a query I'm building so if you can give me a solution based on joining these two tables that would help.
This is what i have tried:
count(distinct case when group_submissions.student_id is not null then group_submissions.student_id end) + count(distinct case when submissions.student_id is not null then submissions.student_id end)
But it gives me duplicates so if a student is in both tables he is counted two times.
Any ideas?
NOTE: This is a MySQL database.
I think you want union and a count:
select count(*)
from ((select student_id
from submissions
)
union -- on purpose to remove duplicates
(select student_id
from group_submissions
)
) s;
After listening to the clarification, I think it is not wise to force yourself to compute this using the join. You can instead make the count just a simple expression as the final outcome. Use UNION and then distinct will help for building such an expression.
OLD ANSWER BELOW THAT DOES NOT FIT THE PROBLEM:
Very simple fix is needed to your current version...
count(distinct case when group_submissions.student_id is not null then
group_submissions.student_id when assignment_submissions.student_id is
not null then assignment_submissions.student_id end)
Note:
your original expression is an addition between 2 case expressions, each with a single WHEN inside
now I turn it into a single case expression with 2 WHEN's```SQL

Insert data in table using two or more tables

I have two existing table and wants to create third table with help of few columns. The fist two tables are;
Table one: users
|id | name | sid |
| 1 | demo | test1 |
| 2 | anu | test2 |
Table one: insights
| id | description| name |
| 1 | yes | demoone|
| 2 | no | demotwo|
I want to insert data in new table called insight_owner. As per my knowledge, I made below query but that is giving me below error
ERROR 1242 (21000): Subquery returns more than 1 row
The query used is
insert into insight_owner (column_one, column_two, column_three, column_four, column_five) VALUES ('1', '0', NULL, (select u.id from users u where u.sid='test1'), (select i.id from insights i)) ;
Expected output is
| column_one| column_two| column_three| column_four| column_five| column_six |
+----+-----------------+--------------------+---------------+-----------+--------------------+
| 1 | 1 | 1 | NULL | 1 | 1 |
| 2 | 1 | 1 | NULL | 1 | 2 |
column_five = Users id
column_six = Insight id
INSERT...SELECT syntax is what you're looking for (instead of INSERT...VALUES, which is limited to single values per column in each value list). That allows you to select the data directly from the table(s) concerned, using normal SELECT and JOIN syntax. You can also hard-code values which you want to appear on every row, just as you can in a normal SELECT statement. Basically, write the SELECT statement, get it to output what you want. Then stick an INSERT at the start of it and it sends the output to the desired table.
insert into insight_owner (column_one, column_two, column_three, column_four, column_five)
select '1', '0', NULL, (select u.id from users u where u.sid='test1'), i.id
from insights i
You are using
insert into insight_owner (column_one, column_two, column_three, column_four, column_five) VALUES ('1', '0', NULL, (select u.id from users u where u.sid='test1'), (select i.id from insights i));
Which basically inserts one row in your new table.
So, when you add subquery
select i.id from insights i
It will return all rows from insights table an you actually want just one value.
The result you will get is
| id |
| 1 |
| 2 |
And you want
| id |
| 1 |
So, you should be adding conditional that will make sure you are getting only one result as you are doing with first query (where u.sid='test1'), or limit.
I hope this helps.

Fastest way to order by having true result on a left join in MYSQL

I am trying to set up something where data is being matched on two different tables. The results would be ordered by some data being true on the second table. However, not everyone in the first table is in the second table. My problem is twofold. 1) Speed. My current MYSQL query takes 4 seconds to go through several thousand results on each table. 2) Not ordering correctly. I need it to order the results by who is online, but still be alphabetical. As it stands now it orders everyone by whether or not they are online according to chathelp table, then fills in the rest with the users table.
What I have:
SELECT u.name, u.id, u.url, c.online
FROM users AS u
LEFT JOIN livechat AS c ON u.url = CONCAT('http://www.software.com/', c.chat_handle)
WHERE u.live_account = 'y'
ORDER BY c.online DESC, u.name ASC
LIMIT 0, 24
users
+-----------------------------------------------------------+--------------+
| id | name | url | live_account |
+-----------------------------------------------------------+--------------|
| 1 | Lisa Fuller | http://www.software.com/LisaHelpLady | y |
| 2 | Eric Reiner | | y |
| 3 | Tom Lansen | http://www.software.com/SaveUTom | y |
| 4 | Billy Bob | http://www.software.com/BillyBob | n |
+-----------------------------------------------------------+--------------+
chathelp
+------------------------------------+
| chat_id | chat_handle | online |
+------------------------------------+
| 12 | LisaHelpLady | 1 |
| 34 | BillyBob | 0 |
| 87 | SaveUTom | 0 |
+------------------------------------+
What I would like the data I receive to look like:
+----------------------------------------------------------------------+
| name | id | url | online |
+----------------------------------------------------------------------+
| Lisa Fuller | 1 | http://www.software.com/LisaHelpLady | 1 |
| Eric Reiner | 4 | | 0 |
| Tom Lansen | 3 | http://www.software.com/SaveUTom | 0 |
+----------------------------------------------------------------------+
Explanation: Billy is excluded right off the bat for not having a live account. Lisa comes before Eric because she is online. Tom comes after Eric because he is offline and alphabetically later in the data. The only matching data between the two tables is a portion of the url column with the chat_handle column.
What I am getting instead:
(basically, I am getting Lisa, Tom, then Eric)
I am getting everybody in the chathelp table listed first whether or not they are online or not. So 600 people come first, then I get the remaining people who aren't in both tables from users table. I need people who are offline in the chathelp table to be sorted into the users table people in alphabetical order. So if Lisa and Tom were the only users online they would come first, but everyone else from the users table regardless of whether or not they set up their chathelp handle would come alphabetically after those two users.
Again, I need to sort them and figure out how to do this in less than 4 seconds. I have tried indexes on both tables, but they don't help. Explain says it is using a key (name) on table users hitting rows 4771 -> Using where;Using temporary; Using filesort and on table2 NULL for key with 1054 rows and nothing in the extra column.
Any help would be appreciated.
Edit to add table into and explain statement
CREATE TABLE `chathelp` (
`chat_id` int(13) NOT NULL,
`chat_handle` varchar(100) NOT NULL,
`online` tinyint(1) NOT NULL DEFAULT '0',
UNIQUE KEY `chat_id` (`chat_id`),
KEY `chat_handle` (`chat_handle`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `users` (
`id` int(8) NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`url` varchar(250) NOT NULL,
`live_account` varchar(1) NOT NULL DEFAULT 'n',
PRIMARY KEY (`id`),
KEY `livenames` (`live_account`,`name`)
) ENGINE=MyISAM AUTO_INCREMENT=9556 DEFAULT CHARSET=utf8
+----+-------------+------------+------+---------------+--------------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+--------------+---------+-------+------+----------------------------------------------+
| 1 | SIMPLE | users | ref | livenames | livenames | 11 | const | 4771 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | chathelp | ALL | NULL | NULL | NULL | NULL | 1144 | |
+----+-------------+------------+------+---------------+--------------+---------+-------+------+----------------------------------------------+
We're going to guess that online is integer datatype.
You can modify the expression in your order by clause like this:
ORDER BY IFNULL(online,0) DESC, users.name ASC
^^^^^^^ ^^^
The problem is that for rows in user that don't have a matching row in chathelp, the value of the online column in the resultset is NULL. And NULL always sorts after all non-NULL values.
If we assume that a missing row in helpchat is to be treated equally with a row in helpchat that has a 0 for online, we can replace the NULL value with a 0. (If there are NULL values in the online column, we won't be able to distinguish between that, and a missing row in helpchat (using this expression in the ORDER BY.))
EDIT
Optimizing Performance
To address performance, we'd need to see the output from EXPLAIN.
With the query as its written above, there's no getting around the "Using filesort" to get the rows returned in the order specified, on that expression.
We may be able to re-write the query to get an equivalent result faster.
But I suspect the "Using filesort" operation is not really the problem, unless there's a bloatload (thousands and thousands) of rows to sort.
I suspect that suitable indexes aren't available for the join operation.
But before we go to the knee jerk "add an index!", we really need to look at EXPLAIN, and look at the table definitions including the indexes. (The output from SHOW CREATE TABLE is suitable.
We just don't have enough information to make recommendations yet.
Reference: 8.8.1 Optimizing Queries with EXPLAIN
As a guess, we might want to try a query like this:
SELECT u.name
, u.id
, l.url
, l.online
FROM users
LEFT
JOIN livechat
ON l.url = CONCAT('http://www.software.com/', u.chat_handle)
AND l.online = 1
WHERE u.live_account = 'y'
ORDER
BY IF(l.online=1,0,1) ASC
, u.name ASC
LIMIT 0,24
After we've added covering indexes, e.g.
.. ON user (live_account,chat_handle,name, id)
...ON livechat (url, online)
(If query is using a covering index, EXPLAIN should show "Using index" in the Extra column.)
One approach might be to break the query into two parts: an inner join, and a semi-anti join. This is just a guess at something we might try, but again, we'd want to compare the EXPLAIN output.
Sometimes, we can get better performance with a pattern like this. But for better performance, both of the queries below are going to need to be more efficient than the original query:
( SELECT u.name
, u.id
, l.url
, l.online
FROM users u
JOIN livechat
ON l.url = CONCAT('http://www.software.com/', u.chat_handle)
AND l.online = 1
WHERE u.live_account = 'y'
ORDER
BY u.name ASC
LIMIT 0,24
)
UNION ALL
( SELECT u.name
, u.id
, NULL AS url
, 0 AS online
FROM users u
LEFT
JOIN livechat
ON l.url = CONCAT('http://www.software.com/', u.chat_handle)
AND l.online = 1
WHERE l.url IS NULL
AND u.live_account = 'y'
ORDER
BY u.name ASC
LIMIT 0,24
)
ORDER BY 4 DESC, 1 ASC
LIMIT 0,24

WHERE in Aggregate function

I have a quick question. I know that we cannot use WHERE clause in an aggregate function in MySQL. The table structure is as follows:
+----+------+----------+--------+
| ID | Name | Location | Active |
+----+------+----------+--------+
| 1 | Aaaa | India | 0 |
| 2 | Aaaa | USA | 0 |
| 3 | Aaaa | USA | 1 |
| 4 | Aaaa | India | 0 |
| 5 | Aaaa | UK | 0 |
| 6 | Aaaa | India | 1 |
| 7 | Aaaa | USA | 1 |
| 8 | Aaaa | USA | 0 |
| 9 | Aaaa | India | 0 |
| 10 | Aaaa | UK | 1 |
+----+------+----------+--------+
The query I have here is:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location`;
The above query will give me the counts of the location. But, I need only the active users. So, I need a WHERE clause that does something like:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location` WHERE `active`=1;
The above query is invalid. The valid query would be using HAVING. But, if I change the query to:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location` HAVING `active`=1;
The counts are no different from the original query, which is:
SELECT COUNT(*), `location`, `active` FROM `users` GROUP BY `location`;
So, what am I supposed to do for getting the user counts of the location, who are active? Thanks in advance.
Use where before group by so where clause will filter out the results according to your criteria and you can have your count on basis of your where criteria
SELECT COUNT(*), `location`, `active`
FROM `users`
WHERE `active`=1
GROUP BY `location`
In addition to your specific question you can also use sum with your criteria so this will return as boolean and you can have your count on basis of your expression like
SELECT sum(`active`=1), `location`, `active`
FROM `users`
GROUP BY `location`
Above sum expression is equivalent to sum(CASE when active=1 THEN 1 ELSE 0 END)
You can put the WHERE before the GROUP BY if you want to filter rows before aggregating. I you want to do some calculation among rows already aggregated, you have to use a CASE expression inside the aggregate. I don't think that's what you're doing here, so just put the WHERE clause in the right place.
The query looks like this, with active removed because it's now redundant (will always be 1):
SELECT COUNT(*), location
FROM users
WHERE active = 1
GROUP BY location, active
It's important here to be reminded of the logical order of operations of an SQL query:
FROM: starting table, view, derived table
JOIN: adding additional tables, views, etc.
WHERE: filtering the rows from the initial table and joined tables
GROUP BY: aggregating the rows that have been filtered by WHERE
HAVING: filtering the aggregated rows output by GROUP BY
SELECT: pick columns and compute expressions
ORDER BY: sort the resulting rows, with selected columns and computed expressions available for ordering
LIMIT: control which and how many rows go back to the client
If you keep this in mind, it'll be easier to understand when to do certain types of filtering and what the rows that you're filtering look like.
You can use where in aggregate queries, however, it must be before group by and after from

Find a class with exactly specific students

I'm no SQL expert, but I'm not a total amateur, yet this is a query on a single table with 2 fields that I don't know how to approach.
Suppose you have a table with class # & student #. How do you find the classes that have only exactly students x, y & z?
My real problem is more like a table of catalogs & item numbers, and how to find all the catalogs that have exactly (mo more or less) the specified items.
My only thought revolved around matching on GROUP_CONCAT, but there must b731e a more elegant way...
EDIT:
I misstated the problem, so I will provide table structure as well. The issue is more like products in boxes, where a box could contain more than one of a particular product, and you want to find boxes that have exactly the specified content. So the table, for example, is:
+------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| box_id | bigint(20) | YES | | NULL | |
| product_id | bigint(20) | YES | MUL | NULL | |
+------------+------------+------+-----+---------+----------------+
I want to find all boxes that contain exactly 2 items of product ID 22, one of 17, and one of 55. No more, no less.
You could use a having clause:
select *
from YourTable
group by
class
having count(distinct student) = 3
and max(case when student = 'X' then 1 end) = 1
and max(case when student = 'Y' then 1 end) = 1
and max(case when student = 'Z' then 1 end) = 1
I do have an answer that works, but it is far from efficient OR elegant, so I present it for anyone looking for a sub-optimal but correct solution to this problem, and an enticement to anyone else to provide a better one.
SELECT box_id, GROUP_CONCAT( product_id
ORDER BY product_id DESC
SEPARATOR ',' ) AS contents
FROM box_product
GROUP BY box_id
HAVING contents = '17,22,22,55';