Aggregate Text data using SQL - mysql

I have the following data:
Name | Condition
Mike | Good
Mike | Good
Steve | Good
Steve | Alright
Joe | Good
Joe | Bad
I want to write an if statement, if Bad exists, I want to classify the name as Bad. If Bad does not exist but Alright Exists, then classify as Alright. If only Good exists, then classify as good.
So my data would turn into:
Name | Condition
Mike | Good
Steve | Alright
Joe | Bad
Is this possible in SQL?

An Access query would be easy if you first create a table which maps Condition to a rank number.
Condition rank
--------- ----
Bad 1
Alright 2
Good 3
Then a GROUP BY query would give you the minimum rank for each Name:
SELECT y.Name, Min(c1.rank) AS MinOfrank
FROM
[YourTable] AS y
INNER JOIN conditions AS c1
ON y.Condition = c1.Condition
GROUP BY y.Name;
If you want to display the Condition string for those ranks, join back to the conditions table again:
SELECT sub.Name, sub.MinOfrank, c2.Condition
FROM
(
SELECT y.Name, Min(c1.rank) AS MinOfrank
FROM
[YourTable] AS y
INNER JOIN conditions AS c1
ON y.Condition = c1.Condition
GROUP BY y.Name
) AS sub
INNER JOIN conditions AS c2
ON sub.MinOfrank = c2.rank;
Performance should be fine with indexes on those conditions fields.
Seems to me this approach could also work in those other databases (MySQL and SQL Server) tagged in the question.

You can use a case statement to rank the conditions then max() or min() to summarize the results before returning them back to the user in the same format.
Query:
SELECT [Name]
, case min(case condition when 'bad' then 0 when 'alright' then 1 else 2 end)
when 0 then 'bad' when 1 then 'alright' when 2 then 'good' end as Condition
from mytable
group by [name]

mysql has an IF - function.
Here, have a look at it: https://dev.mysql.com/doc/refman/5.1/en/control-flow-functions.html#function_if

Related

Union as sub query using MySQL 8

I'm wanting to optimize a query using a union as a sub query.
Im not really sure how to construct the query though.
I'm using MYSQL 8.0.12
Here is the original query:
---------------
| c1 | c2 |
---------------
| 18182 | 0 |
| 18015 | 0 |
---------------
2 rows in set (0.35 sec)
I'm sorry but the question doesn't stored if I paste the sql query as text and format using ctrl+k
Output expected
---------------
| c1 | c2 |
---------------
| 18182 | 167 |
| 18015 | 0 |
---------------
As a output I would like to have the difference of rows between the two tables in UNION ALL.
I processed this question using the wizard https://stackoverflow.com/questions/ask
Since a parenthesized SELECT can be used almost anywhere a expression can go:
SELECT
ABS( (SELECT COUNT(*) FROM tbl_aaa) -
(SELECT COUNT(*) FROM tbl_bbb) ) AS diff;
Also, MySQL is happy to allow a SELECT without a FROM.
There are several ways to go for this, including UNION, but I wouldn't recommend it, as it is IMO a bit 'hacky'. Instead, I suggest you use subqueries or use CTEs.
With subqueries
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM (
SELECT
COUNT(*) as size
FROM tbl_aaa
) c_tbl_aaa
CROSS JOIN (
SELECT
COUNT(*) as size
FROM tbl_bbb
) c_tbl_bbb
With CTEs, also known as WITHs
WITH c_tbl_aaa AS (
SELECT
COUNT(*) as size
FROM tbl_aaa
), c_tbl_bbb AS (
SELECT
COUNT(*) as size
FROM tbl_bbb
)
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM c_tbl_aaa
CROSS JOIN c_tbl_bbb
In a practical sense, they are the same. Depending on the needs, you might want to define and join the results though, and in said cases, you could use a single number as a "pseudo id" in the select statement.
Since you only want to know the differences, I used the ABS function, which returns the absolute value of a number.
Let me know if you want a solution with UNIONs anyway.
Edit: As #Rick James pointed out, COUNT(*) should be used in the subqueries to count the number of rows, as COUNT(id_***) will only count the rows with non-null values in that field.

SQL order of execution for correlated subquery

I have the following Personnel table:
+---------+----------+-------------+
| name | dept_nbr | job_title |
+---------+----------+-------------+
| Michael | 14 | Programmer |
| Kumar | 14 | Programmer |
| Dave | 14 | Programmer |
| Jane | 14 | Manager |
| Carol | 37 | Programmer |
| Joe | 37 | Programmer |
| John | 59 | CEO |
+---------+----------+-------------+
Problem: Find all dept_nbr's (departments) that have fewer than 3 programmers.
Working query:
SELECT DISTINCT dept_nbr
FROM Personnel AS P1
WHERE (SELECT COUNT(P2.dept_nbr)
FROM Personnel AS P2
WHERE P1.dept_nbr = P2.dept_nbr AND P2.job_title = 'Programmer') < 3;
Result:
37
59
Notes:
Department 14 is correctly not included as it has 3 programmers (3 is equal to but not fewer than 3). Department 59 has zero programmers, and is also correctly included in the results.
My question:
When the above query executes, how does a generic SQL engine proceed? From what I have read, SQL execution order is (roughly): From, Where, Group By, Having, and Select. So, is the following correct?
1 - The Outer Query passes each row of the Personnel table as P1 into the Inner query.
2.a - The Inner Query scans the entire Personnel table as P2, row by row, looking for rows that satisfy the condition "P1.dept_nbr = P2.dept_nbr AND P2.job_title = 'Programmer'".
2.b – Once the Inner Query is done with the entire table, it COUNTs the matching dept_nbr values and returns it to the Outer Query.
3 – In the Outer Query, if the count returned from the Inner Query satisfies the condition "WHERE (Inner Query Count Result) < 3", the corresponding dept_nbr for the P1 row is SELECTed.
4 – Following all rows processed by the Outer Query, the Outer Query does a DISTINCT on the results and displays the unique dept_nbr values.
Is my understanding above correct? Specifically, does the outer query do the DISTINCT at the very end (step #4)? It seems that in this way, the inner query does redundant scanning (for example, it processes dept_nbr = 14 four times, when it really has the answer in the first pass).
I tested the above query on sqlfiddle.com w/ MySQL 5.6.
When the above query executes, how does a generic SQL engine proceed?
From what I have read, SQL execution order is (roughly): From, Where,
Group By, Having, and Select.
This statement is -- generally -- not correct. SQL is parsed in the order that you describe. However, the execution is determined by the optimizer and might have little to do with the original query. Remember: SQL is a descriptive language, not a procedural language. It describes the result set, not the specific steps for calculating it.
That said, MySQL's execution plan is much closer to the query than most other databases (particularly more advanced databases with better optimizers). And, almost any database is going to proceed in the steps you describe for this query. The aggregation in the subquery limits the choices for optimization.
If you want to eliminate the redundancy, then do the select distinct before the filtering:
SELECT dept_nbr
FROM (SELECT DISTINCT dept_nbr FROM Personnel P1) P1
WHERE (SELECT COUNT(P2.dept_nbr)
FROM Personnel AS P2
WHERE P1.dept_nbr = P2.dept_nbr AND P2.job_title = 'Programmer'
) < 3;
You can also do this more simply with just an aggregation:
select dept_nbr
from personnel
group by dept_nbr
having sum(job_title = 'Programmer') < 3;
Add EXPLAIN (or EXPLAIN EXTENDED) before your query and it should give you the explain plan which will detail exactly the steps in order of your query. This is a very useful tool when trying to optimize queries.

Mysql - Only rows that value from one column matches for all from other column

I never asked here because I always had in my mind that Google Skills are better than bothering people around... And I couldn't find it anywhere... Maybe my google skills are not that good, then... But I really don't know how to get this thing done.
Picture the above:
DELIVER_METHOD_CODE | PRODUCT_CODE
1 | 909
1 | 4
2 | 4
I just need the values from the first column when they match for ALL values from the socond column...
Does anyone out there know how to do it? D;
For getting this far my query is like that:
select DELIVER_METHOD_CODE,PRODUCT_CODE
from DELIVER_METHOD_TABLE
right join PRODUCT_TABLE on PRODUCT_TABLE.PRODUCT_ID = DELIVER_METHOD_TABLE.PRODUCT_ID
Sorry for bad English
EDIT1:
How the output should be
DELIVER_METHOD_CODE | PRODUCT_CODE
1 | 909
1 | 4
This is an example of a "set-within-sets" subquery. I think the most flexible approach is to use aggregation with a having clause:
select DELIVER_METHOD_CODE
from DELIVER_METHOD_TABLE
group by DELIVER_METHOD_CODE
having count(distinct PRODUCT_CODE) = (select count(distinct PRODUCT_CODE)
from DELIVER_METHOD_TABLE
);

Update subsequent duplicate field values in mysql

I have the following schema:
id | order_ref | description | price
Currently I have the following duplicate issue:
1 | 34567 | This is the description | 19.99
2 | 34567 | This is the description | 13.99
This was due to the data I was importing having the description for each item duplicated. Is there a way I can keep the first row, and then UPDATE the description on subsequent (up to approx 20 rows) to be 'AS ABOVE'?
1 | 34567 | This is the description | 19.99
2 | 34567 | - AS ABOVE - | 13.99
Thanks
-------UPDATED
UPDATE documents_orders_breakdown
SET `desc` = '- AS ABOVE -'
WHERE NOT id IN (SELECT id
FROM documents_orders_breakdown AS D
WHERE D.`desc` <> `desc`
ORDER BY D.id
LIMIT 1)
But this returns [Err] 1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
--------UPDATED
UPDATE documents_orders_breakdown
SET `desc` = '- AS ABOVE -'
WHERE NOT id IN (SELECT MIN(id)
FROM documents_orders_breakdown AS t
WHERE t.`desc` = `desc`)
This now returns [Err] 1093 - You can't specify target table 'documents_orders_breakdown' for update in FROM clause
If this is a one-time thing, performance is not a big issue. You can run an UPDATE on all the records that are not returned by a SELECT with a LIMIT of 1.
UPDATE the_table
SET description = '- AS ABOVE -'
WHERE NOT id IN (SELECT id
FROM the_table t
WHERE t.description = the_table.description
ORDER BY t.id
LIMIT 1)
This query assumes you want to keep the description of the record whose id comes first (hence the ORDER BY).
Since you can't use LIMIT in subqueries, you can work around that by using the aggregate function MIN:
UPDATE the_table
SET description = '- AS ABOVE -'
WHERE NOT id IN (SELECT MIN(id)
FROM the_table t
WHERE t.description = the_table.description)
(Let's hope you can mix MIN and subqueries ;)
Apparently you can't SELECT from the table you're UPDATEing in MySQL. A workaround is to use an implicit temporary table. This is bad for performance, but, again, given this is a one-time thing, that's not a big concern.
UPDATE the_table
SET description = '- AS ABOVE -'
WHERE NOT id IN (SELECT m FROM (SELECT MIN(id) AS m
FROM the_table t
WHERE t.description = the_table.description) AS temp)
Relational datebases do not have a notion of subsequent. Records in a table are not in any particular order. If you do not specify an order in a SELECT query, you have to assume that the records are retrieved in an order that you do not expect.
The comment Oswald made about ordering (or lack thereof) of the rows is very important. You have no garuntee, period, that unsorted rows selected out of this table will be in the order you expect. This means that unless you specify the existing in table order every single time, things could be tagged 'AS ABOVE' even when this does not reflect reality. In addition, none of the provided solutions so far will deal with any out-of-sequence records properly.
Overall, this sounds more like a database design issue (specifically, a normalization problem), than a query issue.
Ideally, the descriptions would be extracted to some master datatable (along with the necessary ids). Then, the choice about the description to use is left to when the 'SELECT' runs. This has the added benefit of making the 'AS ABOVE' safe for changes in ordering.
So, assuming that each instance of the order_ref column should have a different description (barring the 'AS ABOVE' bit), the tables can be refactored as followed:
id | order_ref | price
=======================
1 | 34567 | 19.99
2 | 34567 | 13.99
and
order_ref_fk | description
==========================================
34567 | "This is the description"
At this point, you join to the description table normally. Displaying a different description is usually a display issue regardless, to be handled by whatever program you have outputting the rows to display (not directly in the database).
If you insist on doing this in-db, you could write the SELECT in this vein:
SELECT Orders.id, Orders.order_ref, Orders.price,
COALESCE(Dsc.description, 'AS ABOVE')
FROM Orders
LEFT JOIN (Description
JOIN (SELECT order_ref, MIN(id) AS id
FROM Orders
GROUP BY order_ref) Ord
ON Ord.order_ref = Description.order_ref_fk) Dsc
ON Dsc.order_ref_fk = Orders.order_ref
AND Dsc.id = Orders.id
ORDER BY Orders.order_ref, Orders.id

complex sql query issue

I have a little SQL but I can't find the way to get back text just numbers. - revised!
SELECT if( `linktype` = "group",
(SELECT contactgroups.grname
FROM contactgroups, groupmembers
WHERE contactgroups.id = groupmembers.id ???
AND contactgroups.id = groupmembers.link_id),
(SELECT contactmain.contact_sur
FROM contactmain, groupmembers
WHERE contactmain.id = groupmembers.id ???
AND contactmain.id = groupmembers.link_id) ) AS adat
FROM groupmembers;
As now I have improved a bit gives back some info but ??? (thanks to minitech) indicate my problem. I can't see how could I fix... Any advice welcomed! Thansk
Contactmain (id, contact_sur, email2)
data:
1 | Peter | email#email.com
2 | Andrew| email2#email.com
Contactgroups (id, grname)
data:
1 | All
2 | Trustee
3 | Comitee
Groupmembers (id, group_id, linktype, link_id)
data:
1 | 1 | contact | 1
2 | 1 | contact | 2
3 | 2 | contact | 1
4 | 3 | group | 2
And I would like to list out who is in the 'Comitee' the result should be Andrew and Trustee if I am right:)
It does look a bit redundant on the join since you are implying both the ID and Link_ID columns are the same value. Since BOTH select values are derived from a qualification to the group members table, I have restructured the query to use THAT as the primary table and do a LEFT JOIN to each of the other tables, anticipating from your query that the link should be found from ONE or the OTHER tables. So, with each respective LEFT JOIN, you will go through the GroupMembers table only ONCE. Now, your IF(). Since the group members is the basis, and we have BOTH tables available and linked, we just grab the column from one table vs the other respectively. I've included the "linktype" too just for reference purposes. By using the STRAIGHT_JOIN will help the engine from trying to change the interpretation of how to join the tables.
SELECT STRAIGHT_JOIN
gm.linktype,
if( gm.linktype = "group", cg.grname, cm.contact_sur ) ADat
from
groupmembers gm
left join contactgroups cg
ON gm.link_id = cg.id
left join contactmain cm
ON gm.link_id = cm.id
If contactgroups.id must equal groupmembers.id but must also equal 2, that's redundant and also probably where your problem is. It works fine as you've written it: http://ideone.com/7EGLZ so without knowing what it's actually supposed to do I can't help more.
EDIT: I'm unfamiliar with the comma-separated FROM, but it gives the same result since you don't select anything from the other table so it doesn't really matter.