MySQL - Reshape Data - mysql

I have the following data set (sample):
emplid | Citizeship |
100001 | USA |
100001 | CAN |
100001 | CHN |
100002 | USA |
100002 | CHN |
100003 | USA |
Is there a way to transform the data into the following:
emplid | Citizeship_1 | Citizenship_2 | Citizenship_3
100001 | USA | CHN | CAN
100002 | USA | CHN |
100003 | USA | |
The assumption is that each emplid will have up to 4 citizenships.
I started with the following codes, but for the emplids who just have 1 citizenship, the value is being repeated in the citizenship_2, citizenship_3, which should be just blank:
select *
, substring_index(Citizenship_multiple, ',', 1) as Citizenship_1
, substring_index(substring_index(Citizenship_multiple,',',-1),',',1) as Citizenship_2
, substring_index(substring_index(Citizenship_multiple,',',-2),',',1) as Citizenship_3
, substring_index(substring_index(Citizenship_multiple,',',-3),',',1) as Citizenship_4
from
(select *
, group_concat(distinct Citizenship) as Citizenship_multiple
from `citizenship_csv_meta`
group by emplid) a

you can do it with case and max
SELECT emplid,
max(case when Citizeship = 'USA' then 'USA' else '' end) as Citizeship_1,
max(case when Citizeship = 'CHN' then 'CHN' else '' end) as Citizeship_2,
max(case when Citizeship = 'CAN' then 'CAN' else '' end) as Citizeship_3
FROM citizenship_csv_meta
GROUP BY emplid

I know you stated hardcoding was a pain, and likely not the best solution, but I was able to do this while using only one assumption: that an employee can have at most 4 citizenships. So, I just joined your table together 4 times. I had to use an outer join, because not every employee would have 4 citizenships. Here is the code, and I will explain what I did:
SELECT e.emplid, MAX(e.citizenship) AS citizenship1,
MAX(e1.citizenship) AS citizenship2,
MAX(e2.citizenship) AS citizenship3,
MAX(e3.citizenship) AS citizenship4
FROM employee e
LEFT JOIN employee e1 ON e1.emplid = e.emplid AND e1.citizenship < e.citizenship
LEFT JOIN employee e2 ON e2.emplid = e1.emplid AND e2.citizenship < e1.citizenship
LEFT JOIN employee e3 ON e3.emplid = e2.emplid AND e3.citizenship < e2.citizenship
GROUP BY e.emplid
I joined your table together 4 times, and took the MAX() citizenship from each group. The reason this works is because in the join condition i used e1.citizenship < e.citizenship to make sure that the previous values weren't included. For example, table e2 never included USA, so I was able to use the max function again.
What this will do is that once an employee no longer has a citizenship, the cell in the remaining columns is null, so you will need to be aware of that.
This tested beautifully on SQL Fiddle, and I actually referenced this question to figure out how to get the succeeding citizenships. Of course, I used a method slightly different from theres, but I want to give credit where credit is due.
EDIT
If you want the null cells replaced with a blank value, refer to this SQL Fiddle.

Related

How to substitute the column values with a MySQL query

I am working with MySQL Workbench to get the table I am looking for. I am almost there. Here is the result of the query:
----------------------------------------------------------------------------
employee.manager_id | employee.id | employee.first_name | employee.last_name
----------------------------------------------------------------------------
null | 1 | Petra | Wallace
null | 3 | Peter | Willis
null | 5 | Michael | Best
1 | 2 | David | Lone
3 | 4 | Barbara | Grinder
5 | 6 | Anthony | Krone
Now, I want to replace the values of the column employee.manager_id with the following:
When the value is null, either leave it null or substitute it with the string "none"
When the value has a number, it references the number of the employee.id. For example, the value 1 in employee.manager_id represents employee.id number 1, who is Petra Wallace.
I would like to show in the employee.manager_id column, the employee.first_name and the employee.last_name, instead of a number. Anybody has any idea how to do it?
A left join retrieves a manager based on the JOIN criteria. If there isn't a manager, then all of the manager.* fields are NULL.
SELECT
COALESCE(manager.first_name,"none") as manager_first_name,
manager.last_name as manager_last_name,
employee.id,
employee.first_name,
employee.last_name
FROM employee
LEFT JOIN employee manager
ON employee.manager_id = manager.id
ORDER BY employee.manager_id
Try this:
SELECT
CASE WHEN employee.manager_id IS NULL THEN "NONE"
ELSE (SELECT CONCAT(e.first_name, ' ', e.last_name), e.id FROM employee e
WHERE e.id = employee.manager_id)
END AS case_generated_column,
employee.id, employee.first_name, employee.last_name FROM employee ORDER BY employee.manager_id;

How to properly add an additional column in a SELECT statement in MySQL?

I would like to extract the number of attendances (i.e., COUNT()) of "Coaches" at "Shows" happening during two separate months: March and April. I managed to create a query that collects that number over only one of the months. In addition, via slightly modifying the query, the numbers over the second month can be found easily. But how do I merge them into one table containing both columns?
So, given the two queries and resulting tables below, how would one "append" the result of Query 2 to the result of Query 1? In other words, how would one combine their respective SELECT statements?
I included links to the SQL fiddle in case you need them.
Thank you in advance.
SQL Fiddle
Query 1:
SELECT C.*, COUNT(CIS.idCoach) AS MarchNumOfShows
FROM Coach AS C
LEFT JOIN
(
CoachInShow AS CIS
LEFT JOIN
TVShow AS S
ON S.idShow = CIS.idShow
)
ON C.idCoach = CIS.idCoach AND S.airDate LIKE '_____04___'
GROUP BY C.idCoach
Results:
| idCoach | name | surname | MarchNumOfShows |
|---------|-----------|---------|-----------------|
| 1 | Stephen | Hawking | 5 |
| 2 | Nicholas | Cage | 7 |
| 3 | Sigourney | Weaver | 6 |
Query 2 (Minimal difference, querying for April instead of March):
SELECT COUNT(CIS.idCoach) AS AprilNumOfShows
FROM Coach AS C
LEFT JOIN
(
CoachInShow AS CIS
LEFT JOIN
TVShow AS S
ON S.idShow = CIS.idShow
)
ON C.idCoach = CIS.idCoach AND S.airDate LIKE '_____05___'
GROUP BY C.idCoach
Results:
| AprilNumOfShows |
|-----------------|
| 8 |
| 7 |
| 10 |
Wanted:
| idCoach | name | surname | MarchNumOfShows | AprilNumOfShows |
|---------|-----------|---------|-----------------|-----------------|
| 1 | Stephen | Hawking | 5 | 8 |
| 2 | Nicholas | Cage | 7 | 7 |
| 3 | Sigourney | Weaver | 6 | 10 |
You are very close, the last step you missed is simply combine MarchNumOfShows and AprilNumOfShows with left join.
like below codes (or look into the Sql Fiddle ):
SELECT C.idCoach, C.name, C.surname, COUNT(distinct CIS4.idShow) AS MarchNumOfShows
, COUNT(distinct CIS5.idShow) AS AprilNumOfShows
FROM Coach AS C
LEFT JOIN
(
CoachInShow AS CIS4
LEFT JOIN
TVShow AS S4
ON S4.idShow = CIS4.idShow
)
ON C.idCoach = CIS4.idCoach AND S4.airDate LIKE '_____04___'
LEFT JOIN
(
CoachInShow AS CIS5
LEFT JOIN
TVShow AS S5
ON S5.idShow = CIS5.idShow
)
ON C.idCoach = CIS5.idCoach AND S5.airDate LIKE '_____05___'
GROUP BY C.idCoach;
And below is another way to get the same output (or look into SQL Fiddle):
SELECT C.idCoach, C.name, C.surname,
sum(case when DATE_FORMAT(airDate,'%M')='April' then 1 else null end ) AS AprilNumOfShows,
sum(case when DATE_FORMAT(airDate,'%M')='May' then 1 else null end ) AS MayNumOfShows
FROM Coach AS C
LEFT JOIN
(
CoachInShow AS CIS
LEFT JOIN
TVShow AS S
ON S.idShow = CIS.idShow
)
ON C.idCoach = CIS.idCoach
GROUP BY C.idCoach;
one way to do it is with a case:
select *,
sum(case when airdate like "%03%" then 1 else 0 end) as March,
sum(case when airdate like "%04%" then 1 else 0 end) as April
...

Efficiently join same data twice without CTE in MySQL?

I have this query that sums payments and reimbursements against pledges. It works, but smells bad:
select P.pledgeID, P.decamount,
(
sum(coalesce(C1.decamount, 0)) - sum(coalesce(C2.decamount, 0))
) as paymentTotal
from Pledge P
left join (select C.*, CT.eaddOrSubtract
from `Payment` C
left join PaymentType CT on C.paymentTypeID = CT.paymentTypeID )
C1 on P.pledgeID = C1.pledgeID and C1.eaddOrSubtract = 'add'
left join (select C.*, CT.eaddOrSubtract
from `Payment` C
left join PaymentType CT on C.paymentTypeID = CT.paymentTypeID)
C2 on P.pledgeID = C2.pledgeID and C2.eaddOrSubtract = 'subtract'
group by pledgeID
Particularly, I think there should be a better way to handle the joins inside the joins, especially since they produce the same results. On another RDBMS, I'd use a CTE, but that's not available here. Is there a more efficient way to calculate these payment totals (taking into account the fact that some are net additions and other net subtractions)?
Schema info:
PaymentType
---
| paymentTypeID | eaddOrSubtract | ...
| 1 | add |
| 2 | add |
| 3 | subtract |
| 4 | add |
| 5 | subtract |
Payment
---
| checkID | pledgeID | paymentTypeID | decamount | ...
| 1 | 19415 | 4 | 15.19 |
| 2 | 19414 | 2 | 900.00 |
| 3 | 19106 | 5 | 3856.00 |
| 4 | 19106 | 3 | 52.00 |
| 5 | 19414 | 1 | 15.00 |
The query should select all pledges (their pledgeID and decamount) and the total of payments for each pledge. Some payments are positive, some negative.
You query selects all pledges, joins the positive payments to each pledge and joins the negative payments to each row related to the pledge. If there is at most one negative and at most one positive payment it almost works (except that it returns NULL instead of 0 when there are no payments). Once there are at least two payments in one category (positive/negative) and at least one in the other category, problems arise. Each negative payment is joined to each positive payment on the same pledge and all the pairs are summed.
A cleaner way to look at the problem directly follows the first paragraph of this answer. Instead of two joins with filter on eaddOrSubtract, it suffices to make one join to a subquery, where the subquery internally handles the sign of the amount being summed. The CASE operator is great for such a job.
SELECT
P.pledgeID
, P.decamount
, COALESCE(SUM(C.signedDecamount), 0) AS paymentTotal
FROM Pledge P
LEFT JOIN (
SELECT
C.*
, CASE CT.eaddOrSubtract
WHEN 'add' THEN C.decamount
WHEN 'subtract' THEN -C.decamount
END AS signedDecamount
FROM Payment C
LEFT JOIN PaymentType CT ON C.paymentTypeID = CT.paymentTypeID
) C ON P.pledgeID = C.pledgeID
GROUP BY P.pledgeID
The COALESCE() call is there for the case when no payments are joined to the pledge or all they joined payments have NULL decamount. COALESCE() to 0 inside SUM() can always be safely omitted, as SUM() skips NULLs; I guess those calls were just artifacts of hacking the corner cases of joins in the original query.
SQL Fiddle

Get data from one table and count matching records from another

I'm not sure if this is possible. I have one table members and a second table transactions.
I need to get the name of the member from the members table, but also count the number of transactions that member has made from another table. Is this even possible in a JOIN statement, or do I need to write two statements?
SELECT
m.first_name,
m.last_name,
COUNT(t.giver_id),
COUNT(t.getter_id)
FROM
members AS m
JOIN
transactions AS t
ON
m.id = t.giver_id
WHERE
m.id = $i
I should add that it's possible a member has not made any transactions and would therefore not appear in the transactions table.
When I run this code, it returns all NULL columns. When I add the EXPLAIN statement, MySql says "Impossible WHERE noticed after reading const table..."
Is this possible? If so, then what am I doing wrong? Thanks in advance.
EDIT:
Sample data structure and expected output:
members
id | first_name | last_name
_______________________________
1 | Bill | Smith
2 | Joe | Jones
transactions table
id | giver_id | getter_id | status
________________________________________
1 | 1 | 2 | complete
2 | 1 | 2 | complete
So running my query should return:
1 | Bill | Smith | 2 | 0
2 | Joe | Jones | 0 | 2
Simple LEFT JOIN should suffice:
SELECT
m.first_name,
m.last_name,
SUM(CASE WHEN m.id = t.giver_id THEN 1 END) AS giver_count,
SUM(CASE WHEN m.id = t.getter_id THEN 1 END) AS getter_count
FROM members AS m
LEFT JOIN transactions AS t ON m.id = t.giver_id OR m.id = t.getter_id
GROUP BY m.first_name, m.last_name
Do not forget adding GROUP BY when using aggregate functions. Just because MySQL allows the query to go through without it, it doesn't mean it is advised. MySQL will pick up random row values for unaggregated columns which can be problematic. Avoid this anti-pattern.

SQL Self Join with null values

I am having all employees(manager and employees) under one table called Employee. Table looks as follows,
Table
+-------+------------+---------+---------+------------+
|emp_id | name | dept_id | salary | manager_id |
+=======+============+=========+=========+============+
| 1 | Sally | 1 | 20000 | null |
| 2 | Ajit | 2 | 20000 | 1 |
| 3 | Rahul | 1 | 20000 | 1 |
| 4 | uday | 1 | 20000 | null |
| 5 | john | 1 | 20000 | null |
| 6 | netaji | 2 | 20000 | 2 |
| 7 | prakriti | 3 | 1111 | 3 |
| 8 | sachin | 3 | 1111 | 3 |
| 9 | santosh | 1 | 1111 | 2 |
| 10 | Ravi | 1 | 1111 | 2 |
+-------+------------+---------+---------+------------+
Both managers and employees belong to same table. manager_id refers = emp_id who is manager.
I want to write query to count number of employees belonging to each manager. So even if certain manager doesn't have any employee under her or him the count will show as 0
Result should be as follows,
Expected Output
+------+----------+
|Count | Manager |
+======+==========+
| 2 | Sally |
| 3 | Ajit |
| 2 | Rahul |
| 0 | Uday |
| 0 | John |
+------+----------+
You need to do left self-join on the table. The left join will ensure that there is a row for every manager even if there are no employees under them. You need to use the COUNT() aggregate on a field from the employee side of the join that will be NULL if the manager has no employees. COUNT() doesn't actually count NULLs so this should give you zeroes where you want them.
The WHERE clause in this query defines managers by looking if their manager_id is NULL or if there are any matches in the joined table which means there are people that have them set as their manager.
SELECT mgr.name, COUNT(emp.emp_id) AS employee_count
FROM Employee AS mgr
LEFT JOIN Employee AS emp ON emp.manager_id=mgr.emp_id
WHERE mgr.manager_id IS NULL OR emp.emp_id IS NOT NULL
GROUP BY mgr.name
The correct solution likely involves fixing the scheme as any approach will fail for a "sub-manager" (who is managed and thus has a manager_id) but does not currently manage anybody.
Anyway, if the above limitation is acceptable, then people are managers if either
They have a NULL manager_id (as stated in a comment), or
They currently manage people other employees
Then this query (example sqlfiddle) can be used:
SELECT m.name as Manager, COUNT(e.id) as `Count`
FROM employee m
LEFT JOIN employee e
ON m.id = e.manager_id
GROUP BY m.id, m.name, m.manager_id
HAVING `Count` > 0 OR m.manager_id IS NULL
Notes/explanation:
The LEFT [OUTER] join is important here; otherwise managers who did not manage anybody would not be found. The filtering is then applied via the HAVING clause on the grouped result.
The COUNT is applied to a particular column, instead of *; when done so, NULL values in that column are not counted. In this case that means that employees (m) without a match (e) are not automatically selected by the COUNT condition in the HAVING. (The LEFT JOIN leaves in the left-side records, even when there is no join-match - all the right-side columns are NULL in this case.)
The GROUP BY contains all the grouping fields, even if they appear redundant. This allows the manager_id field to be used in the HAVING, for instance. (The group on ID was done in case two managers ever have the same name, or it is to be selected in the output clause.)
here is the solution; you are to make self join on employee table.
SELECT e1.manager_id, e2.name, COUNT (1) AS COUNT
FROM Employee e1 JOIN Employee e2 ON e1.manager_id = e2.id
GROUP BY e1.manager_id, e2.name
UNION ALL
SELECT e3.id, e3.name, 0 AS COUNT
FROM Employee e3
WHERE manager_id IS NULL
AND e3.id NOT IN ( SELECT e1.manager_id
FROM Employee e1
JOIN
Employee e2
ON e1.manager_id = e2.id
GROUP BY e1.manager_id, e2.name)
Maybe that helps:
select t1.name, count(*) -- all managers with emps
from t t1
join t t2
on t1.emp_id = t2.manager_id
group
by t1.name
union
all
select t1.name, 0 -- all managers without emps
from t t1
left
join t t2
on t1.emp_id = t2.manager_id
where t1.manager_id is null
and t2.emp_id is null
try below:
select (select count(*) from employees b where b.manager_id = a.emp_id)) as Count, a.Name as manager from employees a where a.emp_id in (select distict c.manager_id from employees c)
Query
CREATE TABLE employee(emp_id varchar(5) NOT NULL,
emp_name varchar(20) NULL,
dt_of_join date NULL,
emp_supv varchar(5) NULL,
CONSTRAINT emp_id PRIMARY KEY(emp_id) ,
CONSTRAINT emp_supv FOREIGN KEY(emp_supv)
REFERENCESemployee(emp_id));
you need to do
LEFT OUTER JOIN like this:
SELECT movies.title,sequelies.title AS sequel_title
FROM movies
LEFT OUTER JOIN movies sequelies
ON movies.sequel_id = sequelies.id ;