SQL queries using self-join - mysql

I can make simple select, join, update queries. But this seems a bit hard for me (I'm just learning).
Customer has a table like this (using Mysql) (I don't have control on his DB schema, I can't create a Customers table for him. I just need to create some customer reports).
+-----------+--------------+--------------------------+
|Transaction|Customers name|Customers email |Set |
+-----------+--------------+--------------------------+
| 1 | John | jo#gmail.com | blue |
| 2 | Mary | ma#gmail.com | green |
| 3 | Paul | pa#gmail.com | red |
| 4 | JOHN G. | jo#gmail.com | green |
| 5 | Paul Simon | pa#gmail.com | blue |
+-----------+--------------+--------------------------+
As you can see, each transaction the customer enters freely his name. That could lead to apparently more customers, but email field is unique.
I need to make these reports (all of them are driven by what he bought - the 'Set' field):
1) AND searchs (like 'blue' AND 'green')
Customers that have bought 'this' AND 'that' set.
I need to get a result like this:
|John | jo#gmail.com |
or this (as I said, John could enter his name different ways each transaction. If the email is unique, it is fine):
|JOHN G. | jo#gmail.com |
2) OR searchs (like 'blue' or 'red')
Need to get this:
|John | jo#gmail.com |
|Paul | pa#gmail.com |
or this:
|John | jo#gmail.com |
|Paul Simon| pa#gmail.com |
3) Bought one set, but not the other (like 'green' but not 'blue')
|Mary | ma#gmail.com |
Doe anyone know how to do that? I believe this could be accomplished by some kind of 'self join'. But as I'm just a beginner, I couldn't figure out how to solve this.

Obviously a person can buy this or that set and I imagine it is even possible that one person buys the same set again in a later transaction.
So you want information per person. The easiest way is to aggregate you data hence by grouping by person (GROUP BY). Then you check your aggregates in the HAVING clause: Did the customer by set X and / or y?
Query 1:
select email, name
from transactions
group by email
having max(case when set = 'blue' then 1 else 0 end) = 1
and max(case when set = 'green' then 1 else 0 end) = 1;
Query 2:
select email, name
from transactions
group by email
having max(case when set = 'blue' then 1 else 0 end) = 1
or max(case when set = 'red' then 1 else 0 end) = 1;
Query 3:
select email, name
from transactions
group by email
having max(case when set = 'green' then 1 else 0 end) = 1
and max(case when set = 'blue' then 1 else 0 end) = 0;
The name you get with these queries is just one of the matching names arbitrarily chosen. This is something special in MySQL. In standard SQL this would not be allowed. Anyway, whether MySQL or standard SQL, you could also use MIN(name) or MAX(name) to always get the first or last in alphabetical order.
By the way: The CASE WHEN expressions are standard SQL. MySQL, however, features an additional special boolean handling: a true expression evaluates to 1 and a false expression to 0. So in MySQL you can simply write max(set = 'green') = 1 instead of max(case when set = 'green' then 1 else 0 end) = 1.

For first query - I am using sub query here
select name,email from customer c where Set = 'blue' and c.email = (select email from customer where Set = 'green' and email = c.email);
For second query- simple or condition will be enough
select * from customer where Set = 'green' or Set = 'blue' group by email ;
For Third query- (its a work around, as per your requirement it will work,it is based on approach like there should be only 1 record and that record should have Set value as mentioned int the input)
select * from customer group by email having count(pset)= 1 and pset like 'green';

Related

MySQL JOIN Statement from Multiple Tables

I have an old database of entries from an abandoned "Joomgalaxy" Joomla plugin.
There are three tables, joomgalaxy_entries, joomgalaxy_fields, and joomgalaxy_entries_data
The id from the entries table matches the entry_id in the entries_data table, but the actual field name is saved in another table, fields
Can someone please help me with the correct SQL statement to obtain results like you can see below in Ultimate Goal? My MySQL knowledge is very basic, and from my searching it sounds like I need to use a LEFT JOIN, but I have no idea how to use the value from field_name as the column name for returned values
Thank You!!
joomgalaxy_entries
---------------------------------------
| id | title | longitude | latitude |
---------------------------------------
| 50 | John | -79.333333 | 43.669999 |
| 51 | Bob | -79.333333 | 43.669999 |
---------------------------------------
joomgalaxy_fields
This is just two examples below to keep it simple, there are more than just these two, so it would have to be able to handle dynamically using the field_name as the column name.
--------------------------------
| id | field_type | field_name |
--------------------------------
| 1 | textbox | websiteurl |
| 2 | dropdown | occupation |
--------------------------------
joomgalaxy_entries_data
"Technically" there shouldn't be any duplicate entries (fieldid and entry_id), so from my understanding that shouldn't affect using the field_name from above as the column name, but what if there ends up being one?
-------------------------------------
| fieldid | field_value | entry_id |
-------------------------------------
| 1 | google.com | 50 |
| 2 | unemployed | 50 |
| 1 | doctor.com | 51 |
| 2 | doctor | 51 |
-------------------------------------
Ultimate Goal
Ultimately trying to get this type of result, so I can then use that statement in MySQL Workbench to export the data that would look like this:
------------------------------------------------------------------
| id | title | longitude | latitude | websiteurl | occupation |
------------------------------------------------------------------
| 50 | John | -79.333333 | 43.669999 | google.com | unemployed |
| 51 | Bob | -79.333333 | 43.669999 | doctor.com | doctor |
------------------------------------------------------------------
EDIT:
There are more than just the two fields websiteurl and occupation, I was just using those two as examples, there are numerous fields that are all different, so in theory pulling the value from field_name would be used for the column name
You can use some conditional logic, like a CASE statement, along with an aggregate function like max() or min() to return those values as columns:
SELECT je.id,
je.title,
je.longitude,
je.latitude,
max(case when jf.fieldid = 1 then jed.field_value end) as WebsiteUrl,
max(case when jf.fieldid = 2 then jed.field_value end) as Occupation
FROM joomgalaxy_entries je
INNER JOIN joomgalaxy_entries_data jed
on je.id = jed.entry_id
GROUP BY je.id,
je.title,
je.longitude,
je.latitude
Using an INNER JOIN will only return the joomgalaxy_entries rows that have values in each table, if you want to return all joomgalaxy_entries even if there are no matching rows to join on in the other tables, then change the INNER JOIN to a LEFT JOIN.
You can write a simple SELECT query like this:
SELECT je.id, je.title, je.longitude, je.latitude,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 1 AND entry_id = je.id) AS websiteurl,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 2 AND entry_id = je.id) AS occupation
FROM joomgalaxy_entries je;
First step is easy:
SELECT JE.id, JE.title, JE.longitude, JE.latitude
FROM joomgalaxy_entries JE
Now you need to JOIN:
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
JD.*
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
Now you need convert rows to columns
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
MIN(CASE WHEN fieldid = 1 THEN JD.field_value END) as WebsiteUrl,
MIN(CASE WHEN fieldid = 2 THEN JD.field_value END) as Occupation
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
GROUP BY JE.id, JE.title, JE.longitude, JE.latitude
This depend on you only have two field for each entry, if number of field is dynamic you would need a different aproach.
This should work:
select id, title, longitude, latitude,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomgalaxy_fields
where field_name = 'websiteurl')
and jed.entry_id = je.id
) as websiteurl,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomlgalaxy_fields
where field_name = 'occupation')
and jed.entry_id = je.id) as occupation
from joomgalaxy_entries je;
Note that the reason to have a left join would be if either websiteurl or occupation were null, however, this solution should work in that case anyway.
Well, that certainly makes it a bit more difficult... :) Honestly, I'm not sure what you're asking is possible with a static sql query. I'm sure someone will speak up, however, if I'm wrong.
That said, I do have a few options you can try:
Option 1 - Generate the SQL Dynamically
Assuming this is mysql, if you execute the following SQL, it will generate the subqueries dynamically:
select concat('(select field_value from joomgalaxy_entries_data jed ',
'where fieldid = (select id from joomgalaxy_fields ',
'where field_name = ''', field_name, ''') ',
'and jed.entry_id = je.id) as ', field_name, ',')
from joomgalaxy_fields;
Take the result of that command, copy-paste it into a text editor and add the following at the beginning:
select id, title, longitude, latitude,
And the rest of this at the end:
from joomgalaxy_entries je;
Then run your new uber-query and go grab a cup of copy, lunch, or a good night's sleep depending on how much data is in your database.
Alternatively, you could add all of this to a stored procedure so you don't have to hand edit the SQL. Also, note that my syntax works for MySQL. Other databases have different concatenation operators so you may have to work around that if applicable. Also, with 50+ subqueries there is a good chance this uber-query will be quite slow, maybe too slow to make this option viable.
Option 2 - Create a table structured the way you want, and populate it
Hopefully, this is self-explanatory, but just create a new table with all of the necessary columns from the joomgalaxy_fields table. Then populate each column separately with a long series of what should be pretty straightforward sql commands. Granted this option is only viable if the database is no longer in use which I believe you indicated. From there the result is just:
select * from my_new_table;

SQL GROUP BY Issue on GROUP BY

I've written a query that builds a small table of information from a couple of data sources, it uses a self made table to reference the vehicle model for the final group by which is how the data needs to be viewed, however when I group by vehicle it misses out figures in the subquery column from the group by, i.e. if I group by Prefix it shows the correct numbers, grouped by Vehicle hides off some of the data.
The Prefix can relate to a couple of like vehicle models and hence the need to group by vehicle. Can anyone see what I've done wrong easily from the SQL query below please.
SELECT Vehicle, COUNT(`Chassis-No`) AS Stock,
ROUND((100/COUNT(`Chassis-No`)) * SUM(CASE WHEN `Vehicle Age` > '182' THEN 1 ELSE 0 END),1) AS Perc6Months,
ROUND((100/COUNT(`Chassis-No`)) * SUM(CASE WHEN `Vehicle Age` > '365' THEN 1 ELSE 0 END),1) AS Perc12Months,
(SELECT COUNT(VIN_Prefix) FROM Orderdownload
INNER JOIN VehicleMatrix ON (`VIN_Prefix` LIKE 'S%' AND Prefix = LEFT(`VIN_Prefix`,2)) OR (`VIN_Prefix` NOT LIKE 'S%' AND Prefix = LEFT(`VIN_Prefix`,1)) WHERE DealerCode = 'AA12345' AND `VIN_Prefix` = IF(LEFT(`Chassis-No`,1)='S',LEFT(`Chassis-No`,2),LEFT(`Chassis-No`,1))) As Qty
FROM DealerAgedStock
INNER JOIN VehicleMatrix AS VM
ON (`Chassis-No` LIKE 'S%' AND Prefix = LEFT(`Chassis-No`,2)) OR (`Chassis-No` NOT LIKE 'S%' AND Prefix = LEFT(`Chassis-No`,1))
WHERE `DL Dealer Code` = 'AA12345'
GROUP BY Vehicle
Grouped on Vehicle I get the following:
Vehicle | Perc6Months | Perc12Months | Qty
Mondeo | 37.5 | 0 | 2
Grouped on Prefix I get the following:
VIN_Prefix | Perc6Months | Perc12Months | Qty
S1 | 25 | 0 | 2
S2 | 50 | 0 | 2
Ideally it should look this this:
Vehicle | Perc6Months | Perc12Months | Qty
Mondeo | 37.5 | 0 | 4
Where S1 and S2 are relative to the Vehicle Mondeo, thus it gives me the first instance of subquery rather than adding them together.
My question is: why does the Group By not add the figures together properly from the subquery? I need it to add them to have the correct figures...

Totalizing and grouping data from an SQL query

I have a query that will display a list of products in order of their import/export property.
select i.`itemid`, p.`prodid`,
(
(case when p.`desc` like "Import/Export"
then 100 else 0 end) +
(case when p.`desc` like "Export"
then 70 else 0 end) +
(case when p.`desc` like "Import"
then 50 else 0 end)
) as priority
from item i , product p
where (
p.`name` LIKE "cleaning agent"
and p.`prodid` = i.`itemid`
)
The query does fine in adding a "priority" value to each product but what I would like to ask is how do I group them by ID and total the priority based on the ID? I can group similar prodid rows with the Order by keyword, but then it just gives me a single value for the priority field.
What I want to achieve is to group all similar product id's and get a total of their priority value. I've used sum() in select statements before, but I'm at a loss at trying to figure out how to get the total of all priority fields because it is a query-generated column.
+--------+----------+
| prodid | priority |
+--------+----------+
| 225 | 50 |
| 225 | 20 |
+--------+----------+
should be
+--------+----------+
| prodid | priority |
+--------+----------+
| 225 | 70 |
+--------+----------+
Here is a sqlfiddle: http://sqlfiddle.com/#!2/cec136/5
You can do this by turning your query into an aggregation using group by:
select p.`prodid`,
sum(case when p.`desc` like 'Import/Export' then 100
when p.`desc` like 'Export' then 70
when p.`desc` like 'Import' then 50
else 20
end) as priority
from item i join
product p
on p.`prodid` = i.`prodid`
where p.`type` LIKE 'cleaning agent'
group by p.prodid;
Along the way, I fixed a few things:
The join is now explicit in the from clause, rather than implicit in the where clause.
Because i.prodid = p.prodid, there is no need to include both in the select.
I changed the case statement to cascade. Only one of the conditions can match, so there is no reason to add things together.
I changed all the string constants to use single quotes rather than double quotes.

Counting partial values or conditional values

I hope someone can lend an assist and some advise here. I'm trying to get a fairly complex result and not sure if I can do it as one query with subqueries, a union, or simply separate queries to be merged into excel after the fact.
I'm working with a legacy database from my predecessor with the following tables:
Business (columns working with: id, sector, state)
Forms (columns working with: Submitted (Y/N), id, business_id)
Inventory (Columns working with: In_stock (Y/N), id, form_id)
I'm trying to get a final result that looks like this:
| SubmittedForms | Unsubmitted Forms | Sector | State |
|-----------------------------------------------------|
| 10 | 5 | Agr | UT |
| 0 | 7 | Chem | MT |
| 2 | 1 | Bio | OK |
| 13 | 0 | Chem | NM |
The main problem I'm getting is that while submitted forms doesn't need any further arguments and is a simple count, the unsubmitted forms are dependent on the Inventory.in_stock='Y'. Here's my query for the submitted forms:
SELECT COUNT(Forms.id) AS Submitted, Business.sector, Business.state
FROM Forms
JOIN Business ON Forms.business_id=Business.id
WHERE Forms.submitted='Y'
GROUP BY Business.state, Business.sector
Unfortunately, I can't seem to get the unsubmitted forms number to calculate correctly. It just returns the total count of rows where in_stock is Y for that sector.
If it's easier to run a separate query for Submitted and Unsubmitted that's fine for the end result but I need some help getting the correct count of Unsubmitted forms with the in_stock flagged as Y. Also, I attempted to use a COUNT DISTINCT but takes way too long, was still running after 10 minutes. Another complication I can envision in a single query option is the possibility of 0/null values in either Submitted or Unsubmitted forms
Any help is greatly appreciated!
One option:
SELECT COUNT(CASE WHEN Forms.submitted = 'Y' THEN 1 END) SubmittedForms,
COUNT
( CASE WHEN Forms.submitted = 'N'
AND EXISTS ( SELECT 1
FROM Inventory
WHERE form_id = Forms.id
AND in_stock = 'Y'
)
THEN 1
END
) UnsubmittedForms,
Business.sector Sector,
Business.state State
FROM Forms
RIGHT
OUTER
JOIN Business
ON Forms.business_id = Business.id
GROUP
BY Business.sector,
Business.state
;
Another option, which might perform better:
SELECT COUNT(CASE WHEN Forms.submitted = 'Y' THEN 1 END) SubmittedForms,
COUNT(CASE WHEN Forms.submitted = 'N' THEN 1 END) UnsubmittedForms,
Business.sector Sector,
Business.state State
FROM ( SELECT *
FROM Forms
WHERE submitted = 'Y'
OR id IN ( SELECT DISTINCT form_id
FROM Inventory
AND in_stock = 'Y'
)
) Forms
RIGHT
OUTER
JOIN Business
ON Forms.business_id = Business.id
GROUP
BY Business.sector,
Business.state
;

sql query help on multiple count columns and group by

i have the following table Students:
id | status | school | name
----------------------------
0 | fail | skool1 | dan
1 | fail | skool1 | steve
2 | pass | skool2 | joe
3 | fail | skool2 | aaron
i want a result that gives me
school | fail | pass
---------------------
skool1 | 2 | 0
skool2 | 1 | 1
I have this but it's slow,
SELECT s.school, (
SELECT COUNT( * )
FROM school
WHERE name = s.name
AND status = 'fail'
) AS fail, (
SELECT COUNT( * )
FROM school
WHERE name = s.name
AND status = 'pass'
) AS pass,
FROM Students s
GROUP BY s.school
suggestions?
Something like this should work:
SELECT
school,
SUM(CASE WHEN status = 'fail' THEN 1 ELSE 0 END) as [fail],
SUM(CASE WHEN status = 'pass' THEN 1 ELSE 0 END) as [pass]
FROM Students
GROUP BY school
ORDER BY school
EDIT
Almost forgot, but you could also write the query this way:
SELECT
school,
COUNT(CASE WHEN status = 'fail' THEN 1 END) as [fail],
COUNT(CASE WHEN status = 'pass' THEN 1 END) as [pass]
FROM Students
GROUP BY school
ORDER BY school
I'm not sure if there's any performance benefit with second query. My guess would be if there is it's probably very small. I tend to use the first query because I think it's more clear but both should work. Also, I don't have a MySql instance handy to test with, but according to #Johan the ORDER BY clauses are unnecessary.
SELECT q.school, q.fail, q.failpass-q.fail as pass
FROM
(
SELECT s.school, sum(if(status = 'fail',1,0)) as fail, count(*) as failpass
FROM students s
GROUP BY s.school
) q
This way you save one conditional sum.
In MySQL a GROUP BY already orders the results, so a separate ORDER BY is not needed.