Regarding MySQL queries, how can I get all table entries that belong to combinations of entries in another table?
Background: I would like to count orders which consist of order items. Each order item has a state like 'canceled' or 'delivered'. There are partial deliveries, so that there can be both delivered and canceled orders in one order. I would like to count the net orders and I would like to know how many orders have items with more than one item status.
Order Number | Order Item | Status
X0001 | Item | delivered
X0001 | Item | delivered
X0002 | Item | delivered
X0002 | Item | canceled
X0002 | Item | delivered
X0003 | Item | delivered
I have 3 valid orders here and one order with delivered/canceled. How can I ask for all orders which have at least one delivered and one canceled item?
As I am very new to MySQL I am basically looking for the right approach. Do I need subqueries or joints for this?
Edit: First of all, sorry for the late reply. And sorry again because my question was obviously misleading.
There are three tables involved: 'order', 'order_item' and 'order_item_status'.
'order' and 'order_item' are linked through 'order_item.fk_order' and 'order.id_order'.
'order_item' and 'order_item_status' are linked through 'order_item.fk_order_item_status' and 'order_item_status.id_order_item_status'.
You have been very helpful so far but I am still a bit stuck as I do not know how to finally count by combination.
The perfect result would be something like that:
{shipped} | 34
{canceled} | 12
{shipped, canceled} | 8
{closed, canceled} | 4
{closed} | 27
... | ...
But I don't know how to deal with combinations in a query. Maybe you have some more helpful hints for me ...
Thank you very much.
I assume you have two table:
orders (list of your orders)
order_items (list of itemd of order)
two table are relationed by field fk_order.
try this:
SELECT o.id, (select count(*) from order_items i where i.status = 'delivered' and i.fk_order = o.id) as item_delivered, (select count(distinct i.status) from order_items i where i.fk_order = o.id) as status_no
FROM orders o
I assume your data are all in one table like you described. Then you want to group your data by order number and then only get the orders with both canceled and delivered ?
The request should also look like :
select * from table \
group by Order Number \
Having State="canceled" and State="delivered"
By join on the table with the same table you can select rows with different values. Then count the distinct order numbers.
SELECT COUNT(DISTINCT orig.Number)
FROM order_items AS orig
INNER JOIN order_items AS diff ON diff.Number = orig.Number AND diff.Status != orig.Status
This is quicker than using sub-queries.
Edit: Forgot to mention it's a more scalable implementation as well as you can add more order statuses without this failing.
I don't understand why you are talking about "combinations of entries in another table" ?
According to the following of your question you have one single table with - at least - 3 attributes: Order number, Order item, Status. Is that correct or are some of those attributes in another table and in that case can you elaborate ?
Otherwise I would try to use sub-requests and do something like:
SELECT T1.* FROM table T1 WHERE T1.status=canceled AND T1.order_number IN (SELECT T2.order_number FROM table T2 where T2.status=delivered)
Related
I have very limited experience with MySQL past standard queries, but when it comes to joins and relations between multiple tables I have a bit of an issue.
I've been tasked with creating a job that will pull a few values from a mysql database every 15 minutes but the info it needs to display is pulled from multiple tables.
I have worked with it for a while to figure out the relationships between everything for the phone system and I have discovered how I need to pull everything out but I'm trying to find the right way to create the job to do the joins.
I'm thinking of creating a new table for the info I need, with columns named as:
Extension | Total Talk Time | Total Calls | Outbound Calls | Inbound Calls | Missed Calls
I know that I need to start with the extension ID from my 'user' table and match it with 'extensionID' in my 'callSession'. There may be multiple instances of each extensionID but each instance creates a new 'UniqueCallID'.
The 'UniqueCallID' field then matches to 'UniqueCallID' in my 'CallSum' table. At that point, I just need to be able to say "For each 'uniqueCallID' that is associated with the same 'extensionID', get the sum of all instances in each column or a count of those instances".
Here is an example of what I need it to do:
callSession Table
UniqueCallID | extensionID |
----------------------------
A 123
B 123
C 123
callSum table
UniqueCallID | Duration | Answered |
------------------------------------
A 10 1
B 5 1
C 15 0
newReport table
Extension | Total Talk Time | Total Calls | Missed Calls
--------------------------------------------------------
123 30 3 1
Hopefully that conveys my idea properly.
If I create a table to hold these values, I need to know how I would select, join and insert those things based on that diagram but I'm unable to construct the right query/statement.
You simply JOIN the two tables, and do a group by on the extensionID. Also, add formulas to summarize and gather the info.
SELECT
`extensionID` AS `Extension`,
SUM(`Duration`) AS `Total Talk Time`,
COUNT(DISTINCT `UniqueCallID`) as `Total Calls`,
SUM(IF(`Answered` = 1,0,1)) AS `Missed Calls`
FROM `callSession` a
JOIN `callSum` b
ON a.`UniqueCallID` = b.`UniqueCallID`
GROUP BY a.`extensionID`
ORDER BY a.`extensionID`
You can use a join and group by
select
a.extensionID
, sum(b.Duration) as Total_Talk_Time
, count(b.Answered) as Total_Calls
, count(b.Answered) -sum(b.Answered) as Missed_calls
from callSession as a
inner join callSum as b on a.UniqueCallID = b.UniqueCallID
group by a.extensionID
This should do the trick. What you are being asked to do is to aggregate the number of and duration of calls. Unless explicitly requested, you do not need to create a new table to do this. The right combination of JOINs and AGGREGATEs will get the information you need. This should be pretty straightforward... the only semi-interesting part is calculating the number of missed calls, which is accomplished here using a "CASE" statement as a conditional check on whether each call was answered or not.
Pardon my syntax... My experience is with SQL Server.
SELECT CS.Extension, SUM(CA.Duration) [Total Talk Time], COUNT(CS.UniqueCallID) [Total Calls], SUM(CASE CS.Answered WHEN '0' THEN SELECT 1 ELSE SELECT 0 END CASE) [Missed Calls]
FROM callSession CS
INNER JOIN callSum CA ON CA.UniqueCallID = CS.UniqueCallID
GROUP BY CS.Extension
I have the following situation. I have a table with all info of article. I will like to compare the same column with it self. because I have multiple type of article. Single product and Master product. the only way that I have to differences it, is by SKU. for example.
ID | SKU
1 | 11111
2 | 11112
3 | 11113
4 | 11113-5
5 | 11113-8
6 | 11114
7 | 11115
8 | 11115-1-W
9 | 11115-2
10 | 11116
I only want to list or / and count only the sku that are full unique. follow th example the sku that are unique and no have variant are (ID = 1, 2, 6 and 10) I will want to create a query where if 11113 are again on the column not cout it. so in total I will be 4 unique sku and not "6 (on total)". Please let me know. if this are possible.
Assuming the length of master SKUs are 5 characters, try this:
select a.*
from mytable a
left join mytable b on b.sku like concat(a.sku, '%')
where length(a.sku) = 5
and b.sku is null
This query joins master SKUs to child ones, but filters out successful joins - leaving only solitary master SKUs.
You can do this by grouping and counting the unique rows.
First, we will need to take your table and add a new column, MasterSKU. This will be the first five characters of the SKU column. Once we have the MasterSKU, we can then GROUP BY it. This will bundle together all of the rows having the same MasterSKU. Once we are grouping we get access to aggregate functions like COUNT(). We will use that function to count the number of rows for each MasterSKU. Then, we will filter out any rows that have a COUNT() over 1. That will leave you with only the unique rows remaining.
Take that unique list and LEFT JOIN it back into your original table to grab the IDs.
SELECT ID, A.MasterSKU
FROM (
SELECT
MasterSKU = SUBSTRING(SKU,1,5),
MasterSKUCount = COUNT(*)
FROM MyTable
GROUP BY SUBSTRING(SKU,1,5)
HAVING COUNT(*) = 1
) AS A
LEFT JOIN (
SELECT
ID,
MasterSKU = SUBSTRING(SKU,1,5)
FROM MyTable
) AS B
ON A.MasterSKU = B.MasterSKU
Now one thing I noticed from you example. The original SKU column really looks like three columns in one. We have multiple values being joined with hypens.
11115-1-W
There may be a reason for it, but most likely this violates first normal form and will make the database hard to query. It's part of the reason why such a complicated query is needed. If the SKU column really represents multiple things then we may want to consider breaking it out into MasterSKU, Version, and Color or whatever each hyphen represents.
I have a SELECT query that returns some fields like this:
Date | Campaign_Name | Type | Count_People
Oct | Cats | 1 | 500
Oct | Cats | 2 | 50
Oct | Dogs | 1 | 80
Oct | Dogs | 2 | 50
The query uses aggregation and I only want to include results where when Type = 1 then ensure that the corresponding Count_People is greater than 99.
Using the example table, I'd like to have two rows returned: Cats. Where Dogs is type 1 it's excluded because it's below 100, in this case where Dogs = 2 should be excluded also.
Put another way, if type = 1 is less than 100 then remove all records of the corresponding campaign name.
I started out trying this:
HAVING CASE WHEN type = 1 THEN COUNT(DISTINCT Count_People) > 99 END
I used Teradata earlier int he year and remember working on a query that used an analytic function "Qualify PartitionBy". I suspect something along those lines is what I need? I need to base the exclusion on aggregation before the query is run?
How would I do this in MySQL? Am I making sense?
Now that I understand the question, I think your best bet will be a subquery to determine which date/campaign combinations of a type=1 have a count_people greater than 99.
SELECT
<table>.date,
<table>.campaign_name,
<table>.type,
count(distinct count_people) as count_people
FROM
(
SELECT
date,
campaign_name
FROM
<table>
WHERE type=1
HAVING count(distinct count_people) > 99
GROUP BY 1,2
) type1
LEFT OUTER JOIN <table> ON
type1.campaign_name = <table>.campaign_name AND
type1.date = <table>.date
WHERE <table>.type IN (1,2)
GROUP BY 1,2,3
The subquery here only returns campaign/date combinations when both the type=1 AND it has greater than 99 count_people. It uses a LEFT JOIN back to the to insure that only those campaign/date combinations make it into the result set.
The WHERE on the main query keeps the results to only Types 1 and 2, which you stated was already a filter in place (though not mentioned in the question, it was stated in a comment to a previous answer).
Based on your comments to answer by #JNevill I think you will have no option but to use subselects to pre-filter the record set you are dealing with, as working with HAVING is going to limit you only to the current record being evaluated - there is no way to compare against previous or subsequent records in the set in this manner.
So have a look at something like this:
SELECT
full_data.date AS date,
full_data.campaign_name AS campaign_name,
full_data.type AS type,
COUNT(full_data.people) AS people_count
FROM
(
SELECT
date,
campaign_name,
type,
COUNT(people) AS people_count
FROM table
WHERE type IN (1,2)
GROUP BY date, campaign_name, type
) AS full_data
LEFT JOIN
(
SELECT
date,
campaign_name,
COUNT(people) AS people_count
FROM table
WHERE type = 1
GROUP BY date, campaign_name
HAVING people_count < 100
) AS filter
ON
full_data.date = filter.date
AND full_data.campaign_name = filter.campaign_name
WHERE
filter.date IS NULL
AND filter.campaign_name IS NULL
The first subselect is basically your current query without any attempt at using HAVING to filter out results. The second subselect is used to find all date/campaign name combos which have people_count > 100 and use those as a filter for against the full data set.
I think it will be easiest to start with the table I have and the result I am aiming for.
Name | Date
A | 03/01/2012
A | 03/01/2012
B | 02/01/2012
A | 02/01/2012
B | 02/01/2012
A | 02/01/2012
B | 01/01/2012
B | 01/01/2012
A | 01/01/2012
I want the result of my query to be:
Name | 01/01/2012 | 02/01/2012 | 03/01/2012
A | 1 | 2 | 2
B | 2 | 2 | 0
So basically I want to count the number of rows that have the same date, but for each individual name. So a simple group by of dates won't do because it would merge the names together. And then I want to output a table that shows the counts for each individual date using php.
I've seen answers suggest something like this:
SELECT
NAME,
SUM(CASE WHEN GRADE = 1 THEN 1 ELSE 0 END) AS GRADE1,
SUM(CASE WHEN GRADE = 2 THEN 1 ELSE 0 END) AS GRADE2,
SUM(CASE WHEN GRADE = 3 THEN 1 ELSE 0 END) AS GRADE3
FROM Rodzaj
GROUP BY NAME
so I imagine there would be a way for me to tweak that but I was wondering if there is another way, or is that the most efficient?
I was perhaps thinking if the while loop were to output just one specific name and date each time along with the count, so the first result would be A,01/01/2012,1 then the next A,02/01/2012,2 - A,03/01/2012,3 - B,01/01/2012,2 etc. then perhaps that would be doable through a different technique but not sure if something like that is possible and if it would be efficient.
So I'm basically looking to see if anyone has any ideas that are a bit outside the box for this and how they would compare.
I hope I explained everything well enough and thanks in advance for any help.
You have to include two columns in your GROUP BY:
SELECT name, COUNT(*) AS count
FROM your_table
GROUP BY name, date
This will get the counts of each name -> date combination in row-format. Since you also wanted to include a 0 count if the name didn't have any rows on a certain date, you can use:
SELECT a.name,
b.date,
COUNT(c.name) AS date_count
FROM (SELECT DISTINCT name FROM your_table) a
CROSS JOIN (SELECT DISTINCT date FROM your_table) b
LEFT JOIN your_table c ON a.name = c.name AND
b.date = c.date
GROUP BY a.name,
b.date
SQLFiddle Demo
You're asking for a "pivot". Basically, it is what it is. The real problem with a pivot is that the column names must adapt to the data, which is impossible to do with SQL alone.
Here's how you do it:
SELECT
Name,
SUM(`Date` = '01/01/2012') AS `01/01/2012`,
SUM(`Date` = '02/01/2012') AS `02/01/2012`,
SUM(`Date` = '03/01/2012') AS `03/01/2012`
FROM mytable
GROUP BY Name
Note the cool way you can SUM() a condition in mysql, becasue in mysql true is 1 and false is 0, so summing a condition is equivalent to counting the number of times it's true.
It is not more efficient to use an inner group by first.
Just in case anyone is interested in what was the best method:
Zane's second suggestion was the slowest, I loaded in a third of the data I did for the other two and it took quite a while. Perhaps on smaller tables it would be more efficient, and although I am not working with a huge table roughly 28,000 rows was enough to create significant lag, with the between clause dropping the result to about 4000 rows.
Bohemian's answer gave me the least amount to code, I threw in a loop to create all the case statements and it worked with relative ease. The benefit of this method was the simplicity, besides creating the loop for the cases, the results come in without the need for any php tricks, just simple foreach to get all the columns. Recommended for those not confident with php.
However, I found Zane's first suggestion the quickest performing and despite the need for extra php coding it seems I will be sticking with this method. The disadvantage of this method is that it only gives the dates that actually have data, so creating a table with all the dates becomes a bit more complicated. What I did was create a variable that keeps track of what date it is supposed to be compared to the table column which is reset on each table row, when the result of the query is equal to that date it echoes the value otherwise it does a while loop echoing table cells with 0 until the dates do match. It also had to do a check to see if the 'Name' value is still the same and if not it would switch to the next row after filling in any missing cells with 0 to the end of that row. If anyone is interested in seeing the code you can message me.
Results of the two methods over 3 months of data (a column for each day so roughly 90 case statements) ~ 12,000 rows out of 28,000:Bohemian's Pivot - ~0.158s (highest seen ~0.36s)Zane's Double Group by - ~0.086s (highest seen ~0.15s)
Using mysql (and php) I try to hide out of a select menu
category's that are not available for the item selection.
Basically I'm stuck and can't thing of a way to do it.
Asos uses this technique to browse trough there products ( http://www.asos.com/Men/Jeans/Cat/pgecategory.aspx?cid=4208 ).
To be clear the problem is not how to do the ajax stuff but how efficiently use mysql and php to make a system like this.
I've tried to figure out to do this with nested sets:
(1)items(22)----------------------------------
| |
| |
(2)producs(7) (8)origin(21)
| |
(3)single(4)-(5)bundle(6) |
---------------------
| |
(9)EU(14 ) (15)ASIA(20)
| |
---------------- -----------------
| | | |
(10)DU(11) (12)PL(13) (16)CN(17) (18)JP(19)
Menu:
type:
-single
-bundle
origin:
-EU
-ASIA
country:
-DU
-PL
-CN
-JP
Lets say there's one item in the db:
item '1' is in:
-single
-EU
-DU
Item category table:
itemID - cat_lft - cat_rgt
1 10 11
1 7 12
1 2 3
In the menu selecting DU should select:
DU-EU-SINGLE
And deselected the rest (return a array?) because theres is only one item for now.
You could get the items and compare if the result are present in the remaining category's, but that sounds like using a lot of query's...
Any advice where to go from here?
Here's how I would do it. Make sure your products are categorized with leaves only (i.e. left = right + 1). So, a product can be categorized as "DU," but not "origin" or "ASIA."
Category table:
category_id
name
left
right
Products table:
product_id
name
...
Categories_Products table:
category_id
product_id
SQL query to determine checked vs. unchecked categories:
SELECT
`categories`.`category_id`,
`categories`.`name`,
IFNULL(COUNT(`categories_products`.`id`), 0) AS `count` FROM `categories`
LEFT JOIN `categories` as `children` ON
`children`.`left` > `categories`.`left` AND
`children`.`right` < `categories`.`right`
JOIN `categories_products` ON
`categories_products`.`category_id` = `children`.`category_id`
GROUP BY `categories`.`category_id`
If "count" is 0, leave it unchecked; otherwise, it's checked, and you know how many items are in that category. This is untested, so it might need some tweaking. You DEFINITELY want indexes on category_id, product_id, left, and right columns.
Addition: SQL query to select products in one or more categories (see below for compound conditions):
SELECT `products`.* FROM `categories`
JOIN `categories` as `children` ON
`children`.`left` >= `categories`.`left` AND
`children`.`right` <= `categories`.`right`
JOIN `categories_products` ON
`categories_products`.`category_id` = `children`.`category_id`
JOIN `products` ON
`products`.`product_id` = `categories_products`.`product_id`
WHERE `categories`.`category_id` IN (1,2,3,n,...)
GROUP BY `products`.`product_id`
Addition: Logic to select products in multiple categories, compound filtered (product must be in all categories to be a match):
You'll have to do separate joins for every category requirement you want. Please note that this type of query could get pretty slow, so do some benchmarking as you go. You might be better off with multiple queries (one for each category), combining the results in the application code.
Along those same lines, I'd recommend that something like "single vs. bundle" be an ENUM in your products table, and categorization is left to more abstract structures (region works, and so would a traditional consumer-facing, "logical" categorization structure) that aren't a simple either/or flag.