I'm using phpMyAdmin and I've two SQL tables:
___SalesTaxes
|--------|----------|------------|
| STX_Id | STX_Name | STX_Amount |
|--------|----------|------------|
| 1 | Tax 1 | 5.00 |
| 2 | Tax 2 | 13.50 |
|--------|----------|------------|
___Inventory
|--------|----------|----------|---------------------|
| INV_Id | INV_Name | INV_Rate | INV_ApplicableTaxes |
|--------|----------|----------|---------------------|
| 10 | Bike | 9.00 | 1 |
| 11 | Movie | 3.00 | 1,2 |
|--------|----------|----------|---------------------|
INV_ApplicableTaxes list the applicable taxes.
For each item in the ___Inventory table, I have the table ___SalesTaxes linked to know witch taxes is applicable to the item.
How can I list items in ___Inventory and sum applicable taxes to have something like this:
Bike - Applicable sum of taxes is 5.00%
Movie - Applicable sum of taxes is 18.50%
What I already tried is:
SELECT
a.INV_ApplicableTaxes,
(
SELECT Count(b.STX_Amount)
FROM ___SalesTaxes
Where b.STX_Id = a.INV_ApplicableTaxes
) as b_count
FROM ___Inventory
Thanks.
You have a very poor data structure. You should not be storing 1,2 in a single column. Why not?
Integers should be stored as integers, not strings.
Foreign key relationships should be properly declared.
SQL has relatively poor string handling functions.
The SQL optimizer cannot optimize string operations very well, particularly between tables.
The proper way to store this would be using a separate itemTaxes table, with one row per item and one row per applicable tax.
That said, sometimes we are stuck with other people's really bad design decisions and can't do anything about it (until performance is so bad that we have to).
You can do what you want using like:
SELECT i.INV_ApplicableTaxes,
(SELECT SUM(st.STX_Amount)
FROM ___SalesTaxes st
WHERE ',' || st.STX_Id || ',' LIKE '%,' || i.INV_ApplicableTaxes || ',%'
) as sum_taxes
FROM ___Inventory i;
Note: This uses ANSI standard syntax for string concatenation. Some databases have their own syntax.
EDIT:
In MySQL, you can express this using find_in_set():
SELECT i.INV_ApplicableTaxes,
(SELECT SUM(st.STX_Amount)
FROM ___SalesTaxes st
WHERE FIND_IN_SET(st.STX_Id, i.INV_ApplicableTaxes) > 0
) as sum_taxes
FROM ___Inventory i;
Looking forward, this is quite a questionable design for a database, and sooner or later, the lack of 1NF normalization may very well cause you problems.
However, if changing the database schema is not an option, you could use the find_in_set function to help perform this join:
SELECT i.*,
(SELECT SUM(s.stx_amount)
FROM ___SalesTaxes s
WHERE FIND_IN_SET(s.stx_id, i.inv_applicableTaxes) > 0
) AS total_tax
FROM ___Inventory i
Related
Please note that I'm an absolute n00b in MySQL but somehow I managed to build some (for me) complex queries that work as they should. My main problem now is that for a many of the queries we're working on:
The querie is becoming too big and very hard to see through.
The same subqueries get repeated many times and that is adding to the complexity (and probably to the time needed to process the query).
We want to further expand this query but we are reaching a point where we can no longer oversee what we are doing. I've added one of these subqueries at the end of this post, just as an example.
!! You can fast foward to the Problem section if you want to skip the details below. I think the question can be answered also without the additional info.
What we want to do
Create a MySQL query that calculates purchase orders and forecasts for a given supplier based on:
Sales history in a given period (past [x] months = interval)
Current stock
Items already in backorder (from supplier)
Reserved items (for customers)
Supplier ID
I've added an example of a subquery at the bottom of this message. We're showing just this part to keep things simple for now. The output of the subquery is:
Part number
Units sold
Units sold (outliers removed)
Units sold per month (outliers removed)
Number of invoices with the part number in the period (interval)
It works quite OK for us, although I'm sure it can be optimised. It removes outliers from the sales history (e.g. one customer that orders 50 pcs of one product in one order). Unfortunately it can only remove outliers with substantial data, so if the first order happens to be 50 pcs then it is not considered an outlier. For that reason we take the amount of invoices into account in the main query. The amount of invoices has to exceed a certain number otherwise the system wil revert to a fixed value of "maximum stock" for that product.
As mentioned this is only a small part of the complete query and we want to expand it even further (so that it takes into account the "sales history" of parts that where used in assembled products).
For example if we were to build and sell cars, and we want to place an
order with our tyre supplier, the query calculates the amount of tyres we need to order based on the sales history of the various car models (while also taking into account the stock of the cars, reserved cars and stock of the tyres).
Problem
The query is becomming massive and incomprehensible. We are repeating the same subqueries many times which to us seems highly inefficient and it is the main cause why the query is becomming so bulky.
What we have tried
(Please note that we are on MySQL 5.5.33. We will update our server soon but for now we are limited to this version.)
Create a VIEW from the subqueries.
The main issue here is that we can't execute the view with parameters like supplier_id and interval period. Our subquery calculates the sum of the sold items for a given supplier within the given period. So even if we would build the VIEW so that it calculates this for ALL products from ALL suppliers we would still have the issue that we can't define the interval period after the VIEW has been executed.
A stored procedure.
Correct me if I'm wrong but as far as I know, MySQL only allows us to perform a Call on a stored procedure so we still can't run it against the parameters (period, supplier id...)
Even this workaround won't help us because we still can't run the SP against the parameters.
Using WITH at the beginning of the query
A common table expression in MySQL is a temporary result whose scope is confined to a single statement. You can refer this expression multiple times with in the statement.
The WITH clause in MySQL is used to specify a Common Table Expression, a with clause can have one or more comms-separated subclauses.
Not sure if this would be the solution because we can't test it. WITH is not supported untill MySQL version 8.0.
What now?
My last resort would be to put the mentioned subqueries in a temp table before starting the main query. This might not completely eliminate our problems but at least the main query will be more comprehensible and with less repetition of fetching the same data. Would this be our best option or have I overlooked a more efficient way to tackle this?
Many thanks for your kind replies.
SELECT
GREATEST((verkocht_sd/6*((100 + 0)/100)),0) as 'units sold p/month ',
GREATEST(ROUND((((verkocht_sd/6)*3)-voorraad+reserved-backorder),0),0) as 'Order based on units sold',
SUM(b.aantal) as 'Units sold in period',
t4.verkocht_sd as 'Units sold in period, outliers removed',
COUNT(*) as 'Number of invoices in period',
b.art_code as 'Part number'
FROM bongegs b -- Table that has all the sales records for all products
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code) -- Right Join stock data to also include items that are not in table bongegs (no sales history).
LEFT JOIN artcred ON (artcred.art_code = b.art_code) -- add supplier ID to the part numbers.
LEFT JOIN
(
SELECT
SUM(b.aantal) as verkocht_sd,
b.art_code
FROM bongegs b
RIGHT JOIN totvrd ON (totvrd.art_code = b.art_code)
LEFT JOIN artcred ON (artcred.art_code = b.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f" -- Selects only invoices
and artcred.vln = 1 -- 1 = Prefered supplier
and artcred.cred_nr = 9117 -- Supplier ID
and b.aantal < (select * from (SELECT AVG(b.aantal)+3*STDDEV(aantal)
FROM bongegs b
WHERE
b.bon_soort = 'f' and
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)) x)
GROUP BY b.art_code
) AS t4
ON (b.art_code = t4.art_code)
WHERE
b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
and b.bon_soort = "f"
and artcred.vln = 1
and artcred.cred_nr = 9117
GROUP BY b.art_code
Bongegs | all rows from sales forms (invoices F, offers O, delivery notes V)
| art_code | bon_datum | bon_soort | aantal |
|:---------|:---------: |:---------:|:------:|
| item_1 | 2021-08-21 | f | 6 |
| item_2 | 2021-08-29 | v | 3 |
| item_6 | 2021-09-03 | o | 2 |
| item_4 | 2021-10-21 | f | 6 |
| item_1 | 2021-11-21 | o | 6 |
| item_3 | 2022-01-17 | v | 6 |
| item_1 | 2022-01-21 | o | 6 |
| item_4 | 2022-01-26 | f | 6 |
Artcred | supplier ID's
| art_code | vln | cred_nr |
|:---------|:----:|:-------:|
| item_1 | 1 | 1001 |
| item_2 | 1 | 1002 |
| item_3 | 1 | 1001 |
| item_4 | 1 | 1007 |
| item_5 | 1 | 1004 |
| item_5 | 2 | 1008 |
| item_6 | 1 | 1016 |
| item_7 | 1 | 1567 |
totvrd | stock
| art_code | voorraad | reserved | backorder |
|:---------|:---------: |:--------:|:---------:|
| item_1 | 1 | 0 | 5 |
| item_2 | 0 | 0 | 0 |
| item_3 | 88 | 0 | 0 |
| item_4 | 9 | 0 | 0 |
| item_5 | 67 | 2 | 20 |
| item_6 | 112 | 9 | 0 |
| item_7 | 65 | 0 | 0 |
| item_8 | 7 | 1 | 0 |
Now, on to the query. You have LEFT JOINs to the artcred table, but then include artcred in the WHERE clause making it an INNER JOIN (required both left and right tables) in the result. Was this intended, or are you expecting more records in the bongegs table that do NOT exist in the artcred.
Well to be honest I was not fully aware that this would essentially form an INNER JOIN but in this case it doesn't really matter. A record that exists in bongegs always exists in artcred as well (every sold product must have a supplier). That doesn't work both ways since a product can be in artcred without ever being sold.
You also have RIGHT JOIN on totvrd which implies you want every record in the TotVRD table regardless of a record in the bongegs table. Is this correct?
Yes it is intended. Otherwise only products with actual sales in the period would end up in the result and we also wanted to include products with zero sales.
One simplification:
and b.aantal < ( SELECT * from ( SELECT AVG ...
-->
and b.aantal < ( SELECT AVG ...
A personal problem: my brain hurts when I see RIGHT JOIN; please rewrite as LEFT JOIN.
Check you RIGHTs and LEFTs -- that keeps the other table's rows even if there is no match; are you expecting such NULLs? That is, it looks like they can all be plain JOINs (aka INNER JOINs).
These might help performance:
b: INDEX(bon_soort, bon_datum, aantal, art_code)
totvrd: INDEX(art_code)
artcred: INDEX(vln, cred_nr, art_code)
Is b the what you keep needing? Build a temp table:
CREATE TEMPORARY TABLE tmp_b
SELECT ...
FROM b
WHERE ...;
But if you need to use tmp_b multiple times in the same query, (and since you are not yet on MySQL 8.0), you may need to make it a non-TEMPORARY table for long enough to run the query. (If you have multiple connections building the same permanent table, there will be trouble.)
Yes, 5.5.33 is rather antique; upgrade soon.
(pre
By getting what I believe are all the pieces you had, I think this query significantly simplifies the query. Lets first start with the fact that you were trying to eliminate the outliers by selecting the standard deviation stuff as what to be excluded. Then you had the original summation of all sales also from the bongegs table.
To simplify this, I have the sub-query ONCE internal that does the summation, counts, avg, stddev of all orders (f) within the last 6 months. I also computed the divide by 6 for per-month you wanted in the top.
Since the bongegs is now all pre-aggregated ONCE, and grouped per art_code, it does not need to be done one after the other. You can use the totals directly at the top (at least I THINK is similar output without all actual data and understanding of your context).
So the primary table is the product table (Voorraad) and LEFT-JOINED to the pre-query of bongegs. This allows you to get all products regardless of those that have been sold.
Since the one aggregation prequery has the avg and stddev in it, you can simply apply an additional AND clause when joining based on the total sold being less than the avg/stddev context.
The resulting query below.
SELECT
-- appears you are looking for the highest percentage?
-- typically NOT a good idea to name columns starting with numbers,
-- but ok. Typically let interface/output name the columns to end-users
GREATEST((b.verkocht_sdperMonth * ((100 + 0)/100)),0) as 'units sold p/month',
-- appears to be the total sold divided by 6 to get monthly average over 6 months query of data
GREATEST( ROUND(
( (b.verkocht_sdperMonth * 3) - v.voorraad + v.reserved - v.backorder), 0), 0)
as 'Order based on units sold',
b.verkocht_sd as 'Units sold in period',
b.AvgStdDev as 'AvgStdDeviation',
b.NumInvoices as 'Number of invoices in period',
v.art_code as 'Part number'
FROM
-- stock, master inventory, regardless of supplier
-- get all products, even though not all may be sold
Voorraad v
-- LEFT join to pre-query of Bongegs pre-grouped by the art_code which appears
-- to be basis of all other joins, std deviation and average while at it
LEFT JOIN
(select
b.arc_code,
count(*) NumInvoices,
sum( b.aantal ) verkocht_sd,
sum( b.aantal ) / 6.0 verkocht_sdperMonth,
avg( b.aantal ) AvgSale,
AVG(b.aantal) + 3 * STDDEV( b.aantal) AvgStdDev
from
bongegs b
JOIN artcred ac
on b.art_code = ac.art_code
AND ac.vln = 1
and ac.cred_nr = 9117
where
-- only for ORDERS ('f') and within last 6 months
b.bon_soort = 'f'
AND b.bon_datum > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
group by
b.arc_code ) b
-- result is one entry per arc_code, thus preventing any Cartesian product
ON v.art_code = b.art_code
GROUP BY
v.art_code
I'm trying to create a report in ReportBuilder (Digital Metaphors, not Microsoft) and I'm having trouble getting the SQL to do what I want.
I have one table with a field building:
| building |
+------------+
| WhiteHouse |
| TajMahal |
and another table with a field locations:
| id | locations |
+----+-----------------------------------------------------------------+
| 1 | WhiteHouse:RoseGarden,WhiteHouse:MapRoom,TajMahal:MainSanctuary |
| 2 | TajMahal:NorthGarden,WhiteHouse:GreenRoom |
I would like to create a table showing how many times each building is used in locations, like so:
| building | count |
+------------+-------+
| WhiteHouse | 3 |
| TajMahal | 2 |
The characters : and , are never used in building or room names. Even a quick-and-dirty solution that assumes that building names never appear in room names would be good enough for me.
Of course this would be easy to do in just about any sane programming language (total over something like /\bWhiteHouse:/); the trick will be getting RB to do it. Suggestions for workarounds are welcome.
it is possible to split locations string into pieces using the "," and ":" characters as seperators as follows in SQL Server with the help of a custom sql split function
select
p2.val,
count(p2.val)
from locations l
cross apply dbo.split(l.locations,',') p1
cross apply dbo.split(p1.val,':') p2
inner join building b
on b.building = p2.val
group by p2.val
I'm not sure there is a similar one in mysql, if so please check following solution as a template
You can try this, probably not the fastest, but certainly easier solution.
SELECT t1.building,
( SELECT SUM( ROUND( (LENGTH(t2.locations)
- LENGTH(REPLACE(t2.locations, concat(t1.building, ':'), ''))
) / (LENGTH(t1.building) + 1)
)
)
FROM table2 AS t2
) as count
FROM table1 as t1
SQL Fiddle Demo
i have a tbl_remit where i need to get the last remittance.
I'm developing as system wherein I need to get the potential collection of each Employer using the Employer's last remittance x 12. Ideally, Employers should remit once every month. But there are cases where an Employer remits again for the same month for the additional employee that is newly hired. The Mysql Statement that I used was this.
SELECT Employer, MAX(AP_From) as AP_From,
MAX(AP_To) as AP_To,
MAX(Amount) as Last_Remittance,
(MAX(Amount) *12) AS LastRemit_x12
FROM view_remit
GROUP BY PEN
Result
|RemitNo.| Employer | ap_from | ap_to | amount |
| 1 | 1 |2016-01-01 |2016-01-31 | 2000 |
| 2 | 1 |2016-02-01 |2016-02-28 | 2000 |
| 3 | 1 |2016-03-01 |2016-03-31 | 2000 |
| 4 | 1 |2016-03-01 |2016-03-31 | 400 |
By doing that statement, i ended up getting the wrong potential collection.
What I've got:
400 - Last_Remittance
4800 - LastRemit_x12 (potential collection)
What I need to get:
2400 - Last_Remittance
28800 - LastRemit_x12 (potential collection)
Any help is greatly appreciated. I don't have a team in this project. this may be a novice question to some but to me it's really a complex puzzle. thank you in advance.
You want to filter the data for the last time period. So, think where rather than group by. Then, you want to aggregate by employer.
Here is one method:
SELECT Employer, MAX(AP_From) as AP_From, MAX(AP_To) as AP_To,
SUM(Amount) as Last_Remittance,
(SUM(Amount) * 12) AS LastRemit_x12
FROM view_remit vr
WHERE vr.ap_from = (SELECT MAX(vr2.ap_from)
FROM view_remit vr2
WHERE vr2.Employer = vr.Employer
)
GROUP BY Employer;
EDIT:
For performance, you want an index on view_remit(Employer, ap_from). Of course, that assumes that view_remit is really a table . . . which may be unlikely.
If you want to improve performance, you'll need to understand the view.
I am trying to figure out best way to get the aggregate of a person's hours spent on a project name that follows a certain pattern
Current Tables
+--------------------+----------------+-----------------+
| Tbl_Employee | Tbl Projects | tbl_timesheet |
+--------------------+----------------+-----------------+
| employee_id | project_id | timesheet_id |
| employee_full_name | cws_project_id | employee_id |
| | | project_id |
| | | timesheet_hours |
+--------------------+----------------+-----------------+
Here is the query I have so far
select
te.employee_id,
te.employee_last_name,
te.employee_first_name,
te.employee_department,
te.employee_type_id,
te.timesheet_routing,
sum(tt.timesheet_hours) as total_hours,
month(tt.timesheet_date) as "month",
year(tt.timesheet_date) as "year"
from tbl_employee te
left join tbl_timesheet tt
on te.employee_id = tt.employee_id
join tbl_projects tp
on tp.project_id = tt.project_id
where te.employee_active = 1
and te.employee_id > 0
and employee_department IN ("Project Management","Engineering","Deployment Srvs.")
and year(tt.timesheet_date) = 2015
group by te.employee_last_name, year(tt.timesheet_date), month(tt.timesheet_date)
order by employee_last_name
What I need to add to my select statement is something to the effect of
sum(tt.timesheet_hours) as where cws_project_id like '%Training%' as training
In short I need to know the sum of hours an employee has contributed to a project where the cws_project_id contains the word Training. I know you cant add a where clause to a Sum but I cant seem to find another way to do it.
If this makes a difference I need to do this several times - ie where the project_name contains a different word.
Thank you so much for any help that can be provided. I hope that is not clear as mud.
Here is the general form of what you are looking for:
SELECT SUM(IF(x LIKE '%y%', z, 0)) AS ySum
even more general
SELECT SUM(IF([condition on row], [value or calculation from row], 0)) AS [partialSum]
Edit: For more RDBMS portability (earlier versions of MS SQL do not support this form of IF):
SELECT SUM(CASE WHEN [condition on row] THEN [value or calculation from row] ELSE 0 END) AS [partialSum]
products
+----+--------+
| id | title |
+----+--------+
| 1 | Apple |
| 2 | Pear |
| 3 | Banana |
| 4 | Tomato |
+----+--------+
product_variants
+----+------------+------------+
| id | product_id | is_default |
+----+------------+------------+
| 1 | 1 | 0 |
| 2 | 1 | 1 |
| 3 | 2 | 1 |
| 4 | 3 | 1 |
| 5 | 4 | 1 |
+----+------------+------------+
properties
+----+-----------------+-----------+
| id | property_key_id | value |
+----+-----------------+-----------+
| 1 | 1 | Yellow |
| 2 | 1 | Green |
| 3 | 1 | Red |
| 4 | 2 | Fruit |
| 5 | 2 | Vegetable |
| 6 | 1 | Blue |
+----+-----------------+-----------+
property_keys
+----+-------+
| id | value |
+----+-------+
| 1 | Color |
| 2 | Type |
+----+-------+
product_has_properties
+----+------------+-------------+
| id | product_id | property_id |
+----+------------+-------------+
| 1 | 1 | 4 |
| 2 | 1 | 3 |
| 3 | 2 | 4 |
| 4 | 3 | 4 |
| 5 | 3 | 4 |
| 6 | 4 | 4 |
| 7 | 4 | 5 |
+----+------------+-------------+
product_variant_has_properties
+----+------------+-------------+
| id | variant_id | property_id |
+----+------------+-------------+
| 1 | 1 | 2 |
| 2 | 1 | 3 |
| 3 | 2 | 6 |
| 4 | 3 | 4 |
| 5 | 4 | 1 |
| 6 | 5 | 1 |
+----+------------+-------------+
I need to query my DB so it selects products which have certain properties attached to the product itself OR have those properties attached to one of its related product_variants. Also should properties with the same properties.property_key_id be grouped like this: (pkey1='red' OR pkey1='blue') AND (pkey2='fruit' OR pkey2='vegetable')
Example cases:
Select all products with (color='red' AND type='vegetable'). This should return only Tomato.
Select all products with ((color='red' OR color='yellow') AND type='fruit') should return Apple and Banana
Please note that in the example cases above I don't really need to query by properties.value, I can query by properties.id.
I played around a lot with MySQL query's but the biggest problem I'm struggling with is the properties being loaded through two pivot tables. Loading them is no problem but loading them and combining them with the correct WHERE, AND and OR statements is.
The following code should give you what you're looking for, however you should note that your table currently has a Tomato listed as yellow and a vegetable. Obviously you want the Tomato as red and a Tomato is actually a fruit not a vegetable:
Select distinct title
from products p
inner join
product_variants pv on pv.product_id = p.id
inner join
product_variant_has_properties pvp on pvp.variant_id = pv.id
inner join
product_has_properties php on php.product_id = p.id
inner join
properties ps1 on ps1.id = pvp.property_id --Color
inner join
properties ps2 on ps2.id = php.property_id --Type
inner join
property_keys pk on pk.id = ps1.property_key_id or pk.id = ps2.property_key_id
where ps1.value = 'Red' and ps2.value = 'Vegetable'
Here is the SQL Fiddle: http://www.sqlfiddle.com/#!9/309ad/3/0
This is a convoluted answer, and it may be possible to do it in a far simpler way. However given that you seem to want to be able to query by color = xx and type = xx, we clearly need to have columns with those names, which as you've intimated, means we need to pivot the data.
Furthermore, since we want to get all the combinations of colours and types for each product, we need to perform a sort of cross join, to combine them.
This leads us to the query - first we get all the types for a product and its variants, then we join that to all the colours for a product and its variant. We use union to combine the product and variant properties in order to keep them all in the same column, rather than having multiple columns to check.
Of course all products may not have this information specified, so we use left joins all the way through. If it is guaranteed that a product will always have at least one colour, and at least one type - they can all be changed to inner joins.
Also, in your example you say tomato should have a colour of red, yet in the sample data you provide i'm sure the tomato has a colour of yellow.
Anyway, here's the query:
select distinct title from
(select q1.title, q1.value as color, q2.value as type from
(
select products.id, products.title, properties.value, properties.property_key_id
from products
left join product_has_properties
on products.id = product_has_properties.product_id
left join properties
on properties.id = product_has_properties.property_id and properties.property_key_id = 1
union
select product_variants.product_id, products.title, properties.value, properties.property_key_id
from product_variants
inner join products
on product_variants.product_id = products.id
left join product_variant_has_properties
on product_variants.id = product_variant_has_properties.variant_id
left join properties
on properties.id = product_variant_has_properties.property_id and properties.property_key_id = 1
) q1
left join
(
select products.id, products.title, properties.value, properties.property_key_id
from products
left join product_has_properties
on products.id = product_has_properties.product_id
left join properties
on properties.id = product_has_properties.property_id and properties.property_key_id = 2
union
select product_variants.product_id, products.title, properties.value, properties.property_key_id
from product_variants
inner join products
on product_variants.product_id = products.id
left join product_variant_has_properties
on product_variants.id = product_variant_has_properties.variant_id
left join properties
on properties.id = product_variant_has_properties.property_id and properties.property_key_id = 2
) q2
on q1.id = q2.id
where q1.value is not null or q2.value is not null
) main
where ((color = 'red' or color = 'yellow') and type = 'fruit')
And here's a demo: http://sqlfiddle.com/#!9/d3ded/76
If you were to get more types of property, in addition to colour and type, the query would need to be modified - sorry but that's pretty much what you're stuck with, trying to pivot in mysql
I think that you make unnecessary complications for your data model, your code and your queries.
Those eventually will be a performance killer for your application.
Your best solution is to consider an easier approach.
Try to flatten your data structure so you will not have such dependencies.
I don't know what exactly product_variants mean so I can't tell exactly how to do the change.
But the main idea is to save the properties always for each variant.
When you have only 1 variant - define it as a variant too.
And I suggest you to make the properties table to reference the exact variant instead of having global numbering with referencing tables in the structure of:
+----+-----------------+-------------+-----------+
| id | property_key_id | variant_id| value |
+----+-----------------+-------------+-----------+
| 1 | 1 | 1 | Yellow |
| 2 | 1 | 1 | Green |
| 3 | 1 | 1 | Red |
| 4 | 2 | 1 | Fruit |
| 5 | 2 | 2 | Vegetable |
| 6 | 1 | 2 | Blue |
| 7 | 1 | 2 | Yellow |
+----+-----------------+-------------+-----------+
If this approach - you will have duplicate values, but all your queries will be simpler and you will have the freedom to save the values that you want for each specific product variant.
UPDATE
If you have no option to change the structure of the data, "LEFT OUTER JOIN" is your only hope.
Check the below query that selects the ones with color 'Yellow'
select p.* from products p
left outer join product_has_properties pp
on p.id=pp.product_id
left outer join product_variants v
on p.id=v.product_id
left outer join product_variant_has_properties vp
on v.id = vp.variant_id
where vp.property_id=1 or pp.property_id=1;
Considering products and not variants, you can simulate this (at least to some extent) with joins so that you
substitute each OR in your query with an equivalent condition in the WHERE clause. E.g. to have (color='red' OR color='yellow'),
SELECT product_id FROM product_has_properties
WHERE property_id IN (1, 3)
substitute each AND in your query with a self-join and a condition in the WHERE clause. This should yield rows that correspond to products that have the pair of properties in question. E.g.to have (color='red' AND type='vegetable'),
SELECT p1.product_id
FROM product_has_properties p1
INNER JOIN product_has_properties p2 ON (p1.product_id = p2.product_id)
WHERE p1.property_id = 3 AND p2.property_id = 5
Obviously this gets complicated as the number of conditions grows. To get ((color='red' OR color='yellow') AND type='fruit'), you would need to do
SELECT p1.product_id
FROM product_has_properties p1
INNER JOIN product_has_properties p2 ON (p1.product_id = p2.product_id)
WHERE (p1.property_id = 1 OR p1.property_id = 3) AND p2.property_id = 4
Assuming that some fruit could be both blue and red, to get pkey1='red' AND pkey1='blue' AND pkey2='fruit', you'd have to do
SELECT p1.product_id
FROM product_has_properties p1
INNER JOIN product_has_properties p2 ON (p1.product_id = p2.product_id)
INNER JOIN product_has_properties p3 ON (p1.product_id = p3.product_id)
WHERE p1.property_id = 3 AND p2.property_id = 6 AND p3.property_id = 4
There might be some case which isn't covered by this approach, though.
Short answer
I'm going to throw out a bit of a different answer to the ones you've been getting. While it is very possible to have a purely SQL answer to this, the question I would pose to you is: Why?
That answer will determine your next step.
If your answer is to try to learn the pure SQL way to do it, there are some great answers here which get you most if not all of the way there.
If your answer is to create scalable dynamic queries for an end application, then you may find your job eased by leaning on your programming language.
A little personal background
I had a requirement to pivot data with more tables. I was determined I'd try to do this the best possible way, and I spent a lot of time working out what was best for my application. Knowing full well this may not be the same experience you have, I will share my experience here in case it helps you.
I tried to create pure SQL solutions, which did work for specific use cases but required extensive tweaking for each additional use case. When I tried to scale the queries up I first attempted to create Stored Procedures. That was a nightmare and pretty early on in my development I realized it would be a headache to maintain.
I went on to use PHP and create my own query generating. While some of this code has morphed into something that is quite useful for me today, I learned that much of it was going to be challenging to maintain unless I created service libraries. At that point, I realized I was basically going to be creating an Object-relational Mapper (ORM). Unless my application was SO special and SO unique that no ORM on the market could come close to doing what I wanted, then I needed to take that opportunity to explore employing an ORM for my application. Despite my initial reservations which caused me to do everything BUT look at an ORM, I have started using one and it helped my development speed increase significantly.
Reaching your desired end result
Select all products with (color='red' AND type='vegetable'). This should return only Tomato.
Select all products with ((color='red' OR color='yellow') AND type='fruit') should return Apple and Banana
This is possible in an ORM. What you're describing is only loosely defined in your SQL but is in fact perfectly summarized in OOP. This is what it would look like in PHP, just as an example.
<?
Abtract class AbstractProductType {
public function __construct() {
}
}
class Color extends AbstractProductType {
}
class Yellow extends Color {
}
class Red extends Color {
}
class Type extends AbstractProductType {
}
class Vegetable extends Type {
}
class Fruit extends Type {
}
class Product {
public function setColor(Color $color) {
//
}
public function setType(Type $type) {
//
}
}
$product = new Product();
$product->setColor(new Red());
$product->setType(new Fruit());
$result = $product->find();
?>
The idea behind this is that you can make full use of SQL in object oriented programming.
A slightly lower-key version of this would be to create a class which generates SQL snippets. My personal experience was that that's a lot of work for a limited payback. If your project is going to remain relatively small, it may work out just fine. However, if you antiicpate that your project will grow, then an ORM may well be worth exploring.
Conclusion
Although I am not sure what language you will be utilizing to query and manipulate your data, there are great ORMs out there which should not be discounted. Despite their many cons (you can find a lot of debate about this all over the internet), I am a reluctant believer that, although certainly not ideal for all situations, they should be considered for some. If this is not one of those situations for you, be prepared to write lots of JOINs yourself. When referencing a table n times and requiring a reference back to the table, the only method I am aware of to add a reference is to create n JOINs.
I'll be very interested to see if there is a better way, of course!
Conditional Aggregation
You can use conditional aggregation in your having clause to see if a product has specific properties. For example, to query all products that have both the "type vegetable" and "color red" properties.
You have to group by both the product id and the product variant id in order to make sure that all the properties you're searching for exist on the same variant or the product itself.
select p.id, pv.id from products p
left join product_has_properties php on php.product_id = p.id
left join properties pr on pr.id = php.property_id
left join property_keys pk on pk.id = pr.property_key_id
left join product_variants pv on pv.product_id = p.id
left join product_variant_has_properties pvhp on pvhp.variant_id = pv.id
left join properties pr2 on pr2.id = pvhp.property_id
left join property_keys pk2 on pk2.id = pr2.property_id
group by p.id, pv.id
having (
count(case when pk.value = 'Color' and pr.value = 'Red' then 1 end) > 0
and count(case when pk.value = 'Type' and pr.value = 'Vegetable' then 1 end) > 0
) or (
count(case when pk2.value = 'Color' and pr2.value = 'Red' then 1 end) > 0
and count(case when pk2.value = 'Type' and pr2.value = 'Vegetable' then 1 end) > 0
)
What was the question? (I read through the post several times, and I'm still failing to see any actual question that is being asked.) A lot of the answers here seem to be answering the question "What SQL statement would return a result from these tables?" My answer doesn't provide an example or a "how to" guide to writing SQL. My answer addresses a fundamentally different question.
The difficulty that OP is experiencing writing SQL against the tables shown in the "question" is due to (what I refer to as) the "impedance mismatch" between the "Relational" model and the "Entity-Attribute-Value" (EAV) model.
SQL is designed to work with the "Relational" model. Each instance of an entity is represented as a tuple, stored a row in table. The attributes of an entity are stored as values in columns of the entity row.
The EAV model differs significantly from the Relational model. It moves attribute values off of the entity row, and moves them into multiple, separate rows in other tables. And that makes writing queries more complicated, if the queries are attempting to emulate queries against a "Relational" model by transforming the data from the "EAV" representation back into a "Relational" representation.
There's a couple of approaches to writing SQL queries against the EAV model that emulate the results returned from a Relational model (as demonstrated by the example SQL provided in other answers to this "question".
One approach is to use subqueries in the SELECT list to return values of attributes as columns in the entity row.
Another approach is to perform joins between the row in the entity table to the rows in the attribute table(s), and use a GROUP BY to collapse the rows, and in the SELECT list, use conditional expressions "pick out" the value to be returned for a column.
There's lots of examples of both of those approaches. And neither is really better than the other, the suitability of each approach really depends on the particular use case.
While it is possible to write SQL queries against the EAV-style tables shown, those queries are an order of magnitude more complicated than equivalent queries against data stored in a "relational" model.
A result returned by a trivial query in the relational model, e.g.
SELECT p.id
FROM product p
WHERE p.color = 'red'
To return that same set from data in the EAV model requires a much more complex SQL query, involving joins of several tables and/or subqueries.
And once we move beyond the trivial query, to a query where we want to return attributes from multiple related entities... as a simple example, return information about orders in the past 30 days for products that were 'red'
SELECT c.customer_name
, c.address
, o.order_date
, p.product_name
, l.qty
FROM customer c
JOIN order o ON ...
JOIN line_item l ON ...
JOIN product p ON ...
WHERE p.color = 'red'
AND o.order_date >= DATE(NOW()) + INTERVAL 30
getting that same result, using SQL, from the EAV model is way more convoluted and confusing, and can be an excruciating exercise in frustration.
Certainly, it's possible to write the SQL. And once we do manage to get SQL statements that work to return a "correct" resultset, when the number of rows in the tables scale up beyond the trivial demonstration, up to the kind of volumes we expect databases to handle... the performance of those queries is horrendous (as compared to queries returning the same results from a traditional Relational model).
(And we've not even touched on the additional complexity for just adding and updating the attributes of entities, enforcing referential integrity between entities, etc.)
But why would we want to do that? Why do we need (or want) to write SQL statements against the EAV model tables that emulate the results returned from queries against Relational model tables?.
Bottom line, if we are going to use an EAV model, we are much better off not attempting to use a single SQL statement to return results like we'd get back from a query of a "Relational" model.
The problem of retrieving information from the EAV model is much more suited to a programming language that is object-oriented, and provides a framework. Something that is entirely lacing in SQL.