I have two tables that look like this:
Table cars
+--------+-----------+-------+---------+
| car_id | attribute | value | brand |
+--------+-----------+-------+---------+
| 1 | colore | rosso | Ferrari |
| 1 | prezzo | 100 | Ferrari |
| 2 | couleur | bleu | Renault |
| 2 | prix | 50 | Renault |
| 3 | colore | blu | Ferrari |
| 3 | prezzo | 100 | Ferrari |
+--------+-----------+-------+---------+
Table translations
+--------------------+----------------+---------+----------------------+------------------+
| original_attribute | original_value | brand | translated_attribute | translated_value |
+--------------------+----------------+---------+----------------------+------------------+
| colore | rosso | Ferrari | color | red |
| prezzo | 100 | Ferrari | price | 100 |
| colore | blu | Ferrari | color | blue |
| couleur | bleu | Renault | color | blue |
| prix | 50 | Renault | price | 50 |
+--------------------+----------------+---------+----------------------+------------------+
I am trying to get to a table that looks like this:
+-------------------+-------+-------------+--------------------+
| translated_car_id | color | price | translated_brand |
+-------------------+-------+-------------+--------------------+
| 1 | red | 100 | Ferrari |
| 2 | blue | 50 | Renault |
| 3 | blue | 100 | Ferrari |
+-------------------+-------+-------------+--------------------+
At the moment, I am using below code. It works, but is excruciatingly slow.
SELECT
car_id translated_car_id,
MAX(CASE
WHEN translations.translated_attribute = 'color' THEN translations.translated_value
END) color,
MAX(CASE
WHEN translations.translated_attribute = 'price' THEN translations.translated_value
END) price,
brand translated_brand
FROM
cars c
INNER JOIN
translations ON (c.attribute = translations.original_attribute
AND c.brand = translations.brand
AND c.value = relations.original_value)
GROUP BY c.car_id
Anybody have an idea on how to make the query or the structure more efficient? Would really appreciate it.
Thanks in advance
From what I can find MySql doesn't have hash matches, which would be useful here. So I'm assuming that everything is being done as nested loop joins.
My concern is that because there isn't an index on translations, for every row in cars it has to scan the translations table to find the matching rows.
I would recommend a clustered index on translations(original_value, brand, original_attribute). (ms-sql does best with most specific first everything else being equal, not sure about MySql) That way it can go right to the row that it needs to match to. That should allow queries with one car to be done quickly.
If you can reduce the translations from manufacturer to language that would certainly help with the size of the translation table, but you would have to be sure that it works with your dataset because it does take away a level of flexibility.
I think that MySql will be able to use the index on cars to process the GROUP BY efficiently, but I will propose a more normalized schema where you don't need the group by.
car
-------------
car_id
brand
car_attribute
------------
car_id
attribute
value
clustered_index(car_id, attribute)
brand_attribute
------------
brand_attribute_id
brand
attribute
translated_attribute
clustered_index(brand, translated_attribute)
brand_attribute_value
------------
brand_attribute_id
value
translated_value
clustered_index(brand_attribute_id, value)
And here's the query to do it that way.
SELECT
car.car_id,
color_brand_value.translated_value AS color,
price_brand_value.translated_value AS price,
car.brand
FROM
car
INNER JOIN brand_attribute AS color_brand
ON color_brand.brand = car.brand
AND color_brand.translated_attribute = 'color'
INNER JOIN car_attribute AS color_attribute
ON color_attribute.car_id = car.car_id
AND color_attribute.attribute = brand.attribute
INNER JOIN brand_attribute_value AS color_brand_value
ON color_brand_value.brand_attribute_id = color_brand.brand_attribute_id
AND color_brand_value.value = color_attribute.value
INNER JOIN brand_attribute AS price_brand
ON price_brand.brand = car.brand
AND price_brand.translated_attribute = 'price'
INNER JOIN car_attribute AS price_attribute
ON price_attribute.car_id = car.car_id
AND price_attribute.attribute = brand.attribute
INNER JOIN brand_attribute_value AS price_brand_value
ON price_brand_value.brand_attribute_id = price_brand.brand_attribute_id
AND price_brand_value.value = price_attribute.value
It's definitely more complicated to do it this way. With your situation I'm not sure it's better, but something to consider if the first option isn't good enough.
My background is in mssql, so there may be differences that I'm not aware of. Please leave comments if I've missed or got something wrong.
Related
I'm using MySQL 5.7 and I'm trying to do a join with one of my source tables to a reference table in order to get the appropriate corresponding values. However, I'd like the join to be conditional so it can match according to the length of the value found in the source column.
Source Table
|---------------------|------------------|
| Company_Name | NAICS_Code |
|---------------------|------------------|
| Chem Inc | 325 |
|---------------------|------------------|
| Joe's Farming | 1112 |
|---------------------|------------------|
Reference Table
|---------------------|------------------|--------------------|------------------|
| NAICS_Code_3_Digit | NAICS_Code_ | NAICS_Code_4_Digit | NAICS_Cod_ |
| | 3D_Description | | 4D_Description |
|---------------------|------------------|--------------------|------------------|
| 325 | Chemicals | 3252 | Resin and Rubber|
|---------------------|------------------|--------------------|------------------|
| 111 | Crop Production | 1112 | Fruit and Nuts |
|---------------------|------------------|----------------------------------------
Final Table
|---------------------|------------------|------------------|--------------------|
| Company_Name | NAICS_Code | NAICS_Code_3D_ | NAICS_Code_4D |
| | | Description | Description |
|---------------------|------------------|---------------------------------------|
| Chem Inc | 325 | Chemicals | NULL |
|---------------------|------------------|------------------|--------------------|
| Joe's Farming | 1112 | Crop Production | Fruit and Nuts |
|---------------------|------------------|------------------|--------------------|
While I'm able to write a query that works, it takes an extremely long time and I' curious as to if there is a better way. Here's what I got so far:
SELECT src.Company_Name,
src.NAICS_Code,
CASE
WHEN LENGTH(src.NAICS_Code < 3 THEN NULL
ELSE ref.NAICS_Code_3D_Description
END AS NAICS_Code_3D_Description,
CASE
WHEN LENGTH(src.NAICS_Code < 4 THEN NULL
ELSE ref.NAICS_Code_4D Description
END AS NAICS_Code_4D_Description
FROM source_table AS src
LEFT JOIN reference_table AS ref ON CASE
WHEN LENGTH(src.NAICS_Code) = 4
AND src.NAICS_Code = ref.NAICS_Code_4_Digit THEN 1
WHEN LENGTH(src.NAICS_Code) = 3
AND src.NAICS_Code = ref.NAICS_Code_3_Digit THEN 1
ELSE 0
END = 1;
It might be more efficient to left join twice:
this avoids the need for the complicated logic in the on clause of the join
conditions are exclusive so it will not generate duplicates in the resultset
then you can use coalesce() in the select clause
So:
select
s.compay_name,
s.naics_code,
coalesce(r1.naics_code_3d_description, r2.naics_code_3d_description) naics_code_3d_description,
r2.naics_code_4d_description
from source_table s
left join reference_table r1 on r1.naics_code_3_digit = s.naics_code
left join reference_table r2 on r2.naics_code_4_digit = s.naics_code
If you want to evict source rows that did not match in the reference table, you can add a where clause, like:
where r1.naics_code_3_digit is not null or r2.naics_code_3d_description is not null
Cars
|Vin |Make |Year|
| 1 |Honda|2009|
| 2 |Honda|2010|
| 3 |Honda|2009|
| 4 |Toyota|2009|
Colors
|Color|Make |Vin|
| Red |Honda|1|
|Blue |Toyota|4|
|Pink |Honda|2|
Financer
|Bank|Make |
| BOA |Honda|
| Cha |Toyota|
| PNC |Jeep|
A query
SELECT
car.Vin,
colors.Color,
financer.Bank
from Car
JOIN Colors ON car.vin = colors.vin
AND car.make = colors.make
JOIN financer ON car.make = financer.make
WHERE car.make = 'Honda'
Trying to join these tables but getting more rows than expected. My financer table above doesn't have a column that's unique.
Is there a way that I can do an ALTER TABLE to make a primary key unique by combining 2 columns?
Let's take a step back from the RDBMS details, and consider what it means that your financer table has no unique column. What does that mean in the world of your application?
It might mean that a particular Make has more than one Bank that gives loans, in which case there's some kind of choice, or alternative, embedded in that table.
It might mean there are completely duplicated rows.
Here are examples of both circumstances.
|Bank |Make |
| BOA |Honda |
| TD |Honda | 1. more than one bank finances Honda
| Cha |Toyota |
| PNC |Jeep |
| PNC |Jeep | 2. entirely duplicate row
This version of the financer table generates this result set (http://sqlfiddle.com/#!9/27705/2/0). It has, as you mention plenty of duplicates
| Vin | Color | Bank |
|-----|-------|------|
| 1 | Red | BOA |
| 2 | Pink | BOA |
| 1 | Red | TD |
| 2 | Pink | TD |
| 4 | Blue | Cha |
You can prevent the completely duplicated rows by creating a composite primary key, consisting of both columns. That probably makes sense in your application's world.
If you want a 1::1 relationship Make::Bank, you can put a primary key on Make. But I suspect that won't model your application's world correctly. (http://sqlfiddle.com/#!9/27705/5/0)
Maybe you want this query:
SELECT cars.Vin, cars.Make,
GROUP_CONCAT(DISTINCT colors.Color) Colors,
GROUP_CONCAT(DISTINCT financer.Bank) Banks
from Cars
JOIN Colors ON cars.vin = colors.vin
AND cars.make = colors.make
JOIN financer ON cars.make = financer.make
GROUP BY cars.Make, cars.vin
This combination of GROUP_CONCAT() and GROUP BY yields a list of matching colors, and a list of matching banks, for each car.
| Vin | Make | Colors | Banks |
|-----|--------|--------|--------|
| 1 | Honda | Red | TD,BOA |
| 2 | Honda | Pink | TD,BOA |
| 4 | Toyota | Blue | Cha |
As you can see, each of the Honda cars has a list of two banks.
I have a table in mysql in this structure
table: member
| Id | Name | Lastname | Username
| --------------------------------
| 1 | Alexi| Lalas | alexi
| 2 | Jack | Louis | louis
And I have a table called member images with this structre:
table: image
| Id | MemberId | Image | Type |
|------------------------------------|
| 50 | 1 | face.jpg |Avetar |
| 51 | 1 | image.jpg |Gallery|
| 52 | 2 | main.jpg |Avetar |
| 53 | 2 | jungle.jpg |Gallery|
And I want to get this result
| Id | Name | Lastname | Username | Image1 | Image2 |
|-------------------------------------------------------|
| 1 | Alexi| Lalas | alexi |face.jpg |image.jpg |
| 2 | Jack | Louis | louis |main.jpg |jungle.jpg|
Becuase of some reasons I can't handle is on app side and I have to do it on sql side.
Imagin that I always have 2 type of images and we always have Image1 and Image2.
Any help would be appritiated.
Since they are only two types, you can use the CASE expression to do so. Something like this:
SELECT
m.Id,
m.Name,
m.LastName,
m.UserName,
MAX(CASE WHEN i.Type = 'Avetar' THEN i.Image END) AS 'Image1',
MAX(CASE WHEN i.Type ='Gallery' THEN i.Image END) AS 'Image2'
FROM member AS m
LEFT JOIN image AS i ON m.Id = i.MemberId
GROUP BY m.Id,
m.Name,
m.LastName,
m.UserName;
Note that: LEFT JOIN will include all the members from the member table even if they have no images in the image table, in this case NULL will be returned.
See it in action here:
SQL Fiddle Demo
The way to do this is via a subselect (subquery). You would subselect the first column as the image for the user withe Avetar type and the other column subselect would be for the image with the type Gallery. If you can't find a way to make User ID and Type unique in your sub table then you are going to have issues with this type of query. Keep in mind that a subselects can impact performance heavily.
http://dev.mysql.com/doc/refman/5.0/en/subqueries.html
Sorry I couldn't think of a better way to title this. In Amazon's SimpleDB, an item can have multiple values in the same column, so it's possible to select only those items that have all of the attributes being sought.
In MySQL, let's say the following table ("Photo_Attributes") contains an unlimited number of attributes for photographs that are contained in another table ("Photos"), and that the two tables are joined by Item_Number.
And, let's say I wanted to find a hat whose color was red and size was medium, which in this case would be ITEM_ID "ABC" and not "OPQ".
+-----+----------+--------+-----------+-------+
| ID | Item_ID | Object | Attribute | Value |
+-----+----------+--------+-----------+-------+
| 1 | ABC | Hat | Color | Red |
+-----+----------+--------+-----------+-------+
| 2 | FGH | Pants | Color | Blue |
+-----+----------+--------+-----------+-------+
| 3 | FGH | Pants | Size | Large |
+-----+----------+--------+-----------+-------+
| 4 | LMN | Shirt | Color | Red |
+-----+----------+--------+-----------+-------+
| 5 | ABC | Hat | Size | Med |
+-----+----------+--------+-----------+-------+
| 6 | LMN | Shirt | Size | Med |
+-----+----------+--------+-----------+-------+
| 7 | OPQ | Hat | Color | White |
+-----+----------+--------+-----------+-------+
| 8 | OPQ | Hat | Size | Med |
+-----+----------+--------+-----------+-------+
The following query would yield no results because each row contains only one Attribute and one Value.
SELECT FROM Photo_Attributes WHERE OBJECT='hat' AND (Attribute='Color" AND
Value='Red") AND (Attribute='Size' AND Value='Med');
And, this query would produce more rows than it should (i.e., all red and all medium-sized items).
SELECT FROM Photo_Attributes WHERE OBJECT='hat' AND (Attribute='Color" AND
Value='Red") OR (Attribute='Size' AND Value='Med');
What's the best way to write this -and- is there a way to do it without using JOIN in the SELECT statement? I'm wondering the latter because the query would be programmatically generated (in nodejs) and the number of Attribute-Value pairs could range from one to several. I figure I could also use nested queries, culling from recordset, but that seems equally inefficient.
SELECT pa1.Item_ID
FROM Photo_Attributes pa1
INNER JOIN Photo_Attributes pa2
ON pa1.Item_ID = pa2.Item_ID
AND pa2.Attribute = 'Size'
AND pa2.Value = 'Med'
WHERE pa1.Object = 'Hat'
AND pa1.Attribute = 'Color'
AND pa1.value = 'Red'
Assuming no overlap in your name/value pairs (e.g., You'd never have Size/Red or Color/Med), you could probably also do something like this.
SELECT pa.Item_ID
FROM Photo_Attributes pa
WHERE pa.Object = 'Hat'
AND pa.Attribute IN ('Size', 'Color')
AND pa.Value IN ('Med', 'Red')
GROUP BY pa.Item_ID
HAVING COUNT(DISTINCT Value) = 2
I've got the following schema:
phrase (in U.S. English): translation: code_value:
---------------------- ----------------------------------------- ------------------------------
| phrase_id | phrase | | phrase_id | translation | language_cd | | code class | code value |
---------------------- ----------------------------------------- ------------------------------
| 1 | cheese | | 1 | fromage | FR | | LANGUAGE_CD | FR |
---------------------- ----------------------------------------- | LANGUAGE_CD | DE |
| LANGUAGE_CD | ES |
------------------------------
What this collection of data does is, for a given U.S. English phrase, it will give you the corresponding translation in three languages, French, German, and Spanish (at least within the context of our web app - we're not trying to be be Google Translation or anything).
What I'm trying to do is get a list of all translations for a given phrase, and if no translation into a given destination language exists, I want it to return NULL.
My query so far is:
SELECT phrase.phrase_id, phrase.string orig_phrase, code_value.code_value, translation.string as trans_phrase
FROM phrase, translation, code_value
WHERE code_value.code_class = 'LANGUAGE_CD' AND translation.phrase_id = phrase.phrase_id
ORDER BY orig_phrase;
Which returns:
-------------------------------------------------------
| phrase_id | orig_phrase | code_value | trans_phrase |
-------------------------------------------------------
| 1 | cheese | FR | fromage |
| 1 | cheese | DE | fromage |
| 1 | cheese | ES | fromage |
-------------------------------------------------------
But what I intend for it to return is:
-------------------------------------------------------
| phrase_id | orig_phrase | code_value | trans_phrase |
-------------------------------------------------------
| 1 | cheese | FR | fromage |
| 1 | cheese | DE | <NULL> |
| 1 | cheese | ES | <NULL> |
-------------------------------------------------------
I know I need a LEFT or RIGHT JOIN in there to get the NULL values back, but I can't seem to figure out exactly how to write it.
When this is all said and done, obviously we'll have the German and Spanish translations in there as well, and I need them to match up, but the purpose of the NULL values is to clearly show us into what languages we have yet to translate a given phrase.
SELECT phrase.phrase_id, phrase.string orig_phrase, code_value.code_value, translation.string as trans_phrase
FROM phrase
inner join translation on (translation.phrase_id = phrase.phrase_id)
left join code_value on (code_value.code_value=translation.language_cd)
WHERE code_value.code_class = 'LANGUAGE_CD'
ORDER BY orig_phrase;
You are right that you'll need a LEFT or a RIGHT join. This should work:
SELECT `p`.`phrase_id`, `p`.`phrase` `orig_phrase`, `c`.`code_value`, `t`.`translation` `trans_phrase`
FROM `phrase` `p`
INNER JOIN (
SELECT `code_value`
FROM `code_value`
WHERE `code_class` = 'LANGUAGE_CD'
) `c`
LEFT JOIN `translation` `t` ON `p`.`phrase_id` = `t`.`phrase_id` AND `c`.`code_value` = `t`.`language_cd`;
Hope this helps.