I need to join two tables, and append columns where certain data matches. The two tables are setup like so:
Apple | Flavor
Fuji | 1
Red Delicious | 2
Flavor | Quality
1 | Sour
1 | Sweet
2 | Bitter
2 | Sweet
And I need to append a column for each matching flavor quality like so:
Apple | Quality 1 | Quality 2
Fuji | sour | sweet
Red Delicious | bitter | sweet
Notice I also need append an auto-integer to each column header appended.
This seems like a tall task and I'm not sure where to start? Is this even possible? Thanks for the help in advance.
You can group related rows together and concatenate the Quality values for the respective group. This produces one column, a string with comma-separated values. See GROUP_CONCAT().
SELECT a.Apple, GROUP_CONCAT(f.Quality) AS Qualities
FROM apples a JOIN flavors f ON a.flavor = f.flavor
GROUP BY a.Apple;
This doesn't do exactly what you asked, because it doesn't separate the Quality values in to separate columns. But it's a pretty simple query, and you can explode() the string of qualities after you fetch it back into your application.
You can't have a variable number of columns like that. Use one query to fetch the apples, then loop over the result set to fetch their qualities. You can merge the result sets together using your server-side scripting language if you like.
Against what Mark said, you can have a variable number of columns. Though I'm demonstrating here just how nonsense it actually is.
Usually you would have a single statement like
SELECT * FROM apples a JOIN flavors f ON a.flavor = f.flavor;
then you'd do little voodoo on application level and that's it. Here's how this simple query would look like to get a static number of columns:
select
Apple,
max(case when my_quality_numbering = 1 then Quality else null end) as Quality1,
max(case when my_quality_numbering = 2 then Quality else null end) as Quality2
from (
select
a.Apple,
f.Quality,
case when (if(#prev_apple != a.Apple, #quality_no := 1, #quality_no := #quality_no + 1)) is null then null
when (#prev_apple := a.Apple) is null then null
else #quality_no end as my_quality_numbering
from
apples a
join flavors f on a.Flavor = f.Flavor
, (select #quality_no := 0, #prev_apple := NULL) v
order by a.Apple
) sq
group by Apple
see it working live in an sqlfiddle
And this is only half the work. To have variable number of columns, you'd have to write a stored procedure, that reads from the inner query, how much columns are needed. Then you'd have to build your query string for the outer query and execute all that via a prepared statement. Pretty much work for a simple query. Again, I did above query just for fun, but in general it's really not worth the trouble.
Related
Currently the database looks like this: (product IDs with name value pairs)
id, attribute_name, attribute_value
1, Clockspeed, 1.6Ghz
1, Screen, 13.3"
2, Clockspeed, 1.8Ghz
2, Screen, 15.1"
I would like to convert the above data to the following format (separated by product ID, with only one line per id) for migrating to a new platform.
id, Clockspeed, Screen
1, 1.6Ghz, 13.3"
2, 1.8Ghz, 15.1"
What is the easiest way to achieve this result? My gut tells me this is going to be done with the concat or group_concat function but I need a point in the right direction, going bald from pulling my hair out.
This points out one of the problems with the entity-attribute-value database design.
There are two methods to use SQL to pivot the attributes into columns, as though you had stored the data in a conventional table:
SELECT id, MAX(CASE attribute_name WHEN 'Clockspeed' THEN attribute_value END) AS Clockspeed,
MAX(CASE attribute_name WHEN 'Screen' THEN attribute_value END) AS Screen
FROM eav_table
GROUP BY id;
SELECT id, c.attribute_value AS Clockspeed, s.attribute_value AS Screen
FROM eav_table AS c
JOIN eav_table AS s USING(id)
WHERE c.attribute_name = 'Clockspeed' AND s.attribute_name = 'Screen'
Output of both queries after testing on MySQL 5.6:
+------+------------+--------+
| id | Clockspeed | Screen |
+------+------------+--------+
| 1 | 1.6GHz | 13.3" |
| 2 | 1.8GHz | 15.1" |
+------+------------+--------+
The latter solution requires N-1 joins to output N attributes. It doesn't scale well.
Both of the above solutions require that you write quite a bit of application code to format the SQL query, according to the number of attributes you want to fetch. And that means if the number of attributes varies (which is likely because that's one of the primary advantages of using EAV), then it's possible to fetch too many attributes for the query to have good performance.
Another solution is to forget about pivoting the data using only SQL. Instead, fetch the rows of data base to your application one attribute per row, as they are stored in the database. Then write application code to post-process the results into one object.
For simplicity, let's say we have a table with two columns: uid (user id) and fruit, describing what kinds of fruit a user likes.
E.g.:
uid | fruit
----|------------
1 | Strawberry
1 | Orange
2 | Strawberry
2 | Banana
3 | Watermelon
and so forth.
If I want to find what kinds of fruit are common in N particular users (i.e. the intersection N times of the table with itself), the first option is to use an INNER JOIN.
SELECT DISTINCT fruit FROM Fruits f1
INNER JOIN Fruits f2 USING (fruit)
INNER JOIN Fruits f3 USING (fruit)
...
INNER JOIN Fruits fN USING (fruit)
WHERE f1.uid = 1 AND f2.uid = 2 ... AND fN.uid = M
But this kinds of looks silly to me. What if N = 10? or even 20? Is it sensible to do 20 joins? Is there some other join operation I'm missing?
Before learning the "magic" of joins, I used another method, which would apply in my current case as follows:
SELECT DISTINCT fruit FROM Fruits
WHERE uid IN (1, 2, ..., M)
GROUP BY fruit
HAVING COUNT (*) = N
It seems much more compact, but I remember somebody telling me to avoid using GROUP BY because it is slower than an INNER JOIN.
So, I guess my question really is, is there maybe a third method for doing the above? If yes/no, which one is the most efficient?
-- EDIT --
So, it seems a question has been asked before, bearing a resemblance to mine. The two answers provided, are actually the two methods I'm using.
But the question remains. Which one is really more efficient? Is there, maybe, a third one?
I have a table like this
id | user_id | code | type | time
-----------------------------------
2 2 fdsa r 1358300000
3 2 barf r 1358311000
4 2 yack r 1358311220
5 3 surf r 1358311000
6 3 yooo r 1358300000
7 4 poot r 1358311220
I want to get the concatenated 'code' column for user 2 and user 3 for each matching time.
I want to receive a result set like this:
code | time
-------------------------------
fdsayooo 1358300000
barfsurf 1358311000
Please note that there is no yackpoot code because the query was not looking for user 4.
You can use GROUP_CONCAT function. Try this:
SELECT GROUP_CONCAT(code SEPARATOR '') code, time
FROM tbl
WHERE user_id in (2, 3)
GROUP BY time
HAVING COUNT(time) = 2;
SQL FIDDLE DEMO
What you are looking for is GROUP_CONCAT, but you are missing a lot of details in your question to provide a good example. This should get you started:
SELECT GROUP_CONCAT(code), time
FROM myTable
WHERE user_id in (2, 3)
GROUP BY time;
Missing details are:
Is there an order required? Not sure how ordering would be done useing grouping, would need to test if critical
Need other fields? If so you will likely end up needing to do a sub-select or secondary query.
Do you only want results with multiple times?
Do you really want no separator between values in the results column (specify the delimiter with SEPARATOR '' in the GROUP_CONCAT
Notes:
You can add more fields to the GROUP BY if you want to do it by something else (like user_id and time).
I think it will be easiest to start with the table I have and the result I am aiming for.
Name | Date
A | 03/01/2012
A | 03/01/2012
B | 02/01/2012
A | 02/01/2012
B | 02/01/2012
A | 02/01/2012
B | 01/01/2012
B | 01/01/2012
A | 01/01/2012
I want the result of my query to be:
Name | 01/01/2012 | 02/01/2012 | 03/01/2012
A | 1 | 2 | 2
B | 2 | 2 | 0
So basically I want to count the number of rows that have the same date, but for each individual name. So a simple group by of dates won't do because it would merge the names together. And then I want to output a table that shows the counts for each individual date using php.
I've seen answers suggest something like this:
SELECT
NAME,
SUM(CASE WHEN GRADE = 1 THEN 1 ELSE 0 END) AS GRADE1,
SUM(CASE WHEN GRADE = 2 THEN 1 ELSE 0 END) AS GRADE2,
SUM(CASE WHEN GRADE = 3 THEN 1 ELSE 0 END) AS GRADE3
FROM Rodzaj
GROUP BY NAME
so I imagine there would be a way for me to tweak that but I was wondering if there is another way, or is that the most efficient?
I was perhaps thinking if the while loop were to output just one specific name and date each time along with the count, so the first result would be A,01/01/2012,1 then the next A,02/01/2012,2 - A,03/01/2012,3 - B,01/01/2012,2 etc. then perhaps that would be doable through a different technique but not sure if something like that is possible and if it would be efficient.
So I'm basically looking to see if anyone has any ideas that are a bit outside the box for this and how they would compare.
I hope I explained everything well enough and thanks in advance for any help.
You have to include two columns in your GROUP BY:
SELECT name, COUNT(*) AS count
FROM your_table
GROUP BY name, date
This will get the counts of each name -> date combination in row-format. Since you also wanted to include a 0 count if the name didn't have any rows on a certain date, you can use:
SELECT a.name,
b.date,
COUNT(c.name) AS date_count
FROM (SELECT DISTINCT name FROM your_table) a
CROSS JOIN (SELECT DISTINCT date FROM your_table) b
LEFT JOIN your_table c ON a.name = c.name AND
b.date = c.date
GROUP BY a.name,
b.date
SQLFiddle Demo
You're asking for a "pivot". Basically, it is what it is. The real problem with a pivot is that the column names must adapt to the data, which is impossible to do with SQL alone.
Here's how you do it:
SELECT
Name,
SUM(`Date` = '01/01/2012') AS `01/01/2012`,
SUM(`Date` = '02/01/2012') AS `02/01/2012`,
SUM(`Date` = '03/01/2012') AS `03/01/2012`
FROM mytable
GROUP BY Name
Note the cool way you can SUM() a condition in mysql, becasue in mysql true is 1 and false is 0, so summing a condition is equivalent to counting the number of times it's true.
It is not more efficient to use an inner group by first.
Just in case anyone is interested in what was the best method:
Zane's second suggestion was the slowest, I loaded in a third of the data I did for the other two and it took quite a while. Perhaps on smaller tables it would be more efficient, and although I am not working with a huge table roughly 28,000 rows was enough to create significant lag, with the between clause dropping the result to about 4000 rows.
Bohemian's answer gave me the least amount to code, I threw in a loop to create all the case statements and it worked with relative ease. The benefit of this method was the simplicity, besides creating the loop for the cases, the results come in without the need for any php tricks, just simple foreach to get all the columns. Recommended for those not confident with php.
However, I found Zane's first suggestion the quickest performing and despite the need for extra php coding it seems I will be sticking with this method. The disadvantage of this method is that it only gives the dates that actually have data, so creating a table with all the dates becomes a bit more complicated. What I did was create a variable that keeps track of what date it is supposed to be compared to the table column which is reset on each table row, when the result of the query is equal to that date it echoes the value otherwise it does a while loop echoing table cells with 0 until the dates do match. It also had to do a check to see if the 'Name' value is still the same and if not it would switch to the next row after filling in any missing cells with 0 to the end of that row. If anyone is interested in seeing the code you can message me.
Results of the two methods over 3 months of data (a column for each day so roughly 90 case statements) ~ 12,000 rows out of 28,000:Bohemian's Pivot - ~0.158s (highest seen ~0.36s)Zane's Double Group by - ~0.086s (highest seen ~0.15s)
Firstly I'd like to start by apologizing for the potentially miss-leading title... I am finding it difficult to describe what I am trying to do here.
With the current project I'm working on, we have setup a 'dynamic' database structure with MySQL that looks something like this.
item_details ( Describes the item_data )
fieldID | fieldValue | fieldCaption
1 | addr1 | Address Line 1
2 | country | Country
item_data
itemID | fieldID | fieldValue
12345 | 1 | Some Random Address
12345 | 2 | United Kingdom
So as you can see, if for example I wanted to lookup the address for the item 12345 I would simply do the statement.
SELECT fieldValue FROM item_data WHERE fieldID=1 and itemID=12345;
But here is where I am stuck... the database is relatively large with around ~80k rows and I am trying to create a set of search functions within PHP.
I would like to be able to perform a query on the result set of a query as quickly as possible...
For example, Search an address name within a certain country... ie: Search for the fieldValue of the results with the same itemID's as the results from the query:
'SELECT itemID from item_data WHERE fieldID=2 and fieldValue='United Kingdom'..
Sorry If I am unclear, I have been struggling with this for the past couple of days...
Cheers
You can do this in a couple of ways. One is to use multiple joins to the item_data table with the fieldID limited to whatever it is you want to get.
SELECT *
FROM
Item i
INNER JOIN item_data country
ON i.itemID = country.itemID
and fieldid = 2
INNER JOIN item_data address
ON i.itemID = country.itemID
and fieldid = 1
WHERE
country.fieldValue= 'United Kingdom'
and address.fieldValue= 'Whatever'
As an aside this structure is often referred to as an Entry Attribute Value or EAV database
Sorry in advance if this sounds patronizing, but (as you suggested) I'm not quite clear what you are asking for.
If you are looking for one query to do the whole thing, you could simply nest them. For your example, pretend there is a table named CACHED with the results of your UK query, and write the query you want against that, but replace CACHED with your UK query.
If the idea is that you have ALREADY done this UK query and want to (re-)use its results, you could save the results to a table in the DB (which may not be practical if there are a large number of queries executed), or save the list of IDs as text and paste that into the subsequent query (...WHERE ID in (...) ... ), which might be OK if your 'cached' query gives you a manageable fraction of the original table.