MySQL JOIN Statement from Multiple Tables - mysql

I have an old database of entries from an abandoned "Joomgalaxy" Joomla plugin.
There are three tables, joomgalaxy_entries, joomgalaxy_fields, and joomgalaxy_entries_data
The id from the entries table matches the entry_id in the entries_data table, but the actual field name is saved in another table, fields
Can someone please help me with the correct SQL statement to obtain results like you can see below in Ultimate Goal? My MySQL knowledge is very basic, and from my searching it sounds like I need to use a LEFT JOIN, but I have no idea how to use the value from field_name as the column name for returned values
Thank You!!
joomgalaxy_entries
---------------------------------------
| id | title | longitude | latitude |
---------------------------------------
| 50 | John | -79.333333 | 43.669999 |
| 51 | Bob | -79.333333 | 43.669999 |
---------------------------------------
joomgalaxy_fields
This is just two examples below to keep it simple, there are more than just these two, so it would have to be able to handle dynamically using the field_name as the column name.
--------------------------------
| id | field_type | field_name |
--------------------------------
| 1 | textbox | websiteurl |
| 2 | dropdown | occupation |
--------------------------------
joomgalaxy_entries_data
"Technically" there shouldn't be any duplicate entries (fieldid and entry_id), so from my understanding that shouldn't affect using the field_name from above as the column name, but what if there ends up being one?
-------------------------------------
| fieldid | field_value | entry_id |
-------------------------------------
| 1 | google.com | 50 |
| 2 | unemployed | 50 |
| 1 | doctor.com | 51 |
| 2 | doctor | 51 |
-------------------------------------
Ultimate Goal
Ultimately trying to get this type of result, so I can then use that statement in MySQL Workbench to export the data that would look like this:
------------------------------------------------------------------
| id | title | longitude | latitude | websiteurl | occupation |
------------------------------------------------------------------
| 50 | John | -79.333333 | 43.669999 | google.com | unemployed |
| 51 | Bob | -79.333333 | 43.669999 | doctor.com | doctor |
------------------------------------------------------------------
EDIT:
There are more than just the two fields websiteurl and occupation, I was just using those two as examples, there are numerous fields that are all different, so in theory pulling the value from field_name would be used for the column name

You can use some conditional logic, like a CASE statement, along with an aggregate function like max() or min() to return those values as columns:
SELECT je.id,
je.title,
je.longitude,
je.latitude,
max(case when jf.fieldid = 1 then jed.field_value end) as WebsiteUrl,
max(case when jf.fieldid = 2 then jed.field_value end) as Occupation
FROM joomgalaxy_entries je
INNER JOIN joomgalaxy_entries_data jed
on je.id = jed.entry_id
GROUP BY je.id,
je.title,
je.longitude,
je.latitude
Using an INNER JOIN will only return the joomgalaxy_entries rows that have values in each table, if you want to return all joomgalaxy_entries even if there are no matching rows to join on in the other tables, then change the INNER JOIN to a LEFT JOIN.

You can write a simple SELECT query like this:
SELECT je.id, je.title, je.longitude, je.latitude,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 1 AND entry_id = je.id) AS websiteurl,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 2 AND entry_id = je.id) AS occupation
FROM joomgalaxy_entries je;

First step is easy:
SELECT JE.id, JE.title, JE.longitude, JE.latitude
FROM joomgalaxy_entries JE
Now you need to JOIN:
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
JD.*
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
Now you need convert rows to columns
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
MIN(CASE WHEN fieldid = 1 THEN JD.field_value END) as WebsiteUrl,
MIN(CASE WHEN fieldid = 2 THEN JD.field_value END) as Occupation
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
GROUP BY JE.id, JE.title, JE.longitude, JE.latitude
This depend on you only have two field for each entry, if number of field is dynamic you would need a different aproach.

This should work:
select id, title, longitude, latitude,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomgalaxy_fields
where field_name = 'websiteurl')
and jed.entry_id = je.id
) as websiteurl,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomlgalaxy_fields
where field_name = 'occupation')
and jed.entry_id = je.id) as occupation
from joomgalaxy_entries je;
Note that the reason to have a left join would be if either websiteurl or occupation were null, however, this solution should work in that case anyway.

Well, that certainly makes it a bit more difficult... :) Honestly, I'm not sure what you're asking is possible with a static sql query. I'm sure someone will speak up, however, if I'm wrong.
That said, I do have a few options you can try:
Option 1 - Generate the SQL Dynamically
Assuming this is mysql, if you execute the following SQL, it will generate the subqueries dynamically:
select concat('(select field_value from joomgalaxy_entries_data jed ',
'where fieldid = (select id from joomgalaxy_fields ',
'where field_name = ''', field_name, ''') ',
'and jed.entry_id = je.id) as ', field_name, ',')
from joomgalaxy_fields;
Take the result of that command, copy-paste it into a text editor and add the following at the beginning:
select id, title, longitude, latitude,
And the rest of this at the end:
from joomgalaxy_entries je;
Then run your new uber-query and go grab a cup of copy, lunch, or a good night's sleep depending on how much data is in your database.
Alternatively, you could add all of this to a stored procedure so you don't have to hand edit the SQL. Also, note that my syntax works for MySQL. Other databases have different concatenation operators so you may have to work around that if applicable. Also, with 50+ subqueries there is a good chance this uber-query will be quite slow, maybe too slow to make this option viable.
Option 2 - Create a table structured the way you want, and populate it
Hopefully, this is self-explanatory, but just create a new table with all of the necessary columns from the joomgalaxy_fields table. Then populate each column separately with a long series of what should be pretty straightforward sql commands. Granted this option is only viable if the database is no longer in use which I believe you indicated. From there the result is just:
select * from my_new_table;

Related

How to select only if the last element of an inner-join matches?

I have two tables : processes and notes. Each note is linked to a process. A process have several notes (one to many relationship). Notes also have a creation date.
I want to select each process whose last note contains a certain text (say 'some content'), but ONLY if this note is the last created one for the process.
For example :
processes table:
id | name
----------
42 | 'foo'
notes table:
content | creation_date | process_id
-------------------------------------------
'note1' | '09/13' | 42
'note1' | '09/14' | 42
'some_content'| '09/15' | 42
The process_id field in notes is a foreign key. In this example, the 'foo' process should be selected by my query.
If a new note is added, the notes tables becomes something like this:
content | creation_date | process_id
-------------------------------------------
'note1' | '09/13' | 42
'note1' | '09/14' | 42
'some_content'| '09/15' | 42
'note4' | '09/16' | 42
In this case the 'foo' process should not be selected, because the last note content is not 'some_content' anymore.
Is it possible to do such a thing in a single query?
I'm using MySQL.
One possibility is aggregation:
select p.id, p.name
from processes p join
notes n
on n.process_id = p.id
group by p.id, p.name
having max(n.creation_date) = max(case when n.note like '%some_content%' then n.creation_date end);
You can use a correlated sub query like so:
SELECT *
FROM processes
WHERE (
SELECT content
FROM notes
WHERE notes.process_id = processes.id
ORDER BY creation_date DESC
LIMIT 1
) = 'some_content'
Yet another method simply uses exists
select *
from processes p
where exists (
select * from notes n
where n.process_id=p.id
and n.content='some_content'
and n.creation_date=(select Max(creation_date) from notes)
)

Compare two query results and output difference

I have two queries against the same table:
SELECT * FROM table WHERE fileName='x';
SELECT * FROM table WHERE fileName='y';
Now, I am trying to compare the result that is being returned by these queries. There is no ID I could compare against but I need to compare each column.
I was trying to modify this approach, as seen here:
SELECT 'robot' AS `set`, r.*
FROM robot r
WHERE ROW(r.col1, r.col2, …) NOT IN
(
SELECT *
FROM tbd_robot
)
UNION ALL
SELECT 'tbd_robot' AS `set`, t.*
FROM tbd_robot t
WHERE ROW(t.col1, t.col2, …) NOT IN
(
SELECT *
FROM robot
)
I am not sure how to modify this code correctly. My attempts to change the table names to the same table but adding a WHERE clause failed in SQL exceptions.
Is this even the best route to take? Maybe there is an even more clever way to compare two query results and output the differences?
Thank you very much for your help in advance :-)
EDIT:
Sample Data:
ID | fileName | firstName | lastName | address
1 | x.txt | John | Doe | 1 Test st
2 | x.txt | Jane | Doe | 3 Test st
3 | y.txt | John | Doe | 2 Test st
4 | y.txt | Jane | Doe | 3 Test st
Since the address differs where ID = 3, this is the row that should be returned.
Perhaps group by is a simpler approach:
select col1, col2, col3, . . .,
sum(fileName = 'x') as count_x,
sum(fileName = '7') as count_y
from table t
where fileName in ('x', 'y')
group by col1, col2, col3, . . .;
For the columns that you specify, you will get the count of rows with 'x' and 'y'.
You can just output the differences by putting having count_x <> count_y at the end of the query.

MySQL subquery from same table

I have a database with table xxx_facileforms_forms, xxx_facileforms_records and xxx_facileforms_subrecords.
Column headers for xxx_facileforms_subrecords:
id | record | element | title | neame | type | value
As far as filtering records with element = '101' ..query returns proper records, but when i add subquery to filete aditional element = '4871' from same table - 0 records returned.
SELECT
F.id AS form_id,
R.id AS record_id,
PV.value AS prim_val,
COUNT(PV.value) AS count
FROM
xxx_facileforms_forms AS F
INNER JOIN xxx_facileforms_records AS R ON F.id = R.form
INNER JOIN xxx_facileforms_subrecords AS PV ON R.id = PV.record AND PV.element = '101'
WHERE R.id IN (SELECT record FROM xxx_facileforms_records WHERE record = R.id AND element = '4871')
GROUP BY PV.value
Does this looks right?
Thank You!
EDIT
Thank you for support and ideas! Yes, I left lot of un guessing. Sorry. Some input/output table data might help make it more clear.
_facileforms_form:
id | formname
---+---------
1 | myform
_facileforms_records:
id | form | submitted
----+------+--------------------
163 | 1 | 2014-06-12 14:18:00
164 | 1 | 2014-06-12 14:19:00
165 | 1 | 2014-06-12 14:20:00
_facileforms_subrecords:
id | record | element | title | name|type | value
-----+--------+---------+--------+-------------+--------
5821 | 163 | 101 | ticket | radio group | flight
5822 | 163 | 4871 | status | select list | canceled
5823 | 164 | 101 | ticket | radio group | flight
5824 | 165 | 101 | ticket | radio group | flight
5825 | 165 | 4871 | status | select list | canceled
Successful query result:
form_id | record_id | prim_val | count
1 | 163 | flight | 2
So i have to return value data (& sum those records) from those records where _subrecord element - 4871 is present (in this case 163 and 165).
And again Thank You!
Thank You for support and ideas! Yes i left lot of un guessing.. sorry . So may be some input/output table data might help.
_facileforms_form:
headers -> id | formname
1 | myform
_facileforms_records:
headers -> id | form | submitted
163 | 1 | 2014-06-12 14:18:00
164 | 1 | 2014-06-12 14:19:00
165 | 1 | 2014-06-12 14:20:00
_facileforms_subrecords
headers -> id | record | element | title | name | type | value
5821 | 163 | 101 | ticket | radio group| flight
5822 | 163 | 4871 | status | select list | canceled
5823 | 164 | 101 | ticket | radio group | flight
5824 | 165 | 101 | ticket | radio group | flight
5825 | 165 | 4871 | status | select list | canceled
Succesful Query result:
headers -> form_id | record_id | prim_val | count
1 | 163 | flight | 2
So i have to return value data (& sum those records) from those records where _subrecord element - 4871 is present (in this case 163 and 165).
And again Thank You!
No, it doesn't look quite right. There's a predicate "R.id IN (subquery)" but that subquery itself has a reference to R.id; it's a correlated subquery. Looks like something is doubled up there. (We're assuming here that id is a UNIQUE or PRIMARY key in each table.)
The subquery references an identifier element... the only other reference we see to that identifier is from the _subrecords table (we don't see any reference to that column in _records table... if there's no element column in _records, then that's a reference to the element column in PV, and that predicate in the subquery will never be true at the same time the PV.element='101' predicate is true.
Kudos for qualifying the column references with a table alias, that makes the query (and the EXPLAIN output) much easier to read; the reader doesn't need to go digging around in the table definitions to figure out which table does and doesn't contain which columns. But please take that pattern to the next step, and qualify all column references in the query, including column references in the subqueries.
Since the reference to element isn't qualified, we're left to guess whether the _records table contains a column named element.
If the goal is to return only the rows from R with element='4871', we could just do...
WHERE R.element='4871'
But, given that you've gone to the bother of using a subquery, I suspect that's not really what you want.
It's possible you're trying to return all rows from R for a _form, but only for the _form where there's at least one associated _record with element='4871'. We could get that result returned with either an IN (subquery) or an EXISTS (correlated_ subquery) predicate, or an anti-join pattern. I'd give examples of those query patterns; I could take some guesses at the specification, but I would only be guessing at what you actually want to return.
But I'm guessing that's not really what you want. I suspect that _records doesn't actually contain a column named element.
The query is already restricting the rows returned from PV with those that have element='101'.)
This is a case where some example data and the example output would help explain the actual specification; and that would be a basis for developing the required SQL.
FOLLOWUP
I'm just guessing... maybe what you want is something pretty simple. Maybe you want to return rows that have element value of either '101' or '4913'.
The IN comparison operator is a convenient of way of expressing the OR condition, that a column be equal to a value in a list:
SELECT F.id AS form_id
, R.id AS record_id
, PV.value AS prim_val
, COUNT(PV.value) AS count
FROM xxx_facileforms_forms F
JOIN xxx_facileforms_records R
ON R.form = F.id
JOIN xxx_facileforms_subrecords PV
ON PV.record = R.id
AND PV.element IN ('101','4193')
GROUP BY PV.value
NOTE: This query (like the OP query) is using a non-standard MySQL extension to GROUP BY, which allows non-aggregate expressions (e.g. bare columns) to be returned in the SELECT list.
The values returned for the non-aggregate expressions (in this case, F.id and R.id) will be a values from a row included in the "group". But because there can be multiple rows, and different values on those rows, it's not deterministic which of values will be returned. (Other databases would reject this statement, unless we wrapped those columns in an aggregate function, such as MIN() or MAX().)
FOLLOWUP
I noticed that you added information about the question into an answer... this information would better be added to the question as an EDIT, since it's not an answer to the question. I took the liberty of copying that, and reformatting.
The example makes it much more clear what you are trying to accomplish.
I think the easiest to understand is to use EXISTS predicate, to check whether a row meeting some criteria "exists" or not, and exclude rows where such a row does not exist. This will use a correlated subquery of the _subrecords table, to which check for the existence of a matching row:
SELECT f.id AS form_id
, r.id AS record_id
, pv.value AS prim_val
, COUNT(pv.value) AS count
FROM xxx_facileforms_forms f
JOIN xxx_facileforms_records r
ON r.form = f.id
JOIN xxx_facileforms_subrecords pv
ON pv.record = r.id
AND pv.element = '101'
-- only include rows where there's also a related 4193 subrecord
WHERE EXISTS ( SELECT 1
FROM xxx_facileforms_subrecords sx
WHERE sx.element = '4193'
AND sx.record = r.id
)
--
GROUP BY pv.value
(I'm thinking this is where OP was headed with the idea that a subquery was required.)
Given that there's a GROUP BY in the query, we could actually accomplish an equivalent result with a regular join operation, to a second reference to the _subrecords table.
A join operation is often more efficient than using an EXISTS predicate.
(Note that the existing GROUP BY clause will eliminate any "duplicates" that might otherwise be introduced by a JOIN operation, so this will return an equivalent result.)
SELECT f.id AS form_id
, r.id AS record_id
, pv.value AS prim_val
, COUNT(pv.value) AS count
FROM xxx_facileforms_forms f
JOIN xxx_facileforms_records r
ON r.form = f.id
JOIN xxx_facileforms_subrecords pv
ON pv.record = r.id
AND pv.element = '101'
-- only include rows where there's also a related 4193 subrecord
JOIN xxx_facileforms_subrecords sx
ON sx.record = r.id
AND sx.element = '4193'
--
GROUP BY pv.value

SQL GROUP BY Issue on GROUP BY

I've written a query that builds a small table of information from a couple of data sources, it uses a self made table to reference the vehicle model for the final group by which is how the data needs to be viewed, however when I group by vehicle it misses out figures in the subquery column from the group by, i.e. if I group by Prefix it shows the correct numbers, grouped by Vehicle hides off some of the data.
The Prefix can relate to a couple of like vehicle models and hence the need to group by vehicle. Can anyone see what I've done wrong easily from the SQL query below please.
SELECT Vehicle, COUNT(`Chassis-No`) AS Stock,
ROUND((100/COUNT(`Chassis-No`)) * SUM(CASE WHEN `Vehicle Age` > '182' THEN 1 ELSE 0 END),1) AS Perc6Months,
ROUND((100/COUNT(`Chassis-No`)) * SUM(CASE WHEN `Vehicle Age` > '365' THEN 1 ELSE 0 END),1) AS Perc12Months,
(SELECT COUNT(VIN_Prefix) FROM Orderdownload
INNER JOIN VehicleMatrix ON (`VIN_Prefix` LIKE 'S%' AND Prefix = LEFT(`VIN_Prefix`,2)) OR (`VIN_Prefix` NOT LIKE 'S%' AND Prefix = LEFT(`VIN_Prefix`,1)) WHERE DealerCode = 'AA12345' AND `VIN_Prefix` = IF(LEFT(`Chassis-No`,1)='S',LEFT(`Chassis-No`,2),LEFT(`Chassis-No`,1))) As Qty
FROM DealerAgedStock
INNER JOIN VehicleMatrix AS VM
ON (`Chassis-No` LIKE 'S%' AND Prefix = LEFT(`Chassis-No`,2)) OR (`Chassis-No` NOT LIKE 'S%' AND Prefix = LEFT(`Chassis-No`,1))
WHERE `DL Dealer Code` = 'AA12345'
GROUP BY Vehicle
Grouped on Vehicle I get the following:
Vehicle | Perc6Months | Perc12Months | Qty
Mondeo | 37.5 | 0 | 2
Grouped on Prefix I get the following:
VIN_Prefix | Perc6Months | Perc12Months | Qty
S1 | 25 | 0 | 2
S2 | 50 | 0 | 2
Ideally it should look this this:
Vehicle | Perc6Months | Perc12Months | Qty
Mondeo | 37.5 | 0 | 4
Where S1 and S2 are relative to the Vehicle Mondeo, thus it gives me the first instance of subquery rather than adding them together.
My question is: why does the Group By not add the figures together properly from the subquery? I need it to add them to have the correct figures...

Combining two tables in a complex way

The situation:
I have the main table, lets call it MainTable.
+---------+----------+----------+----------+
| Id (PK)| Title | Text | Type |
+---------+----------+----------+----------+
| 1 | Some Text|More Stuff| A |
| 2 | Another | Example | B |
+---------+----------+----------+----------+
And I have a second table called TranslationsTable, in which the Id field is the representation of the MainTable row Id (no foreign key, as it can be refering to different tables), The ObjType is the ObjectType (same name as table), the FieldName is the name of the field from the ObjecType and the value has the translation value for the FieldName value in the ObjType table.
+---------+-----------+-----------+------------+----------+
| Id | ObjType | FieldName | Value | Language |
+---------+-----------+-----------+------------+----------+
| 1 | MainTable | Title | Algum Texto| PT |
| 1 | MainTable | Text | Mais Coisas| PT |
+---------+-----------+-----------+------------+----------+
And because I need to search in translated fields, I figured I could use a TEMPORARY TABLE to do so, but then came the problem of "Which SELECT query should I use?". I read some posts about pivot table queries, but I don't really know how can I build a query so my temp table is something like
+---------+------------+------------+----------+
| Id (PK)| Field_1 | Field_2 | Field_3 |
+---------+------------+------------+----------+
| 1 | Algum Texto| Mais Coisas| A |
+---------+------------+------------+----------+
Thank you.
EDIT:
I accepted AD7six answer because for 500.000 entries in the MainTable and 1.500.000 in the Translations it is roughly 30x times faster than the other one.
SELECT
orig.Id,
COALESCE(xlate.Field_1, orig.Field_1) AS Field_1,
COALESCE(xlate.Field_2, orig.Field_2) AS Field_2,
COALESCE(xlate.Field_3, orig.Field_3) AS Field_3
FROM MainTable orig
INNER JOIN (
SELECT
Id,Field_1,Field_2,Field_3
FROM TranslationsTable
PIVOT(MIN(Value) FOR FieldName IN (Field_1,Field_2,Field_3)) p
WHERE ObjType = 'MainTable'
) xlate ON (orig.Id = xlate.Id)
If you want to include the (untranslated) rows from MainTable that have no matches in TranslationsTable, change the INNER JOIN to LEFT OUTER JOIN
Another alternative is to perform the pivot manually:
SELECT
orig.Id,
COALESCE(xlate.Field_1, orig.Field_1) AS Field_1,
COALESCE(xlate.Field_2, orig.Field_2) AS Field_2,
COALESCE(xlate.Field_3, orig.Field_3) AS Field_3
FROM MainTable orig
INNER JOIN (
SELECT
Id,
MIN(CASE FieldName WHEN 'Field_1' THEN Value END) AS Field_1,
MIN(CASE FieldName WHEN 'Field_2' THEN Value END) AS Field_2,
MIN(CASE FieldName WHEN 'Field_3' THEN Value END) AS Field_3
FROM TranslationsTable
WHERE ObjType = 'MainTable'
GROUP BY Id
) xlate ON (orig.Id = xlate.Id)
With a change in the MainTable schema like others have suggested, you won't need the repetition for (Field_1,Field_2,Field_3). It makes the code easier to maintain and modify.
That's not complex
It's just a query with one join per translated field.
That means you query/sort/whatever it like any other e.g. (Using some real names so that it's easier to read):
SELECT
products.id,
COALESCE(product_name.value, products.name) as name,
COALESCE(product_description.value, products.description) as description
FROM
products
LEFT JOIN
TranslationsTable AS product_name
ON (
product_name.Language = 'PT' AND
product_name.ObjectType = 'products' AND
product_name.FieldName = 'name' AND
product_name.id = products.id
)
LEFT JOIN
TranslationsTable AS product_description
ON (
product_description.Language = 'PT' AND
product_description.ObjectType = 'products' AND
product_description.FieldName = 'description' AND
product_description.id = products.id
)
WHERE
product_name.value = "Algum Texto" // Find all products named "Algum Texto"
You don't need a temp table
But if you want to create one, it's easy to do using the query itself:
CREATE TABLE
products_pt
AS
SELECT
products.id,
COALESCE(product_name.value, products.name) as name,
COALESCE(product_description.value, products.description) as description
...
This will create a table (no indexes) matching the structure of the query. If your data does not change frequently it can make querying your multilingual data a lot easier to manage, but has some disadvantages such as (obviously) your translation-specific table will not be up to date if the source table data changes.