Combining two tables in a complex way - mysql

The situation:
I have the main table, lets call it MainTable.
+---------+----------+----------+----------+
| Id (PK)| Title | Text | Type |
+---------+----------+----------+----------+
| 1 | Some Text|More Stuff| A |
| 2 | Another | Example | B |
+---------+----------+----------+----------+
And I have a second table called TranslationsTable, in which the Id field is the representation of the MainTable row Id (no foreign key, as it can be refering to different tables), The ObjType is the ObjectType (same name as table), the FieldName is the name of the field from the ObjecType and the value has the translation value for the FieldName value in the ObjType table.
+---------+-----------+-----------+------------+----------+
| Id | ObjType | FieldName | Value | Language |
+---------+-----------+-----------+------------+----------+
| 1 | MainTable | Title | Algum Texto| PT |
| 1 | MainTable | Text | Mais Coisas| PT |
+---------+-----------+-----------+------------+----------+
And because I need to search in translated fields, I figured I could use a TEMPORARY TABLE to do so, but then came the problem of "Which SELECT query should I use?". I read some posts about pivot table queries, but I don't really know how can I build a query so my temp table is something like
+---------+------------+------------+----------+
| Id (PK)| Field_1 | Field_2 | Field_3 |
+---------+------------+------------+----------+
| 1 | Algum Texto| Mais Coisas| A |
+---------+------------+------------+----------+
Thank you.
EDIT:
I accepted AD7six answer because for 500.000 entries in the MainTable and 1.500.000 in the Translations it is roughly 30x times faster than the other one.

SELECT
orig.Id,
COALESCE(xlate.Field_1, orig.Field_1) AS Field_1,
COALESCE(xlate.Field_2, orig.Field_2) AS Field_2,
COALESCE(xlate.Field_3, orig.Field_3) AS Field_3
FROM MainTable orig
INNER JOIN (
SELECT
Id,Field_1,Field_2,Field_3
FROM TranslationsTable
PIVOT(MIN(Value) FOR FieldName IN (Field_1,Field_2,Field_3)) p
WHERE ObjType = 'MainTable'
) xlate ON (orig.Id = xlate.Id)
If you want to include the (untranslated) rows from MainTable that have no matches in TranslationsTable, change the INNER JOIN to LEFT OUTER JOIN
Another alternative is to perform the pivot manually:
SELECT
orig.Id,
COALESCE(xlate.Field_1, orig.Field_1) AS Field_1,
COALESCE(xlate.Field_2, orig.Field_2) AS Field_2,
COALESCE(xlate.Field_3, orig.Field_3) AS Field_3
FROM MainTable orig
INNER JOIN (
SELECT
Id,
MIN(CASE FieldName WHEN 'Field_1' THEN Value END) AS Field_1,
MIN(CASE FieldName WHEN 'Field_2' THEN Value END) AS Field_2,
MIN(CASE FieldName WHEN 'Field_3' THEN Value END) AS Field_3
FROM TranslationsTable
WHERE ObjType = 'MainTable'
GROUP BY Id
) xlate ON (orig.Id = xlate.Id)
With a change in the MainTable schema like others have suggested, you won't need the repetition for (Field_1,Field_2,Field_3). It makes the code easier to maintain and modify.

That's not complex
It's just a query with one join per translated field.
That means you query/sort/whatever it like any other e.g. (Using some real names so that it's easier to read):
SELECT
products.id,
COALESCE(product_name.value, products.name) as name,
COALESCE(product_description.value, products.description) as description
FROM
products
LEFT JOIN
TranslationsTable AS product_name
ON (
product_name.Language = 'PT' AND
product_name.ObjectType = 'products' AND
product_name.FieldName = 'name' AND
product_name.id = products.id
)
LEFT JOIN
TranslationsTable AS product_description
ON (
product_description.Language = 'PT' AND
product_description.ObjectType = 'products' AND
product_description.FieldName = 'description' AND
product_description.id = products.id
)
WHERE
product_name.value = "Algum Texto" // Find all products named "Algum Texto"
You don't need a temp table
But if you want to create one, it's easy to do using the query itself:
CREATE TABLE
products_pt
AS
SELECT
products.id,
COALESCE(product_name.value, products.name) as name,
COALESCE(product_description.value, products.description) as description
...
This will create a table (no indexes) matching the structure of the query. If your data does not change frequently it can make querying your multilingual data a lot easier to manage, but has some disadvantages such as (obviously) your translation-specific table will not be up to date if the source table data changes.

Related

Problems using SQL ALL operator

I'm having trouble using/understanding the SQL ALL operator. I have a table FOLDER_PERMISSION with the following columns:
+----+-----------+---------+----------+
| ID | FOLDER_ID | USER_ID | CAN_READ |
+----+-----------+---------+----------+
| 1 | 34353 | 45453 | 0 |
| 2 | 46374 | 342532 | 1 |
| 3 | 46374 | 32352 | 1 |
+----+-----------+---------+----------+
I want to select the folders where all the users have permission to read, how could I do it?
Use aggregation and having:
select folder_id
from t
group by folder_id
having min(can_read) = 1;
Gordon's answer seems better but for the sake of completeness, using ALL a query could look like:
SELECT x1.folder_id
FROM (SELECT DISTINCT
fp1.folder_id
FROM folder_permission fp1) x1
WHERE 1 = ALL (SELECT fp2.can_read
FROM folder_permission fp2
WHERE fp2.folder_id = x1.folder_id);
If you have a table for the folders themselves replace the derived table (aliased x1) with it.
But this only respects users present in folder_permissions. If not all users have a reference in that table you possibly won't get the folders really all users can read.
You can do aggregation :
SELECT fp.FOLDER_ID
FROM folder_permission fp
GROUP BY fp.FOLDER_ID
HAVING SUM( can_read = 0 ) = 0;
You can also express it :
SELECT fp.FOLDER_ID
FROM folder_permission fp
GROUP BY fp.FOLDER_ID
HAVING MIN(CAN_READ) = MAX(CAN_READ) AND MIN(CAN_READ) = 1;
If you wanted to return the full matching records, you could try using some exists logic:
SELECT ID, FOLDER_ID, USER_ID, CAN_READ
FROM yourTable t1
WHERE NOT EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.FOLDER_ID = t1.FOLDER_ID AND t2.CAN_READ = 0);
Demo
The existence of a matching record in the above exists subquery would imply that there exist one or more users for that folder who do not have read access rights.

MySQL JOIN Statement from Multiple Tables

I have an old database of entries from an abandoned "Joomgalaxy" Joomla plugin.
There are three tables, joomgalaxy_entries, joomgalaxy_fields, and joomgalaxy_entries_data
The id from the entries table matches the entry_id in the entries_data table, but the actual field name is saved in another table, fields
Can someone please help me with the correct SQL statement to obtain results like you can see below in Ultimate Goal? My MySQL knowledge is very basic, and from my searching it sounds like I need to use a LEFT JOIN, but I have no idea how to use the value from field_name as the column name for returned values
Thank You!!
joomgalaxy_entries
---------------------------------------
| id | title | longitude | latitude |
---------------------------------------
| 50 | John | -79.333333 | 43.669999 |
| 51 | Bob | -79.333333 | 43.669999 |
---------------------------------------
joomgalaxy_fields
This is just two examples below to keep it simple, there are more than just these two, so it would have to be able to handle dynamically using the field_name as the column name.
--------------------------------
| id | field_type | field_name |
--------------------------------
| 1 | textbox | websiteurl |
| 2 | dropdown | occupation |
--------------------------------
joomgalaxy_entries_data
"Technically" there shouldn't be any duplicate entries (fieldid and entry_id), so from my understanding that shouldn't affect using the field_name from above as the column name, but what if there ends up being one?
-------------------------------------
| fieldid | field_value | entry_id |
-------------------------------------
| 1 | google.com | 50 |
| 2 | unemployed | 50 |
| 1 | doctor.com | 51 |
| 2 | doctor | 51 |
-------------------------------------
Ultimate Goal
Ultimately trying to get this type of result, so I can then use that statement in MySQL Workbench to export the data that would look like this:
------------------------------------------------------------------
| id | title | longitude | latitude | websiteurl | occupation |
------------------------------------------------------------------
| 50 | John | -79.333333 | 43.669999 | google.com | unemployed |
| 51 | Bob | -79.333333 | 43.669999 | doctor.com | doctor |
------------------------------------------------------------------
EDIT:
There are more than just the two fields websiteurl and occupation, I was just using those two as examples, there are numerous fields that are all different, so in theory pulling the value from field_name would be used for the column name
You can use some conditional logic, like a CASE statement, along with an aggregate function like max() or min() to return those values as columns:
SELECT je.id,
je.title,
je.longitude,
je.latitude,
max(case when jf.fieldid = 1 then jed.field_value end) as WebsiteUrl,
max(case when jf.fieldid = 2 then jed.field_value end) as Occupation
FROM joomgalaxy_entries je
INNER JOIN joomgalaxy_entries_data jed
on je.id = jed.entry_id
GROUP BY je.id,
je.title,
je.longitude,
je.latitude
Using an INNER JOIN will only return the joomgalaxy_entries rows that have values in each table, if you want to return all joomgalaxy_entries even if there are no matching rows to join on in the other tables, then change the INNER JOIN to a LEFT JOIN.
You can write a simple SELECT query like this:
SELECT je.id, je.title, je.longitude, je.latitude,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 1 AND entry_id = je.id) AS websiteurl,
(SELECT field_value FROM joomgalaxy_entries_data WHERE fieldid = 2 AND entry_id = je.id) AS occupation
FROM joomgalaxy_entries je;
First step is easy:
SELECT JE.id, JE.title, JE.longitude, JE.latitude
FROM joomgalaxy_entries JE
Now you need to JOIN:
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
JD.*
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
Now you need convert rows to columns
SELECT JE.id, JE.title, JE.longitude, JE.latitude,
MIN(CASE WHEN fieldid = 1 THEN JD.field_value END) as WebsiteUrl,
MIN(CASE WHEN fieldid = 2 THEN JD.field_value END) as Occupation
FROM joomgalaxy_entries JE
JOIN joomgalaxy_entries_data JD
ON JE.id = JD.entry_id
GROUP BY JE.id, JE.title, JE.longitude, JE.latitude
This depend on you only have two field for each entry, if number of field is dynamic you would need a different aproach.
This should work:
select id, title, longitude, latitude,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomgalaxy_fields
where field_name = 'websiteurl')
and jed.entry_id = je.id
) as websiteurl,
(select field_value from joomgalaxy_entries_data jed
where fieldid = (select id from joomlgalaxy_fields
where field_name = 'occupation')
and jed.entry_id = je.id) as occupation
from joomgalaxy_entries je;
Note that the reason to have a left join would be if either websiteurl or occupation were null, however, this solution should work in that case anyway.
Well, that certainly makes it a bit more difficult... :) Honestly, I'm not sure what you're asking is possible with a static sql query. I'm sure someone will speak up, however, if I'm wrong.
That said, I do have a few options you can try:
Option 1 - Generate the SQL Dynamically
Assuming this is mysql, if you execute the following SQL, it will generate the subqueries dynamically:
select concat('(select field_value from joomgalaxy_entries_data jed ',
'where fieldid = (select id from joomgalaxy_fields ',
'where field_name = ''', field_name, ''') ',
'and jed.entry_id = je.id) as ', field_name, ',')
from joomgalaxy_fields;
Take the result of that command, copy-paste it into a text editor and add the following at the beginning:
select id, title, longitude, latitude,
And the rest of this at the end:
from joomgalaxy_entries je;
Then run your new uber-query and go grab a cup of copy, lunch, or a good night's sleep depending on how much data is in your database.
Alternatively, you could add all of this to a stored procedure so you don't have to hand edit the SQL. Also, note that my syntax works for MySQL. Other databases have different concatenation operators so you may have to work around that if applicable. Also, with 50+ subqueries there is a good chance this uber-query will be quite slow, maybe too slow to make this option viable.
Option 2 - Create a table structured the way you want, and populate it
Hopefully, this is self-explanatory, but just create a new table with all of the necessary columns from the joomgalaxy_fields table. Then populate each column separately with a long series of what should be pretty straightforward sql commands. Granted this option is only viable if the database is no longer in use which I believe you indicated. From there the result is just:
select * from my_new_table;

SQL get multiple values (from the same column) from same table using multiple queries

I am trying to get a list of values from the same column in a table by running two queries.
This is what the table looks like:
******************************************
Key | Short_text | UID | Boolean_value
******************************************
Name | John | 23 | null
******************************************
Male | NULL | 23 | true
******************************************
Name | Ben | 45 | null
******************************************
Male | NULL | 45 | true
I am trying to get the SHORT_TEXT of the NAME rows if the Boolean values of the Male rows are true based on the UIDs
This is what I have so far (Which is throwing an error: Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
)
SELECT SHORT_TEXT_VALUE
FROM Table
WHERE ((SELECT UID
FROM Table
WHERE KEY = 'NAME') =
(SELECT CUSTOMER_UID
FROM Table
WHERE KEY = 'Male'
AND BOOLEAN_VALUE = 1))
I am very new to sql so I am not sure what I should do to achieve what I would like.
Any help would greatly be appreciated.
You can join your table with itself:
SELECT
t1.UID,
t1.Short_text
FROM
tablename t1 INNER JOIN tablename t2
ON t1.UID=t2.UID
WHERE
t1.Key='Name' AND t2.Key='Male' AND t2.Boolean_value=TRUE
or this with EXISTS:
SELECT
t1.UID,
t1.Short_text
FROM
tablename t1
WHERE
t1.Key='Name' AND
EXISTS (SELECT * FROM tablename t2
WHERE t1.UID=t2.UID AND t2.Key='Male' AND t2.Boolean_value=1)
I am unsure what you are trying to accomplish but basing on your code I think this is what you want
SELECT SHORT_TEXT_VALUE
FROM Table
WHERE KEY='Name'
and UID in(SELECT UID
FROM Table
WHERE KEY = 'Male'
AND BOOLEAN_VALUE = 1)
But on a more important note. You might want to think about your redesigning your table design. Why is Male details of a specific uid on a different row?
Hi try it with a subquery, try this:
SELECT Short_text
FROM table
WHERE uid in (SELECT uid FROM table WHERE boolean_value = "true")
AND Short_text IS NOT NULL
Make sure that the values of the Male rows are(NULL) and not the string with "NULL"
Btw. This table does not match to the normalization-form of database-tables. Please read the introduction to database normalization

select statement with only rows which have set true in second table

i have two tables
activity
id | user_id | time | activity_id
1 | 1 | | 3
2 | 1 | | 1
and preferences
user_id | running | cycling | driving
1 | TRUE | FALSE | FALSE
i need result set of
id | user_id | time |
2 | 1 | |
i only need rows from first table whose values are set true in preferences table.
e.g activity_id for running is 1 which is set true in preferences table, so it returns while others doesn't.
If you can edit the schema, it would be better like this:
activity
id | name
1 | running
2 | cycling
3 | driving
user_activity
id | user_id | time | activity_id
1 | 1 | | 3
2 | 1 | | 1
preferences
user_id | activity_id
1 | 1
A row in preferences indicates a TRUE value from your schema. No row indicates a FALSE.
Then your query would simply be:
SELECT ua.id, ua.user_id, ua.time
FROM user_activity ua
JOIN preferences p ON ua.user_id = p.user_id
AND ua.activity_id = p.activity_id
If you want to see the activity name in the results:
SELECT ua.id, ua.user_id, ua.time, activity.name
FROM user_activity ua
JOIN preferences p ON ua.user_id = p.user_id
AND ua.activity_id = p.activity_id
JOIN activity ON ua.activity_id = activity.id
What I would probably do is join the tables on a common column, looks like user_id is a common column in this case, which gives access to the columns in both tables to query against in the where clause of the query.
Which type of join depends on what information you want from preferences
Handy Visual Guide for joins
So you could query
SELECT * FROM activity LEFT JOIN preferences ON activity.user_id = preferences.user_id WHERE preferences.columnIWantToBeTrue = true
I'm using left join since you mentioned you want the values from the first table based on the second table.
Mike B has the right answer. The relational model relates rows together by common values.
You've got a table named activity with an id column which looks like the primary key. The column name activity_id would typically be the name of a column in another table that is a foreign key to the activity table, referencing activity.id.
It looks like you've used the activity_id column in the activity table as a reference to either "running", "cycling" or "driving".
It's possible to match activity.activity_id = 1 with "running", but this is a bizarre design.
Here's an example query:
SELECT a.id
, a.user_id
, a.time
FROM activity a
JOIN preferences p
ON p.user_id = a.user_id
AND ( ( p.running = 'TRUE' AND a.activity_id = 1 )
OR ( p.cycling = 'TRUE' AND a.activity_id = 2 )
OR ( p.driving = 'TRUE' AND a.activity_id = 3 )
)
But, again, this is a bizarre design.
As a start, each table in your database should have rows that represent either an entity (a person, place, thing, concept or event that can be uniquely identified, is important, and we need to store information about), or a relationship between the entities.
From the limited information we have about your use case, the entities appear to be "user", an "activity_type" (running, cycling, driving), an "activity" (an amount of time, for a user and an activity_type) and some user "preference" about which activity_types the user prefers.
See the answer from Mark B for a possible schema design.

Faster sql query then join

I have a big table with more than 10,000 rows and it will grow to 1,000,000 in the near future, and I need to run a query which gives back a Time value for each keyword for each user. I have one right now which is quite slow because I use left joins and it needs one subquery / keyword:
SELECT rawdata.user, t1.Facebook_Time, t2.Outlook_Time, t3.Excel_time
FROM
rawdata left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Facebook_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Facebook%'
GROUP by user)t1 on rawdata.user = t1.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Outlook_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Outlook%'
GROUP by user)t2 on rawdata.user = t2.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Excel_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Excel%'
GROUP by user)t3 on rawdata.user = t3.user
The table looks like this:
WindowTitle | StartTime | EndTime | User
------------|-----------|---------|---------
Form1 | DateTime | DateTime| user1
Form2 | DateTime | DateTime| user2
... | ... | ... | ...
Form_n | DateTime | DateTime| user_n
The output should looks like this:
User | Keyword | SUM(EndTime-StartTime)
-------|-----------|-----------------------
User1 | 'Facebook'| 00:34:12
User1 | 'Outlook' | 00:12:34
User1 | 'Excel' | 00:43:13
User2 | 'Facebook'| 00:34:12
User2 | 'Outlook' | 00:12:34
User2 | 'Excel' | 00:43:13
... | ... | ...
User_n | ... | ...
And the question is, which is the fastest way in MySQL to do this?
I think your wildcard searches are probably what's slowing it down the most, since you can't really utilize indexes on those fields. Also if you can avoid doing sub-queries and just do a straight join, it might help, but the wildcard searches are far worse. Is there anyway you could change the table to have a categoryName or categoryID that can have an index and not require a wildcard search? Like "where categoryName = 'Outlook'"
To optimize the data in your tables, add a categoryID (ideally this would reference a separate table, but let's just use arbitrary numbers for this example):
alter table rawData add column categoryID int not null
alter table rawData add index (categoryID)
Then populate the categoryID field for the existing data:
update rawData set categoryID=1 where name like '%Outlook%'
update rawData set categoryID=2 where name like '%Facebook%'
-- etc...
Then change your insert to follow the same rules.
Then make your SELECT query like this (changed wild cards to categoryID):
SELECT rawdata.user, t1.Facebook_Time, t2.Outlook_Time, t3.Excel_time
FROM
rawdata left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Facebook_Time'
FROM rawdata
WHERE categoryID = 2
GROUP by user)t1 on rawdata.user = t1.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Outlook_Time'
FROM rawdata
WHERE categoryID = 1
GROUP by user)t2 on rawdata.user = t2.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Excel_Time'
FROM rawdata
WHERE categoryID = 3
GROUP by user)t3 on rawdata.user = t3.user