Faster sql query then join - mysql

I have a big table with more than 10,000 rows and it will grow to 1,000,000 in the near future, and I need to run a query which gives back a Time value for each keyword for each user. I have one right now which is quite slow because I use left joins and it needs one subquery / keyword:
SELECT rawdata.user, t1.Facebook_Time, t2.Outlook_Time, t3.Excel_time
FROM
rawdata left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Facebook_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Facebook%'
GROUP by user)t1 on rawdata.user = t1.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Outlook_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Outlook%'
GROUP by user)t2 on rawdata.user = t2.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Excel_Time'
FROM rawdata
WHERE MainWindowTitle LIKE '%Excel%'
GROUP by user)t3 on rawdata.user = t3.user
The table looks like this:
WindowTitle | StartTime | EndTime | User
------------|-----------|---------|---------
Form1 | DateTime | DateTime| user1
Form2 | DateTime | DateTime| user2
... | ... | ... | ...
Form_n | DateTime | DateTime| user_n
The output should looks like this:
User | Keyword | SUM(EndTime-StartTime)
-------|-----------|-----------------------
User1 | 'Facebook'| 00:34:12
User1 | 'Outlook' | 00:12:34
User1 | 'Excel' | 00:43:13
User2 | 'Facebook'| 00:34:12
User2 | 'Outlook' | 00:12:34
User2 | 'Excel' | 00:43:13
... | ... | ...
User_n | ... | ...
And the question is, which is the fastest way in MySQL to do this?

I think your wildcard searches are probably what's slowing it down the most, since you can't really utilize indexes on those fields. Also if you can avoid doing sub-queries and just do a straight join, it might help, but the wildcard searches are far worse. Is there anyway you could change the table to have a categoryName or categoryID that can have an index and not require a wildcard search? Like "where categoryName = 'Outlook'"
To optimize the data in your tables, add a categoryID (ideally this would reference a separate table, but let's just use arbitrary numbers for this example):
alter table rawData add column categoryID int not null
alter table rawData add index (categoryID)
Then populate the categoryID field for the existing data:
update rawData set categoryID=1 where name like '%Outlook%'
update rawData set categoryID=2 where name like '%Facebook%'
-- etc...
Then change your insert to follow the same rules.
Then make your SELECT query like this (changed wild cards to categoryID):
SELECT rawdata.user, t1.Facebook_Time, t2.Outlook_Time, t3.Excel_time
FROM
rawdata left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Facebook_Time'
FROM rawdata
WHERE categoryID = 2
GROUP by user)t1 on rawdata.user = t1.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Outlook_Time'
FROM rawdata
WHERE categoryID = 1
GROUP by user)t2 on rawdata.user = t2.user left join
(SELECT user, sec_to_time(SuM(time_to_sec(EndTime-StartTime))) as 'Excel_Time'
FROM rawdata
WHERE categoryID = 3
GROUP by user)t3 on rawdata.user = t3.user

Related

Problems using SQL ALL operator

I'm having trouble using/understanding the SQL ALL operator. I have a table FOLDER_PERMISSION with the following columns:
+----+-----------+---------+----------+
| ID | FOLDER_ID | USER_ID | CAN_READ |
+----+-----------+---------+----------+
| 1 | 34353 | 45453 | 0 |
| 2 | 46374 | 342532 | 1 |
| 3 | 46374 | 32352 | 1 |
+----+-----------+---------+----------+
I want to select the folders where all the users have permission to read, how could I do it?
Use aggregation and having:
select folder_id
from t
group by folder_id
having min(can_read) = 1;
Gordon's answer seems better but for the sake of completeness, using ALL a query could look like:
SELECT x1.folder_id
FROM (SELECT DISTINCT
fp1.folder_id
FROM folder_permission fp1) x1
WHERE 1 = ALL (SELECT fp2.can_read
FROM folder_permission fp2
WHERE fp2.folder_id = x1.folder_id);
If you have a table for the folders themselves replace the derived table (aliased x1) with it.
But this only respects users present in folder_permissions. If not all users have a reference in that table you possibly won't get the folders really all users can read.
You can do aggregation :
SELECT fp.FOLDER_ID
FROM folder_permission fp
GROUP BY fp.FOLDER_ID
HAVING SUM( can_read = 0 ) = 0;
You can also express it :
SELECT fp.FOLDER_ID
FROM folder_permission fp
GROUP BY fp.FOLDER_ID
HAVING MIN(CAN_READ) = MAX(CAN_READ) AND MIN(CAN_READ) = 1;
If you wanted to return the full matching records, you could try using some exists logic:
SELECT ID, FOLDER_ID, USER_ID, CAN_READ
FROM yourTable t1
WHERE NOT EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.FOLDER_ID = t1.FOLDER_ID AND t2.CAN_READ = 0);
Demo
The existence of a matching record in the above exists subquery would imply that there exist one or more users for that folder who do not have read access rights.

Combine rows from two tables with different columns?

I'm having a hard time wrapping my head around this one. I believe it's happening because I am joining the two separate tables based on the same column (user_id), but I don't know how to fix it because the only thing in common between the two tables IS the user_id column.
Here is the query.
SELECT users_data_existing.`date`,`message`,`action`,`status`,`data`,
users_data_new.`date`,`data_new`
FROM users_data_existing
INNER JOIN users_data_action USING (action_id)
INNER JOIN users_data_status_user USING (status_user_id)
INNER JOIN `users` USING (user_id)
INNER JOIN users_data_new USING (user_id)
INNER JOIN data ON users_data_existing.`data_id` = data.`id`
WHERE users_data_existing.`user_id` = 2
ORDER BY users_data_existing.`date`,users_data_new.`date` DESC
The result, is that the users_data_new.date and data_new columns, are concatenated or "appended" to the previous rows.
+----------+-----------+-----------+-----------+-----------+----------+-----------+
| date | message | action | status | data | date | data_new |
+----------+-----------+-----------+-----------+-----------+----------+-----------+
|2011-01-01| data | data | data | data |2011-01-02| data_new |
-----------------------------------------------------------------------------------
|2011-01-01| data | data | data | data |2011-01-03| data_new1 |
-----------------------------------------------------------------------------------
REPEATS PATTERN FOR TOTAL RECORDS IN users_data_new TABLE
+----------+-----------+-----------+-----------+-----------+----------+-----------+
| date | message | action | status | data | date | data_new |
+----------+-----------+-----------+-----------+-----------+----------+-----------+
|2011-01-01| data1 | data1 | data1 | data1 |2011-01-02| data_new |
-----------------------------------------------------------------------------------
|2011-01-01| data1 | data1 | data1 | data1 |2011-01-03| data_new1 |
-----------------------------------------------------------------------------------
But that's not what I need. How can I get the last two columns into a separate row? I think a UNION would resolve this but I can't do that because the tables are almost identical but don't share the message column.
As suspected in the question, it was a UNION that I needed. The trick was to create an empty column in users_data_new to match users_data_existing. I also had a challenge with sorting it so I will include that here as well.
(SELECT data_existing.date AS submitdate,status_user.status,action.action,
data.data,data_existing.message
FROM users_data_existing AS data_existing
INNER JOIN users_requested_status_user status_user
ON data_existing.status_user_id = status_user.status_user_id
INNER JOIN users_requested_action action
ON data_existing.action_id = action.action_id
INNER JOIN websites data
ON data_existing.data_id = data.id
ORDER BY data_existing.date DESC) //sorts sub-query
UNION ALL
(SELECT data_new.date AS submitdate,status_user.status,
action.action,data_new.data_new,'' message //needed to add this last empty column
FROM users_data_new AS data_new
INNER JOIN users_requested_status_user status_user
ON data_new.status_user_id = status_user.status_user_id
INNER JOIN users_requested_action action
ON data_new.action_id = action.action_id
ORDER BY data_new.date DESC) //sorts sub-query
ORDER BY submitdate DESC"; //sorts the entire result
Keep in mind that with the alias for the date, the associative array key will be whatever alias name you use. i.e. $result['submitdate']

MySQL - updating table based on chronological order

I have two tables as below:
logs
id | user | log_id
---------------------
1 | user1 | abc
2 | user2 | def
3 | user1 | xyz
...
users
id | user | code
---------------
1 | user1 | 1234
2 | user2 | 9876
3 | user1 | 5678
...
I want to add log_id to users and update it with log_id's from Table1, to make Table2 as below:
id | user | code | log_id
---------------------------
1 | user1 | 1234 | abc
2 | user2 | 9876 | def
3 | user1 | 5678 | xyz
...
The only way to match rows in logs and users is using the user field, and the chronological order they appear in the tables. id, as you may have guessed, is the primary key in both tables.
Much appreciated if someone could help me with the query for this. Thanks.
If the id fields are always matched then the reply by Ronak Shah would be my choice.
If the ids do not match then possibly something like this:-
Firstly:-
ALTER TABLE table1 ADD COLUMN code VARCHAR(25);
Then an update like this:-
UPDATE table2
INNER JOIN
(
SELECT id, user, code, #rank2:=IF(#prev_user2 = user, #rank2+1, 1) AS rank, #prev_user2 := user
FROM table2
CROSS JOIN (SELECT #rank2:=0, #prev_user2:='') sub2
ORDER BY user, id
) tab_2
ON table2.id = tab_2.id
INNER JOIN
(
SELECT id, user, log_id, #rank1:=IF(#prev_user1 = user, #rank1+1, 1) AS rank, #prev_user1 = user
FROM table1
CROSS JOIN (SELECT #rank1:=0, #prev_user1:='') sub1
ORDER BY user, id
) tab_1
ON tab_1.user = tab_2.user
AND tab_1.rank = tab_2.rank
SET table2.log_id = tab_1.log_id;
What this is doing is a pair of sub queries which adds a rank to each tables records (I have added the rank within the user, which should make it cope a bit better if one user on one table has an extra record). The results of these sub queries are joined together, and then joined to table2 to do the actual update (the sub query for table2 to get the rank can be joined to table2 based on id).
This seems to work when done in SQL fiddle:-
http://www.sqlfiddle.com/#!2/ad8a6b/1
Try this:
UPDATE dbo.Table2 A
SET A.log_id = B.log_id
INNER JOIN dbo.Table1 B
ON A.user = B.user
But first you have to add log_id column to table2 with alter query.
try this:
alter table table1 add column code varchar(100);
update table1,table2 set table1.code = table2.code where table1.id=table2.id and table1.user=table2.user;
I figured out the solution. I added 2 columns rank and prev_user in both tables, and incremented the value for rank from 1 for the first record for user_x to n for the nth record for user_x, as below:
ALTER TABLE users ADD COLUMN rank tinyInt(1);
ALTER TABLE users ADD COLUMN prevuser varchar(50);
SET #prevuser = '';
SET #rank = 0;
UPDATE users
SET rank = (#rank:=IF(#prevuser != user,1,#rank+1)),
prevuser = (#prevuser := user)
ORDER BY user,id;
ALTER TABLE users DROP COLUMN prevuser;
and,
ALTER TABLE logs ADD COLUMN rank tinyInt(1);
ALTER TABLE logs ADD COLUMN prevuser varchar(50);
SET #prevuser = '';
SET #rank = 0;
UPDATE logs
SET rank = (#rank:=IF(#prevuser != user,1,#rank+1)),
prevuser = (#prevuser := user)
ORDER BY user,id;
ALTER TABLE logs DROP COLUMN prevuser;
Now records can be matched between the tables using user & rank. I added the field log_id to users and updated it as below:
UPDATE users, logs SET users.log_id=logs.log_id WHERE users.user=logs.user AND users.rank = logs.rank;
And voila!

Trying to delete duplicate rows based on a hash in MySQL

I'm trying to delete duplicate values (which will all have the same nid) based on the hash value.
I'm going to leave the initial (oldest) nid row with the same hash.
For some reason, I get the error, "You can't specify target table 'node_revision' for update in FROM clause
I'm trying to alias my tables, but that doesn't seem to work - what am I doing wrong?
delete from node_revision
WHERE nid NOT IN(SELECT MIN(nid) FROM node_revision GROUP BY hash)
(timestamp is just for illustration, don't actually want this used in any queries)
| nid | hash | timestamp |
| 2 | 123456 | 123364600 |
| 2 | 123456 | 123364601 |
| 2 | 1234567 | 123364602 |
Rows 1, and 3 would survive in this case.
You can phrase this as a left join:
delete nr from node_revision nr left join
(SELECT MIN(nid) as minnid
FROM node_revision
GROUP BY hash
) nrkeep
on nr.nid = nrkeep.minnid
where nrkeep.minnid is null;
You can also "trick" MySQL into using the subquery:
DELETE FROM node_revision
WHERE nid NOT IN (SELECT minnid
FROM (SELECT MIN(nid) as minnid FROM node_revision GROUP BY hash
) t
);
MySQL has a well-documented limitation on using the modified table in update and delete statements. This query gets around the limitation by actually materializing the list of minnids by using a subquery.
EDIT:
Based on the example now in the question, you should use timestamp as follows:
delete nr from node_revision nr left join
(SELECT hash, nid, min(timestamp) as mintimestamp
FROM node_revision
GROUP BY hash
) nrkeep
on nr.hash = nrkeep.hash and
nr.nid = nrkeep.nid and
nr.timestamp = nrkeep.mintimestamp
where nrkeep.minnid is null;

Combining two tables in a complex way

The situation:
I have the main table, lets call it MainTable.
+---------+----------+----------+----------+
| Id (PK)| Title | Text | Type |
+---------+----------+----------+----------+
| 1 | Some Text|More Stuff| A |
| 2 | Another | Example | B |
+---------+----------+----------+----------+
And I have a second table called TranslationsTable, in which the Id field is the representation of the MainTable row Id (no foreign key, as it can be refering to different tables), The ObjType is the ObjectType (same name as table), the FieldName is the name of the field from the ObjecType and the value has the translation value for the FieldName value in the ObjType table.
+---------+-----------+-----------+------------+----------+
| Id | ObjType | FieldName | Value | Language |
+---------+-----------+-----------+------------+----------+
| 1 | MainTable | Title | Algum Texto| PT |
| 1 | MainTable | Text | Mais Coisas| PT |
+---------+-----------+-----------+------------+----------+
And because I need to search in translated fields, I figured I could use a TEMPORARY TABLE to do so, but then came the problem of "Which SELECT query should I use?". I read some posts about pivot table queries, but I don't really know how can I build a query so my temp table is something like
+---------+------------+------------+----------+
| Id (PK)| Field_1 | Field_2 | Field_3 |
+---------+------------+------------+----------+
| 1 | Algum Texto| Mais Coisas| A |
+---------+------------+------------+----------+
Thank you.
EDIT:
I accepted AD7six answer because for 500.000 entries in the MainTable and 1.500.000 in the Translations it is roughly 30x times faster than the other one.
SELECT
orig.Id,
COALESCE(xlate.Field_1, orig.Field_1) AS Field_1,
COALESCE(xlate.Field_2, orig.Field_2) AS Field_2,
COALESCE(xlate.Field_3, orig.Field_3) AS Field_3
FROM MainTable orig
INNER JOIN (
SELECT
Id,Field_1,Field_2,Field_3
FROM TranslationsTable
PIVOT(MIN(Value) FOR FieldName IN (Field_1,Field_2,Field_3)) p
WHERE ObjType = 'MainTable'
) xlate ON (orig.Id = xlate.Id)
If you want to include the (untranslated) rows from MainTable that have no matches in TranslationsTable, change the INNER JOIN to LEFT OUTER JOIN
Another alternative is to perform the pivot manually:
SELECT
orig.Id,
COALESCE(xlate.Field_1, orig.Field_1) AS Field_1,
COALESCE(xlate.Field_2, orig.Field_2) AS Field_2,
COALESCE(xlate.Field_3, orig.Field_3) AS Field_3
FROM MainTable orig
INNER JOIN (
SELECT
Id,
MIN(CASE FieldName WHEN 'Field_1' THEN Value END) AS Field_1,
MIN(CASE FieldName WHEN 'Field_2' THEN Value END) AS Field_2,
MIN(CASE FieldName WHEN 'Field_3' THEN Value END) AS Field_3
FROM TranslationsTable
WHERE ObjType = 'MainTable'
GROUP BY Id
) xlate ON (orig.Id = xlate.Id)
With a change in the MainTable schema like others have suggested, you won't need the repetition for (Field_1,Field_2,Field_3). It makes the code easier to maintain and modify.
That's not complex
It's just a query with one join per translated field.
That means you query/sort/whatever it like any other e.g. (Using some real names so that it's easier to read):
SELECT
products.id,
COALESCE(product_name.value, products.name) as name,
COALESCE(product_description.value, products.description) as description
FROM
products
LEFT JOIN
TranslationsTable AS product_name
ON (
product_name.Language = 'PT' AND
product_name.ObjectType = 'products' AND
product_name.FieldName = 'name' AND
product_name.id = products.id
)
LEFT JOIN
TranslationsTable AS product_description
ON (
product_description.Language = 'PT' AND
product_description.ObjectType = 'products' AND
product_description.FieldName = 'description' AND
product_description.id = products.id
)
WHERE
product_name.value = "Algum Texto" // Find all products named "Algum Texto"
You don't need a temp table
But if you want to create one, it's easy to do using the query itself:
CREATE TABLE
products_pt
AS
SELECT
products.id,
COALESCE(product_name.value, products.name) as name,
COALESCE(product_description.value, products.description) as description
...
This will create a table (no indexes) matching the structure of the query. If your data does not change frequently it can make querying your multilingual data a lot easier to manage, but has some disadvantages such as (obviously) your translation-specific table will not be up to date if the source table data changes.