MYSQL update from another table with multiple entries - mysql

I have seen a bunch of helpful answers about updating table values from a different table with multiple values based on a timestamp using a MAX() subquery.
e.g. Update another table based on latest record
I was wondering how this compares with doing an ALTER first and relying on the order in the table to simplify the UPDATE. Something like this:
ALTER TABLE `table_with_multiple_data` ORDER BY `timestamp` DESC;
UPDATE `table_with_single_data` as `t1`
LEFT JOIN `table_with_multiple_data` AS `t2`
ON `t1`.`id`=`t2`.`t1id`
SET `t1`.`value` = `t2`.`value`;
(Apologies for the pseudocode but I hope you get what I'm asking)
Both achieve the same for me but don't really have a big enough data set to see any difference in speed.
Thanks!!

You would normally use a correlated subquery:
UPDATE table_with_single_data t1
SET t1.value = (select t2.value
from table_with_multiple_data t2
where t2.t1id = t1.id
order by t2.timestamp desc
limit 1
);
If your method happens to work, that is just happenstance. Even if MySQL respected the ordering of tables, such ordering would not survive the join operation. Not to mention the fact that there is no guarantee on *which * value is assigned when there is multiple matching rows.

Related

Mysql: How reliable are values provided when rows are grouped?

I think this is a relatively advanced question and I may have trouble asking it well. so apologies in advance for any babbling.
I love Mysql's grouping functions. MIN(), MAX(), etc. make it easy to group rows by a certain common factor, then fetch salient features of each pool of grouped rows. But the question I'm asking relates to cases where I do not want this behavior to happen; rather, in a particular situation, I want to ensure that when I group a set of (let's say 10) rows into a single row, for any values that vary from row to row, all values displayed in the resultant grouped row were derived from the same pre-grouped row. My question: is this possible? are there potholes I should look out for?
Let me share a bit of this query's structure. At core, it has a "parent" table (here t1) joined to a "child" table (here t2). The query results, prior to any grouping or sorting, may list the same t1 record multiple times, associated with different t2 records and values. I want the final output to be grouped such that each t1 record only appears once, and that the t2 values displayed in each row reflect the t2 record that had the highest priority (among all t2 records associated with that t1 record). See my dumbed-down query below for example.
Based on my experimentation, it seems that nested queries should be able to do this, where I ORDER first, then GROUP later. The GROUP operation seems to reliably preserve the values from the first row it came across, meaning that if I ORDER then GROUP, I should have reasonable control over which values are included in the grouped output.
Here's an example of the query structure I'm planning. My question: Am I missing anything? Have you experienced GROUP to behave in ways that might make this a bad plan for me? Can you think of a simpler way to achieve what I'm describing?
Thanks in advance!
SELECT * FROM (
SELECT
# Each record from t1 may only appear once in the final output.
t1.id, t1.field2, t1.field3, t1.field4,
# there are multiple t2 records (each having different values & priority)
# associated with each t1 record.
t2.id AS t2_id, t2.field5, t2.field6, t2.priority
FROM t1
JOIN t2 ON t1.id = t2.t1_id
{ several other joins }
WHERE { lots of conditions }
ORDER BY t2.priority ) t
GROUP BY t.priority
It's unreliable at all. DBMS does not specify a row which will be returned in described case. To say more, it's only MySQL feature, in normal SQL this will be invalid - to mix non-group columns and group functions. Further explanations about this behavior can be found in this manual page:
However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause. Sorting of the result set
occurs after values have been chosen, and ORDER BY does not affect
which values within each group the server chooses.
There's another way to get the right result that would work in any DBMS. Taking your original query, it would look something like this.
SELECT
t1.id, t1.field2, t1.field3, t1.field4,
t2.id AS t2_id, t2.field5, t2.field6, t2.priority
FROM t1
JOIN t2 ON t1.id = t2.t1_id AND t2.priority =
(Select Max(t2b.priority) From t2 AS t2b Where t1.id = t2b.t1_id)
{ several other joins }
WHERE { lots of conditions }
(I assumed there's only one row in t2 by (t1.id, priority) )
Hope it helps!

Is this query well written? I am fairly new at this and am wondering if there is a better way to write it

UPDATE table1
INNER JOIN table2
ON table1.var1=table2.var1
SET table1.var2=table2.var2
My table has about 975,000 rows in it and I know this will take a while no matter what. Is there any better way to write this?
Thanks!
If the standard case is that table1.Var2 already is equal to table2.var2, you may end up with an inflated write count as the database may still update all those rows with no functional change in value.
You may get better performance by updating only those rows which have a different value than the one you desire.
UPDATE table1
INNER JOIN table2
ON table1.var1=table2.var1
SET table1.var2=table2.var2
WHERE (table1.var2 is null and table2.var2 is not null OR
table1.var2 is not null and table2.var2 is null OR
table1.var2 <> table2.var2)
Edit: Nevermind... MySQL only updates on actual changes, unlike some other RDBMS's (MS SQL, for example.)
Your query:
UPDATE table1 INNER JOIN
table2
ON table1.var1 = table2.var1
SET table1.var2 = table2.var2;
A priori, this looks fine. The major issue that I can see would be a 1-many relationship from table1 to table2. In that case, multiple rows from table2 might match a given row from table1. MySQL assigns an arbitrary value in such a case.
You could fix this by choosing one value, such as the min():
UPDATE table1 INNER JOIN
(select var1, min(var2) as var2
from table2
group by var1
) t2
ON table1.var1 = t2.var1
SET table1.var2 = t2.var2;
For performance reasons, you should have an index on table2(var1, var2). By including both columns in the index, the query will be able to use the index only and not have to fetch rows directly from the table.

Use ORDER BY 'x' with a JOIN, but keep rows that don't have a value for 'x'

This is simplified version of a relatively complex problem that myself and my colleagues can't quite get our heads around.
Consider two tables, table_a and table_b. In our CMS table_a holds metadata for all the data stored in the database, and table_b has some more specific information, so for simplicity's sake, a title and date column.
At the moment our query looks like:
SELECT *
FROM `table_a` LEFT OUTER JOIN `table_b` ON (table_a.id = table_b.id)
WHERE table_a.col = 'value'
ORDER BY table_b.date ASC
LIMIT 0,20
This degrades badly when table_a has a large amount of rows. If the JOIN is changed RIGHT OUTER JOIN (which triggers MySQL to use the INDEX set on table_b.date), the query is infinitely quicker, but it doesn't produce the same results (because if table_b.date doesn't have a value, it is ignored).
This becomes an issue in our CMS because if the user sorts on the date column, any rows that don't have a date set yet disappear from the interface, creating a confusing UI experience and makes it difficult to add dates for the rows that missing them.
Is there a solution that will:
Use table_b.date's INDEX so that
the query will scale better
Somehow retain those rows in
table_b that don't have a date
set so that a user can enter the
data
I'm going to second ArtoAle's comment. since the order by applies to a null value in the outer join for missing rows in table_b, those rows will be out of order anyway.
The simulated outer join is the ugly part, so lets look at that first. Mysql doesn't have except, so you need to write the query in terms of exists.
SELECT table_a.col1, table_a.col2, table_a.col3, ... NULL as table_b_col1, NULL as ...
FROM
table_a
WHERE
NOT EXISTS (SELECT 1 FROM table_a INNER JOIN table_b ON table_a.id = table_b.id);
Which should be UNION ALLed with the original query as an inner join. The UNION_ALL is needed to preserve the original order.
This sort of query is probably going to be dog-slow no matter what you do, because there won't be an index that readily supports a "Foreign Key not present" sort of query. This basically boils down to an index scan in table_a.id with a lookup (Or maybe a parallel scan) for the corresponding row in table_b.id.
So we ended up implemented a different solution that while the results were not as good as using an INDEX, it still provided a nice speed boost of around 25%.
We remove the JOIN and instead used an ORDER BY subquery:
SELECT *
FROM `table_a`
WHERE table_a.col = 'value'
ORDER BY (
SELECT date
FROM table_b
WHERE id = table_a.id
) ASC
LIMIT 0,20

MYSQL: subquery into a table updated in the main query

I'd like do something like this:
In table TAGS find a row with name='someName', and remeber it's id
In the same table find another row with someCondition and set in this row col refference=the id from above
Tried to do this using a subquery, but mysql refused saying I can't subquery a table that I'm updating in the main query.
How can I otherwise implement the above idea?
Thank you
Convert your subquery to a join and then UPDATE:
You can also perform UPDATE operations covering multiple tables. However, you cannot use ORDER BY or LIMIT with a multiple-table UPDATE. The table_references clause lists the tables involved in the join. Its syntax is described in Section 12.2.8.1, “JOIN Syntax”. Here is an example:
UPDATE items,month SET items.price=month.price
WHERE items.id=month.id;
The preceding example shows an inner join that uses the comma operator, but multiple-table
UPDATE statements can use any type of join permitted in SELECT statements, such as LEFT JOIN.
you can do this
update TAGS set
reference =
(select my_id from
(select id as my_id from TAGS where name='someName')
as SUB_TAGS)
where someCondition;
Not advisable though.
Edit#1 You can avoid the sub-queries altogether -- as taspeotis rightly mentioned, by joining the same table with the criteria. Here goes the code for that:
UPDATE
TAGS t1, TAGS t2
SET
t1.reference = t2.id
WHERE
t2.name = 'someName'
AND
t1.someField = someCondition;
This is a better approach.

How do I update MySQL table, using subselect

how can i write the query, to update the table videos, and set the value of field name to 'something' where the average is max(), or UPDATE the table, where average has the second value by size!!!
i think the query must look like this!!!
UPDATE videos
SET name = 'something'
WHERE average IN (SELECT `average`
FROM `videos`
ORDER BY `average` DESC
LIMIT 1)
but it doesn't work!!!
Two things here cause problems with my version of mysql (5.0.84)
1. Using limit not supported in subquery
2. Using table for update (videos) in subquery
I can't think of a good way to get around these problems. I'd suggest pulling the ids of hte rows you want to update out into your code and then executing the update in a second statement. If you are using pure sql and doing this by hand then you could always just select into a temp table and then update based on the ids that you insert there.
UPDATE videos
SET name = 'something'
WHERE videos.id IN (SELECT id
FROM `videos`
ORDER BY `average` DESC
LIMIT 1)