Updating 10 million rows while having a check with another table

Updating 10 million rows while having a check with another table - mysql

There are 12 million rows in table A and 10 million in table B.
Now both these table have a common field, say user_id.
Now I've added a column in table A to add the primary key of B.
So the Tables are something like this
Table A
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
| b_id | int (11) | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
Table B
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
Now I want to update the b_id in table A. In order to do that, I had written the following query:
update A
set A.b_id = (select B.id from B
and A.user_id = B.user_id
);
But even after indexing it and doing it in a chunks of 100K, its taking a really long time(around 3 min each).
Is there a better and faster way to update it?

update A
SET A.b_id = B.id
FROM Table A
INNER JOIN TABLE B ON A.user_id = B.user_id
Make sure you have indexes setup on the user_id columns.
Can't comment yet: JOINS are historically faster in Mysql as long as there isn't duplicated data.

Related

SQL, delete only if exactly one row is found

I have a nested query that deletes a row in table terms only if exactly one row in definitions.term_id is found. It works but it takes like 9 seconds on my system. Im looking to optimize the query.
DELETE FROM terms
WHERE id
IN(
SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP BY term_id
HAVING COUNT(term_id) = 1
)
The database is only about 4000 rows. If I separate the query into 2 independent queries, it takes about 0.1 each
terms
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term | varchar(50) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
definitions
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term_id | int(11) | YES | | NULL | |
| definition | varchar(500) | YES | | NULL | |
| example | varchar(500) | YES | | NULL | |
| submitter_name | varchar(50) | YES | | NULL | |
| approved | int(1) | YES | MUL | 0 | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| votos | int(3) | NO | | NULL | |
+----------------+------------------+------+-----+---------+----------------+

To speed up the process, please consider creating an index on the relevant field:
CREATE INDEX term_id ON terms (term_id)

How about using correlated sub query using exists and try,
DELETE FROM terms t
WHERE id = 1234
AND EXISTS (SELECT 1
FROM definitions d
WHERE d.term_id = t.term_id
GROUP BY term_id
HAVING COUNT(term_id) = 1)

It's often quicker to create a new table retaining only the rows you wish to keep. That said, I'd probably write this as follows, and provide indexes as appropriate.
DELETE
FROM terms t
JOIN
( SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP
BY term_id
HAVING COUNT(*) = 1
) x
ON x.term_id = t.id

Hehe; this may be a kludgy way to do it:
DELETE ... WHERE id = ( SELECT ... )
but without any LIMIT or other constraints.
I'm depending on getting an error something like "subquery returned more than one row" in order to prevent the DELETE being performed if multiple rows match.

MySQL - UPDATE one column based on results of a SELECT when the SELECT returns multiple columns

I've read MySQL - UPDATE query based on SELECT Query and am trying to do something similar - i.e. run an UPDATE query on a table and populate it with the results from a SELECT.
In my case the table I want to update is called substances and has a column called cas_html which is supposed to store CAS Numbers (chemical codes) as a HTML string.
Due to the structure of the database I am running the following query which will give me a result set of the substance ID and name (substances.id, substances.name) and the CAS as a HTML string (cas_values which comes from cas.value):
SELECT s.`id`, GROUP_CONCAT(c.`value` ORDER BY c.`id` SEPARATOR '<br>') cas_values, GROUP_CONCAT(s.`name` ORDER BY s.`id`) substance_name FROM substances s LEFT JOIN cas_substances cs ON s.id = cs.substance_id LEFT JOIN cas c ON cs.cas_id = c.id GROUP BY s.id;
Sample output:
id | cas_values | substance_name
----------------------------------------
1 | 133-24<br> | Chemical A
455-213<br>
21-234
-----|----------------|-----------------
2 999-23 | Chemical B
-----|----------------|-----------------
3 | | Chemical C
-----|----------------|-----------------
As you can see the cas_values column contains the HTML string (which may also be an empty string as in the case of "Chemical C"). I want to write the data in the cas_values column into substances.cas_html. However I can't piece together how to do this because other posts I'm reading get the data for the UPDATE in one column - I have other columns returned by my SELECT query.
Essentially the problem is that in my "sample output" table above I have 3 columns being returned. Other SO posts seem to have just 1 column being returned which is the actual values that are used in the UPDATE query (in this case on the substances table).
Is this possible?
I am using MySQL 5.5.56-MariaDB
These are the structures of the tables, if this helps:
mysql> DESCRIBE substances;
+-------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| app_id | varchar(8) | NO | UNI | NULL | |
| name | varchar(1500) | NO | | NULL | |
| date | date | NO | | NULL | |
| cas_html | text | YES | | NULL | |
+-------------+-----------------------+------+-----+---------+----------------+
4 rows in set (0.01 sec)
mysql> DESCRIBE cas;
+-------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------------------+------+-----+---------+----------------+
| id | mediumint(8) unsigned | NO | PRI | NULL | auto_increment |
| value | varchar(13) | NO | UNI | NULL | |
+-------+-----------------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
mysql> DESCRIBE cas_substances;
+--------------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-----------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| cas_id | mediumint(8) unsigned | NO | MUL | NULL | |
| substance_id | mediumint(8) unsigned | NO | MUL | NULL | |
+--------------+-----------------------+------+-----+---------+----------------+
3 rows in set (0.02 sec)

Try something like this :
UPDATE substances AS s,
(
SELECT s.`id`,
GROUP_CONCAT(c.`value` ORDER BY c.`id` SEPARATOR '<br>') cas_values,
GROUP_CONCAT(s.`name` ORDER BY s.`id`) substance_name
FROM substances s
LEFT JOIN cas_substances cs ON s.id = cs.substance_id
LEFT JOIN cas c ON cs.cas_id = c.id
GROUP BY s.id
) AS t
SET s.cas_html=t.cas_values
WHERE s.id = t.id
If you don't want to modify all the value, the best way to limit the update to test it, is to add a condition in the where, something like that :
...
WHERE s.id = t.id AND s.id = 1

SQL join not working (or very slow)

I have the following tables in mysql:
Table A:
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| sid | varchar(50) | YES | | NULL | |
| type | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
Table B:
+---------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| channel | varchar(20) | YES | | NULL | |
| sid | varchar(50) | YES | | NULL | |
| type | varchar(20) | YES | | NULL | |
+---------+-------------+------+-----+---------+-------+
I want to find the rows from A that have an entry in B with the same sid. I tried the following Join command:
SELECT A.sid FROM A join B on A.sid=B.sid;
This query never gives me the answer.
Tabe A has 465420 entries and table B has 291326 entries.
Why does it not work?
Are there too many entries?
Or does it have anything to do with the fact that I have no primary keys assigned?

Your query is fine. You would appear to need an index. I would suggest B(sid).
You can also write the query as:
select a.sid
from a
where exists (select 1 from b where a.sid = b.sid);
This will not affect performance -- unless there are lots of duplicates in b -- but it will eliminate issues caused by duplicates in b.

Try
SELECT A1.sid
FROM (select A.sid from A order by sid) A1
join (select B.sid from B order by sid) B1
on A1.sid=B1.sid;
Else above holds true. You need index.

MySQL merge results into table from count of 2 other tables, matching ids

I've got 3 tables: model, model_views, and model_views2. In an effort to have one column per row to hold aggregated views, I've done a migration to make the model look something like this, with a new column for the views:
+---------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
| [...] | | | | | |
| views | int(20) | YES | | 0 | |
+---------------+---------------+------+-----+---------+----------------+
This is what the columns for model_views and model_views2 look like:
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | smallint(5) | NO | MUL | NULL | |
| model_id | smallint(5) | NO | MUL | NULL | |
| time | int(10) unsigned | NO | | NULL | |
| ip_address | varchar(16) | NO | MUL | NULL | |
+------------+------------------+------+-----+---------+----------------+
model_views and model_views2 are gargantuan, both totalling in the tens of millions of rows each. Each row is representative of one view, and this is a terrible mess for performance. So far, I've got this MySQL command to fetch a count of all the rows representing single views in both of these tables, sorted by model_id added up:
SELECT model_id, SUM(c) FROM (
SELECT model_views.model_id, COUNT(*) AS c FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c FROM model_views2
GROUP BY model_views2.model_id)
AS foo GROUP BY model_id
So that I get a nice big table with the following:
+----------+--------+
| model_id | SUM(c) |
+----------+--------+
| 1 | 1451 |
| [...] | |
+----------+--------+
What would be the safest route for pulling off commands from here on in to merge the values of SUM(c) into the column model.views, matched by the model.id to model_ids that I get out of the above SQL query? I want to only fill the rows for models that still exist - There is probably model_views referring to rows in the model table which have been deleted.

You can just use UPDATE with a JOIN on your subquery:
UPDATE model
JOIN (
SELECT model_views.model_id, COUNT(*) AS c
FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c
FROM model_views2
GROUP BY model_views2.model_id) toupdate ON model.id = toupdate.model_id
SET model.views = toupdate.c

How to join tables so every item in one of them shows up independent of the joined table

I have 2 tables:
mysql> describe solution_sections;
+---------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------+----------------+
| solution_section_id | int(10) | NO | PRI | NULL | auto_increment |
| display_order | int(10) | NO | | NULL | |
| section_name | varchar(1000) | YES | | NULL | |
+---------------------+---------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> describe suggested_solution_comments;
+-----------------------+----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+----------------+------+-----+---------+----------------+
| comment_id | int(10) | NO | PRI | NULL | auto_increment |
| problem_id | int(10) | NO | | NULL | |
| suggested_solution_id | int(10) | NO | | NULL | |
| commenter_id | int(10) | NO | | NULL | |
| comment | varchar(10000) | YES | | NULL | |
| solution_part | int(3) | NO | | NULL | |
| date | date | NO | | NULL | |
+-----------------------+----------------+------+-----+---------+----------------+
What I am trying to do is to display the list of section_name from the solution_sections table. It only has about 10 rows in it. And for every section name, to get the list of suggested_solution_comments associated with it.
The tables are linked by suggested_solution_comments.solution_part and solution_sections.solution_section_id
Here is what I am trying so far:
select section_name , comment , solution_part , display_order from solution_sections
left join suggested_solution_comments on
solution_sections.solution_section_id = suggested_solution_comments.solution_part
where suggested_solution_id = 188
group by display_order;
But that returns nothing when there are no comments. But even if there are no comments, I'd like to still display the list of section_names from the solution_sections table.
Thanks!!

The problem is here:
where suggested_solution_id = 188
Your query requires the suggested_solution_id have a value of 188, which will never be true for records that have no comments. Try adding in this:
OR suggested_solution_id IS NULL

By using suggested_solution_id in the where clausule you are eliminating from the result any row that have no content no matching row in suggested_solution_comments table.
If you want to get results even when suggested_solution_comments has no content you can't use this field in the where clausule. or you have to consider the possibility that suggested_solution_id could be NULL.
EDITED to take in consideration the comment by #X-Zero

I think your table structure is not the best to do this. if you have just a primary key in each table, and want to perform on join on those, it would need to refer to the same thing.. Otherwise, introducing a foreign key able to join on a primary key in the other table.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Updating 10 million rows while having a check with another table - mysql

update A SET A.b_id = B.id FROM Table A INNER JOIN TABLE B ON A.user_id = B.user_id Make sure you have indexes setup on the user_id columns. Can't comment yet: JOINS are historically faster in Mysql as long as there isn't duplicated data.

Related

SQL, delete only if exactly one row is found

MySQL - UPDATE one column based on results of a SELECT when the SELECT returns multiple columns

SQL join not working (or very slow)

MySQL merge results into table from count of 2 other tables, matching ids

How to join tables so every item in one of them shows up independent of the joined table

Categories

Resources