UPDATE MySQL Query taking a long time to execute - ways to optimize - mysql

I have a doubt with the following query.
update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
inner join waardeklasse_average b use index (waardeklasse)
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
where b.waardeklasse_new = a.waardeklasse;
I am trying to update a new column 'woz_val' in the 'src_woz_waardeklasse_2011' table using the 'average' values from the 'waardeklasse_average' table. I am joining using the 'waaderklasse' numbers in both the tables. But the 'src_woz_waardeklasse_2011' table is nearly 7 million records and the 'waardeklasee_average' table is 46 records. So the query is taking a really long time. 25 minutes and counting.
Is there a way to optimize it? I am sure it's taking a long time as I am trying to compare values between a large table and a small table. I have included the table structure of both the tables below.
src_woz_waardeklasse_2011
+----------------------+---------------------------+------+-----+---------+-
---------------+
| Field | Type | Null | Key | Default |
Extra |
+----------------------+---------------------------+------+-----+---------+-
---------------+
| id | int(11) unsigned | NO | PRI | NULL |
auto_increment |
| postcode | varchar(150) | YES | MUL | NULL |
|
| huisnummeraanduiding | varchar(150) | YES | | NULL |
|
| huisletter | varchar(150) | YES | | NULL |
|
soort_woonobject | varchar(150) | YES | | NULL |
|
| bouwjaar | varchar(150) | YES | | NULL |
|
| bouwjaarsklasse | varchar(150) | YES | | NULL |
|
| inhoud | varchar(150) | YES | | NULL |
|
| reg_oppervlak | varchar(150) | YES | | NULL |
|
| woz_value | int(15) unsigned zerofill | YES | UNI | NULL |
|
| reg_oppervlak_bn | varchar(150) | YES | | NULL |
|
| waardeklasse | int(10) | NO | PRI | NULL |
|
| waardepeildatum | varchar(150) | YES | | NULL |
|
| zipandnumber | varchar(150) | YES | | NULL |
|
+----------------------+---------------------------+------+-----+---------+-
---------------+
waardeklasse_average
+-------------------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+---------+------+-----+---------+-------+
| waardeklasse_average_id | int(11) | NO | | NULL | |
| waardeklasse_new | int(10) | NO | PRI | NULL | |
| lower | int(11) | NO | | NULL | |
| higher | int(11) | NO | | NULL | |
| average | int(11) | NO | PRI | NULL | |
+-------------------------+---------+------+-----+---------+-------+

update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
Left join waardeklasse_average b
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
where b.waardeklasse_new = a.waardeklasse;
Use the above query - in this you can use the left join and where condition to speed up the query.
Also don't use the index for the second table which has 46 record this will affect the performance - if the table has less records then index will create more burden in this case full table scan without index is lot better. Also if you have a cluster index in the first table(7 million) remove them and run the query, because for each and every update you perform default table structure And page alignment will be updated in the clustered index. Hope this helps..!

First, you don't need the where clause. Second, remove the index hints:
update src_woz_waardeklasse_2011 ww inner join
waardeklasse_average wa
on wa.waardeklasse_new = ww.waardeklasse
set ww.woz_value = wb.average ;
Then, try an index on src_woz_waardeklasse_2011(waardeklasse). This should improve the execution plan.
You might also check that the 46 records in the second table actually correspond to 46 updates. If the join conditions are wrong, you could be updating all the records.
EDIT:
Indexes have nothing to do with your problem. Updating 7 million records takes a long time. You might consider creating the data you want in a temporary table and then truncating the original table and inserting the new rows into it. Otherwise, batch the updates a few rows at a time.

update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
inner join waardeklasse_average b use index (waardeklasse)
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
Where condition is not needed. As you had already mention that condition in On clause.
Try above query.

Related

SQL, delete only if exactly one row is found

I have a nested query that deletes a row in table terms only if exactly one row in definitions.term_id is found. It works but it takes like 9 seconds on my system. Im looking to optimize the query.
DELETE FROM terms
WHERE id
IN(
SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP BY term_id
HAVING COUNT(term_id) = 1
)
The database is only about 4000 rows. If I separate the query into 2 independent queries, it takes about 0.1 each
terms
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term | varchar(50) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
definitions
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term_id | int(11) | YES | | NULL | |
| definition | varchar(500) | YES | | NULL | |
| example | varchar(500) | YES | | NULL | |
| submitter_name | varchar(50) | YES | | NULL | |
| approved | int(1) | YES | MUL | 0 | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| votos | int(3) | NO | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
To speed up the process, please consider creating an index on the relevant field:
CREATE INDEX term_id ON terms (term_id)
How about using correlated sub query using exists and try,
DELETE FROM terms t
WHERE id = 1234
AND EXISTS (SELECT 1
FROM definitions d
WHERE d.term_id = t.term_id
GROUP BY term_id
HAVING COUNT(term_id) = 1)
It's often quicker to create a new table retaining only the rows you wish to keep. That said, I'd probably write this as follows, and provide indexes as appropriate.
DELETE
FROM terms t
JOIN
( SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP
BY term_id
HAVING COUNT(*) = 1
) x
ON x.term_id = t.id
Hehe; this may be a kludgy way to do it:
DELETE ... WHERE id = ( SELECT ... )
but without any LIMIT or other constraints.
I'm depending on getting an error something like "subquery returned more than one row" in order to prevent the DELETE being performed if multiple rows match.

Improve query performance in MySQL

I am posting this thread in order to have some advices regarding the performance of my SQL query.
I have actually 2 tables, one which called HGVS_SNP with about 44657169 rows and another on run table which has an average of 2000 rows.
When I try to update field Comment of my run table it takes lot's of time to perform the query. I was wondering if there is any method to increase my SQL query.
Structure of HGVS_SNP Table:
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| snp_id | int(11) | YES | MUL | NULL | |
| hgvs_name | text | YES | | NULL | |
| source | varchar(8) | NO | | NULL | |
| upd_time | varchar(32) | NO | | NULL | |
+-----------+-------------+------+-----+---------+-------+
My run table has the following structure:
+----------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+-------+
| ID | varchar(7) | YES | | NULL | |
| Reference | varchar(7) | YES | MUL | NULL | |
| HGVSvar2 | varchar(120) | YES | MUL | NULL | |
| Comment | varchar(120) | YES | | NULL | |
| Compute | varchar(20) | YES | | NULL | |
+----------------------+--------------+------+-----+---------+-------+
Here's my query:
UPDATE run
INNER JOIN SNP_HGVS
ON run.HGVSvar2=SNP_HGVS.hgvs_name
SET run.Comment=concat('rs',SNP_HGVS.snp_id) WHERE run.Compute not like 'tron'
I`m guessing since you JOIN a text column with a VARCHAR(120) column that you don`t really need a text column. Make it a VARCHAR so you can index it
ALTER TABLE `HGVS_SNP` modify hgvs_name VARCHAR(120);
ALTER TABLE `HGVS_SNP` ADD KEY idx_hgvs_name (hgvs_name);
This will take a while on large tables
Now your JOIN should be much faster,also add an index on compute column
ALTER TABLE `run` ADD KEY idx_compute (compute);
And the LIKE is unnecessary,change it to
WHERE run.Compute != 'tron'

Mysql paginate more than one table

I'm working on an ECommerce website, in which there are 2 database tables in MySQL, one is products and the other one is taxonomies, products and taxonomies are many to many relationship, and taxonomies have a tree structure, meaning there's a parent_id field in taxonomies table to identify the parent id of a taxonomy.
When user selects one taxonomy, I want to get all the products that belong to this taxonomy and all its offspring taxonomies, I did this by first finding out all the offspring taxonomies of the selected taxonomy, then get paginated products result from there, but in my site there are in total 5000 taxonomies, and my solution makes the site slow like a dog...... Any advice on how I could achieve this for the sake of performance?
products table:
+-------------------+----------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+----------------------+------+-----+---------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| code | bigint(20) | NO | UNI | NULL | |
| SKU | varchar(255) | NO | | NULL | |
| name | varchar(100) | NO | | NULL | |
| description | varchar(2000) | NO | | NULL | |
| short_description | varchar(200) | NO | | NULL | |
| price | decimal(8,2) | NO | | 0.00 | |
| discounted_price | decimal(8,2) | NO | | 0.00 | |
| stock | smallint(5) unsigned | NO | | 0 | |
| sales | smallint(5) unsigned | NO | | 0 | |
| num_reviews | smallint(6) | NO | | 0 | |
| weight | decimal(5,2) | NO | | 0.00 | |
| overall_rating | decimal(3,2) | NO | | 5.00 | |
| activity_id | int(10) unsigned | YES | MUL | NULL | |
| created_at | timestamp | NO | | 0000-00-00 00:00:00 | |
| updated_at | timestamp | NO | | 0000-00-00 00:00:00 | |
+-------------------+----------------------+------+-----+---------------------+----------------+
taxonomies table:
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(100) | YES | UNI | NULL | |
| parent_id | int(10) unsigned | YES | MUL | NULL | |
| num_products | smallint(6) | NO | | 0 | |
+--------------+------------------+------+-----+---------+----------------+
product_taxonomy table:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| product_id | int(10) unsigned | NO | MUL | NULL | |
| taxonomy_id | int(10) unsigned | NO | MUL | NULL | |
+-------------+------------------+------+-----+---------+----------------+
In case depth of single level one can use the following query
SELECT * FROM `product_taxonomy`
INNER JOIN (SELECT * FROM `taxonomies` WHERE `id` = 100 OR `parent_id` = 100) `taxonomies`
ON `product_taxonomy`.`taxonomy_id` = `taxonomies`.`id`
LEFT JOIN `products` ON `product_taxonomy`.`product_id` = `products`.`id`
You can add limit, offset to the above query for pagination.
100 in the above query represents the taxonomy id requested by the user.
Apart from this I would suggest :-
1) id in your product table to renamed if possible to product_id as referenced in your product_taxonomy and I presume in other tables, similarly taxonomy_id.
This way when you join query column name would be the same.
2) I hope product_taxonomy.product_id, product_taxonomy.taxonomy_id are indexed for faster querying.
Update:
What you had mentioned in the comment below is a hierarchical data problem and not what relational database ideally intended for.
Solution 1
IF you know for sure that you will have only 4 levels / generation then you can do 4 join queries.
I can elaborate on this if you need to.
Solution 2
If you are not too deep or committed to the architecture of this project I would recommend restructuring it such a way, where recursion is taken care of by the server side scripting. i.e You change your CMS/taxonomy management in such a way that whenever you add/remove/modify taxonomy the script will update a table called taxonomy_childs with all possible offspring for a given category so that you have a flat data at your disposal when you need it.
Personally I would prefer this. I always like my database to match my business logic requirement.
I can elaborate on this if you need to.
Solution 3
As mentioned earlier hierarchical data is not a strong point of a relational database. Having said that you can implement something called as Nested Set Model.
Please read more at http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
You would need to add 3 columns to your taxonomy table :- level_depth, lft, rht.
Please let me know which solution would you want me to elaborate.

Spliting database according to user's ID

I have a database of 5m rows and it grows and it's getting harder and harder to do operations with it.
Is it a good idea to split the table in 10 tables (v0_table, v1_table... v9_table), where the number(v*) is the first number of the user's id?
The user's id in my case are not auto-increment so it would sort the data evenly across those 10 tables.
The problem is I have never done similar things....
Can anyone spot any disadvantages?
EDIT:
I would appreciate any help with tuning the structure or the query.
So the slowest query is the following one:
SELECT logos.user,
logos.date,
logos.level,
logos.title,
Count(guesses.id),
Sum(guesses.points)
FROM logos
LEFT JOIN guesses
ON guesses.user = '".$user['uid']."'
AND guesses.done = '1'
AND guesses.logo = logos.id
WHERE open = '1'
GROUP BY level
Where guesses table:
+--------+------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| logo | int(11) | NO | MUL | NULL | |
| user | int(11) | NO | MUL | NULL | |
| date | timestamp | NO | | CURRENT_TIMESTAMP | |
| points | int(4) | YES | MUL | 100 | |
| done | tinyint(1) | NO | MUL | 0 | |
+--------+------------+------+-----+-------------------+----------------+
LOGOS table:
+-------+--------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| img | varchar(222) | NO | MUL | NULL | |
| level | int(3) | NO | MUL | NULL | |
| date | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| user | int(11) | NO | MUL | NULL | |
| open | tinyint(1) | NO | MUL | 0 | |
+-------+--------------+------+-----+-------------------+----------------+
EXPLAIN:
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
| 1 | SIMPLE | logos | ref | open | open | 1 | const | 521 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | guesses | ref | done,user,logo | user | 4 | const | 87 | |
+----+-------------+---------+------+----------------+------+---------+-------+------+----------------------------------------------+
Your problem isn't that you have too much data, it's that this data is not properly indexed. Try adding an index:
CREATE INDEX open_level ON logos(open, level)
This should eliminate Using temporary; Using filesort on logos.
Basically, you need an index on this table for this query to cover two things: open - for WHERE open = '1' and level - for GROUP BY level in this order, as MySQL will first filter by open, then will group the results by level (implicitly sorting by it in process).
Short and sweet: No. This is never a good idea. Is your table properly indexed? Is MySQL properly tuned? Are your queries efficient? Are you using any caching?
Instead of sharding your table, you may want to examine other tables in your database to see if they can be split off into other dbs. For example tables, that are never joined to are great candidates for this type of vertical partitioning.
This allows you to optimize hardware for smaller sets of data.

How to join tables so every item in one of them shows up independent of the joined table

I have 2 tables:
mysql> describe solution_sections;
+---------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------+----------------+
| solution_section_id | int(10) | NO | PRI | NULL | auto_increment |
| display_order | int(10) | NO | | NULL | |
| section_name | varchar(1000) | YES | | NULL | |
+---------------------+---------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> describe suggested_solution_comments;
+-----------------------+----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+----------------+------+-----+---------+----------------+
| comment_id | int(10) | NO | PRI | NULL | auto_increment |
| problem_id | int(10) | NO | | NULL | |
| suggested_solution_id | int(10) | NO | | NULL | |
| commenter_id | int(10) | NO | | NULL | |
| comment | varchar(10000) | YES | | NULL | |
| solution_part | int(3) | NO | | NULL | |
| date | date | NO | | NULL | |
+-----------------------+----------------+------+-----+---------+----------------+
What I am trying to do is to display the list of section_name from the solution_sections table. It only has about 10 rows in it. And for every section name, to get the list of suggested_solution_comments associated with it.
The tables are linked by suggested_solution_comments.solution_part and solution_sections.solution_section_id
Here is what I am trying so far:
select section_name , comment , solution_part , display_order from solution_sections
left join suggested_solution_comments on
solution_sections.solution_section_id = suggested_solution_comments.solution_part
where suggested_solution_id = 188
group by display_order;
But that returns nothing when there are no comments. But even if there are no comments, I'd like to still display the list of section_names from the solution_sections table.
Thanks!!
The problem is here:
where suggested_solution_id = 188
Your query requires the suggested_solution_id have a value of 188, which will never be true for records that have no comments. Try adding in this:
OR suggested_solution_id IS NULL
By using suggested_solution_id in the where clausule you are eliminating from the result any row that have no content no matching row in suggested_solution_comments table.
If you want to get results even when suggested_solution_comments has no content you can't use this field in the where clausule. or you have to consider the possibility that suggested_solution_id could be NULL.
EDITED to take in consideration the comment by #X-Zero
I think your table structure is not the best to do this. if you have just a primary key in each table, and want to perform on join on those, it would need to refer to the same thing.. Otherwise, introducing a foreign key able to join on a primary key in the other table.