I have a nested query that deletes a row in table terms only if exactly one row in definitions.term_id is found. It works but it takes like 9 seconds on my system. Im looking to optimize the query.
DELETE FROM terms
WHERE id
IN(
SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP BY term_id
HAVING COUNT(term_id) = 1
)
The database is only about 4000 rows. If I separate the query into 2 independent queries, it takes about 0.1 each
terms
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term | varchar(50) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
definitions
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term_id | int(11) | YES | | NULL | |
| definition | varchar(500) | YES | | NULL | |
| example | varchar(500) | YES | | NULL | |
| submitter_name | varchar(50) | YES | | NULL | |
| approved | int(1) | YES | MUL | 0 | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| votos | int(3) | NO | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
To speed up the process, please consider creating an index on the relevant field:
CREATE INDEX term_id ON terms (term_id)
How about using correlated sub query using exists and try,
DELETE FROM terms t
WHERE id = 1234
AND EXISTS (SELECT 1
FROM definitions d
WHERE d.term_id = t.term_id
GROUP BY term_id
HAVING COUNT(term_id) = 1)
It's often quicker to create a new table retaining only the rows you wish to keep. That said, I'd probably write this as follows, and provide indexes as appropriate.
DELETE
FROM terms t
JOIN
( SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP
BY term_id
HAVING COUNT(*) = 1
) x
ON x.term_id = t.id
Hehe; this may be a kludgy way to do it:
DELETE ... WHERE id = ( SELECT ... )
but without any LIMIT or other constraints.
I'm depending on getting an error something like "subquery returned more than one row" in order to prevent the DELETE being performed if multiple rows match.
Related
I have a doubt with the following query.
update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
inner join waardeklasse_average b use index (waardeklasse)
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
where b.waardeklasse_new = a.waardeklasse;
I am trying to update a new column 'woz_val' in the 'src_woz_waardeklasse_2011' table using the 'average' values from the 'waardeklasse_average' table. I am joining using the 'waaderklasse' numbers in both the tables. But the 'src_woz_waardeklasse_2011' table is nearly 7 million records and the 'waardeklasee_average' table is 46 records. So the query is taking a really long time. 25 minutes and counting.
Is there a way to optimize it? I am sure it's taking a long time as I am trying to compare values between a large table and a small table. I have included the table structure of both the tables below.
src_woz_waardeklasse_2011
+----------------------+---------------------------+------+-----+---------+-
---------------+
| Field | Type | Null | Key | Default |
Extra |
+----------------------+---------------------------+------+-----+---------+-
---------------+
| id | int(11) unsigned | NO | PRI | NULL |
auto_increment |
| postcode | varchar(150) | YES | MUL | NULL |
|
| huisnummeraanduiding | varchar(150) | YES | | NULL |
|
| huisletter | varchar(150) | YES | | NULL |
|
soort_woonobject | varchar(150) | YES | | NULL |
|
| bouwjaar | varchar(150) | YES | | NULL |
|
| bouwjaarsklasse | varchar(150) | YES | | NULL |
|
| inhoud | varchar(150) | YES | | NULL |
|
| reg_oppervlak | varchar(150) | YES | | NULL |
|
| woz_value | int(15) unsigned zerofill | YES | UNI | NULL |
|
| reg_oppervlak_bn | varchar(150) | YES | | NULL |
|
| waardeklasse | int(10) | NO | PRI | NULL |
|
| waardepeildatum | varchar(150) | YES | | NULL |
|
| zipandnumber | varchar(150) | YES | | NULL |
|
+----------------------+---------------------------+------+-----+---------+-
---------------+
waardeklasse_average
+-------------------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+---------+------+-----+---------+-------+
| waardeklasse_average_id | int(11) | NO | | NULL | |
| waardeklasse_new | int(10) | NO | PRI | NULL | |
| lower | int(11) | NO | | NULL | |
| higher | int(11) | NO | | NULL | |
| average | int(11) | NO | PRI | NULL | |
+-------------------------+---------+------+-----+---------+-------+
update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
Left join waardeklasse_average b
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
where b.waardeklasse_new = a.waardeklasse;
Use the above query - in this you can use the left join and where condition to speed up the query.
Also don't use the index for the second table which has 46 record this will affect the performance - if the table has less records then index will create more burden in this case full table scan without index is lot better. Also if you have a cluster index in the first table(7 million) remove them and run the query, because for each and every update you perform default table structure And page alignment will be updated in the clustered index. Hope this helps..!
First, you don't need the where clause. Second, remove the index hints:
update src_woz_waardeklasse_2011 ww inner join
waardeklasse_average wa
on wa.waardeklasse_new = ww.waardeklasse
set ww.woz_value = wb.average ;
Then, try an index on src_woz_waardeklasse_2011(waardeklasse). This should improve the execution plan.
You might also check that the 46 records in the second table actually correspond to 46 updates. If the join conditions are wrong, you could be updating all the records.
EDIT:
Indexes have nothing to do with your problem. Updating 7 million records takes a long time. You might consider creating the data you want in a temporary table and then truncating the original table and inserting the new rows into it. Otherwise, batch the updates a few rows at a time.
update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
inner join waardeklasse_average b use index (waardeklasse)
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
Where condition is not needed. As you had already mention that condition in On clause.
Try above query.
I have the following (simplified) Mysql Tables:
Requests:
+----------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| UniqueIdentifier | varchar(255) | YES | MUL | NULL | |
| UniversalServiceId | bigint(20) | YES | MUL | NULL | |
+----------------------+--------------+------+-----+---------+-------+
Observations:
+---------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| Value | varchar(255) | NO | | NULL | |
| RequestId | bigint(20) | NO | MUL | NULL | |
+---------------------+--------------+------+-----+---------+-------+
I have indexed UniqueIdentifier, UniversalServiceId and RequestId.
The tables are queried on UniqueIdentifier and UniversalServiceId with a JOIN on RequestId.
The Observation table has many millions of records. The queries are painfully slow to return and I am wondering if there is anything that I can do to improve performance. I have just started reading about memcache but it seems that it may be useful only after the first query (which is often the only one) for a particular dataset.
This is they type of query that is being used:
select * from Observations where RequestId in (select ID from Requests where UniqueIdentifier = '123456' and UniversalServiceId = '1234'
Any advice / guidance appreciated!
I recommend you use a query using a JOIN operation, rather than an IN (subquery) predicate.
For example:
SELECT o.ID
, o.Value
, o.RequestId
FROM Observations o
JOIN Requests r
ON r.ID = o.RequestId
WHERE r.UniqueIdentifier = '123456'
AND r.UniversalServiceId = '1234'
For optimum performance, suitable indexes would be:
... ON Requests (UniversalServiceId, UniqueIdentifier, ID)
... ON Observations (RequestId, Value, ID)
(The choice of the leading column in the index on the Requests table would depend on the expected cardinality.)
I've got 3 tables: model, model_views, and model_views2. In an effort to have one column per row to hold aggregated views, I've done a migration to make the model look something like this, with a new column for the views:
+---------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | | NULL | |
| [...] | | | | | |
| views | int(20) | YES | | 0 | |
+---------------+---------------+------+-----+---------+----------------+
This is what the columns for model_views and model_views2 look like:
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | smallint(5) | NO | MUL | NULL | |
| model_id | smallint(5) | NO | MUL | NULL | |
| time | int(10) unsigned | NO | | NULL | |
| ip_address | varchar(16) | NO | MUL | NULL | |
+------------+------------------+------+-----+---------+----------------+
model_views and model_views2 are gargantuan, both totalling in the tens of millions of rows each. Each row is representative of one view, and this is a terrible mess for performance. So far, I've got this MySQL command to fetch a count of all the rows representing single views in both of these tables, sorted by model_id added up:
SELECT model_id, SUM(c) FROM (
SELECT model_views.model_id, COUNT(*) AS c FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c FROM model_views2
GROUP BY model_views2.model_id)
AS foo GROUP BY model_id
So that I get a nice big table with the following:
+----------+--------+
| model_id | SUM(c) |
+----------+--------+
| 1 | 1451 |
| [...] | |
+----------+--------+
What would be the safest route for pulling off commands from here on in to merge the values of SUM(c) into the column model.views, matched by the model.id to model_ids that I get out of the above SQL query? I want to only fill the rows for models that still exist - There is probably model_views referring to rows in the model table which have been deleted.
You can just use UPDATE with a JOIN on your subquery:
UPDATE model
JOIN (
SELECT model_views.model_id, COUNT(*) AS c
FROM model_views
GROUP BY model_views.model_id
UNION ALL
SELECT model_views2.model_id, COUNT(*) AS c
FROM model_views2
GROUP BY model_views2.model_id) toupdate ON model.id = toupdate.model_id
SET model.views = toupdate.c
I have four tables like this:
mysql> describe courses;
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| course_id | int(11) | NO | PRI | NULL | auto_increment |
| course_name | varchar(75) | YES | | NULL | |
| course_price_id | int(11) | YES | MUL | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
mysql> describe pricegroups;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| price_id | int(11) | NO | PRI | NULL | auto_increment |
| price_name | varchar(255) | YES | | NULL | |
| price_value | int(11) | YES | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
mysql> describe courseplans;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| plan_id | int(11) | NO | PRI | NULL | auto_increment |
| plan_name | varchar(255) | YES | | NULL | |
| plan_time | int(11) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
mysql> describe course_to_plan;
+-----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| course_id | int(11) | NO | PRI | NULL | |
| plan_id | int(11) | NO | PRI | NULL | |
+-----------+---------+------+-----+---------+-------+
Let me try to explain what I have and what I would like to do...
All my courses (course_id) has different steps (plan_id) wich has a value of 1 or more days (plan_time). A course has one or more steps (course_to_plan)A course is connected to a pricegroup (price_id).
I would like to query my MySQL database and get an output off:
The course_name, the plan_id's it has, and based on the value of price_id together with the value in the plan_time get a result who looks something like this:
+------------+--------------+------------+---------+
| course_name| pricegroup | plan_time | RESULT |
+------------+--------------+------------+---------+
| Math | Expensive | 7 | 3500 |
+------------+--------------+------------+---------+
I hope you understand me...
Is it even possible with the structure I have or should I "rebuild-and-redo-correct" something?
SELECT c.course_name, p.price_name, SUM(cp.plan_time), SUM(cp.plan_time * p.price_value)
FROM courses c
INNER JOIN pricegroups p ON p.price_id = c.course_price_id
INNER JOIN course_to_plan cpl ON cpl.course_id = c.course_id
INNER JOIN courseplans cp ON cp.plan_id = cpl.plan_id
GROUP BY c.course_name, p.price_name
Please note that it seems to me that your implementation might be erroneous. The way you want the data makes me think that you could be happier with a plan having a price, so you don't apply the same price for a plan which is "expensive" AND another plan which is "cheap", which is what you are doing at the moment. But I don't really know, this is intuitive :-)
Thanks for accepting the answer, regards.
Let me see if I understand what you need:
SELECT c.course_name, pg.price_name,
COUNT(cp.plan_time), SUM(pg.price_value * cp.plan_time) AS result
FROM courses c
INNER JOIN pricegroups pg ON c.course_price_id = pg.price_id
INNER JOIN course_to_plan ctp ON c.course_id = ctp.course_id
INNER JOIN courseplans cp ON ctp.plan_id = cp.plan_id
GROUP BY c.couse_name, pg.price_name
I have 2 tables:
mysql> describe solution_sections;
+---------------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------+----------------+
| solution_section_id | int(10) | NO | PRI | NULL | auto_increment |
| display_order | int(10) | NO | | NULL | |
| section_name | varchar(1000) | YES | | NULL | |
+---------------------+---------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> describe suggested_solution_comments;
+-----------------------+----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+----------------+------+-----+---------+----------------+
| comment_id | int(10) | NO | PRI | NULL | auto_increment |
| problem_id | int(10) | NO | | NULL | |
| suggested_solution_id | int(10) | NO | | NULL | |
| commenter_id | int(10) | NO | | NULL | |
| comment | varchar(10000) | YES | | NULL | |
| solution_part | int(3) | NO | | NULL | |
| date | date | NO | | NULL | |
+-----------------------+----------------+------+-----+---------+----------------+
What I am trying to do is to display the list of section_name from the solution_sections table. It only has about 10 rows in it. And for every section name, to get the list of suggested_solution_comments associated with it.
The tables are linked by suggested_solution_comments.solution_part and solution_sections.solution_section_id
Here is what I am trying so far:
select section_name , comment , solution_part , display_order from solution_sections
left join suggested_solution_comments on
solution_sections.solution_section_id = suggested_solution_comments.solution_part
where suggested_solution_id = 188
group by display_order;
But that returns nothing when there are no comments. But even if there are no comments, I'd like to still display the list of section_names from the solution_sections table.
Thanks!!
The problem is here:
where suggested_solution_id = 188
Your query requires the suggested_solution_id have a value of 188, which will never be true for records that have no comments. Try adding in this:
OR suggested_solution_id IS NULL
By using suggested_solution_id in the where clausule you are eliminating from the result any row that have no content no matching row in suggested_solution_comments table.
If you want to get results even when suggested_solution_comments has no content you can't use this field in the where clausule. or you have to consider the possibility that suggested_solution_id could be NULL.
EDITED to take in consideration the comment by #X-Zero
I think your table structure is not the best to do this. if you have just a primary key in each table, and want to perform on join on those, it would need to refer to the same thing.. Otherwise, introducing a foreign key able to join on a primary key in the other table.