MySQL Performance and Memcache - mysql

I have the following (simplified) Mysql Tables:
Requests:
+----------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| UniqueIdentifier | varchar(255) | YES | MUL | NULL | |
| UniversalServiceId | bigint(20) | YES | MUL | NULL | |
+----------------------+--------------+------+-----+---------+-------+
Observations:
+---------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| Value | varchar(255) | NO | | NULL | |
| RequestId | bigint(20) | NO | MUL | NULL | |
+---------------------+--------------+------+-----+---------+-------+
I have indexed UniqueIdentifier, UniversalServiceId and RequestId.
The tables are queried on UniqueIdentifier and UniversalServiceId with a JOIN on RequestId.
The Observation table has many millions of records. The queries are painfully slow to return and I am wondering if there is anything that I can do to improve performance. I have just started reading about memcache but it seems that it may be useful only after the first query (which is often the only one) for a particular dataset.
This is they type of query that is being used:
select * from Observations where RequestId in (select ID from Requests where UniqueIdentifier = '123456' and UniversalServiceId = '1234'
Any advice / guidance appreciated!

I recommend you use a query using a JOIN operation, rather than an IN (subquery) predicate.
For example:
SELECT o.ID
, o.Value
, o.RequestId
FROM Observations o
JOIN Requests r
ON r.ID = o.RequestId
WHERE r.UniqueIdentifier = '123456'
AND r.UniversalServiceId = '1234'
For optimum performance, suitable indexes would be:
... ON Requests (UniversalServiceId, UniqueIdentifier, ID)
... ON Observations (RequestId, Value, ID)
(The choice of the leading column in the index on the Requests table would depend on the expected cardinality.)

Related

SQL, delete only if exactly one row is found

I have a nested query that deletes a row in table terms only if exactly one row in definitions.term_id is found. It works but it takes like 9 seconds on my system. Im looking to optimize the query.
DELETE FROM terms
WHERE id
IN(
SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP BY term_id
HAVING COUNT(term_id) = 1
)
The database is only about 4000 rows. If I separate the query into 2 independent queries, it takes about 0.1 each
terms
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term | varchar(50) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
definitions
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term_id | int(11) | YES | | NULL | |
| definition | varchar(500) | YES | | NULL | |
| example | varchar(500) | YES | | NULL | |
| submitter_name | varchar(50) | YES | | NULL | |
| approved | int(1) | YES | MUL | 0 | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| votos | int(3) | NO | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
To speed up the process, please consider creating an index on the relevant field:
CREATE INDEX term_id ON terms (term_id)
How about using correlated sub query using exists and try,
DELETE FROM terms t
WHERE id = 1234
AND EXISTS (SELECT 1
FROM definitions d
WHERE d.term_id = t.term_id
GROUP BY term_id
HAVING COUNT(term_id) = 1)
It's often quicker to create a new table retaining only the rows you wish to keep. That said, I'd probably write this as follows, and provide indexes as appropriate.
DELETE
FROM terms t
JOIN
( SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP
BY term_id
HAVING COUNT(*) = 1
) x
ON x.term_id = t.id
Hehe; this may be a kludgy way to do it:
DELETE ... WHERE id = ( SELECT ... )
but without any LIMIT or other constraints.
I'm depending on getting an error something like "subquery returned more than one row" in order to prevent the DELETE being performed if multiple rows match.

Optimising query - user subscriptions in 1 database, 3-level data in another. Finding top level where user has subscriptions

I have an application which stores a hierarchical list of filters which a user can subscribe/unsubscribe from.
There are 2 databases involved:
db1: Stores the hierarchical list
db2: Stores the user's subscription preferences
Both databases are on the same server.
The hierarchical list (in db1) is composed of 3 tables as follows:
mysql> DESCRIBE regulations;
+-------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------------+------+-----+---------+----------------+
| id | tinyint(3) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
+-------------------+---------------------+------+-----+---------+----------------+
mysql> DESCRIBE groups;
+---------------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------------+------+-----+---------+----------------+
| id | int(4) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| regulation_id | int(4) unsigned | NO | MUL | NULL | |
+---------------+-----------------+------+-----+---------+----------------+
mysql> DESCRIBE filters;
+----------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+----------------+
| id | smallint(5) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| group_id | int(4) unsigned | NO | MUL | NULL | |
+----------+----------------------+------+-----+---------+----------------+
So the hierarchy is:
regulations
groups (foreign key: regulation_id)
filters (foreign key: group_id)
The user is subscribed to 1 or more filters.id. These are stored in a separate database (database name: db2) where the f_id field corresponds to filters.id. The table structure is as follows:
mysql> DESCRIBE tbl_alerts;
+--------+-----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+-----------------------+------+-----+---------+----------------+
| tbl_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| u_id | mediumint(8) unsigned | NO | | NULL | |
| f_id | smallint(5) unsigned | NO | | NULL | |
+--------+-----------------------+------+-----+---------+----------------+
I need to know which regulations.name the user has subscriptions for.
The way in which I've done this is to select all of the f_id in tbl_alerts (assuming a user ID of 123, represented by u_id), e.g.
SELECT f_id FROM tbl_alerts WHERE u_id = 123;
And then use this in an IN condition as follows:
SELECT
DISTINCT(db1.regulations.`name`)
FROM
db1.groups
JOIN db1.regulations
ON db1.groups.regulation_id = db1.regulations.id
JOIN db1.filters
ON db1.filters.group_id = db1.groups.id
WHERE db1.filters.`id` IN (
SELECT f_id FROM db2.tbl_alerts WHERE u_id = 123
)
Is there a more optimal way to write this?
Using 5.5.60-MariaDB
You can try the following "Cross-Database" Join query:
SELECT
DISTINCT db1.regulations.name
FROM
db1.groups
JOIN db1.regulations
ON db1.groups.regulation_id = db1.regulations.id
JOIN db1.filters
ON db1.filters.group_id = db1.groups.id
JOIN db2.tbl_alerts
ON db2.tbl_alerts.f_id = db1.filters.id
WHERE db2.tbl_alerts.u_id = 123
Note that DISTINCT keyword is not a function, so parentheses are unnecessary. Now, regarding the performance part, you will need to benchmark between your current set of queries, versus this single query.

UPDATE MySQL Query taking a long time to execute - ways to optimize

I have a doubt with the following query.
update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
inner join waardeklasse_average b use index (waardeklasse)
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
where b.waardeklasse_new = a.waardeklasse;
I am trying to update a new column 'woz_val' in the 'src_woz_waardeklasse_2011' table using the 'average' values from the 'waardeklasse_average' table. I am joining using the 'waaderklasse' numbers in both the tables. But the 'src_woz_waardeklasse_2011' table is nearly 7 million records and the 'waardeklasee_average' table is 46 records. So the query is taking a really long time. 25 minutes and counting.
Is there a way to optimize it? I am sure it's taking a long time as I am trying to compare values between a large table and a small table. I have included the table structure of both the tables below.
src_woz_waardeklasse_2011
+----------------------+---------------------------+------+-----+---------+-
---------------+
| Field | Type | Null | Key | Default |
Extra |
+----------------------+---------------------------+------+-----+---------+-
---------------+
| id | int(11) unsigned | NO | PRI | NULL |
auto_increment |
| postcode | varchar(150) | YES | MUL | NULL |
|
| huisnummeraanduiding | varchar(150) | YES | | NULL |
|
| huisletter | varchar(150) | YES | | NULL |
|
soort_woonobject | varchar(150) | YES | | NULL |
|
| bouwjaar | varchar(150) | YES | | NULL |
|
| bouwjaarsklasse | varchar(150) | YES | | NULL |
|
| inhoud | varchar(150) | YES | | NULL |
|
| reg_oppervlak | varchar(150) | YES | | NULL |
|
| woz_value | int(15) unsigned zerofill | YES | UNI | NULL |
|
| reg_oppervlak_bn | varchar(150) | YES | | NULL |
|
| waardeklasse | int(10) | NO | PRI | NULL |
|
| waardepeildatum | varchar(150) | YES | | NULL |
|
| zipandnumber | varchar(150) | YES | | NULL |
|
+----------------------+---------------------------+------+-----+---------+-
---------------+
waardeklasse_average
+-------------------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+---------+------+-----+---------+-------+
| waardeklasse_average_id | int(11) | NO | | NULL | |
| waardeklasse_new | int(10) | NO | PRI | NULL | |
| lower | int(11) | NO | | NULL | |
| higher | int(11) | NO | | NULL | |
| average | int(11) | NO | PRI | NULL | |
+-------------------------+---------+------+-----+---------+-------+
update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
Left join waardeklasse_average b
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
where b.waardeklasse_new = a.waardeklasse;
Use the above query - in this you can use the left join and where condition to speed up the query.
Also don't use the index for the second table which has 46 record this will affect the performance - if the table has less records then index will create more burden in this case full table scan without index is lot better. Also if you have a cluster index in the first table(7 million) remove them and run the query, because for each and every update you perform default table structure And page alignment will be updated in the clustered index. Hope this helps..!
First, you don't need the where clause. Second, remove the index hints:
update src_woz_waardeklasse_2011 ww inner join
waardeklasse_average wa
on wa.waardeklasse_new = ww.waardeklasse
set ww.woz_value = wb.average ;
Then, try an index on src_woz_waardeklasse_2011(waardeklasse). This should improve the execution plan.
You might also check that the 46 records in the second table actually correspond to 46 updates. If the join conditions are wrong, you could be updating all the records.
EDIT:
Indexes have nothing to do with your problem. Updating 7 million records takes a long time. You might consider creating the data you want in a temporary table and then truncating the original table and inserting the new rows into it. Otherwise, batch the updates a few rows at a time.
update src_woz_waardeklasse_2011 a use index(waardeklasse_main,woz_val)
inner join waardeklasse_average b use index (waardeklasse)
on b.waardeklasse_new = a.waardeklasse
set a.woz_value = b.average
Where condition is not needed. As you had already mention that condition in On clause.
Try above query.

MySQL Limit query by time when there's not enough results

I have a big table, with 670k rows and I'm running a SELECT with a lot of WHEREs to search and filter useful results, the thing is sometimes there are NO results with the selected filters, and the query just goes all over the table and takes a lot of time, I'd like to stop the query if there are no results found in, say, 30 seconds.
This is my query:
SELECT date, s.name, l.id, l.title,ratingsum,numvotes,keyword,tag
from news_links l
LEFT JOIN sources s on s.id = l.source
WHERE
l.date BETWEEN STR_TO_DATE(?,'%Y-%m-%d')
AND STR_TO_DATE(?,'%Y-%m-%d')
AND s.name like ?
AND ((numvotes-1) *?) <= l.ratingsum
AND numvotes > ?
AND matches = 1
AND tag >= ?
AND tag <= ?
AND (l.title like ? or l.keyword like ?)
AND category >= ?
AND category <= ?
order by date desc
limit ?,15
I tried running a sub-query instead of joining but it didn't speed up the query.
News table(640k rows)
-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | UNI | NULL | auto_increment |
| link | varchar(450) | NO | PRI | NULL | |
| date | datetime | NO | MUL | NULL | |
| title | varchar(145) | NO | MUL | NULL | |
| source | int(11) | NO | MUL | NULL | |
| text | mediumtext | YES | | NULL | |
| numvotes | int(3) | NO | MUL | 0 | |
| ratingsum | int(3) | NO | | 0 | |
| matches | int(1) | NO | | 0 | |
| keyword | varchar(45) | YES | | NULL | |
| tag | int(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+----------------+
I have indexes set up on date,title,source,numvotes as well as the primary key on link
670k rows should run VERY fast in MySQL. You should have a closer look at your indices. Start adding a combined HASH index on news_links.source and news_links.matches:
ALTER TABLE news_links ADD INDEX myIdx1 USING HASH (source, matches)
What does EXPLAIN SELECT ... gives you with that?
After that you can try to improve the Performance further by including more Information in your index (Note that MySQL will use only one index per table). Add a BTREE index:
ALTER TABLE news_links ADD INDEX myIdx2 USING BTREE (source, matches, `date`)
BTREE will be good for range-queries (eg with a BETWEEN in it). HASH is good for equal/unequal conditions. If you want to index several columns with mixed conditions (range an equal) use BTREE
What does EXPLAIN SELECT ... gives you now?

NATURAL JOIN vs WHERE IN Clauses

Recently, I dealt with retrieving a large amount of data which consists of thousands of records from a MySQL database. Since it was my first time to handle such large data set, I didn't think about the efficiency of the SQL statement. And the problem comes.
Here are the tables of the database
(It is just a simple database model of a curriculum system):
course:
+-----------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+----------------+
| course_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(20) | NO | | NULL | |
| lecturer | varchar(20) | NO | | NULL | |
| credit | float | NO | | NULL | |
| week_from | tinyint(3) unsigned | NO | | NULL | |
| week_to | tinyint(3) unsigned | NO | | NULL | |
+-----------+---------------------+------+-----+---------+----------------+
select:
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| select_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| card_no | int(10) unsigned | NO | | NULL | |
| course_id | int(10) unsigned | NO | | NULL | |
| term | varchar(7) | NO | | NULL | |
+-----------+------------------+------+-----+---------+----------------+
When I want to retrieve all the courses that a student has selected (with his card number),
the SQL statement is
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` WHERE course_id IN (
SELECT course_id FROM `select` WHERE card_no=<student's card number>
);
But, it was extremely slow and it didn't return anything for a long time.
So I changed WHERE IN clauses into NATURAL JOIN. Here is the SQL,
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `select` NATURAL JOIN `course`
WHERE card_no=<student's card number>;
It returns immediately and works fine!
So my question is:
What's the difference between NATURAL JOIN and WHERE IN Clauses?
What makes them perform differently?
(Is that maybe because I doesn't set up any INDEX?)
When shall we use NATURAL JOIN or WHERE IN?
Theoretically the two queries are equivalent. I think it's just poor implementation of the MySQL query optimizer that causes JOIN to be more efficient than WHERE IN. So I always use JOIN.
Have you looked at the output of EXPLAIN for the two queries? Here's what I got for a WHERE IN:
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
| 1 | PRIMARY | t_users | ALL | NULL | NULL | NULL | NULL | 2458304 | Using where |
| 2 | DEPENDENT SUBQUERY | t_user_attributes | index_subquery | PRIMARY,attribute | PRIMARY | 13 | func,const | 7 | Using index; Using where |
+----+--------------------+-------------------+----------------+-------------------+---------+---------+------------+---------+--------------------------+
It's apparently performing the subquery, then going through every row in the main table testing whether it's in -- it doesn't use the index. For the JOIN I get:
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
| 1 | SIMPLE | t_user_attributes | ref | PRIMARY,attribute | attribute | 1 | const | 15 | Using where |
| 1 | SIMPLE | t_users | eq_ref | username,username_2 | username | 12 | bbodb_test.t_user_attributes.username | 1 | |
+----+-------------+-------------------+--------+---------------------+-----------+---------+---------------------------------------+------+-------------+
Now it uses the index.
Try this:
SELECT course_id, name, lecturer, credit, week_from, week_to
FROM `course` c
WHERE c.course_id IN (
SELECT s.course_id
FROM `select` s
WHERE card_no=<student's card number>
AND c.course_id = s.course_id
);
Notice the addition of the AND clause in the sub-query. This is called a co-related sub-query because it relates the two course_ids, just as the NATURAL JOIN does.
I think Barmar's index explanation is on the mark.