MySQL - indexing , order of fields

MySQL - indexing , order of fields - mysql

My database size is growing, so I have decided to add indexes (never done that before). So I need some advice. e.g.
SELECT field1, field2
FROM mytable1
WHERE account_id = :account_id and product_id = :product_id and version_name = :version_name
or
SELECT field1, field2
FROM mytable1
WHERE version_name = :version_name and and product_id = :product_id and account_id = :account_id
An account can have many products and product can have millions of versions, which select order in where is faster. version_name is a string.
If I have version_id available which is primary key should I always put that first.
Should I add index upon account_id and product_id together
Does it get all the rows for first condition in where, and then filters the result as per second condition and so on
or
It scans every row for all the three fields, given no index is added

The order of the clauses does not matter. What matters is the order of the fields in the index: if you have an index on 1. account_id, 2. product_id, that index will only be used when you have in the WHERE clause account_id.
For your queries you can put an index on account_id, product_id and version_name.

Related

delete multiple rows in mysqldb

How can we optimize the delete query.
delete FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN(SELECT MaxID FROM temp)
ORDER BY id
LIMIT 1000
This select statement return "SELECT MaxID FROM temp" 35k lines and temp is a temporary table.
and select * FROM student_score WHERE
lesson_id IS NOT null return around 500k rows
I tried using limit and order by clause but doesn't result in faster ways

IN(SELECT...)` is, in many situations, really inefficient.
Use a multi-table DELETE. This involves a LEFT JOIN ... IS NULL, which is much more efficient.
Once you have mastered that, you might be able to get rid of the temp and simply fold it into the query.
Also more efficient is
WHERE NOT EXISTS ( SELECT 1 FROM temp
WHERE student_score.lesson_id = temp.MAXID )
Also, DELETEing a large number of rows is inherently slow. 1000 is not so bad; 35K is. The reason is the need to save all the potentially-deleted rows until "commit" time.
Other techniques for big deletes: http://mysql.rjweb.org/doc.php/deletebig
Note that one of then explains a more efficient way to walk through the PRIMARY KEY (via id). Note that your query may have to step over lots of ids that have lesson_id IS NULL. That is, the LIMIT 1000 is not doing what you expected.

You can do it without order by :
DELETE FROM student_score
WHERE lesson_id IS NOT null
AND id NOT IN (SELECT MaxID FROM temp)
Or like this using left join which is more optimized in term of speed :
DELETE s
FROM student_score s
LEFT JOIN temp t1 ON s.id = t1.MaxID
WHERE lesson_id IS NOT null and t1.MaxID is null;

Delete a MySQL selection

I would like to delete my MySQL selection.
Here is my MySQL selection request:
SELECT *
FROM Items
WHERE id_user=1
ORDER
BY id_user
LIMIT 2,1
With this working request, I select the third item on my table which has as id_user: 1.
Now, I would like to delete the item that has been selected by my request.
I am looking for a same meaning request which would look like this :
DELETE FROM Items (
SELECT * FROM Items WHERE id_user=1 ORDER BY id_user LIMIT 2,1
)

The first thing to note is that there is an issue with your query. You are filtering on a unique value of id_user and sorting on the same column. As all records in the resultset will have the same id_user, the actual order of the resultset is undefined, and we cannot reliably tell which record comes third.
Assuming that you have another column to disanbiguate the resultset (ie some value that is unique amongst each group of records having the same id_user), say id, here is a solution to your question, that uses a self-join with ROW_NUMBER() to locate the third record in each group.
DELETE i
FROM items i
INNER JOIN (
SELECT
id,
id_user,
ROW_NUMBER() OVER(PARTITION BY id_user ORDER BY id) rn
FROM items
) c ON c.id = i.id AND c.id_user = i.id_user AND c.rn = 3
WHERE i.id_user=1 ;
Demo on DB Fiddle

You didn't provide the definition of your table. I guess it has a primary key column called id.
In that case you can use this
CREATE TEMPORARY TABLE doomed_ids
SELECT id FROM Items WHERE id_user = 1 ORDER BY id_user LIMIT 2,1;
DELETE FROM Items
WHERE id IN ( SELECT id FROM doomed_ids);
DROP TABLE doomed_ids;
It's a pain in the neck, but it works around the limitation of MySQL and MariaDB disallowing LIMITs in ... IN (SELECT ...) clauses.

You can use the select query to create a derived table and join it back to your main table to determine which record(s) to delete. Derived tables can use the limit clause.
Assuming that the PK is called id, the query would look as follows:
delete i from items i
inner join (SELECT id FROM Items
WHERE id_user=1
ORDER BY id_user LIMIT 2,1) i2 on i.id=i2.id
You need to substitute your PK in place of id. If you have a multi-column PK, then you need to select all the PK fields in the derived table and join on all of them.

Proper way to use MySQL GROUP BY for returning one result from a referenced table

I often have a situation with two tables in MySQL where I need one record for each foreign key. For example:
table post {id, ...}
table comment {id, post_id, ...}
SELECT * FROM comment GROUP BY post_id ORDER BY id ASC
-- Oldest comment for each post
or
table client {id, ...}
table payment {id, client_id, ...}
SELECT * FROM payment GROUP BY client_id ORDER BY id DESC
-- Most recent payment from each client
These queries often fail because the "SELECT list is not in GROUP BY clause" and contains nonaggregated columns.
Failed Solutions
I can usually work around this with a min()/max() but that creates a very slow query with mis-matched results (row with min(id) isn't equal to row with min(textfield))
SELECT min(id), min(textfield), ... FROM table GROUP BY fk_id
Adding all the columns to GROUP BY results in duplicate records (from the fk_id) which defeats the purpose of GROUP BY.
SELECT id, textfield, ... FROM table GROUP BY fk_id, id, textfield

Same idea as #GurV but using a join instead of a correlated subquery. The basic idea here is that the subquery finds, for each post which has comments, the oldest post and its corresponding id in the comments table. We then join back to comments again to restrict to the records we want.
SELECT t1.*
FROM comments t1
INNER JOIN
(
SELECT post_id, MIN(id) AS min_id
FROM comments
GROUP BY post_id
) t2
ON t1.post_id = t2.post_id AND
t1.id = t2.min_id

You can use a correlated query with aggregation to find out the earliest comment for each post:
select *
from comments c1
where id = (
select min(id)
from comments c2
where c1.post_id = c2.post_id
)
Compound index - comments(id, post_id) should be helpful.
If you are querying the whole table with many rows, then it will. This query is more useful and performant if you are querying for a small subset of posts. If you are querying the whole table, then #Tim's answer is better suited I think.

Optimizing database query with up to 10mil rows as result

I have a MySQL Query that i need to optimize as much as possible (should have a load time below 5s, if possible)
Query is as follow:
SELECT domain_id, COUNT(keyword_id) as total_count
FROM tableName
WHERE keyword_id IN (SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
LIMIT ...
X is an integer that comes from an input
domain_id and keyword_id are indexed
database is on localhost, so the network speed should be max
The subquery from the WHERE clause can get up to 10 mil results. Also, for MySQL seems really hard to calculate the COUNT and ORDER BY this count.
I tried to mix this query with SOLR, but no results, getting such a high number of rows at once gives hard time for both MySQL and SOLR
I'm looking for a solution to have the same results, no matter if i have to use a different technology or an improvement to this MySQL query.
Thanks!
Query logic is this:
We have a domain and we are searching for all the keywords that are being used on that domain (this is the sub query). Then we take all the domains that use at least one of the keywords found on the first query, grouped by domain, with the number of keywords used for each domain, and we have to display it ordered DESC by the number of keywords used.
I hope this make sense

You may try JOIN instead of subquery:
SELECT tableName.domain_id, COUNT(tableName.keyword_id) AS total_count
FROM tableName
INNER JOIN tableName AS rejoin
ON rejoin.keyword_id = tableName.keyword_id
WHERE rejoin.domain_id = X
GROUP BY tableName.domain_id
ORDER BY tableName.total_count DESC
LIMIT ...

I am not 100% sure but can you try this please
SELECT t1.domain_id, COUNT(t1.keyword_id) as total_count
FROM tableName AS t1 LEFT JOIN
(SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X) AS t2
ON t1.keyword_id = t2.keyword_id
WHERE t2.keyword_id IS NTO NULL
GROUP BY t1.domain_id
ORDER BY total_count DESC
LIMIT ...
The goal is to replace the WHERE IN clause with INNER JOIN and that will make it lot quicker. WHERE IN clause always make the Mysql server to struggle, but it is even more noticeable when you do it with huge amount of data. Use WHERE IN only if it make you query look easier to be read/understood, you have a small data set or it is not possible in another way (but you probably will have another way to do it anyway :) )

In terms of MySQL all you can do is to minimize Disk IO for the query using covering indexes and rewrite it a little more efficient so that the query would benefit from them.
Since keyword_id has a match in another copy of the table, COUNT(keyword_id) becomes COUNT(*).
The kind of subqueries you use is known to be the worst case for MySQL (it executes the subquery for each row), but I am not sure if it should be replaced with a JOIN here, because It might be a proper strategy for your data.
As you probably understand, the query like:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (X,Y,Z)
GROUP BY domain_id
ORDER BY total_count DESC
would have the best performance with a covering composite index (keyword_id, domain_id [,...]), so it is a must. From the other side, the query like:
SELECT DISTINCT keyword_id FROM tableName WHERE domain_id = X
will have the best performance on a covering composite index (domain_id, keyword_id [,...]). So you need both of them.
Hopefully, but I am not sure, when you have the latter index, MySQL can understand that you do not need to select all those keyword_id in the subquery, but you just need to check if there is an entry in the index, and I am sure that it is better expressed if you do not use DISTINCT.
So, I would try to add those two indexes and rewrite the query as:
SELECT domain_id, COUNT(*) as total_count
FROM tableName
WHERE keyword_id IN (SELECT keyword_id FROM tableName WHERE domain_id = X)
GROUP BY domain_id
ORDER BY total_count DESC
Another option is to rewrite the query as follows:
SELECT domain_id, COUNT(*) as total_count
FROM (
SELECT DISTINCT keyword_id
FROM tableName
WHERE domain_id = X
) as kw
JOIN tableName USING (keyword_id)
GROUP BY domain_id
ORDER BY total_count DESC
Once again you need those two composite indexes.
Which one of the queries is quicker depends on the statistics in your tableName.

Mysql Query Question

I have a mysql table with the following columns:
id (autoincremented primary key)
user_id
test_id
test_score
How can I get all test data tied to the users that have participated in test_id = 5. I want all test data (not just test_id 5) for the users that have participated in that particular test.

I believe this will get you what you are looking for.
SELECT *
FROM table
WHERE user_id IN
(SELECT DISTINCT user_id
FROM table
WHERE test_id = 5)

Sounds like you'd want to do a self-join. Something like this would utilize indexes (assuming you have an index on user_id and an index on test_id, which you should):
SELECT DISTINCT t1.*
FROM `tableName` t1
INNER JOIN `tableName` t2 ON t2.user_id = t1.user_id
WHERE t2.test_id = 5
I guess that raises another point. How is your table indexed? Just the numeric primary key? You generally want to have indexes on all columns that are used in JOINs or WHERE clauses. Thus, you'd want to have an index on user_id and an index on test_id.
Can a user take the same test multiple times? If not, then you'd want restrict that by having a unique multiple-column ("composite") index across user_id and test_id together. And then add a regular index just to test_id for the WHERE clause.

If I understand your question correctly, then the basic approach is to join the table on itself.
select allTestData.* from TEST_TABLE allTestData
left outer join (select user_id from TEST_TABLE where test_id = 5) testIdFive
on allTestData.user_id = testIdFive.user_id

Try this Query,
SELECT test data
FROM TABLENAME
WHERE user_id IN (
SELECT user_id
FROM TABLENAME
WHERE test_id = 5
)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL - indexing , order of fields - mysql

Related

delete multiple rows in mysqldb

Delete a MySQL selection

Proper way to use MySQL GROUP BY for returning one result from a referenced table

Optimizing database query with up to 10mil rows as result

Mysql Query Question

Categories

Resources