How to optimise join between two MySQL tables? - mysql

I have two tables: p_group.full_data, which is a large dataset I'm working on (100k rows, 200 columns) and p_group.full_data_aggregated, which I've produced to summarise a load of other tables.
Now, what I'd like to do is perform a join between full_data and full_data_aggregated to select out certain rows, averages, and so on. The query I have is as follows:
SELECT 'name', p.group_id, a.group_condition, p.event_index, AVG(p.value) FROM p_group.full_data p
JOIN p_group.full_data_aggregated as a on p.group_id = a.group_id AND p.event_index = a.event_index
WHERE (a.group_condition='open')
GROUP BY p.group_id, p.event_index
I have an index on: full_data.group_id, full_data.event_index and full_data_aggregated.group_id, full_data_aggregated.event_index, full_data_aggregated.group_condition.
Now, the problem is that this query simply won't finish: previously, I had my full_data split up into different tables (one for each group_id), and that worked fine. But now that I have joined the groups together, the query sits there running and so I can only assume I have done something stupid.
Is there anything else I can try to actually get this query to run at a decent speed? I'm sure I've messed up something with indices and the group by function, but I can't work out what. I've tried all sorts of variations of the above query. EXPLAIN indicates that it is "using where; using temporary; using filesort" but I'm not sure how to fix this.
Thanks!

I assume that your indexes are combination indexes (with group_id and event_index together). If you have separate indexes for each field, then only one index is used at a time and the database engine is going through significantly more data.
For example, if you only have a few unique group_id, but lots of event_index, and you have two indexes, one on group_id only, and the other one on event_index, then you query is going to run through a large number of rows for each group_id. If you have one index instead, with both fields in order, then the query will run much faster.

Related

MYSQL search query optimization from two many-to-many tables

I have three tables.
tbl_post for a table of posts. (post_idx, post_created, post_title, ...)
tbl_mention for a table of mentions. (mention_idx, mention_name, mention_img, ...)
tbl_post_mention for a unique many-to-many relation between the two tables. (post_idx, mention_idx)
For example,
PostA can have MentionA and MentionB.
PostB can have MentionA and MentionC.
PostC cannot have MentionC and MentionC.
tbl_post has about million rows, tbl_mention has less than hundred rows, and tbl_post_mention has a couple of million rows. All three tables are heavily loaded with foreign keys, unique indices, etc.
I am trying to make two separate search queries.
Search for post ids with all the given mention ids[AND condition]
Search for post ids with any of the given mention ids[OR condition]
Then join with tbl_post and tbl_mention to populate with meaningful data, order the results, and return the top n. In the end, I hope to have a n list of posts with all the data required for my service to display on the front end.
Here are the respective simpler queries
SELECT post_idx
FROM
(SELECT post_idx, count(*) as c
FROM tbl_post_mention
WHERE mention_idx in (1,95)
GROUP BY post_idx) AS A
WHERE c >= 2;
The problem with this query is that it is already inefficient before the joins and ordering. This process alone takes 0.2 seconds.
SELECT DISTINCT post_idx
FROM tbl_post_mention
WHERE mention_idx in (1,95);
This is a simple index range scan, but because of the IN statement, the query becomes expensive again once you start joining it with other tables.
I tried more complex and "clever" queries and tried indexing different sets of columns with no avail. Are there special syntaxes that I could use in this case? Maybe a clever trick? Partitioning? Or am I missing some fundamental concept here... :(
Send help.
The query you want is this:
SELECT post_idx
FROM tbl_post_mention
WHERE mention_idx in (1,95)
GROUP BY post_idx
HAVING COUNT(*) >= 2
The HAVING clause does your post-GROUP BY filtering.
The index that will help you is this.
CREATE INDEX mentionsdex ON tbl_post_mention (mention_idx, post_idx);
It covers your query by allowing rapid lookup by mention_idx then grouping by post_idx.
Often so-called join tables with two columns -- like your tbl_post_mention -- work most efficiently when they have a pair of indexes with the columns in opposite orders.

How to make a faster query when joining multiple huge tables?

I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.

Optimising sql query with cartesian join

I have 2 tables sl and sd.
I want to optimize the following query, if it is possible
select sl.*, sd.* from sl join sd where sl.conf_id='blah' and sd.for_as=1
My understanding is that the cartesian product is first performed and then filtering happens.
Is there a way to have the filtering done first?
Run EXPLAIN SELECT ... -- it will probably say "Using join buffer". This is where it loads one table into memory (if not too big) and repeatedly scans it for the data. Not a pretty site, but a lot faster than before the 'join buffer' came into play.
Since you have no ON or WHERE tying the two tables together, you really want the "cross join"? That is, if there are 40 'blah' and 70 '1', you will end up with 40*70 = 2800 rows?
As for optimizing, the optimizer will pick one of the tables, giving preference to one that has a useful index, scan it (index or table), then repeatedly use the join buffer (if possible) to scan (index or table) of the other.
In other words, one table will use an index if possible, doing the filtering before the Cartesian product, the other might use the join buffer. If the tables aren't too big, the performance won't be too bad.

Speed of query using FIND_IN_SET on MySql

i have several problems with my query from a catalogue of products.
The query is as follows:
SELECT DISTINCT (cc_id) FROM cms_catalogo
JOIN cms_catalogo_lingua ON ccl_id_prod=cc_id
JOIN cms_catalogo_famiglia ON (FIND_IN_SET(ccf_id, cc_famiglia) != 0)
JOIN cms_catalogo_categoria ON (FIND_IN_SET(ccc_id, cc_categoria) != 0)
JOIN cms_catalogo_sottocat ON (FIND_IN_SET(ccs_id, cc_sottocat) != 0)
LEFT JOIN cms_catalogo_order ON cco_id_prod=cc_id AND cco_id_lingua=1 AND cco_id_sottocat=ccs_id
WHERE ccc_nome='Alpine Skiing' AND ccf_nome='Ski'
I noticed that querying the first time it takes on average 4.5 seconds, then becomes rapid.
I use FIND_IN_SET because in my Database on table "cms_catalogo" I have the column "cc_famiglia" , "cc_categoria" and "cc_sottocat" with inside ID separated by commas (I know it's stupid).
Example:
Table cms_catalogo
Column cc_famiglia: 1,2,3,4,5
Table cms_catalogo_famiglia
Column ccf_id: 3
The slowdown in the query may arise from the use of FIND_IN_SET that way?
If instead of having IDs separated by comma have a table with ID as an index would be faster?
I can not explain, however, why the first execution of the query is very slow and then speeds up
It is better to use constraint connections between tables. So you better connect them by primary key.
If you want just to quick optimisation for this query:
Check explain select ... in mysql to see performance of you query;
Add indexes for columns ccc_id, ccf_id, ccs_id;
Check explain select ... after indexes added.
The first MySQL query takes much more time because it is raw query, the next are cached. So you should rely on first query time.
If it is not complicated report then execution time should be less than 50-100ms, otherwise you can get problems with performance in total. Because I am so sure it is not the only one query for your application.

How can I improve the performance of this MySQL query?

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?
See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)
You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.
Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.
thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)