Increase performance of a one-to-many join - mysql

We're building a model where we're joining a 13-part profile (01_01_resource_utilization_prepared) to a daily record to create 13 record per day; this is a deliberate one-to-many which grows the size of the table.
It is a simple query but we have tried indexing but what is the best way to optimise this query?
SELECT
a.DATE,
a.RUN_ID,
a.HOURS,
a.HOURS * b.RESOURCE_DISTRIBUTION,
a.SCHEDULE_PROFILE_ID,
a.WEEKDAY_NUMBER,
a.SCHEDULE_DISTRIBUTION,
b.RESOURCE_DISTRIBUTION,
a.LOCATION_DESC,
a.DEPARTMENT_DESC,
a.LANGUAGE_DESC,
a.JOB_TITLE_DESC,
FROM
03_01_schedule a
LEFT JOIN 01_01_resource_utilization_prepared b ON (
a.RESOURCE_PROFILE_ID = b.RESOURCE_PROFILE_ID
AND a.DATE >= b.EFFECTIVE_FROM
AND a.DATE <= b.EFFECTIVE_TO
)

Does 01_01 refer to Jan 01? If so, I suggest that is a bad way to lay out the data. But meanwhile...
Checking for within a range, where the range comes from another table is hard to optimize. These composite indexes on b will help a little:
INDEX(RESOURCE_PROFILE_ID, EFFECTIVE_FROM)
INDEX(RESOURCE_PROFILE_ID, EFFECTIVE_TO)
Is LEFT needed? If it can be removed without destroying the semantics, then a much better option avails itself. Removing LEFT would let this be useful on a:
INDEX(RESOURCE_PROFILE_ID, `DATE`)
(Meanwhile, I did not understand the relevance of anything you said in your first paragraph.)

Without more information I can't say exactly, but performance will depend on indexing the columns you're comparing. Without indexes the join may have to scan every row, a "full table scan".
It's pretty common in MySQL to forget to declare foreign keys. 03_01_schedule.RESOURCE_PROFILE_ID and 01_01_resource_utilization_prepared.RESOURCE_PROFILE_ID should be declared as foreign keys and they will be indexed. This will make the basic join much faster and also supply referential integrity.
03_01_schedule.DATE, 01_01_resource_utilization_prepared.EFFECTIVE_FROM, and 01_01_resource_utilization_prepared.EFFECTIVE_TO should all be indexed. This will make comparisons using those columns much faster.

Related

how can I improve the performance of this slow query in mysql

I have a mysql query which combines data from 3 tables, which I'm calling "first_table", "second_table", and "third_table" as shown below.
This query consistently shows up in the MySQL slow query log, even though all fields referenced in the query are indexed, and the actual amount of data in these tables is not large (< 1000 records, except for "third_table" which has more like 10,000 records).
I'm trying to determine if there is a better way to structure this query to achieve better performance, and what part of this query is likely to be the most likely culprit for causing the slowdown.
Please note that "third_table.placements" is a JSON field type. All "label" fields are varchar(255), "id" fields are primary key integer fields, "sample_img" is an integer, "guid" is a string, "deleted" is an integer, and "timestamp" is a datetime.
SELECT DISTINCT first_table.id,
first_table.label,
(SELECT guid
FROM second_table
WHERE second_table.id = first_table.sample_img) AS guid,
Count(third_table.id) AS
related_count,
Sum(Json_length(third_table.placements)) AS
placements_count
FROM first_table
LEFT JOIN third_table
ON Json_overlaps(third_table.placements,
Cast(first_table.id AS CHAR))
WHERE first_table.deleted IS NULL
AND third_table.deleted IS NULL
AND Unix_timestamp(third_table.timestamp) >= 1647586800
AND Unix_timestamp(third_table.timestamp) < 1648191600
GROUP BY first_table.id
ORDER BY Lower(first_table.label) ASC
LIMIT 0, 1000
The biggest problem is that these are not sargable:
WHERE ... Unix_timestamp(third_table.timestamp) < 1648191600
ORDER BY Lower(first_table.label)
That is, don't hide a potentially indexed column inside a function call. Instead:
WHERE ... third_table.timestamp < FROM_UNIXTIME(1648191600)
and use a case insensitive COLLATION for first_table.label. That is any collation ending in _ci. (Please provide SHOW CREATE TABLE so I can point that out, and to check the vague "all fields are indexed" -- That usually indicates not knowing the benefits of "composite" indexes.)
Json_overlaps(...) is probably also not sargable. But it gets trickier to fix. Please explain the structure of the json and the types of id and placements.
Do you really need 1000 rows in the output? That is quite large for "pagination".
How big are the tables? UUIDs/GUIDs are notorious when the tables are too big to be cached in RAM.
It is possibly never useful to have both SELECT DISTINCT and GROUP BY. Removing the DISTINCT may speed up the query by avoiding an extra sort.
Do you really want LEFT JOIN, not just JOIN? (I don't understand the query enough to make a guess.)
After you have fixed most of those, and if you still need help, I may have a way to get rid of the GROUP BY by adding a 'derived' table. Later. (Then I may be able to address the "json_overlaps" discussion.)
Please provide EXPLAIN SELECT ...

How to make a faster query when joining multiple huge tables?

I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.

Optimizing the SQL Query to reduce execution time

My SQL Query with all the filters applied is returning 10 lakhs (one million) records . To get all the records it is taking 76.28 seconds .. which is not acceptable . How can I optimize my SQL Query which should take less time.
The Query I am using is :
SELECT cDistName , cTlkName, cGpName, cVlgName ,
cMmbName , dSrvyOn
FROM sspk.villages
LEFT JOIN gps ON nVlgGpID = nGpID
LEFT JOIN TALUKS ON nGpTlkID = nTlkID
left JOIN dists ON nTlkDistID = nDistID
LEFT JOIN HHINFO ON nHLstGpID = nGpID
LEFT JOIN MEMBERS ON nHLstID = nMmbHhiID
LEFT JOIN BNFTSTTS ON nMmbID = nBStsMmbID
LEFT JOIN STATUS ON nBStsSttsID = nSttsID
LEFT JOIN SCHEMES ON nBStsSchID = nSchID
WHERE (
(nMmbGndrID = 1 and nMmbAge between 18 and 60)
or (nMmbGndrID = 2 and nMmbAge between 18 and 55)
)
AND cSttsDesc like 'No, Eligible'
AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )
GROUP BY cDistName , cTlkName, cGpName, cVlgName ,
DATE_FORMAT(dSrvyOn , '%m-%Y')
I have searched on the forum and outside and used some of the tips given but it hardly makes any difference . The joins that i have used in above query is left join all on Primary Key and Foreign key . Can any one suggest me how can I modify this sql to get less execution time ....
You are, sir, a very demanding user of MySQL! A million records retrieved from a massively joined result set at the speed you mentioned is 76 microseconds per record. Many would consider this to be acceptable performance. Keep in mind that your client software may be a limiting factor with a result set of that size: it has to consume the enormous result set and do something with it.
That being said, I see a couple of problems.
First, rewrite your query so every column name is qualified by a table name. You'll do this for yourself and the next person who maintains it. You can see at a glance what your WHERE criteria need to do.
Second, consider this search criterion. It requires TWO searches, because of the OR.
WHERE (
(MEMBERS.nMmbGndrID = 1 and MEMBERS.nMmbAge between 18 and 60)
or (MEMBERS.nMmbGndrID = 2 and MEMBERS.nMmbAge between 18 and 55)
)
I'm guessing that these criteria match most of your population -- females 18-60 and males 18-55 (a guess). Can you put the MEMBERS table first in your list of LEFT JOINs? Or can you put a derived column (MEMBERS.working_age = 1 or some such) in your table?
Also try a compound index on (nMmbGndrID,nMmbAge) on MEMBERS to speed this up. It may or may not work.
Third, consider this criterion.
AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )
You've applied a function to the dSrvyOn column. This defeats the use of an index for that search. Instead, try this.
AND dSrvyOn >= '2102-08-01'
AND dSrvyOn < '2012-08-01' + INTERVAL 1 MONTH
This will, if you have an index on dSrvyOn, do a range search on that index. My remark also applies to the function in your ORDER BY clause.
Finally, as somebody else mentioned, don't use LIKE to search where = will do. And NEVER use column LIKE '%something%' if you want acceptable performance.
You claim yourself you base your joins on good and unique indexes. So there is little to be optimized. Maybe a few hints:
try to optimize your table layout, maybe you can reduce the number of joins required. That probably brings more performance optimization than anything else.
check your hardware (available memory and things) and the server configuration.
use mysqls explain feature to find bottle necks.
maybe you can create an auxilliary table especially for this query, which is filled by a background process. That way the query itself runs faster, since the work is done before the query in background. That usually works if the query retrieves data that must not neccessarily be synchronous with every single change in the database.
check if an RDBMS is really the right type of database. For many purposes graph databases are much more efficient and offer better performance.
Try adding an index to nMmbGndrID, nMmbAge, and cSttsDesc and see if that helps your queries out.
Additionally you can use the "Explain" command before your select statement to give you some hints on what you might do better. See the MySQL Reference for more details on explain.
If the tables used in joins are least use for updates queries, then you can probably change the engine type from INNODB to MyISAM.
Select queries in MyISAM runs 2x faster then in INNODB, but the updates and insert queries are much slower in MyISAM.
You can create Views in order to avoid long queries and time.
Your like operator could be holding you up -- full-text search with like is not MySQL's strong point.
Consider setting a fulltext index on cSttsDesc (make sure it is a TEXT field first).
ALTER TABLE articles ADD FULLTEXT(cSttsDesc);
SELECT
*
FROM
table_name
WHERE MATCH(cSttsDesc) AGAINST('No, Eligible')
Alternatively, you can set a boolean flag instead of cSttsDesc like 'No, Eligible'.
Source: http://devzone.zend.com/26/using-mysql-full-text-searching/
This SQL has many things that are redundant that may not show up in an explain.
If you require a field, it shouldn't be in a table that's in a LEFT JOIN - left join is for when data might be in the joined table, not when it has to be.
If all the required fields are in the same table, it should be the in your first FROM.
If your text search is predictable (not from user input) and relates to a single known ID, use the ID not the text search (props to Patricia for spotting the LIKE bottleneck).
Your query is hard to read because of the lack of table hinting, but there does seem to be a pattern to your field names.
You require nMmbGndrID and nMmbAge to have a value, but these are probably in MEMBERS, which is 5 left joins down. That's a redundancy.
Remember that you can do a simple join like this:
FROM sspk.villages, gps, TALUKS, dists, HHINFO, MEMBERS [...] WHERE [...] nVlgGpID = nGpID
AND nGpTlkID = nTlkID
AND nTlkDistID = nDistID
AND nHLstGpID = nGpID
AND nHLstID = nMmbHhiID
It looks like cSttsDesc comes from STATUS. But if the text 'No, Eligible' matches exactly one nBStsSttsID in BNFTSTTS then find out the value and use that! If it is 7, take out LEFT JOIN STATUS ON nBStsSttsID = nSttsID and replace AND cSttsDesc like 'No, Eligible' with AND nBStsSttsID = '7'. This would see a massive speed improvement.

How to make my MySQL SUM() query more faster

I have about 1 million rows on users table and have columns A AA B BB C CC D DD E EE F FF by example to count int values 0 & 1
SELECT
CityCode,SUM(A),SUM(B),SUM(C),SUM(D),SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode
Result 8 rows in set (24.49 sec).
How to make my statement more faster?
Use explain to to know the excution plan of your query.
Create atleast one or more Index. If possible make CityCode primary key.
Try this one
SELECT CityCode,SUM(A),SUM(B),SUM(C),SUM(D), SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode,A,B,C,D,E,F,AA,BB,CC,DD,EE,FF
Create an index on the CityCode column.
I believe it is not because of SUM(), try to say select CityCode from users group by CityCode; it should take neary the same time...
Use better hardware
increase caching size - if you use InnoDB engine, then increase the innodb_buffer_pool_size value
refactor your query to limit the number of users (if business logic permits that, of course)
You have no WHERE clause, which means the query has to scan the whole table. This will make it slow on a large table.
You should consider how often you need to do this and what the impact of it being slow is. Some suggestions are:
Don't change anything - if it doesn't really matter
Have a table which contains the same data as "users", but without any other columns that you aren't interested in querying. It will still be slow, but not as slow, especially if there are bigger ones
(InnoDB) use CityCode as the first part of the primary key for table "users", that way it can do a PK scan and avoid any sorting (may still be too slow)
Create and maintain some kind of summary table, but you'll need to update it each time a user changes (or tolerate stale data)
But be sure that this optimisation is absolutely necessary.

When to use straight_join?

What is the order MySQL joins the tables, how is it chosen and when does STRAIGHT_JOIN comes in handy?
MySQL is only capable of doing nested loops (possibly using indexes), so if both join tables are indexed, the time for the join is calculated as A * log(B) if A is leading and B * log(A) if B is leading.
It is easy to see that the table with fewer records satisfying the WHERE condition should be made leading.
There are some other factors that affect the join performance, such as WHERE conditions, ORDER BY and LIMIT clauses etc. MySQL tries to predict the time for the join orders and if statistics are up to date does it quite well.
STRAIGHT_JOIN is useful when the statistics are not accurate (say, naturally skewed) or in case of bugs in the optimizer.
For instance, the following spatial join:
SELECT *
FROM a
JOIN b
ON MBRContains(a.area, b.area)
is subject to a join swap (the smaller table is made leading), however, MBRContains is not converted to MBRWithin and the resulting plan does not make use of the index.
In this case you should explicitly set the join order using STRAIGHT_JOIN.
As others have stated about the optimizer and which tables may meet the criteria on smaller result sets, but that may not always work. As I had been working with gov't contract / grants database. The table was some 14+ million records. However, it also had over 20 lookup tables (states, congressional districts, type of business classification, owner ethnicity, etc)
Anyhow with these smaller tables, the join was using one of the small lookups, back to the master table and then joining all the others. It CHOKED the database and cancelled the query after 30+ hours. Since my primary table was listed FIRST, and all subsequent were lookup and joined AFTER, just adding STRAIGHT_JOIN at the top FORCED the order I had listed and the complex query was running again in just about 2 hrs (expected for all it had to do).
Get whatever is your primary basis to the top with all subsequent extras later I've found, definitely helps.
The order of tables is specified by the optimizer. Straight_join comes in handy when the optimizer does it wrong, which is not so often. I used it only once in a big join, where the optimizer gave one particular table at first place in join (I saw it in explain select command), so I placed the table so that it is joined later in the join. It helped a lot to speed up the query.