How will I improve mysql query that retrieves 11.2m data? - mysql

select tblfarmerdetails.ncode,
tblfarmerdetails.region,tblfarmerdetails.province, tblfarmerdetails.municipality,
concat(tblfarmerdetails.farmerfname, ' ', tblfarmerdetails.farmerlname) as nameoffarmer,
concat(tblfarmerdetails.spousefname, ' ',tblfarmerdetails.spouselname) as nameofspouse, tblstatus.statusoffarmer from tblfarmerdetails
INNER Join
tblstatus on tblstatus.ncode = tblfarmerdetails.ncode where tblstatus.ncode = tblfarmerdetails.ncode order by tblfarmerdetails.region
It takes too long to retrieve 11.2m data. How will I improve this query?

Firstly, format the query so it is readable, or at least decipherable, by a human.
SELECT f.ncode
, f.region
, f.province
, f.municipality
, CONCAT(f.farmerfname,' ',f.farmerlname) AS nameoffarmer
, CONCAT(f.spousefname,' ',f.spouselname) AS nameofspouse
, s.statusoffarmer
FROM tblfarmerdetails
JOIN tblstatus s
ON s.ncode = f.ncode
ORDER BY f.region
It's likely that a lot of time is spent to do a "Using filesort" operation, to sort all the rows in the order specified in the ORDER BY clause. For sure a sort operation is going to occur if there's not an index with a leading column of region.
Having a suitable index available, for examaple
... ON tblfarmerdetails (region, ... )
Means that MySQL may be able to return the rows "in order", using the index, without having to do a sort operation.
If MySQL has a "covering index" available, i.e. an index that contains all of the columns of the table reference in the query, MySQL can make use of that index to satisfy the query without needing to visit pages in the underlying table.
But given the number of columns, and the potential that some of these columns may be goodly sized VARCHAR, this may not be possible or workable:
... ON tblfarmerdetails (region, ncode, province, municipality, farmerfname, farmerlname, spousefname, spouselname)
(MySQL does have some limitations on indexex. The goal of the "covering index" is to avoid lookups to pages in the table.)
And make sure that MySQL knows that ncode is UNIQUE in tblstatus. Either that's the PRIMARY KEY or there's a UNIQUE index.
We suspect tblstatus table contains a small number of rows, so the join operation is probably not that expensive. But an appropriate covering index, with ncode as the leading column, wouldn't hurt:
... ON tblstatus (ncode, statusoffarmer)
If MySQL has to performa a "Using filesort" operation to get the rows ordered (to satisfy the ORDER BY clause), on a large set, that operation can spill to disk, and that can add (sometimes significantly) to the elapsed time.
The resultset produced by the query has to be transferred to the client. And that can also take some clock time.
And the client has to do something with the rows that are returned.
Are you sure you really need to return 11.2M rows? Or, are you only needing the first couple of thousand rows?
Consider adding a LIMIT clause to the query.
And how long are those lname and fname columns? Do you need MySQL to do the concatenation for you, or could that be done on the client as the rows are proceesed.
It's possible that MySQL is having to do a "Using temporary" to hold the rows with the concatenated results. And MySQL is likely allocating enough space for that return column to hold the maximum possible length from lname + maximum posible length from fname. And if that's a multibyte character characterset, that will double or triple the storage over a single byte characterset.
To really see what's going on, you'd need to take a look at the query execution plan. You get that by preceding your SELECT statement with the keyword EXPLAIN
EXPLAIN SELECT ...
The output from that will show the operations that MySQL is going to do, what indexes it's going to use. And armed with knowledge about the operations the MySQL optimizer can perform, we can use that to make some pretty good guesses as to how to get the biggest gains.

Related

how to optimize this sql when use order by select

select id, col1,col2,col3,seq
from `table`
order by seq asc
i have already created index on 'seq', but i found that it doesn't use the index and use filesort when selecting , because the col1 may save some large data ,so i don't want to create the covering index in this table, so it is have some solutions to optimize this sql or table or index, thanks ,my English is not good😂😂😂
The SQL query optimizer apparently estimated the cost of using the index and concluded that it would be better to just do a table-scan and use a filesort of the result.
There is overhead to using a non-covering index. It reads the index in sorted order, but then has to dereference the primary key to get the other columns not covered by the row.
By analogy, this is like reading a book by looking up every word in alphabetical order in the index at the back, and then flipping back to the respect pages of the book, one word at a time. Time-consuming, but it's one way of reading the book in order by keyword.
That said, a filesort has overhead too. The query engine has to collect matching rows, and sort them manually, potentially using temporary files. This is expensive if the number of rows is large. You haven't described the size of your table in number of rows.
If the table you are testing has a small number of rows, the optimizer might have reasoned that it would be quick enough to do the filesort, so it would be unnecessary to read the rows by the index.
Testing with a larger table might give you different results from the optimizer.
The query optimizer makes the right choice in the majority of cases. But it's not infallible. If you think forcing it to use the index is better in this case, you can use the FORCE INDEX hint to make it believe that a table-scan is prohibitively expensive. Then if the index is usable at all, it'll prefer the index.
select id, col1,col2,col3,seq
from `table` FORCE INDEX(seq)
order by seq asc

how can I improve the performance of this slow query in mysql

I have a mysql query which combines data from 3 tables, which I'm calling "first_table", "second_table", and "third_table" as shown below.
This query consistently shows up in the MySQL slow query log, even though all fields referenced in the query are indexed, and the actual amount of data in these tables is not large (< 1000 records, except for "third_table" which has more like 10,000 records).
I'm trying to determine if there is a better way to structure this query to achieve better performance, and what part of this query is likely to be the most likely culprit for causing the slowdown.
Please note that "third_table.placements" is a JSON field type. All "label" fields are varchar(255), "id" fields are primary key integer fields, "sample_img" is an integer, "guid" is a string, "deleted" is an integer, and "timestamp" is a datetime.
SELECT DISTINCT first_table.id,
first_table.label,
(SELECT guid
FROM second_table
WHERE second_table.id = first_table.sample_img) AS guid,
Count(third_table.id) AS
related_count,
Sum(Json_length(third_table.placements)) AS
placements_count
FROM first_table
LEFT JOIN third_table
ON Json_overlaps(third_table.placements,
Cast(first_table.id AS CHAR))
WHERE first_table.deleted IS NULL
AND third_table.deleted IS NULL
AND Unix_timestamp(third_table.timestamp) >= 1647586800
AND Unix_timestamp(third_table.timestamp) < 1648191600
GROUP BY first_table.id
ORDER BY Lower(first_table.label) ASC
LIMIT 0, 1000
The biggest problem is that these are not sargable:
WHERE ... Unix_timestamp(third_table.timestamp) < 1648191600
ORDER BY Lower(first_table.label)
That is, don't hide a potentially indexed column inside a function call. Instead:
WHERE ... third_table.timestamp < FROM_UNIXTIME(1648191600)
and use a case insensitive COLLATION for first_table.label. That is any collation ending in _ci. (Please provide SHOW CREATE TABLE so I can point that out, and to check the vague "all fields are indexed" -- That usually indicates not knowing the benefits of "composite" indexes.)
Json_overlaps(...) is probably also not sargable. But it gets trickier to fix. Please explain the structure of the json and the types of id and placements.
Do you really need 1000 rows in the output? That is quite large for "pagination".
How big are the tables? UUIDs/GUIDs are notorious when the tables are too big to be cached in RAM.
It is possibly never useful to have both SELECT DISTINCT and GROUP BY. Removing the DISTINCT may speed up the query by avoiding an extra sort.
Do you really want LEFT JOIN, not just JOIN? (I don't understand the query enough to make a guess.)
After you have fixed most of those, and if you still need help, I may have a way to get rid of the GROUP BY by adding a 'derived' table. Later. (Then I may be able to address the "json_overlaps" discussion.)
Please provide EXPLAIN SELECT ...

Two different queries on the same table with the same WHERE clause

I have two different queries. But they are both on the same table and have both the same WHERE clause. So they are selecting the same row.
Query 1:
SELECT HOUR(timestamp), COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY HOUR(timestamp)
Query 2:
SELECT country, COUNT(*) as hits
FROM hits_table
WHERE timestamp >= CURDATE()
GROUP BY country
How can I make this more efficient?
If this table is indexed correctly, it honestly doesn't matter how big the entire table is because you're only looking at today's rows.
If the table is indexed incorrectly the performance of these queries will be terrible no matter what you do.
Your WHERE timestamp >= CURDATE() clause means you need to have an index on the timestamp column. In one of your queries the GROUP BY country shows that a compound covering index on (timestamp, country) will be a great help.
So, a single compound index (timestamp, country) will satisfy both the queries in your question.
Let's explain how that works. To look for today's records (or indeed any records starting and ending with particular timestamp values) and group them by country, and count them, MySQL can satisfy the query by doing these steps:
random-access the index to the first record that matches the timestamp. O(log n).
grab the first country value from the index.
scan to the next country value in the index and count. O(n).
repeat step three until the end of the timestamp range.
This index scan operation is about as fast as a team of ace developers (the MySQL team) can get it to be with a decade of hard work. (You may not be able to outdo them on a Saturday afternoon.) MySQL satisfies the whole query with a small subset of the index, so it doesn't really matter how big the table behind it is.
If you run one of these queries right after the other, it's possible that MySQL will still have some or all the index data blocks in a RAM cache, so it might not have to re-fetch them from disk. That will help even more.
Do you see how your example queries lead with timestamp? The most important WHERE criterion chooses a timestamp range. That's why the compound index I suggested has timestamp as its first column. If you don't have any queries that lead with country your simple index on that column probably is useless.
You asked whether you really need compound covering indexes. You probably should read about how they work and make that decision for yourself.
There's obviously a tradeoff in choosing indexes. Each index slows the process of INSERT and UPDATE a little, and can speed up queries a lot. Only you can sort out the tradeoffs for your particular application.
Since both queries have different GROUP BY clauses they are inherently different and cannot be combined. Assuming there already is an index present on the timestamp field there is no straightforward way to make this more efficient.
If the dataset is huge (10 million or more rows) you might get a little extra efficiency out of making an extra combined index on country, timestamp, but that's unlikely to be measurable, and the lack of it will usually be mitigated by in-memory buffering of MySQL itself if these 2 queries are executed directly after another.

How to create Effective Indexing on MySQL

Can anyone help me on how could i create an index so that my query will execute smoothly.
Currently, I have the below query that returns 8k+ or records. But it takes 2 sec or more to complete. The current records on tblproduction is 16million+
SELECT COUNT(fldglobalid) AS PackagesDone
FROM tblproduction
WHERE fldemployeeno = 'APD100401'
AND fldstarttime BETWEEN '2013-08-14 07:18:06' AND '2013-08-14 16:01:58'
AND fldshift = 'B'
AND fldprojectgroup = 'FTO'
AND fldGlobalID <> 0;
I have below current indexes but it still query executes longer
Index_1
fldEmployeeNo
fldStartTime
Index_2
fldEmployeeNo
fldTask
fldTaskStatus
Index_3
fldGlobalId
fldProjectGroup
Index_4
fldGlobalId
I have used all of this indexes using FORCE_Index but still the query executes longer.
Please advise, thanks!
This started as a comment Gordon Linoff's answer but is getting too long.
It would be better to include fldGlobalId in the index as well - no it would not - this is counter productive for performance - it won't improve the speed of retrieving the data (queries are not used for inequalities) but will lead to more frequent index updates, hence increased index fragmentation (hence potentially worsening the performance of SELECT) and reduced performance for inserts and updates.
Ideally you should design your schema to optimize all the queries - which is rather a large task, but since you've only provided one....
The query as it stands will only use a single index for resolution, hence the index should include all the fields which have predicates in the query except for non-matches (i.e. fldGlobalID).
The order of the fields is important: in the absence of other queries with different sets of predicates, then the fields with the highest relative cardinality should come first. It's rather hard to know what this is without analysing the data (SELECT COUNT(DISTINCT field)/COUNT(*) FROM yourtable) but at a guess the order should be fldstarttime, fldemployeeno, fldprojectgroup, fldshift
If there is a dependency on fldshift from fldemployeeno (i.e. employees always, or at least more than around 90% of the time) then including fldshift in the index is merely increasing it's size and not making it any more efficient.
You didn't say what type of index you're using - btrees work will with ranges, hashes with inequalities. Since the highest cardinality predicate here is using a range, then a btree index will be much more efficient than a hash based index.
You can use one index. Here is the query, slightly rearranged:
SELECT COUNT(fldglobalid) AS PackagesDone
FROM tblproduction
WHERE fldemployeeno = 'APD100401'
AND fldshift = 'B'
AND fldprojectgroup = 'FTO'
AND fldstarttime BETWEEN '2013-08-14 07:18:06' AND '2013-08-14 16:01:58'
AND fldGlobalID <> 0;
(I just moved the equality conditions together to the top of the where clause).
The query should make use of an index on tblproduction(fldemployeeno, fldshift, fldprojectgroup, fldstarttime). It would be better to include fldGlobalId in the index as well, so the index "covers" the query (all columns in the query are in the index). So, try this index:
tblproduction(fldemployeeno, fldshift, fldprojectgroup, fldstarttime, fldGlobalID)

Better Query to Select Count Where

I'm hosted at Godaddy and they are giving me a fit with a query that I have which is understandable. I am wondering if there is a better way to go about rewriting this query I have to make it resource friendly:
SELECT count(*) as count
FROM 'aaa_db'
WHERE 'callerid' LIKE '16602158641' AND 'clientid'='41'
I have over 1 million rows, and have duplicates therefore I wrote a small script to output the duplicates and delete them rather than changing tables etc.
Please help.
Having separate indexes on clientid and callerid causes MySQL to use just one index, or in some cases, attempt an index merge (MySQL 5.0+). Neither of these are as efficient as having a multi-column index.
Creating a multi-column index on both callerid and clientid columns will relieve CPU and disk IO resources (no table scan), however it will increase both disk storage and RAM usage. I guess you could give it a shot and see if Godaddy prefers that over the other.
The first thing I would try, is to ensure you have an index on the clientid column, then rewrite your query to look for clientid first, this should remove rows from consideration speeding up your query. Also, as Marcus stated, adding a multi-column index will be faster, but remember to add it as (clientid, callerid) as mysql reads indices from left to right.
SELECT count(*) as count
FROM aaa_db
WHERE clientid = 41 and callerid LIKE '16602158641';
Notice I removed the quotes from the clientid value, as it should be an int datatype, if it is not, converting to int datatype should also be faster. Also, I am not sure why you are doing a LIKE without the wildcard operator, if you could change that to an = that will also help.
I like Marcus' answer, but I would first try a single index on just callerid, because with values like 16602158641, how many rows can there be that match out of a million? Not very many, so the single index could match or exceed performance of the double index.
Also, remove LIKE and use =:
SELECT count(*) as count
FROM aaa_db
WHERE callerid = '16602158641' -- equals instead of like
AND clientid = '41'