When we insert a new row in the DB, does it always do a full table scan in MySQL
as I think since all the columns are inserted newly,
there will be no use of any index as such
explain plan looks something like this:
{
"query_block": {
"select_id": 1,
"table": {
"insert": true,
"table_name": "my_table",
"access_type": "ALL"
}
}
}
Related
I stumbled upon this today and was quite shocked. When searching Google I normally see the question is revered as in - using limit caused it to return slower.
I have a MySQL table with a few million rows in it.
The PK is id and as such it's a unique index.
When I performed a query of the form select a, b, c, ... from table where id in (1, 2, 3, ..., 5000) it took about 15-20 minutes to fetch all results.
However, when I simply added limit 1000000 at the end (I used an extremely larger number than needed on purpose), it returned in a few seconds.
I know that using limit with smaller numbers than returned help as it's returning as soon as the "quota" is filled, but here I can't find the reason for such a dramatic improvement.
Can anyone please explain it?
Should I just add a limit to every query to improve its performance?
Why doesn't MySQL searches with and without it the same?
Update
Per requested the explain for each:
With limit (takes a few seconds)
{
"id" : 1,
"select_type" : "SIMPLE",
"table" : "table",
"partitions" : null,
"type" : "range",
"possible_keys" : "PRIMARY",
"key" : "PRIMARY",
"key_len" : "4",
"ref" : null,
"rows" : 4485,
"filtered" : 100.0,
"Extra" : "Using where"
}
Without limit (takes 15-20 minutes)
{
"id" : 1,
"select_type" : "SIMPLE",
"table" : "table",
"partitions" : null,
"type" : "ALL",
"possible_keys" : "PRIMARY",
"key" : null,
"key_len" : null,
"ref" : null,
"rows" : 69950423,
"filtered" : 50.0,
"Extra" : "Using where"
}
I'm not fluent in this but it looks like it used the key when I used limit but it didn't when I ran it without it.
Possibly other differences in the filtered and type fields which I don't know what they mean.
How come?
Update 2
A lot of questions asked so I'll attempt to provide details for all.
The MySQL version is 8.0.28 and the table engine is InnoDB.
I've ran the tests a few times one after the other, not only once.
Running the same EXPLAIN with fewer (10) values in the IN clause returned the same result for both with limit and without it!
{
"id" : 1,
"select_type" : "SIMPLE",
"table" : "table",
"partitions" : null,
"type" : "range",
"possible_keys" : "PRIMARY",
"key" : "PRIMARY",
"key_len" : "4",
"ref" : null,
"rows" : 10,
"filtered" : 100.0,
"Extra" : "Using where"
}
Now the FORMAT=JSON (with redacted parts):
Without limit
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "8369910.88"
},
"table": {
"table_name": "table",
"access_type": "ALL",
"possible_keys": [
"PRIMARY"
],
"rows_examined_per_scan": 70138598,
"rows_produced_per_join": 35069299,
"filtered": "50.00",
"cost_info": {
"read_cost": "4862980.98",
"eval_cost": "3506929.90",
"prefix_cost": "8369910.88",
"data_read_per_join": "558G"
},
"used_columns": [...],
"attached_condition": "(`db`.`table`.`id` in (...))"
}
}
}
With limit
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "8371410.92"
},
"table": {
"table_name": "table",
"access_type": "range",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "4",
"rows_examined_per_scan": 4485,
"rows_produced_per_join": 35069255,
"filtered": "100.00",
"cost_info": {
"read_cost": "4864485.17",
"eval_cost": "3506925.54",
"prefix_cost": "8371410.92",
"data_read_per_join": "558G"
},
"used_columns": [...],
"attached_condition": "(`db`.`table`.`id` in (...))"
}
}
}
As there is a very long thread under the post in the comments, I will just add the answer here that is both mine and #Bill and it looks like the issue is a very long argument list in IN() part of the statement.
The culprit is to change range_optimizer_max_mem_size parameter number to accommodate more inputs in IN, as exceeding that parameter will cause a full table scan.
range optimize is reserving memory for range scanning, so not having enough of that memory set - will result full table scan
Now why does the LIMIT clause makes it happen - This part I would guess:
LIMIT is forcing MySQL to use a different range scan type
LIMIT is actually limiting the number of resources that will be returned so MySQL would know it will not return more than X, where without limit it would assume it can return 69950423 which would be more than some other memory limits that you set up, worth trying with limit equals number of rows in the table
I would like to form a nested aggregation type query in elastic search. Basically , the nested aggregation is at four levels.
groupId.keyword
---direction
--billingCallType
--durationCallAnswered
example:
"aggregations": {
"avgCallDuration": {
"terms": {
"field": "groupId.keyword",
"size": 10000,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"call_direction": {
"terms" : {
"field": "direction"
},
"aggregations": {
"call_type" : {
"terms": {
"field": "billingCallType"
},
"aggregations": {
"avg_value": {
"terms": {
"field": "durationCallAnswered"
}
}
}
}
}
}
}
}
}
This is part of a query . While running this , I am getting the error as
"type": "illegal_argument_exception",
"reason": "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [direction] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
Can anyone throw light on this?
Tldr;
As the error state, you are performing an aggregation on a text field, the field direction.
Aggregation are not supported by default on text field, as it is very expensive (cpu and memory wise).
They are 3 solutions to your issue,
Change the mapping from text to keyword (will require re indexing, most efficient way to query the data)
Change the mapping to add to this field fielddata: true (flexible, but not optimised)
Don't do the aggregation on this field :)
I have a MySQL database with about 12Mio records. Now I use the following query to query the required rows from that database:
SELECT date_time, price_l0, amount_l0, price_l1, amount_l1, price_l2, amount_l2, price_l3, /* 34 more columns */
FROM book_states
WHERE date_time > ? and
date_time < ? and
bookID = ?
ORDER BY date_time ASC
LIMIT 4350
The problem is when I use a LIMIT of about 4340 this query takes about 0.002/0.15 seconds to run. However, if I use a limit of say 4350 it takes 3.0/0.15 seconds (!) to run.
If I select fewer columns that threshold between a very fast and a very slow query is slighty higher, but it takes 3seconds or more even if I select only one columns if the LIMIT is above 5000.
Now I suspect this is an MySQL setup problem or some sort of RAM limitation, but since I am not a MySQL expert by any means, I'm asking you to explain what causes this drastic performance issue.
EDIT:
This is the JSON Explain data for a query that is taking 3sec
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "282333.60"
},
"ordering_operation": {
"using_filesort": true,
"table": {
"table_name": "book_states",
"access_type": "ref",
"possible_keys": [
"index1",
"index2",
"index3"
],
"key": "index2",
"used_key_parts": [
"bookID"
],
"key_length": "2",
"ref": [
"const"
],
"rows_examined_per_scan": 235278,
"rows_produced_per_join": 81679,
"filtered": "34.72",
"index_condition": "(`datastore`.`book_states`.`bookID` <=> 29)",
"cost_info": {
"read_cost": "235278.00",
"eval_cost": "16335.84",
"prefix_cost": "282333.60",
"data_read_per_join": "14M"
},
"used_columns": [
"id",
"date_time",
"bookID"
],
"attached_condition": "((`datastore`.`book_states`.`date_time` > '2018-09-28T16:18:49') and (`datastore`.`book_states`.`date_time` < '2018-09-29T23:18:49'))"
}
}
}
}
The best index for your query is on: (bookID, date_time). Note the order of the columns, it is quite important.
MySQL is struggling to optimize your query with the indexes on-hand. It can select the records, using the date_time part of your mentioned index (or using an index on bookId) and then sort the results.
Or, it can scan your compound index (which has records ordered by date/time), filtering out the unneeded books as it goes.
Choosing between these two methods is what you are (presumably) seeing. Which is better depends on the gathered statistics, and they necessarily provide only partial information.
So, switch the columns in the index and the problem should go away, at least for this particular query.
I am trying to update my mysql table and insert json data to my mysql table's json-datatype column using JSON_INSERT. Here is the structure of my column.
{
"Data":
[{
"Devce": "ios",
"Status": 1
}]
}
This is the query I am using to insert more data to this field.
UPDATE table SET `Value` = JSON_INSERT
(`Value`,'$.Data','{\"Device\":\"ios\",\"Status\":1}') WHERE Meta = 'REQUEST_APP'
This is supposed to update the field to this:
{
"Data":
[{
"Devce": "ios",
"Status": 1
},
{
"Devce": "ios",
"Status": 1
}
]
}
But instead it the result is:
0 rows affected. (Query took 0.0241 seconds.)
Any help regarding this would be appreciated.
JSON_APPEND serves your purpose better JSON_APPEND docs
I'm trying perform an elasticsearch query as a POST request in order pull data from the index which I created. The data which is in the index is, a table from MySQL DB, configured though logstash.
Here is my request and the JSON body:
http://localhost:9200/response_summary/_search
Body:
{
"query": {
"query_string": {
"query": "transactionoperationstatus:\"charged\" AND api:\"payment\" AND operatorid:\"XL\" AND userid:*test AND time:\"2015-05-27*\" AND responsecode:(200+201)"
}
},
"aggs": {
"total": {
"terms": {
"field": "userid"
},
"aggs": {
"total": {
"sum": {
"script": "Double.parseDouble(doc['chargeamount'].value)"
}
}
}
}
}
}
In the above JSON body, I'm in need to append the timestamp into the query_string in order get the data from the index within a date range. I tried adding at the end of the query as:
AND timestamp:[2015-05-27T00:00:00.128Z+TO+2015-05-27T23:59:59.128Z]"
Where am I going wrong? Any help would be appreciated.
You just need to remove the +as they are only necessary when sending a query via the URL query string (i.e. to URL-encode the spaces), but if you use the query_string query, you don't need to do that
AND timestamp:[2015-05-27T00:00:00.128Z TO 2015-05-27T23:59:59.128Z]"
^ ^
| |
remove these