I wonder if there is a possibility in MySql/MariaDB to make use of an index in a query directly. Suppose we have a simple unsorted table with timestamp/value-pairs:
CREATE TABLE simple (timestamp DATETIME, val INT);
By adding an index for the timestamp:
ALTER TABLE simple ADD INDEX ind_ts (timestamp);
we have a "fast access" to a kind of sort order of the timestamps.
Let's define a query that delivers the difference of values of consecutive values:
SELECT
c.timestamp AS currenttimestamp,
COALESCE(b.timestamp,'0000-00-00 00:00:00') AS timestampbefore,
c.val - COALESCE(b.val,0) AS difference
FROM simple c,
simple b
WHERE b.timestamp = (SELECT MAX(timestamp) FROM simple h WHERE h.timestamp < c.timestamp)
It is obvious that this query is circumstantial and expensive. A more convient way would be adding a column myindex to the table:
ALTER TABLE simple ADD COLUMN (myindex INT) AFTER timestamp;
and fill the new column with the chronical order of timestamp (e.g. by some php-code)
The new query would be simpler and less expensive:
SELECT
c.timestamp AS currenttimestamp,
COALESCE(b.timestamp,'0000-00-00 00:00:00') AS timestampbefore,
c.val - COALESCE(b.val,0) AS difference
FROM simple c
LEFT JOIN simple b ON c.myindex = b.myindex+1
The new column myindex is somehow similar to the database's table index ind_ts. (Why) is there no MySql construct to use ind_ts instead of myindex?
If you are using MySQL 8.0 or MariaDB 10.2 (or later), LEAD() and LAG() provide a simple way to see the row before or after.
If you are using an older version, then do a "self join". That is, JOIN the table to itself. Then line up the two "tables" offset by one. This may require generating a temp table with a fresh AUTO_INCREMENT to provide an easy way to do the offset. This may be slightly better than your idea about "myindex".
CREATE TABLE new_table (
myindex INT UNSIGNED AUTO_INCREMENT,
PRIMARY KEY(myindex))
SELECT * FROM simple;
Then
SELECT a.ts - b.ts AS diff -- (or whatever the math is)
FROM new_table AS a
JOIN new_table AS b ON a.myindex = b.myindex - 1
(This does not take care of the first and last rows of the table.)
Note: You cannot use a TEMPORARY TABLE since such cannot be JOINed to itself.
Related
I have the following query that runs forever and I am looking to see if there is anyway that I can optimise it. This is running on a table that has in total 1,406,480 rows of data but apart from the Filename and Refcolumn, the ID and End_Date have both been indexed.
My Query:
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
Explain Results:
The reason for not indexing the Ref_No is that this is a text column and therefore I get a BLOB/TEXT error when I try and index this column.
Would really appreciate if somebody could advise on how I can quicken this query.
Thanks
Thanks to Bill in regards to multi column indexes I have managed to make some headway. I first ran this code:
CREATE INDEX I_DELETE_DUPS ON master_table(id, End_Date);
I then added a new column to show the length of the Ref_No but had to change it from the query Bill mentioned as my version of MySQL is 5.5. So I ran it in 3 steps:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED;
UPDATE master_table SET Ref_No_length = LENGTH(Ref_No);
ALTER TABLE master_table ADD INDEX (Ref_No_length);
Last step was to change my insert query with the where clause for the length. This was changed to:
AND t1.Ref_No_length between 5 and 10;
I then ran this query and within 15 mins I had 280k worth of id's inserted into my UniqueIDs table. I did go change my insert script to see if I could add more values to the length by doing the following:
AND t1.Ref_No_length IN (5,6,7,8,9,10,13);
This was to bring in the values where length was also equal to 13. This query took a lot longer, 2hr 50 mins to be precise but the additional ask of looking for all rows that have length of 13 gave me an extra 700k unique ids.
I am looking at ways to optimise the query with the IN clause, but a big improvement where this query kept running for 24 hours. So thank you so much Bill.
For the JOIN, you should have a multi-column index on (Ref_No, End_Date, Filename).
You can create a prefix index on a TEXT column like this:
ALTER TABLE master_table ADD INDEX (Ref_No(10));
But that won't help you search based on the LENGTH(). Indexing only helps search by value indexed, not by functions on the column.
In MySQL 5.7 or later, you can create a virtual column like this, with an index on the values calculated for the virtual column:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED AS (LENGTH(Ref_No)),
ADD INDEX (Ref_No_length);
Then MySQL will recognize that your condition in your query is the same as the expression for the virtual column, and it will automatically use the index (exception: in my experience, this doesn't work for expressions using JSON functions).
But this is no guarantee that the index will help. If most of the rows match the condition of the length being between 5 and 10, the optimizer will not bother with the index. It may be more work to use the index than to do a table-scan.
the ID and End_Date have both been indexed.
You have PRIMARY KEY(id) and redundantly INDEX(id)? A PK is a unique key.
"have both been indexed" -- INDEX(a), INDEX(b) is not the same as INDEX(a,b) -- they have different uses. Read about "composite" indexes.
That query smells a lot like "group-wise" max done in a very slow way. (Alas, that may have come from the online docs.)
I have compiled the fastest ways to do that task here: http://mysql.rjweb.org/doc.php/groupwise_max (There are multiple versions, based on MySQL version and what issues your code can/cannot tolerate.)
Please provide SHOW CREATE TABLE. One important question: Is id the PRIMARY KEY?
This composite index may be useful:
(Filename, End_Date, Ref_No, -- first, in any order
ID) -- last
This, as others have noted, is unlikely to be helped by any index, hence T1 will need a full-table-scan:
AND LENGTH(T1.Ref_No) BETWEEN 5 AND 10
If Ref_No cannot be bigger than 191 characters, change it to a VARCHAR so that it can be used in an index. Oh, did I ask for SHOW CREATE TABLE? If you can't make it VARCHAR, then my recommended composite index is
INDEX(Filename, End_Date, ID)
I have a table with [date] index.
[date] column:
'2015-01-05'
'2015-01-06'
and etc
Can I create new index for function index?
When i am trying to create it I get error, example:
create index date_y on table (year(date))
If I could we didn't recreate queries for program performance.
Yes and no. No, you cannot create an index on an expression in this fashion. However, if you happen to have mysql v5.7.8 or newer, then you can create generated columns and you can create a secondary index on them (secondary index means that a generated column cannot be part of a primary key).
So, create your expression as a generated column and then create an index on it - if you have mysql v5.7.8 or newer.
One moment, when I have:
date is created index on tableX
id is created index on tableX
id is created index on tableY
My query:
Select * from tableX as x left outer join tableY as y on x.id=y.id
where year(x.date)=2015 and month(x.date)=11
Should I recreate date to:
create index date on tableX (date,id)
or some else?
I have a table which contains the following columns, and I created indexes for them:
dt date And tm time
Now I'm trying to tell if there's any records between certain timestamp:
explain SELECT count(*) from XX
where dt != CURRENT_DATE
and tm BETWEEN '14:00:00' AND '14:30:00'
And explain shows the key is NULL here, why is that?
If I substitute the first time value with number zero, the explain phrase shows the index is in use.
P.S I just tried more of it. If I wrap it with exists the index is in use again.
Try to create Multiple-Column Index.
Create index dt_tm_ix on XX (dt, tm)
I started with this question: is my large mysql table destined for failure?
The answer that I found from that question was satisfactory. I have a table with 22 million rows that I would like to grow to about 100 million. At this time, the table minute_data structure is like this:
A problem that I am having is as follows. I need to execute this query:
select datediff(date,now()) from minute_data where symbol = "CSCO" order by date desc limit 1;
Which is very fast ( < 1 sec ) when the table contains the value "CSCO". The problem is, sometimes I will query for a symbol that is not in the table already. When I execute a query like this for, say, symbol = "ABCD":
select datediff(date,now()) from minute_data where symbol = "ABCD" order by date desc limit 1;
Then the query takes a LONG TIME... like forever ( 180 seconds ).
A way I can get around this is by making sure that the table contains the symbol I am looking for before I execute the query. The fastest way I found to do this is with the follow query, which I just need to use to check to see if the table minute_data contains the symbol I am looking for or not. Basically I just need it to return a boolean value so I know if the symbol is in the table or not:
select count(1) from minute_data where symbol = "CSCO";
This query takes over 30 seconds to return 1 value, way too long for my liking, since the query above, which actually returns a datediff calculation only takes less than 1 second.
symbol column is part of the pri key, I thought it should be able to figure out if a value exists there very quickly.
What am I doing wrong? Is there a fast way to do what I want to do? Should I change the structure of the data to optimize performance?
Thank You!
UPDATE
I think I found a good solution to this problem. From the answer below by LastCoder, I did the following:
1) Created a new table called minute_data_2 with the exact same definition as minute_data.
2)
ALTER TABLE minute_data_2 ADD PRIMARY KEY (symbol, date);
3)
INSERT IGNORE INTO minute_data_2 SELECT * FROM minute_data;
4)
DROP TABLE minute_data;
5) Rename minute_data_2 to minute_data
Now I am seeing blindingly fast speed for the same query which I described above as taking more than 180 second, now completes in .001 seconds. Amazing.
Did you try using EXISTS (...)
select datediff(date,now()) from minute_data
where EXISTS(SELECT * FROM minute_data WHERE symbol = "CSCO")
AND symbol = "CSCO" order by date desc limit 1;
Even though symbol is a primary key, it seems you have the timestamp as a PK as well which makes me think you are using a COMPOSITE pk which means the ordering is by timestamp then symbol. You may want to put separate index on symbol, if all you have is a composite one where timestamp is first.
I think is better to make a table named symbols and add a reference to that table in your minute_data table:
symbols:
symbol_id (INT, Primary Key, Auto Increment)
symbol_text (VARCHAR)
minute_data:
key_col (BIGINT, Primary Key, Auto Increment)
symbol_id (INT, Index)
other_field
Use InnoDB as table type for adding references.
Try to avoid duplicate entries into your tables..
I have the following data in a MySQL table table:
ID: int(11) [this is the primary key]
Date: date
and I run the MySQL query:
SELECT * from table WHERE Date=CURDATE() and ID=1;
This takes between 0.6 and 1.2 seconds.
Is there any way to optimize this query to get results quicker?
My objective is to find out if I already have a record for today for this ID.
Add indexes on ID and Date.
See CREATE INDEX manual.
You could add a limit 1 at the end, since you are searching for a primary key the max results is 1.
And if you only want to know wether it exists or not you could replace * with ID to select only the ID.
Furthermore, if you haven't already, you really need to add indexes.
SET #cur_date = CURDATE()
...WHERE Date = #cur_date ...
and then create an index of Date, ID (order is important, it should match the order you query on).
In general, calling functions before you do the query and storing them to variables lets SQL treat them like numbers instead of functions, which tends to allow it to use a faster query algorithm.