MySQL: Optimize left join on formatted date - mysql

I'm trying to optimize the speed of this query:
SELECT t.t_date td, v.visit_date vd
FROM temp_dates t
LEFT JOIN visits v ON DATE_FORMAT(v.visit_date, '%Y-%m-%d') = t.t_date
ORDER BY t.t_date
v.visit_date is of type DATETIME and t.t_date is a string of format '%Y-%m-%d'.
Simply creating an index on v.visitdate didn't improve the speed. Therefore I intended to try the solution #oysteing gave here:
How to optimize mysql group by with DATE_FORMAT
I successfully created a virtual column by this SQL
ALTER TABLE visits ADD COLUMN datestr varchar(10) AS (DATE_FORMAT(visit_date, '%Y-%m-%d')) VIRTUAL;
However when I try to create an index on this column by
CREATE INDEX idx_visit_date on visits(datestr) I get this error:
#1901 - Function or expression 'date_format()' cannot be used in the GENERATED ALWAYS AS clause of datestr
What am I doing wrong? My DB is Maria DB 10.4.8
Best regards - Ulrich

date_format() cannot be used for persistent generated columns either. And in an index it cannot be just virtual, it has to be persisted.
I could not find an explicit statement in the manual, but I believe this is due to the fact that the output of date_format() can depend on the locale and isn't strictly deterministic therefore.
Instead of date_format() you can build the string using deterministic functions such as concat(), year(), month(), day() and lpad().
...
datestr varchar(10) AS (concat(year(visit_date),
'-',
lpad(month(visit_date), 2, '0'),
'-',
lpad(day(visit_date), 2, '0')))
...
But as I already mentioned in a comment, you're fixing the wrong end. Dates/times should never be stored as strings. So you should rather promote temp_dates.t_date to a date and use date() to extract the date portion of visit_date in the generated, indexed column
...
visit_date_date date AS (date(visit_date))
...
And you might also want to try to also index temp_dates.t_date.

Does this work for you?
SELECT t.t_date td, v.visit_date vd
FROM temp_dates t
LEFT JOIN visits v ON DATE(v.visit_date) = DATE(t.t_date)
ORDER BY t.t_date
If so, there's a workable solution to your problem:
Add a DATE column using the deterministic DATE() function on your visit_date object. Like this.
ALTER TABLE visits ADD COLUMN dateval DATE AS (DATE(visit_date)) VIRTUAL;
CREATE INDEX idx_visit_date on visits(dateval);
Then create a virtual column in the other table (the one with the nicely formatted dates jammed into your VARCHAR() column.
ALTER TABLE temp_dates ADD COLUMN dateval DATE AS (DATE(t_date)) VIRTUAL;
CREATE INDEX idx_temp_dates_date on temp_dates (dateval);
This works because DATE() is deterministic, unlike DATE_FORMAT().
Then your query should be.
SELECT t.t_date td, v.visit_date vd
FROM temp_dates t
LEFT JOIN visits v ON v.dateval = t.dateval
ORDER BY t.t_date
This solution gives you indexes on (virtual) DATE columns. That's nice because index matching on such columns is efficient.
But, your best solution is to change the datatype of temp_date.t_date from VARCHAR() to DATE.

DATE_FORMAT(expr, format) cannot be used in virtual columns as it depends on the locale of the connection (MariaDB issue MDEV-11553).
A 3 argument form was created to date_format that adds locale.
DATE_FORMAT(visit_date, '%Y-%m-%d', 'en_US') is possible to use in virtual column expressions in MariaDB-10.3+ stable versions.
Using DATE or altering your query not to use functions around column expressions is definitely recommended.

Functions are not "sargeable".
Consider:
ON v.visit_date >= t.t_date
AND v.visit_date < t.t_date + INTERVAL 1 DAY

Related

ORDER BY STR_TO_DATE() not working in phpmyadmin sql

I have dates in varchar type like:
201601
201602
201603
201701
201702 and so on
I am trying to view all my records where the dates are in ascending order. So I am using this query:
SELECT * FROM emp_pp GROUP BY YEARMM ORDER BY STR_TO_DATE(YEARMM,'%Y%m')
Here YEARMM is my column name. The query isn't working properly when I run it. Instead I keep getting all these notices:
Incorrect datetime value: '201601' for function str_to_date
Incorrect datetime value: '201602' for function str_to_date...
Why is that? Please help me
As mentioned by Akina, STR_TO_DATE function requires enough data to generate at minimum a full date value to work correctly. You do not have that (you can not have a date 2016-02-00, for example).
What you do have is a numerical Year and Month integer, 201601, 201602, 201603, 201701 etc. which orders exactly as you want by standard numerical ordering so all you need to do is remove the STR_TO_DATE part entirely:
SELECT * FROM emp_pp GROUP BY YEARMM ORDER BY YEARMM ASC /* Oldest date first */
Of note:
SELECT * is highly inefficient. You should name each column you want to collect.
Column names in SQL should not be upper case, as this is hard to read with the correct case syntax, SQL column should be only lower case; yearmm would be more readable in your SQL code.

Optimize query with large number of data on mysql

I have a more than 10 million data from my table and needs to pull it in order to display in the report. The origin of data was extracted from csv and all of them are in text format. and here is how it looks from my table:
I tried to query with limit on 1000 only and it will display quickly however If I am going to have a date filter for e.g getting 1 day data and it will take around 25-30 secs:
SELECT STR_TO_DATE(SUBSTRING_INDEX(time, '_', 1), '%m/%d/%Y') FROM myTable
WHERE STR_TO_DATE(SUBSTRING_INDEX(time, '_', 1), '%m/%d/%Y') BETWEEN DATE('2019-9-3') AND DATE('2019-9-3');
I already tried to create an index on time column which I am using for filter but still got the same result:
Is there any suggestion/comments how can I improve the speed to pull the data. TIA
When you apply functions to a column as part of your search, it can't use an index, even if you define an index for that column.
You should also use a proper DATE or DATETIME data type for the column, which will require dates be stored in YYYY-MM-DD format, not a string column in MM/DD/YYYY format.
If you store the dates properly, you can do this:
SELECT DATE(time) FROM myTable
WHERE time >= '2019-09-03' AND time < '2019-09-04';
That will make use of the index.
You are storing your dates/timestamps as text, which is going to force you to doing suboptimal things like calling STR_TO_DATE all over the place. I suggest adding a new bona fide datetime column, and then indexing that column:
ALTER TABLE myTable ADD COLUMN time_dt DATETIME;
Then, populate it using STR_TO_DATE:
UPDATE myTable
SET time_dt = STR_TO_DATE(time, '%m/%d/%Y_%H:%i:%s.%f');
Then, add an index on time_dt:
CREATE INDEX idx ON myTable (time_dt);
And finally, rewrite your query so that the WHERE clause is sargable (i.e. so that it may use the above index):
SELECT DATE(time_dt)
FROM myTable
WHERE time_dt >= '2019-09-03' AND time_dt < '2019-09-04';
Side note: You need to use %H in the format mask with STR_TO_DATE, because your hours are in 24-hour clock mode.

sql error using between operator with date between two dates

i have an events table having start date and end date I am trying retrieve all the records by giving a date that is between start and end dates.
eg :
SELECT *
FROM `events`
WHERE '2017-01-29' BETWEEN start_date='2017-01-28'
AND end_date='2017-01-31'
but response is syntax error can any one help me to finish the query
Just list the columns.
WHERE '2017-01-29' BETWEEN start_date AND end_date
The values come from the table, you don't put them into the query.
According to mysql documentation (https://dev.mysql.com/doc/refman/5.7/en/comparison-operators.html#operator_between) the syntax for BETWEN is
expr BETWEEN min AND max
it is not
expr BETWEEN blabla=min AND stuff=max
Also, it is rather pointless to be using constants in all three expressions, because in this case the result will be known in advance (either always TRUE or always FALSE) without having to consult the values in your table.
It is kind of hard to give you an example without knowing the structure of your table, but what you probably want is something like
WHERE '2017-01-29' BETWEEN start_date
AND end_date
(assuming start_date and end_date are columns in your table)
or something like
WHERE some_column BETWEEN '2017-01-28'
AND '2017-01-31'
(assuming some_column is a column in your table.)
I believe you're trying to find all the rows where a date is 2017-01-29, and so, your query could be:
SELECT *
FROM `events`
WHERE
date = '2017-01-29';
If, however, you want all rows with date between 2017-01-28 and 2017-01-31, then you could do:
SELECT *
FROM `events`
WHERE
date BETWEEN '2017-01-28' AND '2017-01-31';
Instead of putting 2017-01-29 before WHERE, put the name of the field you want to filter by date, such as EventDate (or whatever your field is named).

MySQL query to find most sold products within a time period

I have two tables. Here are some fields that will be used in query.
tgc_sales (code,sale_date_time, order_status)
tgc_sales_items (sales_code, quantity, product_name, product_code)
What I want is to get the most sold products within a specific period of time (say a week, a month etc).
What I tried so far:
SELECT SUM(tsi.quantity) AS quantity, tsi.product_name AS product_name
FROM tgc_sales_items tsi
JOIN tgc_sales ts ON ts.code = tsi.sales_code
WHERE DATE(ts.sale_date_time)>='2014-05-01'
AND ts.order_status='Completed'
GROUP BY tsi.product_code
ORDER BY quantity DESC LIMIT 10
Obviously, the query is wrong and it is giving me unexpected result. When I ignore the JOIN and WHERE clause it shows me the most sold products but I need it for when the sales was made in a specific period and I can't figure out how to do it.
The query looks right, assuming that code is a unique (or primary) key on the tgc_sales table, and sales_code is a foreign key reference to that column.
The use of the DATE() function seems a bit odd.
If the sales_date_time column has a datatype of DATE, DATETIME, or TIMESTAMP, then the DATE() function isn't needed, and it's not desirable because it disables MySQL's ability to use an index range scan to satisfy the predicate.
If the sales_date_time is character, and your intent is to convert the character into a DATE, you'd use the STR_TO_DATE() function. But you don't really want to store sales_date_time as a character string.
If that's a DATETIME column, you'd do something like this:
WHERE ts.sale_date_time >= '2014-05-01'
AND ts.sale_date_time < '2014-06-01'
If it's a character column in a non-canonical format (e.g. 'mm/dd/yyyy hh:mi:ss'), then you could do something like:
WHERE STR_TO_DATE(ts.sale_date_time,'%m/%d/%Y %h:%i:%s') >= '2014-05-01'
AND STR_TO_DATE(ts.sale_date_time,'%m/%d/%Y %h:%i:%s') < '2014-06-01'
(But you don't really want to store a date time value in a character string; you want to use one of the MySQL datatypes like DATETIME.)
If you're storing a unix-style timestamp "seconds since the beginning of an era", then you'd want to do the comparison against the native column values. You could do something like this:
WHERE ts.sale_date_time >= UNIX_TIMESTAMP('2014-05-01')
AND ts.sale_date_time < UNIX_TIMESTAMP('2014-06-01')
...though you'd really prefer to use the same library used to do the conversion when the values were stored, and do the query more like this:
WHERE ts.sale_date_time >= 1398902400
AND ts.sale_date_time < 1401580800
Your group by seems suspicious. It includes product_code column which is not in the select clause.

SQL Server: Want to use between clause with dates, but dates in string form (YYYY.MM.DD)

Help! One column in my database is for dates. All of my dates are unfortunately in the String form (YYYY.MM.DD). I have a MASSIVE database (300+GB) so ideally would like to avoid transformations.
Is there a way I can select rows for dates in between YYYY.MM.DD and YYYY.MM.DD? What would the script look like?
Thank you!
If the months and days are stored with leading zeroes, the BETWEEN operator will work as expected. So will ORDER BY.
create table your_table (
date_value varchar(10) not null
);
insert into your_table values
('2013.01.01'), ('2013.01.13'), ('2013.01.30'), ('2013.01.31'),
('2013.02.01'), ('2013.02.13'), ('2013.02.28'), ('2013.02.31'),
('2013.03.01'), ('2013.03.15'), ('2013.03.30'), ('2013.03.31');
select date_value
from your_table
where date_value between '2013.01.01' and '2013-01-31'
order by date_value;
2013.01.01
2013.01.13
2013.01.30
One of the main problems with your structure is that you lose type safety. Look at this query.
select date_value
from your_table
where date_value between '2013.02.01' and '2013.02.31'
order by date_value;
2013.02.01
2013.02.13
2013.02.28
2013.02.31
If you'd used a column of type date or datetime or timestamp, the dbms would not have allowed inserting the values '2013.02.31', because that's not a value in the domain of date. It is a value in the domain of varchar. (And so is "Arrrrgh!", unless you've got a CHECK constraint on that column that severely restricts the acceptable values.)
Not good solution, but works (cost much performance).
You have formated date in order year, month, day (good order to compare strings, without transformation to datetime), so you can try
SELECT * FROM Table WHERE StringDate > '2013.07.10' AND StringDate < '2013.07.14'
It returns bad results if there are dates before year 1000 without leading zero ('999.07.14').
But I dont know how it works on big database.
SQL Fiddle
Between in SQL is inclusive of both bounds. If that is what you want, you can just use between:
where col between 'YYYY.MM.DD' and 'YYYY.MM.DD'
Where the two constants are whatever values you are looking for.
If you have an index on the column, then between (as well as >, >=, and so on) will use the index. You do not need to transform the values. If your constants are dates of one form or another, then you can use date_format() to create a string in the right format. For instance, to get dates within the past week:
where col >= date_format(adddate(now(), -7), '%Y.%m.%d')