speed up join on string field - mysql

I have a table with data like the tableA example below. the date column is formatted as a string.
the close column is an integer, ticker is formatted as string.
I'm trying to run the query below on a mysql database and it is taking a very long time.
is there anything I can do to speed this up, like changing the format of the date column, or adding
indices or primary keys? The combination of ticker and date should be a unique value, and the date field is a timestamp, it's just currently formatted as string.
code:
select avg((a.close-b.close)/b.close) as avg_annual_returns,
a.ticker
from tableA a
join tableA b
on cast(a.date as date)=date_add(cast(b.date as date),interval 365 DAY)
and a.ticker=b.ticker
where b.close is not null
group by a.ticker
tableA
+--------+-----+------+
|date |close|ticker|
+--------+-----+------+
|2/1/2019|5 |abc |
+--------+-----+------+
|2/3/2019|7 |efd |
+--------+-----+------+
|2/4/2019|3 |hij |
+--------+-----+------+
update answer:
select ticker,date, ( -1 +
a.close / max(a.close) over (partition by ticker
order by date
range between interval 365 day preceding and interval 365 day preceding
)
) as annual_returns
from tableA a
) b where annual_returns is not null
group by ticker

If you want the difference from a year ago, then use window functions. Before that, though, fix the data model! Do not store dates as strings. So:
alter table talbeA modify column date date;
Then to get the close from a year ago:
select( -1 +
a.close / max(a.close) over (partition by ticker
order by date
range between interval 365 day preceding and interval 365 day preceding
)
)
from tablea a;
You don't have to worry about NULL values because AVG() ignores them.
Here is a db<>fiddle.

The problem is here:
on cast(a.date as date)=date_add(cast(b.date as date),interval 365 DAY)
Both sides of that are not "sargeable", so it can't use any index.
Assuming date is datatype DATE, then this works:
ON a.date = b.date - INTERVAL 1 YEAR
Also, have
INDEX(ticker, date) -- in this order.
Note: 1 YEAR will hiccup around Feb 28; 365 DAY will hiccup for 366 days ever 4 years.
Also, change
where b.close is not null
to
WHERE b.ticker IS NOT NULL
(Functionally they are the same, but I picked a column in the index, just in case it matters.) Well, OK, are there any rows where close is NULL?
Oh, another issue. Because of weekends, only 3 or 4 days of each week can find a matching day in the previous year.

Related

MySQL searching timestamp columns by date only

I am building out a query to search a table by a timestamp column value. An example of the format I am passing to the api is 2018-10-10. The user has the ability to select a date range. Often times the date range start date is 2018-10-10 and end date is the same day, 2018-10-10. The below doesn't seem to do the trick. What is the simplest way to accomplish this without having to specify the time? Obviously, I'd like to query for the entire day of 2018-10-10 from start to end of day.
SELECT
count(*)
FROM
contact
WHERE
created_at >= '2018-10-10'
AND created_at <= '2018-10-10';
The problem here is that Timestamp datatype will have HH:MM:SS (time) values also. While comparing a datetime with date, MySQL would automatically assume 00:00:00 as HH:MM:SS for the date value.
So, 2018-10-10 12:23:22 will not match the following condition: created_at <= '2018-10-10'; since it would be treated as: 2018-10-10 12:23:22 <= '2018-10-10 00:00:00, which is false
To handle this, you can add one day to the date (date_to in the filter), and use < operator for range checking.
SELECT
count(*)
FROM
contact
WHERE
created_at >= '2018-10-10'
AND created_at < ('2018-10-10' + INTERVAL 1 DAY);

Dynamic due date finder in a single query

id start_date interval period
1 2018-01-22 2 month
2 2018-02-25 3 week
3 2017-11-24 3 day
4 2017-07-22 1 year
5 2018-02-25 2 week
the above is my table data sample. start_dates will be expired based on interval and period(i.e id-1 will have due date after 2 months from the start_date, id-2 will have due after 3 weeks vice versa). period is enum of (day,week,month,year). requirement is, Client can give any period of dates. let's say 25-06-2026 to 13-07-2026 like that.. I have to return the ids whose due dates falls under that period.I hope i made my question clear.
I am using mysql 5.7. I found a way to achieve this with recursive CTE's.(not available in mysql 5.7). and there is a way to achieve this by populating virtual records by using inline sub queries along with unions but its a performance killer and we can't do populate virtual records every time a client request comes.(like given in the link Generating a series of dates) I have reached a point to get results for a single date which is very easy. Below is my query.
SELECT b.*
FROM (SELECT a.*,
CASE
WHEN period = 'week' THEN MOD(Datediff('2018-07-22', start_date), 7 * intervals)
WHEN period = 'month'
AND Day('2018-07-22') = Day(start_date)
AND MOD(Period_diff(201807, Extract(YEAR_MONTH FROM start_date)), intervals) = 0 THEN 0
WHEN period = 'year'
AND Day('2018-07-22') = Day(start_date)
AND MOD(Period_diff(201807, Extract(
YEAR_MONTH FROM start_date)) / 12,
intervals) = 0 THEN 0
WHEN period = 'day' THEN MOD(Datediff('2018-07-22', start_date) , intervals)
end filters
FROM kml_subs a)b
WHERE b.filters = 0;
But I need to do this for a period of dates not a single date. Any suggestions or solutions will be much appreciated.
My desired result shoud be like..
if i give two dates.say 2030-05-21 & 2030-05-27. due dates falls under those 6 dates between(2030-05-21 & 2030-05-27) will be shown in the result.
id
1
4
My question is different from Using DATE_ADD with a Column Name as the Interval Value . I am expecting a dynamic way to check due dates based on start_date
Thanks, Kannan
In MySQL, it would seem that a query along these lines would suffice. (Almost) everything else could and should be handled in application level code...
SELECT *
, CASE my_period WHEN 'day' THEN start_date + INTERVAL my_interval DAY
WHEN 'week' THEN start_date + INTERVAL my_interval WEEK
WHEN 'month' THEN start_date + INTERVAL my_interval MONTH
WHEN 'year' THEN start_date + INTERVAL my_interval YEAR
END due_date
FROM my_table;

Need to find duplicates with multiple criteria and then comparing dates between duplicates

I am writing annual membership registrations to a single db table. I need to keep track of when renewals have occurred in less than 11 months from their last renewal.
I look for the duplicate rows based on multiple criteria. I currently have this working with out the 11 month criteria, although it's slow. Here's what I currently use.
SELECT y_reg.* FROM y_reg WHERE (((y_reg.season) In (SELECT season FROM y_reg As Tmp
GROUP BY season, Father_Last_Name, Father_First_Name
HAVING Count(*)>1
AND Father_Last_Name = y_reg.Father_Last_Name
AND Father_First_Name = y_reg.Father_First_Name)))
ORDER BY y_reg.season, y_reg.Father_Last_Name, y_reg.Father_First_Name
I have a field Date which is the date of the renewal that I need to evaluate. I'd like to add something like "AND Date - Date < 335"
335 is the number of days and is about 1 month short of a year. But I just keep getting syntax error because I clearly don't know what I'm doing.
Date arithmetic works quite well in MySQL; you just need the knack.
You can say things like
AND later.Date >= earlier.Date
AND later.Date < earlier.Date + INTERVAL 11 MONTH
That particular pair of comparisons comes up true if the later date occurs in the time range between the earlier date and 11 months later.
In general you can say stuff like this to do date arithmetic.
datestamp + INTERVAL 1 HOUR
datestamp - INTERVAL 5 MINUTE
datestamp + 1 MONTH - 3 WEEK
datestamp - INTERVAL 3 QUARTER (calendar quarters)
LAST_DAY(datestamp) + INTERVAL 1 DAY - INTERVAL 1 MONTH
The last item is the first day of the month containing the datestamp. This whole date thing works quite well.
I think you should consider a so-called self-join query to get your duplicate-except-for-date results. Try something like this.
SELECT a.*
FROM y_reg a
JOIN y_reg b ON a.Father_Last_Name = b.Father_Last_Name
AND a.Father_First_Name = b.Father_First_Name
AND b.Date < a.Date - 11 MONTH
AND b.Date >= a.Date - 12 MONTH

SQL - Get result of current year only

How can I get the result of the current year using SQL?
I have a table that has a column date with the format yyyy-mm-dd.
Now, I want to do select query that only returns the current year result.
The pseudo code should be like:
select * from table where date is (current year dates)
The result should be as following:
id date
2 2015-01-01
3 2015-02-01
9 2015-01-01
6 2015-02-01
How can I do this?
Use YEAR() to get only the year of the dates you want to work with:
select * from table where YEAR(date) = YEAR(CURDATE())
Using WHERE YEAR(date) = YEAR(CURDATE()) is correct but it cannot use an index on column date if exists; if it doesn't exist it should.
A better solution is:
SELECT *
FROM tbl
WHERE `date` BETWEEN '2015-01-01' AND '2015-12-31'
The dates (first and last day of the year) need to be generated from the client code.
When I tried these answers on SQL server, I got an error saying curdate() was not a recognized function.
If you get the same error, using getdate() instead of curdate() should work!
--========= Get Current Year ===========
Select DATEPART(yyyy, GETDATE())
SELECT id, date FROM your_table WHERE YEAR( date ) = YEAR( CURDATE() )
SELECT
date
FROM
TABLE
WHERE
YEAR (date) = YEAR (CURDATE());
If the date field contains a time component, you want to include December 31 so you have to go to January 1 of the next year. You also don't have to use code to insert dates into the SQL. You can use the following
SELECT * FROM table
WHERE date BETWEEN MAKEDATE(YEAR(CURDATE()), 1) AND MAKEDATE(YEAR(CURDATE())+1, 1)
This will give you January 1st of the current year through January 1st at midnight of the following year.
As #Clockwork-Muse pointed out, if the date field does not contain a time component, you would want to exclude January 1 of the following year by using
WHERE date >= MAKEDATE(YEAR(CURDATE()), 1) AND date < MAKEDATE(YEAR(CURDATE())+1, 1)
You can do this using SQL DATE_FORMATE(). like below:
SELECT
date
FROM
TABLE
WHERE
DATE_FORMAT(date, '%Y') = YEAR (CURDATE());
SELECT [ID]
,[datefield]
FROM [targettable]
WHERE DATEPART(YYYY, [datefield]) = (SELECT TOP 1(MAX(DATEPART(YYYY, [datefield])))
FROM [targettable]
)
/*
This will find the newest records in the table regardless of how recent the last time data was entered.
To grab the oldest records from the table do this
SELECT [ID]
,[datefield]
FROM [targettable]
WHERE DATEPART(YYYY, [datefield]) = (SELECT TOP 1(MIN(DATEPART(YYYY, [datefield])))
FROM [targettable]
)
*/

How can I get the date difference of a timestamp

I am trying to create a query that will limit insertion into a table based on the last time the poster sent data to the table.
For example if you posted data to the table then you are locked out of the system for another 10 hours. Here is what I came up with so far. But I get nowhere with the actual results on the data. Any help?
SELECT DATE( `date` )
FROM tablename
WHERE DATE( CURDATE( ) ) < CURDATE( ) - INTERVAL 1002
DAY
LIMIT 0 , 30
This will return a single post from the last 10 hours, if it exists:
SELECT *
FROM tablename
WHERE `date` >= NOW() - INTERVAL 10 HOUR
LIMIT 1
I'm assuming date is declared as DATETIME, since actual DATE does not contain the time part and hence is only day-accurate.
If date is an integer UNIX timestamp, use this:
SELECT *
FROM tablename
WHERE `date` >= UNIX_TIMESTAMP(NOW() - INTERVAL 10 HOUR)
LIMIT 1
There are a number of ways you could do this. Perhaps if you have a user settings table you could simply add a "last_insert" field, and store the timestamp as an integer value- that would be a super simple way to do it- you could check the current timestamp vs user_settings.last_insert and voila!
I suppose you could use datetime too. Whatever floats the boat.
First of all, you need a DATETIME column and not a DATE column. Assuming that tablename.date is a DATETIME column, then 10 hours before right now is CURRENT_TIMESTAMP - INTERVAL 10 HOUR.
First of all create a Time (TIMESTAMP DEFAULT CURRENT_TIMESTAMP) columnt in your table. It will be automatically set to current date on row insert
Then check:
SELECT COUNT(*) FROM Table WHERE Time > NOW() - INTERVAL 10 HOUR
If its 1 or more - block
You must compare the time last post was put with current time, not current time with current time :|