Use of if condition in MYSQL queries - mysql

Does an IF condition in the where clause of a MySQL query slow down the execution drastically?
Here is the one sample query:-
select * from alert_details_v adv
where (if(day(last_day(now()))<DAY(adv.alert_date),
day(last_day(now())),DAY(adv.alert_date))-adv.alert_trigger_days)<=day(now());
Sample data:
alert_id alert_date alert_trigger_days
==================================================
1 2013-09-14 00:00:00 6
2 2013-09-13 00:00:00 5
alert_date: Some user input date
alert_trigger_days: Number of days before the actual date the alert be triggered.
Brief about query logic:-
Here I am trying to find if the last day of the current month is less than the day of the alert_date (database column). Whichever day comes before would be considered.
Basically this table is meant for storing alert information. So if the user has chosen 30th of some month and the alert is recurring monthly then for February it would not find the day 30th and hence would not show the record.
My question is: does a query with if conditions (as in the sample query above) in where clause slows down the execution of the query drastically or slightly, if there are hundreds of thousands of records in the table?

This may entirely depend upon your table and data. Sometimes it may help in increasing the performance and sometimes it may degrade your performance.

Related

Date table lookup efficiency - MySQL

I need to calculate the number of "working minutes" between two datetime values, lets call them 'Created' and 'Finished'.
'Finished' is always subsequent to 'Created'. The two values can differ by anything from 1 second to several years. The median difference is 50,000 seconds or roughly 14 hours.
Working minutes are defined as those occurring between 0900 to 1700 hours, Monday to Friday; excluding weekends and official holidays in our country.
I decided a lookup table was the way to go, so I generated a table of all work minutes, explicitly excluding weekends, nights and holidays...
CREATE TABLE `work_minutes` (
`min` datetime NOT NULL,
PRIMARY KEY (`min`),
UNIQUE KEY `min_UNIQUE` (`min`)
)
I populated this programatically with all the "working minutes" between years 2017 to 2024, and at this point I started to get the feeling I was being very inefficient as the table began to balloon to several hundred thousand rows.
I can do a lookup easily enough, for instance:
SELECT COUNT(min) FROM `work_minutes` AS wm
WHERE wm.min > '2022-01-04 00:04:03'
AND wm.min <= '2022-02-03 14:13:09';
#Returns 10394 'working minutes' in 0.078 sec
This is good enough for a one-off lookup but to query a table of 70,000 value pairs takes over 90 minutes.
So, I am uncomfortable with the slowness of the query and the sense that the lookup table is unnecessarily bloated.
I am thinking I need to set up two tables, one just for dates and another just for minutes, but not sure how to implement. Date logic has never been my forte. The most important thing to me is that the lookup can query over 70,000 values reasonably quickly and efficiently.
Working in MySQL 5.7.30. Thanks in advance for your expertise.
Divide the timerange to 3 parts - starting and finishing incomplete day parts, and middle part which consists from a lot of complete days. Of course if both starting and finishing time stamps have the same date part then it will be one part only, if their dates are consecutive then you\ll have 2 parts to process.
There is no problem to calculate the number of working minutes in incomplete day part. Common overlapping formula with weekday checking will help.
Create static calendar/service table which starts from the date which is earlier than any possible date in your beginning timestamp with guarantee and includes all dates after any possible date in your finishing timestamp. Calculate cumulative working minutes for each date in the table. This table allows to calculate the amount of working time in any range of complete days with single substraction.
Plan A: Convert the DATETIME values to seconds (from some arbitrary time) via TO_SECONDS(), then manipulate them with simple arithmetic.
Plan B: Use the DATEDIFF() function.
Your COUNT(min) counts the number of rows where min IS NOT NULL. You may as well say COUNT(*). But did you really want to count the number of rows?

Looking for ideas with SQL query optimization

!!! ---------------------!!!
After exchanging some comments with people, I have decided to include the source data, so anyone who wishes to help with solving this could load it into a table and run the query.
Here is the link that retrieves data from Yahoo Finance for symbol SPY in csv format.
https://query1.finance.yahoo.com/v7/finance/download/SPY?period1=1584963550&period2=1616499550&interval=1d&events=history&includeAdjustedClose=true
The file header needs to be changed. Change Date to Source_Date and Adj Close to Adj_Close. The file does not have a Symbol column, so the query doesn't need to reference it. The only two relevant columns are Source_Date and Adj_Close.
!!! ---------------------!!!
The issue with my query is that it takes a very long time to run. The query is not wrong. I know exactly why it takes such a long time to run. I just couldn't come up with anything more efficient.
First, here is the business logic.
Let's say I bought Apple stock and it went down. It's been 10 days since I bought it and it is still down. I extracted an entire history of daily prices for Apple going back to 1993, and I uploaded it into a database table. Now I want to write a query that will tell me how often did Apple stock price recovered in more than 10 days.
For example. I bought Apple at $100. It went down to $90. It's been 10 days since I bought it. I run my query and it comes back with something like this:
-- Buy Date: April 1, 2001. Buy price: $10. Recovered on: April 12, 2001. Days to recovery: 12
-- Buy Date: June 12, 2006. Buy price: $23. Recovered on: July 20, 2001. Days to recovery: 38
-- Buy Date: January 15, 2009. Buy price: $65. Recovered on: December 30, 2010. Days to recovery: 700
Each example indicates that on each day between Buy Date and Recovered on date, the price stayed below the buy price.
My query has two steps.
The first step. Join the same table to itself, qualified as x and y. For each x.record it searches for the next minimum date where y.price is higher than x.price.
The second query will simply extract only those records from the first query result set where the difference between x.date and y.date is greater than 10.
The reason why the first query (see below) runs for such a long time is because for each record in x.table the query must search an entire y.table. That's a lot of table scans. It comes back with the result but it takes between 50 and 60 seconds.
The table stricture is very simple: Symbol, Date, and Price. Symbol and date are primary key.
SELECT x.Symbol,
x.Source_Date as 'Source_Date',
min(y.Source_Date) as 'Recovery_Date'
FROM transformed_source x,
transformed_source y
where x.Symbol = 'AAPL'
and y.Symbol = x.Symbol
and y.Source_Date > x.Source_Date
and y.Adj_Close > x.Adj_Close
group by x.Symbol, x.Source_Date
*** Note This query will miss records where the price never recovered, so I will need to modify it with an outer join. It wouldn't make any difference. Changing it to outer join will not make it run any faster. So, working with this.
Any ideas are welcome.
Thank you
You might find that a correlated subquery is better:
SELECT ts.Symbol, ts.Source_Date,
(SELECT MIN(ts2.Source_Date)
FROM transformed_source ts2
WHERE ts2.Symbol = t.Symbol AND
ts2.Source_Date > t.Source_Date AND
ts2.Adj_Close > t.Adj_Close
) as Recovery_Date
FROM transformed_source ts
WHERE ts.Symbol = 'AAPL';
Then for performance, you want indexes on transformed_data(symbol, source_date, adj_close).
If you're using an RDBMS that provides pattern matching (e.g. Oracle with match recognise) then you can write a query which will do one pass of your table.
I've put together a DBFiddle

Optimalization of MySQL query

I'm using a MySQL database to store values from some energy measurement system. The problem is that the DB contains millions of rows, and the queries take somewhat long to complete. Are the queries optimal? What should I do to improve them?
The database table consists of rows with 15 columns each (t, UL1, UL2, UL3, PL1, PL2, PL3, P, Q1, Q2, Q3,CosPhi1, CosPhi2, CosPhi3, i), where t is time, P is total power and i is some identifier.
Seeing as I display the data in graphs grouped in different intervals (15 minutes, 1 hour, 1 day, 1 month) I want to group the querys as such.
As an example I have a graph that shows the kWh for every day in the current year. The query to gather the data goes like this:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The database has been gathering measurements for 13 days, and this query alone is already taking 2-3 seconds to complete. Those 13 days have added about 1-1.3 million rows to the db, as a new row gets added every second.
Is this query optimal?
I would actually create a secondary table that has a column for each DAY, and one for the total. Then, via a trigger, your insert into the detail table can update the secondary aggregate table. This way, you can sum the DAILY table which will be much quicker, and yet still have the per second table if you needed to look at the granular level details.
Having aggregate tables can be a common time-saver for querying, especially for read-only types of data, or data you know wont be changing. Then, if you want more granular detail such as hourly or 15 minute intervals, go directly to the raw data.
For this query:
SELECT t, SUM(P) as P
FROM table
WHERE i = 0 and t >= '2015-01-01 00:00:00'
GROUP BY DAY(t), MONTH(t)
ORDER BY t
The optimal index is a covering index: table(i, t, p).
2-3 seconds for 1+ million rows suggests that you already have an index.
You may want to consider DRapp's suggestion and use summary tables. In a few months, you will have so much data that historical queries could be taking a long time.
In the meantime, though, indexes and partitioning might provide sufficient performance for your needs.

Correct MySQL Structure for a Time Range for Query Optimization?

I have a scenario where I want to be able to SELECT rows from a MySQL table, but exclude rows where the current time-of-day is inside a time-range.
Example:
The "quiet" period for one row is 10pm - 8:30am.
My SQL SELECT statement should not return that row if the current server time is after 10pm or before 8:30am.
Example 2: The "quiet period" is NULL and ignored.
Example 3: A new row is created with a quiet period from 9:53am to 9:55am. If the current server time is in that 2-minute window, the row is not returned by the SELECT.
My question:
What data format would you use in the database, and how would you write the query?
I have thought about a few different approaches (defining start_time as one column and duration as another, defining both in seconds... or using Date stamps... or whatever). None of them seem ideal and require a lot of calculation.
Thanks!
I would store the start and end dates as MySQL native TIME fields.
You would need to consider ranges that span midnight as two separate ranges but you would be able to query the table like this, To find all current quiet periods
SELECT DISTINCT name FROM `quiet_periods`
WHERE start_time<=CURTIME() AND CURTIME()<=end_time
Or to find all non-active quiet periods
SELECT name FROM quiet_periods WHERE name NOT IN (
SELECT name FROM `quiet_periods`
WHERE start_time<=CURTIME() AND CURTIME()<=end_time
)
So with sample data
id -- name -- start_time -- end_time
1 -- late_night -- 00:00:00 -- 08:30:00
2 -- late_night -- 22:00:00 -- 23:59:59
3 -- null_period -- NULL -- NULL
4 -- nearly_10am -- 09:53:00 -- 09:55:00
At 11pm this would return
null_period
nearly_10am
from the second query.
Depending on performance and how many rows you had you might want to refactor the second query into a JOIN and probably add the relevant INDEXes too.

Select day of week from date

I have the following table in MySQL that records event counts of stuff happening each day
event_date event_count
2011-05-03 21
2011-05-04 12
2011-05-05 12
I want to be able to query this efficiently by date range AND by day of week. For example - "What is the event_count on Tuesdays in May?"
Currently the event_date field is a date type. Are there any functions in MySQL that let me query this column by day of week, or should I add another column to the table to store the day of week?
The table will hold hundreds of thousands of rows, so given a choice I'll choose the most efficient solution (as opposed to most simple).
Use DAYOFWEEK in your query, something like:
SELECT * FROM mytable WHERE MONTH(event_date) = 5 AND DAYOFWEEK(event_date) = 7;
This will find all info for Saturdays in May.
To get the fastest reads store a denormalized field that is the day of the week (and whatever else you need). That way you can index columns and avoid full table scans.
Just try the above first to see if it suits your needs and if it doesn't, add some extra columns and store the data on write. Just watch out for update anomalies (make sure you update the day_of_week column if you change event_date).
Note that the denormalized fields will increase the time taken to do writes, increase calculations on write, and take up more space. Make sure you really need the benefit and can measure that it helps you.
Check DAYOFWEEK() function
If you want textual representation of day of week - use DAYNAME() function.