I have ~6 tables where I have to count or sum fields based on matching site_ids and date. I have the following query, with many subqueries which takes an extraordinary amount of time to run. I am certain there is an easier, more efficient way, however I am rather new to these more complex queries. I have read regarding optimizations, specifically using joins ON but struggling to understand and implement.
The goal is to speed this up and not bring my small server to it's knees when running. Any assistance or direction would be VERY much appreciated!
SELECT date(date_added) as dt_date,
site_id as dt_site_id,
(SELECT site_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as site_id,
(SELECT parent_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as main_site_id,
(SELECT corp_owned from branch_mappings bm WHERE mark_id_site = dt.site_id) as corp_owned,
count(id) as dt_calls,
(SELECT count(date_submitted) FROM mark_unbounce ub WHERE date(date_submitted) = dt_date AND ub.site_id = dt.site_id) as ub,
(SELECT count(timestamp) FROM mark_wordpress_contact wp WHERE date(timestamp) = dt_date AND wp.site_id = dt.site_id) as wp,
(SELECT count(added_on) FROM m_shrednations sn WHERE date(added_on) = dt_date AND sn.description = dt.site_id) as sn,
(SELECT sum(users) FROM mark_ga ga WHERE date(ga.date) = dt_date AND channel LIKE 'Organic%' AND ga.site_id = dt.site_id) as ga_organic
FROM mark_dialogtech dt
WHERE site_id is not null
GROUP BY site_name, dt_date
ORDER BY site_name, dt_date;
What you're doing is the equivalent of asking your server to query 7+ different tables every time you run this query. Personally, I use Joins and nested queries because I can whittle down do what I need.
The first 3 subqueries can be replaced with...
SELECT date(date_added) as dt_date,
dt.site_id as dt_site_id,
bm.site_id as site_id,
bm.parent_id as main_site_id,
bm.corp_owned as corp_owned,
FROM mark_dialogtech dt
INNER JOIN branch_mappings bm
ON bm.mark_id_site = dt.site_id
I'm not sure why you are running the last 3. Is there a business requirement? If so, consider how often this is to be run and when.
If absolutely necessary, add those to the joins like...
FROM mark_dialogtech dt
INNER JOIN
(SELECT site_id, count(date_submitted) FROM mark_unbounce GROUP BY site_id) ub
on ub.site_id = dt.site_id
This should limit the results to only records where the site_id exists in both the mark_dialogtech and mark_unbounce (or whatever table). From my experience, this method has sped things up.
Still, my concern is the number of aggregations you're performing. If they can be cached to a dashboard and pulled during slow times, that would be best.
Its hard to analyze how big is your query(no data examples) but in your case I hightly recommend to use CTE(Common Table Expressions). Check this :
https://www.sqlpedia.pl/cte-common-table-expressions/
CTEs do not have a physical representation in tempdb like temporary tables or tabular variables. CTE can be viewed as such a temporary, non-materialized view. When MSSQL executes a query and encounters a CTE, it replace the reference to that CTE with definition. Therefore, if the CTE data is used several times in a given query, the same code will be executed several times and MSSQL does not optimize it. Soo... it will work just for few data like you want to do.
Appreciate all the responses.
I ended up creating a python script to run the queries separately and inserting the results into the table for the respective KPI. So, I scrapped the idea of a single query due to performance. I concatenated each date and site_id to create the id, then leveraged an ON DUPLICATE KEY UPDATE with each INSERT statement.
The python dictionary looks like this, and I simply looped. Again, thanks for the help.
SELECT STATEMENTS (Python Dict)
"dt":"SELECT date(date_added) as dt_date, site_id as dt_site, count(site_id) as dt_count FROM mark_dialogtech WHERE site_id is not null GROUP BY dt_date, dt_site ORDER BY dt_date, dt_site;",
"ub":"SELECT date_submitted as ub_date, site_id as ub_site, count(site_id) as ub_count FROM mark_unbounce WHERE site_id is not null GROUP BY ub_date, ub_site;",
"wp":"SELECT date(timestamp) as wp_date, site_id as wp_site, count(site_id) as wp_count FROM mark_wordpress_contact WHERE site_id is not null GROUP BY wp_date, wp_site;",
"sn":"SELECT date(added_on) as sn_date, description as sn_site, count(description) as sn_count FROM m_shrednations WHERE description <> '' GROUP BY sn_date, sn_site;",
"ga":"SELECT date as ga_date, site_id as ga_site, sum(users) as ga_count FROM mark_ga WHERE users is not null GROUP BY ga_date, ga_site;"
INSERT STATEMENTS (Python Dict)
"dt":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, dt_calls, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE dt_Calls={dbdata[3]}, added_on='{dbdata[4]}';",
"ub":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ub, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ub={dbdata[3]}, added_on='{dbdata[4]}';",
"wp":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, wp, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE wp={dbdata[3]}, added_on='{dbdata[4]}';",
"sn":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, sn, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE sn={dbdata[3]}, added_on='{dbdata[4]}';",
"ga":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ga_organic, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ga_organic={dbdata[3]}, added_on='{dbdata[4]}';",
It would be very difficult to analyze the query with out the data, Any ways!
try joining the tables and group it, that should improve the performance
here is a left join sample
SELECT column names
FROM table1
LEFT JOIN table2
ON table1.common_column = table2.common_column;
check this for more detailed inform https://learnsql.com/blog/how-to-left-join-multiple-tables/
I want to insert a table record into another table. I am selecting user id ,date and variance. When i insert the data of one user it works fine but when i insert multiple records it gives me an error of SQL Error [1292] [22001]: Data truncation: Truncated incorrect time value: '841:52:24.000000'.
insert into
features.Daily_variance_of_time_between_calls(
uId,
date,
varianceBetweenCalls)
SELECT
table_test.uid as uId,
SUBSTRING(table_test.date, 1, 10) as date ,
VARIANCE(table_test.DurationSinceLastCall) as varianceBetweenCalls #calculating the vairiance of inter-event call time
FROM
(SELECT
id,m.uid, m.date,
TIME_TO_SEC(
timediff(m.date,
COALESCE(
(SELECT p.date FROM creditfix.call_logs AS p
WHERE
p.uid = m.uid
AND
p.`type` in (1,2)
AND
(p.id < m.id AND p.date < m.date )
ORDER BY m.date DESC, p.duration
DESC LIMIT 1 ), m.date))
) AS DurationSinceLastCall,
COUNT(1)
FROM
(select distinct id, duration, date,uid from creditfix.call_logs as cl ) AS m
WHERE
m.uId is not NULL
AND
m.duration > 0
# AND
# m.uId=171
GROUP BY 1,2
) table_test
GROUP BY 1,2
If i remove the comment it works fine for one specific user.
Let's start with the error message:
Data truncation: Truncated incorrect time value: '841:52:24.000000'
This message suggests that at some stage MySQL is running into a value which it cannot convert to a date/time/datetime. Efforts in isolating the issue should therefore begin with a focus on where values are being converted to those data types.
Without knowing the data types of all the fields used, it's difficult to say where the problem is likely to be. However, once we knew that the query on it's own ran without complaint, we also then knew that the problem had to be with a conversion happening during the insert itself. Something in the selected data wasn't a valid date, but was being inserted into a date field. Although dates and times and involved in your calculation of varianceBetweenCalls, variance itself returns a numeric data type. Therefore I deduced the problem had to be with the data returned by SUBSTRING(table_test.date, 1, 10) which was being inserted into the date field.
As per the comments, this turned out to be correct. You can exclude the bad data and allow the insert to work by adding the clause:
WHERE
table_test.date NOT LIKE '841%'
AND table_test.DurationSinceLastCall NOT LIKE '841%' -- I actually think this line is not required.
Alternatively, you can retrieve only the bad data (with a view to fixing it), by removing the INSERT and using the clause
WHERE
table_test.date LIKE '841%'
OR table_test.DurationSinceLastCall LIKE '841%' -- I actually think this line is not required.
or better
SELECT *
FROM creditfix.call_logs m
WHERE m.date LIKE '841%'
However, I'm not sure the data type of that field, so you may need to to it like this:
SELECT *
FROM creditfix.call_logs m
WHERE SUBSTRING(m.date,10) LIKE '841%'
Once you correct the offending data, you should be able to remove the "fix" from your INSERT/SELECT statement, though it would be wise to investigate how the bad data got into the system.
I have the database of ATM card in which there are fields account_no,card_no,is_blocked,is_activated,issue_date
Fields account number and card numbers are not unique as old card will be expired and marked as is_block=Y and another record with same card number ,account number will be inserted into new row with is_blocked=N . Now i need to update is_blocked/is_activated with help of issue_date i.e
UPDATE card_info set is_blocked='Y' where card_no='6396163270002509'
AND opening_date=(SELECT MAX(opening_date) FROM card_info WHERE card_no='6396163270002509')
but is doesn't allow me to do so
it throws following error
1093 - You can't specify target table 'card_info' for update in FROM clause
Try this instead:
UPDATE card_info ci
INNER JOIN
(
SELECT card_no, MAX(opening_date) MaxOpeningDate
FROM card_info
GROUP BY card_no
) cm ON ci.card_no = cm.card_no AND ci.opening_date = cm.MaxOpeningDate
SET ci.is_blocked='Y'
WHERE ci.card_no = '6396163270002509'
That's one of those stupid limitations of the MySQL parser. The usual way to solve this is to use a JOIN query as Mahmoud has shown.
The (at least to me) surprising part is that it really seems a parser problem, not a problem of the engine itself because if you wrap the sub-select into a derived table, this does work:
UPDATE card_info
SET is_blocked='Y'
WHERE card_no = '6396163270002509'
AND opening_date = ( select max_date
from (
SELECT MAX(opening_date) as_max_date
FROM card_info
WHERE card_no='6396163270002509') t
)
I have the task to repair some invalid data in a mysql-database. In one table there are people with a missing date, which should be filled from a second table, if there is a corresponding entry.
TablePeople: ID, MissingDate, ...
TableEvent: ID, people_id, replacementDate, ...
Update TablePeople
set missingdate = (select replacementDate
from TableEvent
where people_id = TablePeople.ID)
where missingdate is null
and (select count(*)
from TableEvent
where people_id = TablePeople.ID) > 0
Certainly doesn't work. Is there any other way with SQL? Or how can I process single rows in mysql to get it done?
We need details about what's not working, but I think you only need to use:
UPDATE TablePeople
SET missingdate = (SELECT MAX(te.replacementDate)
FROM TABLEEVENT te
WHERE te.people_id = TablePeople.id)
WHERE missingdate IS NULL
Notes
MAX is being used to return the latest replacementdate, out of fear of risk that you're getting multiple values from the subquery
If there's no supporting record in TABLEEVENT, it will return null so there's no change
i have this query
SELECT
IF(isnull(ub.user_lecture_id), 0, ub.user_lecture_id) as IsPurchased,
cs.title,cs.start_date, cs.start_time, cs.end_time
FROM campus_schedule cs
LEFT JOIN campus_bookinfo cb ON cs.bookid=cb.idx_campus_bookinfo
LEFT JOIN user_lectures ub ON ub.id_product = cs.idx_campus_schedule AND ub.id_customer = 11
WHERE cs.idx_campus = 1 and cs.title like '%%' and cs.status=1
Which Shows:
Click to view Output
Explanation: if (IsPurchased == 0) it is not yet bought my customer
My Question: if you look at the time of row with IsPurchased=1, the time range is conflicting with the time in IsPurchases=0. how can i compare and conclude that the time of the same date of the query is conflicting to the time and date of the other rows. results may be 1 or 0 in a "conflict" field name
Hope you got the point. Thanks for the help!!!
To compare times, you will find it easier to use DATETIME fields.
To check for "conflicting" rows, you'll probably need to have a subquery in the WHERE clause.
Subquery should work but will be inefficient in mysql. You should create temporary table and analyze it. Or do the same inline, like:
set #lastdate=0;
set #lasttime=0;
select IsPurchased, title, start_date, start_time, end_time, if(#lastdate = start_date, #lasttime < end_time, 1) as CONFLICT, #lastdate:=start_date, #lasttime:=start_time
from (your_query ORDER BY start_date, start_time, end_time) t ;
that is just an idea, it worked for me several times.