I have ~6 tables where I have to count or sum fields based on matching site_ids and date. I have the following query, with many subqueries which takes an extraordinary amount of time to run. I am certain there is an easier, more efficient way, however I am rather new to these more complex queries. I have read regarding optimizations, specifically using joins ON but struggling to understand and implement.
The goal is to speed this up and not bring my small server to it's knees when running. Any assistance or direction would be VERY much appreciated!
SELECT date(date_added) as dt_date,
site_id as dt_site_id,
(SELECT site_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as site_id,
(SELECT parent_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as main_site_id,
(SELECT corp_owned from branch_mappings bm WHERE mark_id_site = dt.site_id) as corp_owned,
count(id) as dt_calls,
(SELECT count(date_submitted) FROM mark_unbounce ub WHERE date(date_submitted) = dt_date AND ub.site_id = dt.site_id) as ub,
(SELECT count(timestamp) FROM mark_wordpress_contact wp WHERE date(timestamp) = dt_date AND wp.site_id = dt.site_id) as wp,
(SELECT count(added_on) FROM m_shrednations sn WHERE date(added_on) = dt_date AND sn.description = dt.site_id) as sn,
(SELECT sum(users) FROM mark_ga ga WHERE date(ga.date) = dt_date AND channel LIKE 'Organic%' AND ga.site_id = dt.site_id) as ga_organic
FROM mark_dialogtech dt
WHERE site_id is not null
GROUP BY site_name, dt_date
ORDER BY site_name, dt_date;
What you're doing is the equivalent of asking your server to query 7+ different tables every time you run this query. Personally, I use Joins and nested queries because I can whittle down do what I need.
The first 3 subqueries can be replaced with...
SELECT date(date_added) as dt_date,
dt.site_id as dt_site_id,
bm.site_id as site_id,
bm.parent_id as main_site_id,
bm.corp_owned as corp_owned,
FROM mark_dialogtech dt
INNER JOIN branch_mappings bm
ON bm.mark_id_site = dt.site_id
I'm not sure why you are running the last 3. Is there a business requirement? If so, consider how often this is to be run and when.
If absolutely necessary, add those to the joins like...
FROM mark_dialogtech dt
INNER JOIN
(SELECT site_id, count(date_submitted) FROM mark_unbounce GROUP BY site_id) ub
on ub.site_id = dt.site_id
This should limit the results to only records where the site_id exists in both the mark_dialogtech and mark_unbounce (or whatever table). From my experience, this method has sped things up.
Still, my concern is the number of aggregations you're performing. If they can be cached to a dashboard and pulled during slow times, that would be best.
Its hard to analyze how big is your query(no data examples) but in your case I hightly recommend to use CTE(Common Table Expressions). Check this :
https://www.sqlpedia.pl/cte-common-table-expressions/
CTEs do not have a physical representation in tempdb like temporary tables or tabular variables. CTE can be viewed as such a temporary, non-materialized view. When MSSQL executes a query and encounters a CTE, it replace the reference to that CTE with definition. Therefore, if the CTE data is used several times in a given query, the same code will be executed several times and MSSQL does not optimize it. Soo... it will work just for few data like you want to do.
Appreciate all the responses.
I ended up creating a python script to run the queries separately and inserting the results into the table for the respective KPI. So, I scrapped the idea of a single query due to performance. I concatenated each date and site_id to create the id, then leveraged an ON DUPLICATE KEY UPDATE with each INSERT statement.
The python dictionary looks like this, and I simply looped. Again, thanks for the help.
SELECT STATEMENTS (Python Dict)
"dt":"SELECT date(date_added) as dt_date, site_id as dt_site, count(site_id) as dt_count FROM mark_dialogtech WHERE site_id is not null GROUP BY dt_date, dt_site ORDER BY dt_date, dt_site;",
"ub":"SELECT date_submitted as ub_date, site_id as ub_site, count(site_id) as ub_count FROM mark_unbounce WHERE site_id is not null GROUP BY ub_date, ub_site;",
"wp":"SELECT date(timestamp) as wp_date, site_id as wp_site, count(site_id) as wp_count FROM mark_wordpress_contact WHERE site_id is not null GROUP BY wp_date, wp_site;",
"sn":"SELECT date(added_on) as sn_date, description as sn_site, count(description) as sn_count FROM m_shrednations WHERE description <> '' GROUP BY sn_date, sn_site;",
"ga":"SELECT date as ga_date, site_id as ga_site, sum(users) as ga_count FROM mark_ga WHERE users is not null GROUP BY ga_date, ga_site;"
INSERT STATEMENTS (Python Dict)
"dt":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, dt_calls, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE dt_Calls={dbdata[3]}, added_on='{dbdata[4]}';",
"ub":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ub, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ub={dbdata[3]}, added_on='{dbdata[4]}';",
"wp":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, wp, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE wp={dbdata[3]}, added_on='{dbdata[4]}';",
"sn":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, sn, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE sn={dbdata[3]}, added_on='{dbdata[4]}';",
"ga":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ga_organic, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ga_organic={dbdata[3]}, added_on='{dbdata[4]}';",
It would be very difficult to analyze the query with out the data, Any ways!
try joining the tables and group it, that should improve the performance
here is a left join sample
SELECT column names
FROM table1
LEFT JOIN table2
ON table1.common_column = table2.common_column;
check this for more detailed inform https://learnsql.com/blog/how-to-left-join-multiple-tables/
Modified some stuff from my pic so you guys can understand it
I have this database. I am trying to update a value from a table based on another value from an another table.
I want to update the SUM from salary like this :
( sum = presence * 5 )
This is what I've been trying to use ( unsuccessful )
update table salary
set suma.salary = users.presence * 5
FROM salary INNER JOIN users1 INNER JOIN presence on id_salary = id_presence
I am not sure what to do, I'd appreciate some help, Thanks
In MySQL to UPDATE tables with a join you use this syntax:
UPDATE table1, table2
SET table1.column = some expression
WHERE table1.column = table2.column
That said, even with the updated picture, in your SQL you are mentioning columns that I cannot understand in which table are to be found. You also have an inner join between salariu and users1, with no join condition. Could you please clean up the question and make everything clear?
Assuming you are making the updates to the db structure you were talking about, then you can start working on this one maybe:
UPDATE salary, presence
SET salary.sum = SUM(presence.hours) * 5
WHERE presence.id = salary.id
AND <some filter on the month that depends on salary.date>
Another way, but I'm not sure it is supported in all RDBMS, would be something like this:
UPDATE salary
SET sum = (
SELECT SUM(presence.hours) * 5
FROM user, presence
WHERE presence.id = salary.id
AND <some filter on the month that depends on salary.date>
)
I just found out about this nifty little feature. I have a couple questions. consider the statement below.
This is how interpret how it works. The USING statement is what gets compared to see if there is a match correct? I want to use how it is now, but I want to use 2 other columns from the source table in the MATCH portion. I can't do that. So is there a way that I can use the 2 columns (decesed (I know its spelled wrong :) ) and hicno_enc)?
Another thing I would like to do and don't know if it possible, but if the row exists in target but not source, then mark it inactive.
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE
FROM
aco.tmpimport i
INNER JOIN aco.patients p
ON p.hicnoenc = i.hicno_enc
MERGE aco.patients AS target
USING (
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE
FROM aco.tmpimport
) AS source
ON target.hicnoenc = source.hicno_enc
WHEN MATCHED AND target.isdeceased <> CONVERT(BIT,source.decesed) THEN
UPDATE
SET
target.isdeceased = source.decesed,
updatedat = getdate(),
updatedby = 0
WHEN NOT MATCHED THEN
INSERT (firstname, lastname, gender, dob, isdeceased, hicnoenc)
VALUES (source.FIRST_NAME,
source.LAST_NAME,
source.sex1,
source.BIRTH_DATE,
source.decesed,
source.hicno_enc);
So is there a way that I can use the 2 columns (decesed (I know its
spelled wrong :) ) and hicno_enc)?
Add the columns you need in the select statement in the using clause.
USING (
SELECT FIRST_NAME, LAST_NAME, SEX1, BIRTH_DATE, decesed, hicno_enc
FROM aco.tmpimport
) AS source
if the row exists in target but not source, then mark it inactive.
Add a when not matched by source clause and do the update.
WHEN NOT MATCHED BY SOURCE THEN
UPDATE
SET active = 0
An earlier data import in CiviCRM placed some member numbers into a custom field (member_number) instead of the more useful (external_id) field.
My (admittedly limited) SQL skills are way too rusty, but what I'm trying to do is:
IF external_id field is empty,
AND the contact_type is "Individual"
THEN copy the data from member_number to external_id for the matching internal id number.
I've tried a few variations of this, with different errors:
INSERT INTO test_table (external_id)
SELECT member_number
FROM member_info
INNER JOIN test_table
ON memberinfo.entity_id=test_table.id
WHERE test_table.external_id IS NULL AND test_table.contact_type = "Individual"
Do I even really need the INNER JOIN on this? And I know the WHERE statement usually refers to the table you're pulling from, not the one you're inserting to, but I can't remember the right way to do this.
update test_table
set external_id =
if(external_id = '' and contact_type = 'Individual', member_number,external_id)
I have the task to repair some invalid data in a mysql-database. In one table there are people with a missing date, which should be filled from a second table, if there is a corresponding entry.
TablePeople: ID, MissingDate, ...
TableEvent: ID, people_id, replacementDate, ...
Update TablePeople
set missingdate = (select replacementDate
from TableEvent
where people_id = TablePeople.ID)
where missingdate is null
and (select count(*)
from TableEvent
where people_id = TablePeople.ID) > 0
Certainly doesn't work. Is there any other way with SQL? Or how can I process single rows in mysql to get it done?
We need details about what's not working, but I think you only need to use:
UPDATE TablePeople
SET missingdate = (SELECT MAX(te.replacementDate)
FROM TABLEEVENT te
WHERE te.people_id = TablePeople.id)
WHERE missingdate IS NULL
Notes
MAX is being used to return the latest replacementdate, out of fear of risk that you're getting multiple values from the subquery
If there's no supporting record in TABLEEVENT, it will return null so there's no change