I have ~6 tables where I have to count or sum fields based on matching site_ids and date. I have the following query, with many subqueries which takes an extraordinary amount of time to run. I am certain there is an easier, more efficient way, however I am rather new to these more complex queries. I have read regarding optimizations, specifically using joins ON but struggling to understand and implement.
The goal is to speed this up and not bring my small server to it's knees when running. Any assistance or direction would be VERY much appreciated!
SELECT date(date_added) as dt_date,
site_id as dt_site_id,
(SELECT site_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as site_id,
(SELECT parent_id from branch_mappings bm WHERE mark_id_site = dt.site_id) as main_site_id,
(SELECT corp_owned from branch_mappings bm WHERE mark_id_site = dt.site_id) as corp_owned,
count(id) as dt_calls,
(SELECT count(date_submitted) FROM mark_unbounce ub WHERE date(date_submitted) = dt_date AND ub.site_id = dt.site_id) as ub,
(SELECT count(timestamp) FROM mark_wordpress_contact wp WHERE date(timestamp) = dt_date AND wp.site_id = dt.site_id) as wp,
(SELECT count(added_on) FROM m_shrednations sn WHERE date(added_on) = dt_date AND sn.description = dt.site_id) as sn,
(SELECT sum(users) FROM mark_ga ga WHERE date(ga.date) = dt_date AND channel LIKE 'Organic%' AND ga.site_id = dt.site_id) as ga_organic
FROM mark_dialogtech dt
WHERE site_id is not null
GROUP BY site_name, dt_date
ORDER BY site_name, dt_date;
What you're doing is the equivalent of asking your server to query 7+ different tables every time you run this query. Personally, I use Joins and nested queries because I can whittle down do what I need.
The first 3 subqueries can be replaced with...
SELECT date(date_added) as dt_date,
dt.site_id as dt_site_id,
bm.site_id as site_id,
bm.parent_id as main_site_id,
bm.corp_owned as corp_owned,
FROM mark_dialogtech dt
INNER JOIN branch_mappings bm
ON bm.mark_id_site = dt.site_id
I'm not sure why you are running the last 3. Is there a business requirement? If so, consider how often this is to be run and when.
If absolutely necessary, add those to the joins like...
FROM mark_dialogtech dt
INNER JOIN
(SELECT site_id, count(date_submitted) FROM mark_unbounce GROUP BY site_id) ub
on ub.site_id = dt.site_id
This should limit the results to only records where the site_id exists in both the mark_dialogtech and mark_unbounce (or whatever table). From my experience, this method has sped things up.
Still, my concern is the number of aggregations you're performing. If they can be cached to a dashboard and pulled during slow times, that would be best.
Its hard to analyze how big is your query(no data examples) but in your case I hightly recommend to use CTE(Common Table Expressions). Check this :
https://www.sqlpedia.pl/cte-common-table-expressions/
CTEs do not have a physical representation in tempdb like temporary tables or tabular variables. CTE can be viewed as such a temporary, non-materialized view. When MSSQL executes a query and encounters a CTE, it replace the reference to that CTE with definition. Therefore, if the CTE data is used several times in a given query, the same code will be executed several times and MSSQL does not optimize it. Soo... it will work just for few data like you want to do.
Appreciate all the responses.
I ended up creating a python script to run the queries separately and inserting the results into the table for the respective KPI. So, I scrapped the idea of a single query due to performance. I concatenated each date and site_id to create the id, then leveraged an ON DUPLICATE KEY UPDATE with each INSERT statement.
The python dictionary looks like this, and I simply looped. Again, thanks for the help.
SELECT STATEMENTS (Python Dict)
"dt":"SELECT date(date_added) as dt_date, site_id as dt_site, count(site_id) as dt_count FROM mark_dialogtech WHERE site_id is not null GROUP BY dt_date, dt_site ORDER BY dt_date, dt_site;",
"ub":"SELECT date_submitted as ub_date, site_id as ub_site, count(site_id) as ub_count FROM mark_unbounce WHERE site_id is not null GROUP BY ub_date, ub_site;",
"wp":"SELECT date(timestamp) as wp_date, site_id as wp_site, count(site_id) as wp_count FROM mark_wordpress_contact WHERE site_id is not null GROUP BY wp_date, wp_site;",
"sn":"SELECT date(added_on) as sn_date, description as sn_site, count(description) as sn_count FROM m_shrednations WHERE description <> '' GROUP BY sn_date, sn_site;",
"ga":"SELECT date as ga_date, site_id as ga_site, sum(users) as ga_count FROM mark_ga WHERE users is not null GROUP BY ga_date, ga_site;"
INSERT STATEMENTS (Python Dict)
"dt":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, dt_calls, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE dt_Calls={dbdata[3]}, added_on='{dbdata[4]}';",
"ub":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ub, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ub={dbdata[3]}, added_on='{dbdata[4]}';",
"wp":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, wp, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE wp={dbdata[3]}, added_on='{dbdata[4]}';",
"sn":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, sn, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE sn={dbdata[3]}, added_on='{dbdata[4]}';",
"ga":f"INSERT INTO mark_helper_rollup (id, on_date, site_id, ga_organic, added_on) VALUES ('{dbdata[0]}','{dbdata[1]}',{dbdata[2]},{dbdata[3]},'{dbdata[4]}') ON DUPLICATE KEY UPDATE ga_organic={dbdata[3]}, added_on='{dbdata[4]}';",
It would be very difficult to analyze the query with out the data, Any ways!
try joining the tables and group it, that should improve the performance
here is a left join sample
SELECT column names
FROM table1
LEFT JOIN table2
ON table1.common_column = table2.common_column;
check this for more detailed inform https://learnsql.com/blog/how-to-left-join-multiple-tables/
Scenario is like this:
list of different types of insurance [ID, name, desc]
each insurance has a different unique table. [ID, user_id, ]
I want to query to show the columns of Insurance like [ID, Name, DESC] and a new column to show whether this user has applied for this insurance or not. No need to worry for the user part.
Just guide me how can I sub-query with dynamic table name.
I tried making a setup table where each insurance maps to its table name. But in my sub query how to do that.
If user has applied then show 1 otherwise 0.
SELECT i1.ID,
i1.name,
i1.desc,
IF(true, 1, 0) AS EXIST
#(SELECT t1.c_user_id FROM #tbl_name t1 WHERE t1.id = '101')
FROM app_fd_pdrm_insur_type i1;
Please, guide me what to replace with true.
Thank you.
Try this:
SELECT i1.ID, i1.name, i1.desc, IF( EXISTS(SELECT t1.c_user_id FROM #tbl_name t1 WHERE t1.id = '101'), 1, 0) as Exists FROM app_fd_pdrm_insur_type i1;
Hi I am making a webrowser game and I am trying to get monsters into my data base when I get the error:
Subquery returns more then 1 row
here is my code
INSERT INTO monster_stats(monster_id,stat_id,value)
VALUES
( (SELECT id FROM monsters WHERE name = 'Necroborg!'),
(SELECT id FROM stats WHERE short_name = 'atk'),
2);
any ideas how to fix this problem?
Try use LIMIT 1
INSERT INTO monster_stats(monster_id,stat_id,value) VALUES ((SELECT id FROM monsters WHERE name = 'Necroborg!' LIMIT 1),(SELECT id FROM stats WHERE short_name = 'atk' LIMIT 1),2);
Or you could use Insert from select, with join, if you have relations with 2 tables.
INSERT INTO monster_stats(monster_id,stat_id,value)
(SELECT monsters.id, stats.id, 2 as value FROM monsters
LEFT JOIN stats on monsters.id = stats.monsters_id
WHERE monsters.name = 'Necroborg!'
AND stats.short_name = 'atk'
)
MYSQL insert from select:
http://dev.mysql.com/doc/refman/5.1/en/insert-select.html
The problem is one or both of the following:
There is more than one monster named 'Necroborg!'.
There is more than on stat named 'atk'.
You need to decide what you want to do. One option (mentioned elsewhere) is to use limit 1 to get only one value from each statement.
A second option is to better specify the where clause so you get only one row from each table.
Another is to insert all combinations. You would do this with insert . . . select and a cross join:
INSERT INTO monster_stats(monster_id, stat_id, value)
SELECT m.id, s.id, 2
FROM (SELECT id FROM monsters WHERE name = 'Necroborg!') m CROSS JOIN
(SELECT id FROM stats WHERE short_name = 'atk');
A third possibility is that there is a field connecting the two tables, such as monster_id. But, based on the names of the tables, I don't think that is true.
I have a table like this:
ID Severity WorkItemSK
23636 3-Minor 695119
23636 3-Minor 697309
23647 2-Major 695081
23647 2-Major 694967
In here I have several WorkItems that share the same ID. How can I get unique IDs that have the highest WorkItem?
So it would like this:
ID Severity WorkItemSK
23636 3-Minor 697309
23647 2-Major 695081
Help the noob :) Mind giving a clue what SQL commands (again I am a noob) should I use? Or an example of a query?
Thank you in advance!
Assuming that Severity can change depending on the WorkItemSK, you'll want to use the following query:
Select T.ID, T.Severity, T.WorkItemSK
From Table T
Join
(
Select ID, Max(WorkItemSK) As WorkItemSK
From Table
Group By ID
) D On T.WorkItemSK = D.WorkItemSK And T.ID = D.ID
The last Join condition of T.ID = D.ID may or may not be needed, depending on whether WorkItemSK can appear multiple times in your table.
Otherwise, you can just use this:
Select ID, Severity, Max(WorkItemSK) As WorkItemSK
From Table
Group by ID, Severity
But if you have different Severity values per ID, you'll see duplicate IDs.
Use select with GROUP BY: SELECT id,MAX(WorkItemSK) FROM table GROUP BY id;
Relevant schema:
courier(courid:int,courname:str)
city(cid:int,cname:str,zid:int) #zid belonging to some table named zone
courierservice(courid:int,cid:int)
So the obvious relationship is courier 'serving' cities. I've been trying to get all couriers serving 'both' cities with cname as A and B.
i.e. a possible intersection.
I could get a workaround using:
select courname from courier where courid in (select courid from courierservice
where cid=(select cid from city where cname='A')) and
courid in (select courid from courierservice where cid=(select cid from city where cname='B'));
But this looks a little heavy.
It should, according to the documentation, work with the following All subquery:
select * from courier where courid in (select courid from courierservice
where cid = all (select cid from city where cname='A' or cname='B'));
But it is returning an Empty Set.
Is there something missing?
It's meaningless to use = ALL: how can a single courierservice.cid simultaneously be equal to more than one city.cid? ALL is only ever used with a comparison operator that can match more than one value, such as >= or <>: see Subqueries with ALL.
However, you would do well to rewrite your query using JOIN (to be both more concise and more performant):
SELECT courname FROM courier NATURAL JOIN (
SELECT courid
FROM courierservice JOIN city USING (cid)
WHERE cname IN ('A', 'B')
GROUP BY courid
HAVING SUM(cname='A') AND SUM(cname='B')
) t
Try replacing cid = all as cid in and it should work well.
ALL() returns a set. But when you compare using '=' it compares with element. So a element can't be compared with a set. If you want to know if that element is inside the set then you have to use IN clause instead of '='