Union All Query takes too long - mysql

This question have been asked multiple times I am sure, but every case is different.
I have MySQL setup on a strong computer with 2GB RAM, it does not do too much so the computer is sufficient.
The following query has been built as a view :
create view view_orders as
select distinct
tbl_orders_order.order_date AS sort_col,
tbl_orders_order.order_id AS order_id,
_utf8'website' AS src,tbl_order_users.company AS company,
tbl_order_users.phone AS phone,
tbl_order_users.full_name AS full_name,
time_format(tbl_orders_order.order_date,_utf8'%H:%i') AS c_time,
date_format(tbl_orders_order.order_date,_utf8'%d/%m/%Y') AS c_date,
tbl_orders_order.comments AS comments,
tbl_orders_order.tmp_cname AS tmp_cname,
tbl_orders_order.tmp_pname AS tmp_pname,
count(tbl_order_docfiles.docfile_id) AS number_of_files,
(case tbl_orders_order.status when 1 then _utf8'completed' when 2 then _utf8'hc' when 0 then _utf8'not-completed' when 3 then _utf8'hc-canceled' end) AS status,
tbl_orders_order.employee_name AS employee_name,
tbl_orders_order.status_date AS status_date,
tbl_orders_order.cancel_reason AS cancel_reason
from
tbl_orders_order left join tbl_order_users on tbl_orders_order.user_id = tbl_order_users.user_id
left join
tbl_order_docfiles on tbl_order_docfiles.order_id = tbl_orders_order.order_id
group by
tbl_orders_order.order_id
union all
select distinct tbl_h.h_date AS sort_col,
(case tbl_h.sub_oid when 0 then tbl_h.order_number else concat(tbl_h.order_number,_utf8'-',tbl_h.sub_oid) end) AS order_id,
(case tbl_h.type when 1 then _utf8'פקס' when 2 then _utf8'email' end) AS src,_utf8'' AS company,
_utf8'' AS phone,_utf8'' AS full_name,time_format(tbl_h.h_date,_utf8'%H:%i') AS c_time,
date_format(tbl_h.h_date,_utf8'%d/%m/%Y') AS c_date,_utf8'' AS comments,tbl_h.client_name AS tmp_cname,
tbl_h.project_name AS tmp_pname,
tbl_h.quantity AS number_of_files,
_utf8'completed' AS status,
tbl_h.computer_name AS employee_name,
_utf8'' AS status_date,
_utf8'' AS cancel_reason
from tbl_h;
The query used UNION, than I read an article about UNION ALL and now uses that.
Query alone takes about 3 seconds to execute (UNION took 4.5-5.5 seconds)
Each part in seperate runs in seconds.
The application does sorting and select on this view, which makes it processing time even larger - about 6 seconds when query is cached, about 12 seconds or more if data has changed.
I see no other way to combine these two results, as both sorted needs to display to the user and I guess something I am doing is wrong.
Of course both tables uses primary keys.
UPDATE!!!!
It didn't help, I got the utf8/case/date_format out of the union query, and removed distincts, now query takes 4 seconds (even longer).
query without case/date/utf8 (only union) was shortened to 2.3 seconds (0.3 seconds improvement).
create view view_orders as
select *,
(CASE src
WHEN 1 THEN
_utf8'fax'
WHEN 2 THEN
_utf8'mail'
WHEN 3 THEN
_utf8'website'
END) AS src,
time_format(order_date,'%H:%i') AS c_time,
date_format(order_date,'%d/%m/%Y') AS c_date,
(CASE status
WHEN 1 THEN
_utf8'completed'
WHEN 2 THEN
_utf8'hc handling'
WHEN 0 THEN
_utf8'not completed'
WHEN 3 THEN
_utf8'canceled'
END) AS status
FROM
(
select
o.order_date AS sort_col,
o.order_id,
3 AS src,
u.company,
u.phone,
u.full_name,
o.order_date,
o.comments,
o.tmp_cname,
o.tmp_pname,
count(doc.docfile_id) AS number_of_files,
o.status,
o.employee_name,
o.status_date,
o.cancel_reason
from
tbl_orders_order o
LEFT JOIN
tbl_order_users u ON u.user_id = o.user_id
LEFT JOIN
tbl_order_docfiles doc ON doc.order_id = o.order_id
GROUP BY
o.order_id
union all
select
h.h_date AS sort_col,
(case h.sub_oid when 0 then h.order_number else concat(h.order_number,'-',h.sub_oid) end) AS order_id,
h.type as src,
'' AS company,
'' AS phone,
'' AS full_name,
h.h_date,
'' AS comments,
h.client_name AS tmp_cname,
h.project_name AS tmp_pname,
h.quantity AS number_of_files,
1 AS status,
h.computer_name AS employee_name,
'' AS status_date,
'' AS cancel_reason
from tbl_h h
)

Think about your using UNION and DISTINCT keywords. Can your query really result in duplicate rows? If yes, the optimal query for removing duplicates would probably be of this form:
SELECT ... -- No "DISTINCT" here
UNION
SELECT ... -- No "DISTINCT" here
There is probably no need for DISTINCT in the two subqueries. If duplicates are impossible anyway, try using this form instead. This will be the fastest execution of your query (without further optimising the subqueries):
SELECT ... -- No "DISTINCT" here
UNION ALL
SELECT ... -- No "DISTINCT" here
Rationale: Both UNION and DISTINCT apply a "UNIQUE SORT" operation on your intermediate result sets. Depending on how much data your subqueries return, this can be very expensive. That's one reason why omitting DISTINCT and replacing UNION by UNION ALL is much faster.
UPDATE Another idea, if you do have to remove duplicates: Remove duplicates first in an inner query, and format dates and codes only afterwards in an outer query. That will accelerate the "UNIQUE SORT" operation because comparing 32/64-bit integers is less expensive than comparing varchars:
SELECT a, b, date_format(c), case d when 1 then 'completed' else '...' end
FROM (
SELECT a, b, c, d ... -- No date format here
UNION
SELECT a, b, c, d ... -- No date format here
)

It may be related to the UNION triggering a character set conversion. For example cancel_reason in the one query is defined as utf8, but in the other it is not specified.
Check if there is a very high cpu spike when you run this query, this would indicate conversion.
Personally I would have done a union of the raw data first, and then applied the case and conversion statements. But I am not sure that that would make a difference in the performance.

Can you try this one:
SELECT
o.order_date AS sort_col,
o.order_id AS order_id,
_utf8'website' AS src,
u.company AS company,
u.phone AS phone,
u.full_name AS full_name,
time_format(o.order_date,_utf8'%H:%i') AS c_time,
date_format(o.order_date,_utf8'%d/%m/%Y') AS c_date,
o.comments AS comments,
o.tmp_cname AS tmp_cname,
o.tmp_pname AS tmp_pname,
COALESCE(d.number_of_files, 0) AS number_of_files,
( CASE o.status WHEN 1 THEN _utf8'completed'
WHEN 2 THEN _utf8'hc'
WHEN 0 THEN _utf8'not-completed'
WHEN 3 THEN _utf8'hc-canceled'
END ) AS status,
o.employee_name AS employee_name,
o.status_date AS status_date,
o.cancel_reason AS cancel_reason
FROM
tbl_orders_order AS o
LEFT JOIN
tbl_order_users AS u
ON o.user_id = u.user_id
LEFT JOIN
( SELECT order_id
, COUNT(*) AS number_of_files
FROM tbl_order_docfiles
GROUP BY order_id
) AS d
ON d.order_id = o.order_id
UNION ALL
SELECT
tbl_h.h_date AS sort_col,
...
FROM tbl_h

Related

how to count number of lines with jointure in Talend on Oracle

i have 3 tables
supplier(id_supp, name, adress, ...)
Customer(id_cust, name, adress, ...)
Order(id_order, ref_cust, ref_supp, date_order...)
I want to make a job that counts the number of orders by Supplier, for last_week, last_two_weeks with Talend
select
supp.name,
(
select
count(*)
from
order
where
date_order between sysdate-7 and sysdate
nd ref_supp=id_supp
) as week_1,
(
select
count(*)
from
order
where
date_order between sysdate-14 and sysdate-7
nd ref_supp=id_supp
) as week_2
from supplier supp
the resaon for what i'm doing this, is that my query took to much time
You need a join between supplier and order to get supplier names. I show an inner join, but if you need ALL suppliers (even those with no orders in the order table) you may change it to a left outer join.
Other than that, you should only have to read the order table once and get all the info you need. Your query does more than one pass (read EXPLAIN PLAN for your query), which may be why it is taking too long.
NOTE: sysdate has a time-of-day component (and perhaps the date_order value does too); the way you wrote the query may or may not do exactly what you want it to do. You may have to surround sysdate by trunc().
select s.name,
count(case when o.date_order between sysdate - 7 and sysdate then 1 end)
as week_1,
count(case when o.date_order between sysdate - 14 and sysdate - 7 then 1 end)
as week_2
from supplier s inner join order o
on s.id_supp = o.ref_supp
;

Adding blank rows to display of result set returned by MySQL query

I am storing hourly results in a MySQL database table which take the form:
ResultId,CreatedDateTime,Keyword,Frequency,PositiveResult,NegativeResult
349,2015-07-17 00:00:00,Homer Simpson,0.0,0.0,0.0
349,2015-07-17 01:00:00,Homer Simpson,3.0,4.0,-2.0
349,2015-07-17 01:00:00,Homer Simpson,1.0,1.0,-1.0
349,2015-07-17 04:00:00,Homer Simpson,1.0,1.0,0.0
349,2015-07-17 05:00:00,Homer Simpson,8.0,3.0,-2.0
349,2015-07-17 05:00:00,Homer Simpson,1.0,0.0,0.0
Where there might be several results for a given hour, but none for certain hours.
If I want to produce averages of the hourly results, I can do something like this:
SELECT ItemCreatedDateTime AS 'Created on',
KeywordText AS 'Keyword', ROUND(AVG(KeywordFrequency), 2) AS 'Average frequency',
ROUND(AVG(PositiveResult), 2) AS 'Average positive result',
ROUND(AVG(NegativeResult), 2) AS 'Average negative result'
FROM Results
WHERE ResultsNo = 349 AND CreatedDateTime BETWEEN '2015-07-13 00:00:00' AND '2015-07-19 23:59:00'
GROUP BY KeywordText, CreatedDateTime
ORDER BY KeywordText, CreatedDateTime
However, the results only include the hours where data exists, e.g.:
349,2015-07-17 01:00:00,Homer Simpson,2.0,2.5,-1.5
349,2015-07-17 04:00:00,Homer Simpson,1.0,1.0,0.0
349,2015-07-17 05:00:00,Homer Simpson,4.5,1.5,-1.0
But I need to show blanks rows for the missing hours, e.g.
349,2015-07-17 01:00:00,Homer Simpson,2.0,2.5,-1.5
349,2015-07-17 02:00:00,Homer Simpson,0.0,0.0,0.0
349,2015-07-17 03:00:00,Homer Simpson,0.0,0.0,0.0
349,2015-07-17 04:00:00,Homer Simpson,1.0,1.0,0.0
349,2015-07-17 05:00:00,Homer Simpson,4.5,1.5,-1.0
Short of inserting blanks into the results before they are presented, I am uncertain of how to proceed: can I use MySQL to include the blank rows at all?
SQL in general has no knowledge about the data, so you have to add that yourself. In this case you will have to insert the not used hours somehow. This can be done by inserting empty rows, or a bit different by counting the hours and adjusting your average for that.
Counting the hours and adjusting the average:
Count all hours with data (A)
Calculate the number of hours in the period (B)
Calculate the avg as you already did, multiply by A divide by B
Example code to get the hours:
SELECT COUNT(*) AS number_of_records_with_data,
(TO_SECONDS('2015-07-19 23:59:00')-TO_SECONDS('2015-07-13 00:00:00'))/3600
AS number_of_hours_in_interval
FROM Results
WHERE ResultsNo = 349 AND CreatedDateTime
BETWEEN '2015-07-13 00:00:00' AND '2015-07-19 23:59:00'
GROUP BY KeywordText, CreatedDateTime;
And just integrate it with the rest of your query.
You can't use MySQL for that. You'll have to do this with whatever you're using later to process the results. Iterate over the range of hours/dates you're interested in and for those, where MySQL returned some data, us that data. For the rest, just add null/zero values.
Small update after some discussions with my stackoverflow colleagues:
Instead of you can't I should have wrote you shouldn't - as other users have proved there are ways to do this. But I still believe that for different tasks we should use tools that were created having such tasks in mind. And by that I mean that while it's probably possible to tow a car with an F-16, it's still better to just call a tow truck ;) That's what tow trucks are made for.
Although you already have accepted an answer I want to demonstrate how you can generate a datetime series in the query and use that to solve your problem.
This query uses a combination of cross joins together with basic arithmetic and date functions to generate a series of all hours between 2015-07-16 00:00:00 AND 2015-07-18 23:59:00.
Generating this type of data on the fly isn't the best option though; if you already had a table with the numbers 0-31 then all the union queries would be unnecessary.
See this SQL Fiddle to see how it could look using a small number table.
Sample SQL Fiddle with a demo of the query below
select
c.createddate as "Created on",
c.Keyword,
coalesce(ROUND(AVG(KeywordFrequency), 2),0.0) AS 'Average frequency',
coalesce(ROUND(AVG(PositiveResult), 2),0.0) AS 'Average positive result',
coalesce(ROUND(AVG(NegativeResult), 2),0.0) AS 'Average negative result'
from (
select
q.createddate + interval d day + interval t hour as createddate,
d.KeywordText AS 'Keyword'
from (
select distinct h10*10+h1 d from (
select 0 as h10
union all select 1 union all select 2 union all select 3
) d10 cross join (
select 0 as h1
union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9
) d1
) days cross join (
select distinct t10*10 + t1 t from (
select 0 as t10 union all select 1 union all select 2
) h10 cross join (
select 0 as t1
union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9
) h1
) hours
cross join
-- use the following line to set the start date for the series
(select '2015-07-16 00:00:00' createddate) q
-- or use the line below to use the dates in the table
-- (select distinct cast(CreatedDateTime as date) CreatedDate from results) q
cross join (select distinct KeywordText from results) d
) c
left join results r on r.CreatedDateTime = c.createddate AND ResultsNo = 349 and r.KeywordText = c.Keyword
where c.createddate BETWEEN '2015-07-16 00:00:00' AND '2015-07-18 23:59:00'
GROUP BY c.createddate, Keyword
ORDER BY c.createddate, Keyword;
I came up with an idea to do it for add rows with null values in the last of your MySQL query.
Just run this query (in the limit add any number of empty rows you want), and ignore the last column:
SELECT ItemCreatedDateTime AS 'Created on',
KeywordText AS 'Keyword',
ROUND(AVG(KeywordFrequency), 2) AS 'Average frequency',
ROUND(AVG(PositiveResult), 2) AS 'Average positive result',
ROUND(AVG(NegativeResult), 2) AS 'Average negative result',
null
FROM Results
WHERE ResultsNo = 349 AND CreatedDateTime BETWEEN '2015-07-13 00:00:00' AND
'2015-07-19 23:59:00'
GROUP BY KeywordText, CreatedDateTime
UNION
SELECT * FROM (
SELECT null a, null b, null c, null d, null e,
(#cnt := #cnt + 1) f
FROM (SELECT null FROM Results LIMIT 23) empty1
LEFT JOIN (SELECT * FROM Results LIMIT 23) empty2 ON FALSE
JOIN (SELECT #cnt := 0) empty3
) empty
ORDER BY KeywordText, CreatedDateTime

mysql increment variable using case

I have two tables marks and exams.
In the marks table I have studentid, mark1, mark2 and examid-foreign key from exams for different exams.
I want to get distinct student id and their number of failures in one single query.
The condition for failure is mark1+mark2 <50 or mark1<30. For e.g. If a student having studentid 1 has 15 entries(15 exams) in marks table and the same student failed in 6 so I want to get result as '1' and '6' in two columns and similarly for all students. For this case I wrote query using 'case' and is given below
select
distinct t1.studentid,
(#arrear:=
case
when (t1.mark1+t1.mark2) <50 OR t1.mark1 < 30
then #arrear+1 else #arrear
end) as failures
from marks t1, exams t2,
(select #arrear := 0) r
where t1.examid = t2.examid group by t1.studentid;
But the above query failed to give correct result. How can I modify the query to get correct result?
Try this. You don't need to use variables to help you.
select
m.studentid,
sum(case when m.mark1 + m.mark2 < 50 or m.mark1 < 30 then 1 else 0 end) as failures
from
marks m inner join exams e
on
m.examid = e.examid
group by
m.studentid
The case statement works out if the result is a failure or not and returns 1 for fail, 0 for no fail. Summing the result of this (grouped by studentid) gives you the number of fails per studentid
Oh and the join makes a more efficient join between your two tables :)
You don't need variable #arrear. You can get your info using only query
Try this:
select
distinct t1.studentid,
sum(
case
when (t1.mark1+t1.mark2) <50 OR t1.mark1 < 30
then 1
else 0
end
) as failures
from marks t1, exams t2
where t1.examid = t2.examid group by t1.studentid;

One MySQL query to get AVG by different Groupings?

Wondering is there is a way to write the following in ONE MySQL query.
I have a table:
cust_ID | rpt_name | req_secs
In the query I'd like to get:
the AVG req_secs when grouped by cust_ID
the AVG req_secs when grouped by rpt_name
the total req_secs AVG
I know I can do separate grouping queries on the same table then UNION the results into one. But I was hoping there was some way to do it in one query.
Thanks.
Well, the following would does two out of three:
select n,
(case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cust_id else rpt_name end);
This essentially "doubles" the data and then does the aggregation for each group. This assumes that cust_id and rpt_name are of compatible types. (The query could be tweaked if this is not the case.)
Actually, you can get the overall average by using rollup:
select n,
(case when n = 1 then cust_id else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) with rollup
This works for average because the average is the same on the "doubled" data as for the original data. It wouldn't work for sum() or count().
No there is not. You can group by a combination of cust_ID and rpt_name at the same time (i.e. two levels of grouping) but you are not going to be able to do separate top-level groupings and then a non-grouped aggregation at the same time.
Because of the way GROUP BY works, the SQL to do this is a little tricky. One way to get the result is to get three copies of the rows, and group each set of rows separately.
SELECT g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,'')) AS gval
, AVG(t.req_secs) AS avg_req_secs
FROM (SELECT 'cust_id' AS gkey UNION ALL SELECT 'rpt_name' UNION ALL SELECT 'total') g
CROSS
JOIN mytable t
GROUP
BY g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,''))
The inline view aliased as "g" doesn't have to use UNION ALL operators, you just need a rowset that returns exactly 3 rows with distinct values. I just used the UNION ALL as a convenient way to return three literal values as a rowset, so I could join that to the original table.

Average on a count() in same query

I'm currently working on an assignment which requires me to find the average on the number of resources for each module. The current table looks like this:
ResourceID ModulID
1 1
2 7
3 2
4 4
5 1
6 1
So basically, I'm trying to figure out how to get the average number of resources. The only
relevant test data here is for module 1, which has 3 different resources connected to it. But I need to display all of the results.
This is my code:
select avg(a.ress) GjSnitt, modulID
from
(select count(ressursID) as ress
from ressursertiloppgave
group by modulID) as a, ressursertiloppgave r
group by modulID;
Obviously it isn't working, but I'm currently at loss on what to change at this point. I would really appreciate any input you guys have.
This is the query you are executing, written in a slightly less obtuse syntax.
SELECT
avg(a.ress) as GjSnitt
, modulID
FROM
(SELECT COUNT(ressursID) as ress
FROM ressursertiloppgave
GROUP BY modulID) as a
CROSS JOIN ressursertiloppgave r <--- Cross join are very very rare!
GROUP BY modulID;
You are cross joining the table, making (6x6=) 36 rows in total and condensing this down to 4, but because the total count is 36, the outcome is wrong.
This is why you should never use implicit joins.
Rewrite the query to:
SELECT AVG(a.rcount) FROM
(select count(*) as rcount
FROM ressursertiloppgave r
GROUP BY r.ModulID) a
If you want the individual rowcount and the average at the bottom do:
SELECT r1.ModulID, count(*) as rcount
FROM ressursertiloppgave r1
GROUP BY r1.ModulID
UNION ALL
SELECT 'avg = ', AVG(a.rcount) FROM
(select count(*) as rcount
FROM ressursertiloppgave r2
GROUP BY r2.ModulID) a
I got the solution
SELECT AVG(counter)
FROM
(
SELECT COUNT(column to count) AS counter FROM table
) AS counter
Note that the nickname {counter} was added in SELECT COUNT and at the end of the inner SELECT