How to convert sub query to joins?

How to convert sub query to joins? - mysql

I am using this query in my opencart site
SELECT MIN(tmp.date_added) AS date_start,
MAX(tmp.date_added) AS date_end,
COUNT(tmp.order_id) AS `orders`,
SUM(tmp.products) AS products,
SUM(tmp.tax) AS tax,
SUM(tmp.total) AS total
FROM
( SELECT o.order_id,
( SELECT SUM(op.quantity)
FROM `oc_order_product` op
WHERE op.order_id = o.order_id
GROUP BY op.order_id
) AS products,
( SELECT SUM(ot.value)
FROM `oc_order_total` ot
WHERE ot.order_id = o.order_id
AND ot.code = 'tax'
GROUP BY ot.order_id
) AS tax,
o.total,
o.date_added
FROM `oc_order` o
WHERE o.order_status_id > '0'
AND DATE(o.date_added) >= '2015-03-01'
AND DATE(o.date_added) <= '2016-04-19'
GROUP BY o.order_id
) tmp
GROUP BY WEEK(tmp.date_added)
ORDER BY tmp.date_added DESC
LIMIT 0,60
Queries like this make my site very slow. Is there any easy way to convert this query from sub query to joins.
Here is the output of above query

WEEK will have a hiccup around the first of the year -- there will be two partial weeks.
We are now in "week" 16 of 2016. That corresponds to slightly different days of 2015; did you want them combined?
Because of those hiccups with WEEK, you had better change the final ORDER BY to WEEK(tmp.date_added) DESC
The FROM ( SELECT ... ) is probably fine. Is that what you are asking about?
The two ( SELECT SUM ... ) AS ... are probably optimal, or nearly so. Is that what you are asking about?
However, you probably do need some indexes:
oc_order_total: INDEX(code, order_id) -- in that order
oc_order_product: INDEX(order_id)
Change DATE(o.date_added) >= '2015-03-01' to o.date_added >= '2015-03-01' (etc) so that INDEX(date_added) can be used.
If this can be only '1': o.order_status_id > '0', then change it to o.order_status_id = 1 so that INDEX(order_status_id, date_added) can be used.

Related

a sum() function with aritmatich 4 table

this is sample data in table pengiriman_supply.
and this is for data_barang
this is for data_supplier and table masuk.
if I'm not using 3 tables the sum is no a problem but if I'm using 4 tables and using subtraction with (sum(table1.a)-ifnull(table2.b)). here is the result with just sum
and this is the picture with subtraction
the code is like this
SELECT DISTINCT
row_number() over(
order by pengiriman_supply.po_nomor desc) as no,
pengiriman_supply.po_nomor as PO,
data_supplier.nama_supplier,
data_barang.nama_barang,
((sum( pengiriman_supply.jumlah ))- (sum( COALESCE ( masuk.terima, 0 )) over ( PARTITION BY masuk.refrence ))) as total
FROM
pengiriman_supply
LEFT JOIN masuk ON pengiriman_supply.po_nomor = masuk.refrence
INNER JOIN data_supplier ON data_supplier.id_supplier = pengiriman_supply.idsupplier
INNER JOIN data_barang ON data_barang.idbarang = pengiriman_supply.idbarang
WHERE
pengiriman_supply.tanggal between date_sub(curdate(), interval 60 day) and curdate()
GROUP BY
pengiriman_supply.po_nomor,masuk.po_nomor,data_supplier.nama_supplier
ORDER BY
GROUP_CONCAT(DISTINCT pengiriman_supply.po_nomor) DESC
this the code that SQL statement that I can find. but the group by not make the SQL statement just pengiriman_supply.po_nomor. can I make the group by just the pengiriman_supply.po_nomor .
can the number 31194 make in one group?

it seems you need to include ifnull(masuk.terima,0) inside sum()
SELECT
pengiriman_supply.po_nomor AS po,
data_supplier.nama_supplier,
data_barang.nama_barang,
Sum((pengiriman_supply.jumlah)-ifnull(masuk.terima,0)) as total
FROM
pengiriman_supply
INNER JOIN data_barang ON pengiriman_supply.idbarang = data_barang.idbarang
INNER JOIN data_supplier ON pengiriman_supply.idsupplier = data_supplier.id_supplier
LEFT JOIN masuk ON masuk.refrence = pengiriman_supply.po_nomor
GROUP BY
pengiriman_supply.po_nomor
ORDER BY
po DESC

Using Groupby and Orderby on same query

SELECT *
FROM
(
SELECT com_jobcard.job_card_num,
sum( worked_qty ),employee.emp_name
FROM timer_completed
INNER JOIN process ON process.id = timer_completed.process_id
INNER JOIN com_jobcard ON com_jobcard.id = timer_completed.job_card_id
INNER JOIN employee ON employee.id = timer_completed.employee_id
AND process.id = '611'
AND timer_completed.group_id = '60'
AND timer_completed.report_date = DATE_ADD(CURDATE(), INTERVAL -1 DAY)
ORDER BY com_jobcard.id DESC
) AS tmp_table
GROUP BY com_jobcard.job_card_num
In this code I'm using Group by option but I need the result in descending order of com_jobcard.id if I use the above query it returns:
#1054 - Unknown column 'com_jobcard.job_card_num' in 'group
statement' .
please help me .

Use
GROUP BY tmp_table.job_card_num

Two things:
1) All Columns in the Sub Query need to be Named. This will clear error #1054
sum( worked_qty ) as 'WorkedTotal'
2) Order by is only available in Subqueries if you are using the 'Top Select' clause . You will need to use Order by Where you have group by and vice versa

You're unnecessarily nesting your query here. You can order an aggregate query result set by putting the ORDER BY after the GROUP BY. Also, like Gordon pointed out in his comment, you're abusing the nonstandard MySql extension to GROUP BY. This will make you crazy unless you learn about it. https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html
Try refactoring your query like this:
SELECT com_jobcard.job_card_num,
sum( worked_qty),
employee.emp_name,
com_jobcard.id
FROM timer_completed
JOIN process ON process.id = timer_completed.process_id
JOIN com_jobcard ON com_jobcard.id = timer_completed.job_card_id
JOIN employee ON employee.id = timer_completed.employee_id
AND process.id = '611'
AND timer_completed.group_id = '60'
AND timer_completed.report_date = DATE_ADD(CURDATE(), INTERVAL -1 DAY)
GROUP BY com_jobcard.job_card_num, employee.emp_name, com_jobcard.id
ORDER BY com_jobcard.id DESC
Besides being simpler than your proposed query, this handles GROUP BY correctly and yields the order you've specified.

SQL request optimization

I have an SQL request that take 100% of my VM CPU while it's working. I wanna know how to optimize it :
SELECT g.name AS hostgroup
, h.name AS hostname
, a.host_id
, s.display_name AS servicename
, a.service_id
, a.entry_time AS ack_time
, ( SELECT ctime
FROM logs
WHERE logs.host_id = a.host_id
AND logs.service_id = a.service_id
AND logs.ctime < a.entry_time
AND logs.status IN (1, 2, 3)
AND logs.type = 1
ORDER BY logs.log_id DESC
LIMIT 1) AS start_time
, ar.acl_res_name AS timeperiod
, a.state AS state
, a.author
, a.acknowledgement_id AS ack_id
FROM centstorage.acknowledgements a
LEFT JOIN centstorage.hosts h ON a.host_id = h.host_id
LEFT JOIN centstorage.services s ON a.service_id = s.service_id
LEFT JOIN centstorage.hosts_hostgroups p ON a.host_id = p.host_id
LEFT JOIN centstorage.hostgroups g ON g.hostgroup_id = p.hostgroup_id
LEFT JOIN centreon.hostgroup_relation hg ON a.host_id = hg.host_host_id
LEFT JOIN centreon.acl_resources_hg_relations hh ON hg.hostgroup_hg_id = hh.hg_hg_id
LEFT JOIN centreon.acl_resources ar ON hh.acl_res_id = ar.acl_res_id
WHERE ar.acl_res_name != 'All Resources'
AND YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id is not null
ORDER BY a.acknowledgement_id ASC
The problem is at this part :
(SELECT ctime FROM logs
WHERE logs.host_id = a.host_id
AND logs.service_id = a.service_id
AND logs.ctime < a.entry_time
AND logs.status IN (1, 2, 3)
AND logs.type = 1
ORDER BY logs.log_id DESC
LIMIT 1) AS start_time
The table logs is really huge and some friends told me to use a buffer table/database but i pretty knew to this things and i don't know how to do it.
There is an EXPLAIN EXTENDED of the query :
It seems that he will examined only 2 row of the table logs so why it takes so much time ? (There is 560000 row in the table logs).
Here is all indexes of those tables :
centstorage.acknowledgements :
centstorage.hosts :
centstorage.services :
centstorage.hosts_hostgroups :
centstorage.hostgroups :
centreon.hostgroup_relation :
centreon.acl_resources_hg_relations :
centreon.acl_resources :

For SQL Server there is the possibility to define the maximum degree of parallelism of your query using MAXDOP
For example you can define at the end of your query
option (maxdop 2)
I'm pretty sure there's an equivalent in MySql.
You can try to approach this situation if the execution time is not relevant.

Create a Temporary Table from where condition for acknowledgements, schema will have column required in final result and used in JOIN with all your 7 tables
CREATE TEMPORARY TABLE __tempacknowledgements AS SELECT g.name AS hostgroup
, '' AS hostname
, a.host_id
, s.display_name AS servicename
, a.service_id
, a.entry_time AS ack_time
, '' AS AS start_time
, '' AS timeperiod
, a.state AS state
, a.author
, a.acknowledgement_id AS ack_id
FROM centstorage.acknowledgements a
WHERE YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id IS NOT NULL
ORDER BY a.acknowledgement_id ASC;
Or create using proper column definition
Update fields from all tables having left join, you can use Inner Join in update. You should write 7 different update statements. 2 examples are given below.
UPDATE __tempacknowledgements a JOIN centstorage.hosts h USING(host_id)
SET a.name=h.name;
UPDATE __tempacknowledgements s JOIN centstorage.services h USING(service_id)
SET a.acl_res_name=s.acl_res_name;
similar way update ctime from logs using Join with Logs, this is 8th update statement.
pick select from temp table.
drop temp table
a sp can be written for this.

Turn LEFT JOIN into JOIN unless you have a real need for LEFT.
AND YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id is not null
Do you have any rows with a.service_id is not null? If not, get rid of it.
As already mentioned, that date comparison does not optimize. Here is what to use instead:
AND a.entry_time >= CONCAT(LEFT(CURDATE(), 7), '-01')
AND a.entry_time < CONCAT(LEFT(CURDATE(), 7), '-01') + INTERVAL 1 MONTH
And add one of these (depending on my above comment):
INDEX(entry_time)
INDEX(service_id, entry_time)
The correlated subquery is hard to optimize. This index (on logs) may help:
INDEX(type, host_id, service_id, status)

WHERE IN is time killer!
Instead of
logs.status IN (1, 2, 3)
use
logs.status=1 or logs.status=2 or logs.status=3

I have SLIGHTLY reformatted the query for my readability reference and better seeing the relations between the tables... otherwise ignore that part.
SELECT
g.name AS hostgroup,
h.name AS hostname,
a.host_id,
s.display_name AS servicename,
a.service_id,
a.entry_time AS ack_time,
( SELECT
ctime
FROM
logs
WHERE
logs.host_id = a.host_id
AND logs.service_id = a.service_id
AND logs.ctime < a.entry_time
AND logs.status IN (1, 2, 3)
AND logs.type = 1
ORDER BY
logs.log_id DESC
LIMIT 1) AS start_time,
ar.acl_res_name AS timeperiod,
a.state AS state,
a.author,
a.acknowledgement_id AS ack_id
FROM
centstorage.acknowledgements a
LEFT JOIN centstorage.hosts h
ON a.host_id = h.host_id
LEFT JOIN centstorage.services s
ON a.service_id = s.service_id
LEFT JOIN centstorage.hosts_hostgroups p
ON a.host_id = p.host_id
LEFT JOIN centstorage.hostgroups g
ON p.hostgroup_id = g.hostgroup_id
LEFT JOIN centreon.hostgroup_relation hg
ON a.host_id = hg.host_host_id
LEFT JOIN centreon.acl_resources_hg_relations hh
ON hg.hostgroup_hg_id = hh.hg_hg_id
LEFT JOIN centreon.acl_resources ar
ON hh.acl_res_id = ar.acl_res_id
WHERE
ar.acl_res_name != 'All Resources'
AND YEAR(FROM_UNIXTIME( a.entry_time )) = YEAR(CURDATE())
AND MONTH(FROM_UNIXTIME( a.entry_time )) = MONTH(CURDATE())
AND a.service_id is not null
ORDER BY
a.acknowledgement_id ASC
I would first recommend starting with your "acknowledgements" table and have an index at a minimum of ( entry_time, acknowledgement_id ). Next, update your WHERE clause. Because you are running a function to convert the unix timestamp to a date and grabbing the YEAR (and month) respectively, I don't believe it is utilizing the index as it has to compute that for every row. To eleviate that, a unix timestamp is nothing but a number representing seconds from a specifc point in time. If you are looking for a specific month, then pre-compute the starting and ending unix times and run for that range. Something like...
and a.entry_time >= UNIX_TIMESTAMP( '2015-10-01' )
and a.entry_time < UNIX_TIMESTAMP( '2015-11-01' )
This way, it accounts for all seconds within the month up to 11:59:59 on Oct 31, just before November 1st.
Then, without my glasses to see all the images more clearly, and short time this morning, I would ensure you have at least the following indexes on each table respectively
table index
logs ( host_id, service_id, type, status, ctime, log_id )
acknowledgements ( entry_time, acknowledgement_id, host_id, service_id )
hosts ( host_id, name )
services ( service_id, display_name )
hosts_hostgroups ( host_id, hostgroup_id )
hostgroups ( hostgroup_id, name )
hostgroup_relation ( host_host_id, hostgroup_hg_id )
acl_resources_hg_relations ( hh_hg_id, acl_res_id )
acl_resources ar ( acl_res_id, acl_res_name )
Finally, your correlated sub-query field is going to be a killer as it is processed for every row, but hopefully the other index optimization ideas will help performance.

HAVING clause not working after server update

We have just made a major upgrade in mysql version from 5.0.51 to 5.6.22 and I have just noticed that one of my queries no longer works properly.
SELECT
p.id AS product_id,
p.code,
p.description,
p.unitofmeasure,
p.costprice,
p.packsize,
vc.rateinpercent,
CASE
WHEN Sum(sales.qty) IS NULL THEN 0
ELSE Sum(sales.qty)
END AS sold,
CASE
WHEN stock.stocklevel IS NULL THEN 0
ELSE stock.stocklevel
END AS stocklevel,
sum(sales.qty) - stock.stocklevel AS diff,
CEIL((sum(sales.qty) - stock.stocklevel) / p.packsize) AS amt
FROM products p
LEFT JOIN
( SELECT
col.product_id,
col.quantity AS qty
FROM customerorderlines col
LEFT JOIN customerorders co
ON co.id = col.customerorder_id
WHERE co.orderdate >= '2014-12-01 00:00:00'
AND co.orderdate <= '2015-02-09 23:59:59'
AND co.location_id IN (1,2,3,7)
) sales
ON sales.product_id = p.id
LEFT JOIN
( SELECT
product_id,
location_id,
Sum(stocklevel) AS stocklevel
FROM stock
WHERE location_id IN (1,2,3,7)
GROUP BY product_id
) stock
ON stock.product_id = p.id
LEFT JOIN vatcodes vc
ON vc.id = p.purchasevatcode_id
WHERE p.supplier_id IN (137)
AND p.currentstatus_v = 1
GROUP BY p.id
HAVING sold > stocklevel
ORDER BY sold DESC
On the old server, the HAVING clause filtered out all results with minuses in, giving a result as follows:
Instead, I am getting the following result on the new server:
Basically, it's filtering out some of the negative results but not all of them. (The datasets are a few days old, which is why the 'Freeze Gel Spray' qty and sold and stock numbers are slightly different)
Hindsight is a wonderful thing but I didn't expect there to by any major changes for queries between server updates so I didn't care to test or check anything. Luckily this one of only two or three queries that use HAVING, so if I have to re-write a couple of queries so be it. Any ideas as to why this is though? If it wasn't working at all, fair enough, but to only be working partially?
Thanks in advance for any insight,
R

I take it you've tried EXPLAIN on the query to find out what it's doing?
Try making the calculated field names unique from the underlying field names so you can be sure what you're filtering on. I've seen some screwy results when the calculated fields have the same names as underlying physical fields.
Having your subqueries return the same format of results (i.e. both summed/grouped) helps to see what's going on.
I haven't tested this query but it may help. If you post the table structures and perhaps some fake data that shows the error, that would help diagnose
SELECT
p.id AS product_id,
p.code,
p.description,
p.unitofmeasure,
p.costprice,
p.packsize,
vc.rateinpercent,
sales.totalSold,
stock.totalStock,
sales.totalSold - stock.totalStock AS diff,
CEIL((sales.totalSold - stock.totalStock) / p.packsize) AS amt
FROM products p
LEFT JOIN (
SELECT
col.product_id,
IFNULL( SUM(col.quantity), 0) AS totalSold
FROM customerorderlines col
LEFT JOIN customerorders co
ON co.id = col.customerorder_id
WHERE co.orderdate >= '2014-12-01 00:00:00'
AND co.orderdate <= '2015-02-09 23:59:59'
AND co.location_id IN (1,2,3,7)
GROUP BY product_id
) sales
ON sales.product_id = p.id
LEFT JOIN (
SELECT
product_id,
IFNULL( SUM(stocklevel), 0) AS totalStock
FROM stock
WHERE location_id IN (1,2,3,7)
GROUP BY product_id
) stock
ON stock.product_id = p.id
LEFT JOIN vatcodes vc
ON vc.id = p.purchasevatcode_id
WHERE p.supplier_id IN (137)
AND p.currentstatus_v = 1
GROUP BY p.id
HAVING totalSold > totalStock
ORDER BY totalSold DESC

Help calculating average per day

The daily_average column is always returning zero. The default timestamp values are for the past week. Any thoughts on what I'm doing wrong here in getting the average order value per day?
SELECT
SUM(price+shipping_price) AS total_sales,
COUNT(id) AS total_orders,
AVG(price+shipping_price) AS order_total_average,
(SELECT
SUM(quantity)
FROM `order_product`
INNER JOIN `order` ON (
`order`.id = order_product.order_id AND
`order`.created >= '.$startTimestamp.' AND
`order`.created <= '.$endTimestamp.' AND
`order`.type_id = '.$type->getId().' AND
`order`.fraud = 0
)
) as total_units,
SUM(price+shipping_price)/DATEDIFF('.$endTimestamp.', '.$startTimestamp.') as daily_average
FROM `order`
WHERE created >= '.$startTimestamp.' AND
created <= '.$endTimestamp.' AND
fraud = 0 AND
type_id = '.$type->getId().'

You're using aggregate functions (SUM, COUNT, AVG) without an aggregate command (group by). I think your SQL is more complicated than it needs to be (no need for the inner select).
Here's a SQL command that should work (hard to test without test data ;))
SELECT
COUNT(id) total_orders,
SUM(finalprice) total_sales,
AVG(finalprice) order_average,
SUM(units) total_units,
SUM(finalprice)/DATEDIFF('.$endTimestamp.', '.$startTimestamp.') daily_average
FROM (
SELECT
o.id id,
o.price+o.shipping_price finalprice,
SUM(p.quantity) units
FROM order o INNER JOIN order_product p ON p.order_id=o.id
WHERE o.created>='.$startTimestamp.'
AND o.created<='.$endTimestamp.'
AND o.fraud=0
AND o.type_id='.$type->getId().'
GROUP BY p.order_id
) t;

Does casting one of the elements in the division work for you?
SELECT
SUM(price+shipping_price) AS total_sales,
COUNT(id) AS total_orders,
AVG(price+shipping_price) AS order_total_average,
(SELECT
SUM(quantity)
FROM `order_product`
INNER JOIN `order` ON (
`order`.id = order_product.order_id AND
`order`.created >= '.$startTimestamp.' AND
`order`.created <= '.$endTimestamp.' AND
`order`.type_id = '.$type->getId().' AND
`order`.fraud = 0
)
) as total_units,
CAST(SUM(price+shipping_price) AS float)/DATEDIFF('.$endTimestamp.', '.$startTimestamp.') as daily_average
FROM `order`
WHERE created >= '.$startTimestamp.' AND
created <= '.$endTimestamp.' AND
fraud = 0 AND
type_id = '.$type->getId().'

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to convert sub query to joins? - mysql

Related

a sum() function with aritmatich 4 table

Using Groupby and Orderby on same query

SQL request optimization

HAVING clause not working after server update

Help calculating average per day

Categories

Resources