Subquery's WHERE needs fields from main query - mysql

have been struggling with this for the whole day and would like to ask for help how to make this query happen. My problem is that in INNER JOIN subquery's WHERE parts I need to use matching values from each GC table row being processed and obviously subquery doesnt know nothing about main query that's why it fails. Hopefully you'll catch the idea what i am trying to acomplish here:
SET #now=100; #sunix datetime
SELECT a.id, b.maxdate
FROM GC AS a
INNER JOIN (
SELECT 0 id_group, MAX(dt_active_to) AS maxdate
FROM GCDeals
WHERE dt_active_from > a.dt_lastused AND dt_active_from <= #now
GROUP BY id_group
UNION ALL
SELECT 1 id_group, MAX(dt_active_to) AS maxdate
FROM GCDeals
WHERE id_group <> 2 AND dt_active_from > a.dt_lastused AND dt_active_from <= #now
UNION ALL
SELECT 2 id_group, MAX(dt_active_to) AS maxdate
FROM GCDeals
WHERE id_group <> 1 AND dt_active_from > a.dt_lastused AND dt_active_from <= #now
GROUP BY id_group
) AS b
ON a.id_group = b.id_group
LEFT JOIN GCMsg AS c
ON a.id = c.id_gc
WHERE a.is_active = 1 AND a.dt_lastused < #now AND c.id_gc IS NULL
ORDER BY a.id
Thank you

Okay, I hope I have understood your original SQL now. You want all GC with the last appropriate max date. What you consider appropriate depends both on gc.dt_lastused and on gc.id_group. So rather than joining the tables together, you should select the max date per record in a subquery:
select id,
(
select max(dt_active_to)
from gcdeals
where dt_active_from > gc.dt_lastused and dt_active_from <= #now
and
(
gc.id_group = 0)
or
(gc.id_group = 1 and gcdeals.id_group <> 2)
or
(gc.id_group = 2 and gcdeals.id_group <> 1)
)
) as maxdate
from gc
where is_active = 1 and dt_lastused < #now
and id not in (select id_gc from gcmsg)
order by id;
EDIT: Here is the same statement using a join, offering to select max(dt_active_from) and min(dt_active_to) in one pass:
select gc.id, max(gcd.dt_active_from), min(gcd.dt_active_to)
from gc
left outer join gcdeals gcd
on gc.id = gcd.id_gc
and gcd.dt_active_from > gc.dt_lastused and gcd.dt_active_from <= #now
and
(
gc.id_group = 0)
or
(gc.id_group = 1 and gcd.id_group <> 2)
or
(gc.id_group = 2 and gcd.id_group <> 1)
)
where gc.is_active = 1 and gc.dt_lastused < #now
group by gc.id
order by gc.id;
You see, once you found out how to select the desired value in a subselect, it is not too hard to change it into a join. You get what you are looking for in two steps. If on the other hand you start with thinking in joins the same task can be quite abstract.
As to the execution plan: Say GC has 1000 active records and there are usually about 10 appropriate matches in GCDeals. Then the first statement selects 1,000 records and uses a loop on each record to access the GCDeals aggregate value. The second statement would just join 1,000 GC records with 10 GCDeals records each, thus getting 10,000 records, then aggregate them to make it 1,000 records again. Maybe the loops are faster, maybe the join. This depends. But, say, GC has one million active records and on each record you expect 1000 GCDeals matches, then the first statement may be quite slow having to loop so many times. But the second statement will create a billion intermediate records which can cause memory problems and either lead to very slow execution, too, or even lead to an unsufficient memory error. So it's just good to know that both techniques are available.

Related

Conditionally counting while also grouping by

I am trying to join two tables
ad_data_grouped
adID, adDate (date), totalViews
This is data that has already been grouped by both adID and adDate.
The second table is
leads
leadID, DateOfBirth, adID, state, createdAt(dateTime)
What I'm struggling with is joining these two tables so I can have a column that counts the number of leads when it shares the same adID and where the adDate = createdAt
The problem I'm running into is that when the counts are all the same for all groupings of adID....I have a few other things I'm trying to do, but it's based on similar similar conditional counting.
Query:(I know the temp table is probably overkill, but I'm trying to break this up into small pieces where I can understand what each piece does)
CREATE TEMPORARY TABLE ad_stats_grouped
SELECT * FROM `ad_stats`
LIMIT 0;
INSERT INTO ad_stats_grouped(AdID, adDate, DailyViews)
SELECT
AdID,
adDate,
sum(DailyViews)
FROM `ad_stats`
GROUP BY adID, adDate;
SELECT
ad_stats_grouped.adID,
ad_stats_grouped.adDate,
COUNT(case when ad_stats_grouped.adDate = Date(Leads.CreatedAt) THEN 1 ELSE 0 END)
FROM `ad_stats_grouped` INNER JOIN `LEADS` ON
ad_stats_grouped.adID = Leads.AdID
GROUP BY adID, adDate;
The problem with your original query is the logic in the COUNT(). This aggregate functions takes in account all non-null values, so it counts 0 and 1s. One solution would be to change COUNT() to SUM().
But I think that the query can be furtermore improved by moving the date condition on the date to the on part of a left join:
select
g.adid,
g.addate,
count(l.adid)
from `ad_stats_grouped` g
left join `leads` l
on g.adid = l.adid
and l.createdat >= g.addate
and l.createdat < g.ad_stats + interval 1 day
group by g.adid, g.addate;

SQL Count on JOIN query is taking forever to execute?

I'm trying to run count query on a 2 table join. e_amazing_client table is having million entries/rows and m_user has just 50 rows BUT count query is taking forever!
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` >= '2018-11-18')) AND (`e`.`id` >= 1)
I don't know what is wrong with this query?
First, I'm guessing that this is sufficient:
SELECT COUNT(*) AS `count`
FROM e_amazing_client e
WHERE e.date_created >= '2018-11-11' AND e.id >= 1;
If user has only 50 rows, I doubt it is creating duplicates. The comparisons on date_created are redundant.
For this query, try creating an index on e_amazing_client(date_created, id).
Maybe you wanted this:
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` <= '2018-11-18')) AND (`e`.`id` >= 1)
to check between dates?
Also, do you really need
AND (`e`.`id` >= 1)
If id is what an id is usually in a table, is there a case to be <1?
Your query is pulling ALL records on/after 2018-11-11 because your WHERE clause is ID >= 1 You have no clause in there for a specific user. You also had in your original query based on a date of >= 2018-11-18. You MAY have meant you only wanted the count WITHIN the week 11/11 to 11/18 where the sign SHOULD have been >= 11-11 and <= 11-18.
As for the count, you are getting ALL people (assuming no entry has an ID less than 1) and thus a count within that date range. If you want it per user as you indicated you need to group by the cx_hc_user_id (user) column to see who has the most, or make the user part of the WHERE clause to get one person.
SELECT
e.cx_hc_user_id,
count(*) countPerUser
from
e_amazing_client e
WHERE
e.date_created >= '2018-11-11'
AND e.date_created <= '2018-11-18'
group by
e.cx_hc_user_id
You can order by the count descending to get the user with the highest count, but still not positive what you are asking.

Select most recent record grouped by 3 columns

I am trying to return the price of the most recent record grouped by ItemNum and FeeSched, Customer can be eliminated. I am having trouble understanding how I can do that reasonably.
The issue is that I am joining about 5 tables containing hundreds of thousands of rows to end up with this result set. The initial query takes about a minute to run, and there has been some trouble with timeout errors in the past. Since this will run on a client's workstation, it may run even slower, and I have no access to modify server settings to increase memory / timeouts.
Here is my data:
Customer Price ItemNum FeeSched Date
5 70.75 01202 12 12-06-2017
5 70.80 01202 12 06-07-2016
5 70.80 01202 12 07-21-2017
5 70.80 01202 12 10-26-2016
5 82.63 02144 61 12-06-2017
5 84.46 02144 61 06-07-2016
5 84.46 02144 61 07-21-2017
5 84.46 02144 61 10-26-2016
I don't have access to create temporary tables, or views and there is no such thing as a #variable in C-tree, but in most ways it acts like MySql. I wanted to use something like GROUP BY ItemNum, FeeSched and select MAX(Date). The issue is that unless I put Price into the GROUP BY I get an error.
I could run the query again only selecting ItemNum, FeeSched, Date and then doing an INNER JOIN, but with the query taking a minute to run each time, it seems there is a better way that maybe I don't know.
Here is my query I am running, it isn't really that complicated of a query other than the amount of data it is processing. Final results are about 50,000 rows. I can't share much about the database structure as it is covered under an NDA.
SELECT DISTINCT
CustomerNum,
paid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.primfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
UNION ALL
SELECT DISTINCT
CustomerNum,
secpaid as Price,
ItemNum,
n.pdate as newest
from admin.fullproclog as f
INNER JOIN (
SELECT
id,
itemId,
MAX(TO_CHAR(pdate, 'MM-DD-YYYY')) as pdate
from admin.fullproclog
WHERE pdate > timestampadd(sql_tsi_year, -3, NOW())
group by id, itemId
) as n ON n.id = f.id AND n.itemId = f.itemId AND n.pdate = f.pdate
LEFT join (SELECT itemId AS linkid, ItemNum FROM admin.itemlist) AS codes ON codes.linkid = f.itemId AND ItemNum >0
INNER join (SELECT DISTINCT parent_id,
MAX(ins1.feesched) as CustomerNum
FROM admin.customers AS p
left join admin.feeschedule AS ins1
ON ins1.feescheduleid = p.secfeescheduleid
left join admin.group AS c1
ON c1.insid = ins1.feesched
WHERE status =1
GROUP BY parent_id)
AS ip ON ip.parent_id = f.parent_id
WHERE CustomerNum >0 AND ItemNum >0
I feel it quite simple when I'd read the first three paragraphs, but I get a little confused when I've read the whole question.
Whatever you have done to get the data posted above, once you've got the data like that it's easy to retrive "the most recent record grouped by ItemNum and FeeSched".
How to:
Firstly, sort the whole result set by Date DESC.
Secondly, select fields you need from the sorted result set and group by ItemNum, FeeSched without any aggregation methods.
So, the query might be something like this:
SELECT t.Price, t.ItemNum, t.FeeSched, t.Date
FROM (SELECT * FROM table ORDER BY Date DESC) AS t
GROUP BY t.ItemNum, t.FeeSched;
How it works:
When your data is grouped and you select rows without aggregation methods, it will only return you the first row of each group. As you have sorted all rows before grouping, so the first row would exactly be "the most recent record".
Contact me if you got any problems or errors with this approach.
You can also try like this:
Select Price, ItemNum, FeeSched, Date from table where Date IN (Select MAX(Date) from table group by ItemNum, FeeSched,Customer);
Internal sql query return maximum date group by ItemNum and FeeSched and IN statement fetch only the records with maximum date.

finding change between records in MySQL

I have a table where I am storing the stored number of barrels inside of many tanks. I am storing values here every night at midnight, and at the beggining and end of any operator initiated transfer.
What I want to return is the number of barrels difference since the previous event record for that specific tank. I have the correct ID for the self join to get the previous record number, however the barrels is incorrect.
Here is what I currently have.
SELECT
inventory.id,
MAX(inventory2.id) AS id2,
inventory.tankname,
inventory.barrels,
inventory.eventstamp,
inventory2.barrels
FROM
inventory
LEFT JOIN
inventory inventory2 ON inventory2.tankname = inventory.tankname AND inventory2.eventstamp < inventory.eventstamp
GROUP BY
inventory.id,
inventory.tankname,
inventory.barrels,
inventory.eventstamp
ORDER BY
inventory.tankname,
inventory.eventstamp
That returns the following
Just use correlated subqueries:
SELECT i.*,
(SELECT i2.id
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_id,
(SELECT i2.barrels
FROM inventory i2
WHERE i2.tankname = i.tankname AND
i2.eventstamp < i.eventstamp
ORDER BY i2.eventstamp DESC
LIMIT 1
) as prev_barrels
FROM inventory i
ORDER BY i.tankname, i.eventstamp;
Your query doesn't work because you have columns in the SELECT that are not in the GROUP BY and are not aggregated. That shouldn't be allowed in any database; it is unfortunate that MySQL does allow it.

optimize Mysql: get latest status of the sale

In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.