Taking sum from master and child table - mysql

I hope that I am able to explain the situation as much as possible :)
We need to take sum from Master and child records of MySQL tables. The current query is as follows:
select sum(
abs(ifnull(dt.N_RETAIL_PRICE,0 ) * ifnull(dt.N_QTY_NO ,0) )
+ ifnull(st.shipping_total,0 ) + ifnull(st.TaxAmount,0 ) - abs(ifnull(st.discount ,0))
) Total
FROM inv_store_transaction st
inner join inv_store_transaction_det dt
on st.OID = dt.INV_STORE_TRANSACTION
where st.INV_TRANSACTION_TYPE = 35
and st.INV_STORES = 1
The issue what we suspect is that if the detail column has more than 1 row, the columns of master will be summed that many times.
e.g if detail has say 3 rows, then the sum of its relevant master data will also be taken 3 times.
To summarize, we need to take a grand total of all Invoices that fall under the given condition.
Any help appreciated.

The solution to this problem is to pre-aggregate the detail data:
select (sum(dt.amt) + sum((st.shipping_total) + sum(st.TaxAmount) -
sum(abs(st.discount))
) Total
FROM inv_store_transaction st inner join
(select dt.INV_STORE_TRANSACTION,
abs(coalesce(dt.N_RETAIL_PRICE, 0) * coalesce(dt.N_QTY_NO, 0)) as dtamt
from inv_store_transaction_det dt
group by dt.INV_STORE_TRANSACTION
) dt
on st.OID = dt.INV_STORE_TRANSACTION
where st.INV_TRANSACTION_TYPE = 35 and st.INV_STORES = 1
You don't need to test for NULL unless all the rows have a NULL value for one of the columns.

Related

improve sql query with 2 EXISTS sub queries

I have this query (mysql):
SELECT `budget_items`.*
FROM `budget_items`
WHERE (budget_category_id = 4
AND ((is_custom_for_family = 0)
OR (is_custom_for_family = 1
AND custom_item_family_id = 999))
AND ((EXISTS
(SELECT 1
FROM balance_histories
WHERE balance_histories.budget_item_id = budget_items.id
AND balance_histories.family_id = 999
AND payment_date >= '2021-02-01'
AND payment_date <= '2021-02-28' ))
OR (EXISTS
(SELECT 1
FROM budget_lines
WHERE family_id = 999
AND budget_id = 188311
AND budget_item_id = budget_items.id
AND amount > 0))))
It runs multiple times on app start. It takes more than 10 seconds (all of them).
I have indexes on:
balance_histories table: budget_item_id, family_id (tried also payment_date)
budget_lines table: family_id, budget_id, budget_item_id
How can I improve the speed? Query or maybe mysql (8) configuration.
balance_histories table:
budget_lines table:
I would start this query in reverse of what you have. Assuming you COULD have years of data, but your EXISTS query is looking more specifically at a date-range, or specific budget lines, start there, it will probably be much smaller. Once you have DISTINCT IDs, then go back to the budget items by qualified ID PLUS the additional criteria.
To help optimize the queries, I would have indexes on
table index
balance_histories ( family_id, payment_date, budget_item_id )
budget_lines ( family_id, budget_id, amount )
budget_items ( id, budget_category_id, is_custom_for_family, custom_item_family_id )
select
bi.*
from
-- pre-query a list of DISTINCT IDs from the balance history
-- and budget lines that qualify. THEN join to the rest.
( select distinct
bh.budget_item_id id
from
balance_histories bh
where
bh.family_id = 999
AND bh.payment_date >= '2021-02-01'
AND bh.payment_date <= '2021-02-28'
UNION
select
bl.budget_item_id
FROM
budget_lines bl
WHERE
bl.family_id = 999
AND bl.budget_id = 188311
AND bl.amount > 0 ) PQ
JOIN budget_items bi
on PQ.id = bi.id
AND bi.budget_category_id = 4
AND ( bi.is_custom_for_family = 0
OR
( bi.is_custom_for_family = 1
AND bi.custom_item_family_id = 999 )
)
Feedback
As for many SQL queries, there are typically multiple ways to get a solution. Sometimes using EXISTS works well, sometimes not as much. You need to consider cardinality of your data, and that is what I was shooting for. Look at what you were asking for first: Get budget items that are all category for and custom for family is 1 or 0 (which is all), but if family, only those for 999. You were correct on your balance of AND/OR. However, this is going through EVERY RECORD, and if you have millions of rows, that is what you are scanning through. Only after scanning through every row are you now doing a secondary query (for each record that qualified) against the histories for the specific date range OR family/budget.
My guess is that the number of possible records returned from your two EXISTS queries was going to be very small. So, by starting by getting a DISTINCT list of just those IDs that are part of that union would be the very small subset. Once that single "ID" if found, it now becomes a direct match to the budget items table and have the final filtering limits of categoryID / Family / Custom Item considerations.
By having indexes better match the context of your query WHERE clause will optimize pulling data. I have had answers to several other questions with similar resolutions and clarify indexes and why in those... take a look for example, and another here.

SQL Create variables from 2 different tables

I have 2 different tables called observations and intervals.
observations:
id,
type,
date
1 recess 03.05.2011 17:00
2 recess 03.06.2011 12:00
intervals:
id,
observation id,
value
1 1 5
2 1 8
3 2 4
4 2 4
I want a view that will display:
observation_id
percent_positive ((count where value = 5)/(total number of observations))
1 .5
2 0
I know
Select observation_id, Count(*) from intervals where value = 5 Group by
observation_id
will give me:
1 1
1 0
and
Select observation_id, Count(*) from intervals Group by
observation_id
will give me:
1 2
2 2
So how do I combine these to create a view with the percent_positive variable I'm looking for?
You can use joins to fetch data from two tables having a common column field .
For more ,please read it in detail Multiple values from multiple Tables
This gave me your desired result. Not proficient enough in SQL to determine if this is the optimal way of solving the issue though.
SELECT
observation_id as obs,
(SELECT COUNT(*) FROM intervals WHERE observation_id = obs AND value = 5)/(SELECT COUNT(*) FROM INTERVALS WHERE observation_id = obs) as percent
FROM observation
JOIN intervals ON observation.id = intervals.observation_id
GROUP BY observation_id;
SELECT
i.observation_id,
SUM(IF(i.value=5,1,0)) / counts.num as 'percent_positive'
FROM intervals i
inner join (
select observation_id, count(1) as num from intervals group by observation_id
) counts on counts.observation_id = i.observation_id
group by i.observation_id
order by i.observation_id
;
That oughta get you close, can't actually run to test at the moment. I'm not sure about the significance of the value 5 meaning positive, so the i.value=5 test might need to be modified to do what you want. This will only include observation IDs with intervals that refer to them; if you want ALL observations then you'll need to join that table (select from that table and left join the others, to be precise). Of course the percentage for those IDs will be 0 divided by 0 anyway...

Number duplicate records on the MySQL table

Have a table with similar schema
id control code amount
1 200 12 300
2 400 12 300
3 200 12 300
4 100 10 400
5 100 10 400
6 500 13 500
Trying to list the duplicates of records on a UI.
Using following query I can retrieve the duplicate records and show it on UI.
select * from mwt group by control,code,amount having count(id) > 1;
id control code amount
1 200 12 300
4 100 10 400
Here the records with id 1 and 4 are duplicates of 3 and 5 respectively.
On the UI, the user will click a check-box adjacent to the record and corresponding duplicate records should be populate to the UI. To make things easier trying to populate another column named dup_id. Using this dup_id it is possible to filter the results from UI , which is in the JSON format.
How to create a result set similar to the one shown below?
id control code amount dup_id
1 200 12 300 1
2 400 12 300
3 200 12 300 1
4 100 10 400 4
5 100 10 400 4
6 500 13 500
This seems like a simpler solution than that suggested by #kickstarter - but maybe I've misunderstood the requirement...
SELECT x.*
, y.dup_id
FROM my_table x
LEFT
JOIN
( SELECT MIN(id) dup_id
, control
, code
, amount
FROM my_table
GROUP
BY control
, code
, amount
HAVING COUNT(*) > 1
) y
ON y.control = x.control
AND y.code = x.code
AND y.amount = x.amount;
Depending on how accurate the order has to be, you could do something like this.
This is getting all the unique control / code / amount with a count, to get a flag to know if that is a duplicate row, and ordered by control / code / amount so that they are in order. It does a cross join to initialise a few user variables.
Then it calculates a counter, only incrementing it if any of control / code / amount have changed AND it is a duplicate row. Then sets user variables to store the previous values of control / code / amount.
The outer query then orders the results back in to id order.
SELECT sub3.id,
sub3.control,
sub3.code,
sub3.amount,
sub3.dup_id
FROM
(
SELECT sub2.id,
sub2.control,
sub2.code,
sub2.amount,
#cnt:=IF(#control=control AND #code=code AND #amount=amount AND sub2.id_count IS NOT NULL, #cnt, IF(sub2.id_count IS NULL, #cnt, #cnt + 1)),
#control:=control,
#code:=code,
#amount:=amount,
IF(sub2.id_count IS NULL, NULL, #cnt) AS dup_id
FROM
(
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub1.id_count
FROM mwt
LEFT OUTER JOIN
(
SELECT control, code, amount, COUNT(id) AS id_count
FROM mwt
GROUP BY control,code,amount
HAVING id_count > 1
) sub1
ON mwt.control = sub1.control
AND mwt.code = sub1.code
AND mwt.amount = sub1.amount
ORDER BY mwt.control, mwt.code, mwt.amount
) sub2
CROSS JOIN
(
SELECT #cnt:=0, #control:=0, #code:=0, #amount:=0
) sub0
) sub3
ORDER BY id
Note that this is ordering by control, code and amount, so not an exact match for your required output (which would require getting the first duplicates ordered by id first).
EDIT - Simpler and better way to do it. This gets all the duplicate rows with the min id for those duplicates (ordered by the min id), and uses a user variable to add a sequence number for those. Then LEFT OUTER JOINs that back against the main table to put that sequence number in all the matching rows.
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub2.dup_id
FROM mwt
LEFT OUTER JOIN
(
SELECT sub1.id, sub1.control, sub1.code, sub1.amount, #cnt:=#cnt+1 AS dup_id
FROM
(
SELECT MIN(id) AS id, control, code, amount
FROM mwt
GROUP BY control,code,amount
HAVING COUNT(id) > 1
ORDER BY id
) sub1
CROSS JOIN
(
SELECT #cnt:=0
) sub0
) sub2
ON mwt.control = sub2.control
AND mwt.code = sub2.code
AND mwt.amount = sub2.amount
ORDER BY mwt.id
Would you need a dup_id column ?. I hope this can be achieved with a simple query like below
select id
, control
, code
, amount
from table
where control = from selected Record
and code = from selected Record
and amount = from selected Record
and id not equals from selected Record
You can very well omit the last not equals if the requirement is to list down duplicates including the selected record.

MySQL GROUP BY DateTime +/- 3 seconds

Suppose I have a table with 3 columns:
id (PK, int)
timestamp (datetime)
title (text)
I have the following records:
1, 2010-01-01 15:00:00, Some Title
2, 2010-01-01 15:00:02, Some Title
3, 2010-01-02 15:00:00, Some Title
I need to do a GROUP BY records that are within 3 seconds of each other. For this table, rows 1 and 2 would be grouped together.
There is a similar question here: Mysql DateTime group by 15 mins
I also found this: http://www.artfulsoftware.com/infotree/queries.php#106
I don't know how to convert these methods into something that will work for seconds. The trouble with the method on the SO question is that it seems to me that it would only work for records falling within a bin of time that starts at a known point. For instance, if I were to get FLOOR() to work with seconds, at an interval of 5 seconds, a time of 15:00:04 would be grouped with 15:00:01, but not grouped with 15:00:06.
Does this make sense? Please let me know if further clarification is needed.
EDIT: For the set of numbers, {1, 2, 3, 4, 5, 6, 7, 50, 51, 60}, it seems it might be best to group them {1, 2, 3, 4, 5, 6, 7}, {50, 51}, {60}, so that each grouping row depends on if the row is within 3 seconds of the previous. I know this changes things a bit, I'm sorry for being wishywashy on this.
I am trying to fuzzy-match logs from different servers. Server #1 may log an item, "Item #1", and Server #2 will log that same item, "Item #1", within a few seconds of server #1. I need to do some aggregate functions on both log lines. Unfortunately, I only have title to go on, due to the nature of the server software.
I'm using Tom H.'s excellent idea but doing it a little differently here:
Instead of finding all the rows that are the beginnings of chains, we can find all times that are the beginnings of chains, then go back and ifnd the rows that match the times.
Query #1 here should tell you which times are the beginnings of chains by finding which times do not have any times below them but within 3 seconds:
SELECT DISTINCT Timestamp
FROM Table a
LEFT JOIN Table b
ON (b.Timestamp >= a.TimeStamp - INTERVAL 3 SECONDS
AND b.Timestamp < a.Timestamp)
WHERE b.Timestamp IS NULL
And then for each row, we can find the largest chain-starting timestamp that is less than our timestamp with Query #2:
SELECT Table.id, MAX(StartOfChains.TimeStamp) AS ChainStartTime
FROM Table
JOIN ([query #1]) StartofChains
ON Table.Timestamp >= StartOfChains.TimeStamp
GROUP BY Table.id
Once we have that, we can GROUP BY it as you wanted.
SELECT COUNT(*) --or whatever
FROM Table
JOIN ([query #2]) GroupingQuery
ON Table.id = GroupingQuery.id
GROUP BY GroupingQuery.ChainStartTime
I'm not entirely sure this is distinct enough from Tom H's answer to be posted separately, but it sounded like you were having trouble with implementation, and I was thinking about it, so I thought I'd post again. Good luck!
Now that I think that I understand your problem, based on your comment response to OMG Ponies, I think that I have a set-based solution. The idea is to first find the start of any chains based on the title. The start of a chain is going to be defined as any row where there is no match within three seconds prior to that row:
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
Now we can assume that any non-chain starters belong to the chain starter that appeared before them. Since MySQL doesn't support CTEs, you might want to throw the above results into a temporary table, as that would save you the multiple joins to the same subquery below.
SELECT
SQ1.my_id,
COUNT(*) -- You didn't say what you were trying to calculate, just that you needed to group them
FROM
(
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
) SQ1
INNER JOIN My_Table MT3 ON
MT3.title = SQ1.title AND
MT3.my_time >= SQ1.my_time
LEFT OUTER JOIN
(
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
) SQ2 ON
SQ2.title = SQ1.title AND
SQ2.my_time > SQ1.my_time AND
SQ2.my_time <= MT3.my_time
WHERE
SQ2.my_id IS NULL
This would look much simpler if you could use CTEs or if you used a temporary table. Using the temporary table might also help performance.
Also, there will be issues with this if you can have timestamps that match exactly. If that's the case then you will need to tweak the query slightly to use a combination of the id and the timestamp to distinguish rows with matching timestamp values.
EDIT: Changed the queries to handle exact matches by timestamp.
Warning: Long answer. This should work, and is fairly neat, except for one step in the middle where you have to be willing to run an INSERT statement over and over until it doesn't do anything since we can't do recursive CTE things in MySQL.
I'm going to use this data as the example instead of yours:
id Timestamp
1 1:00:00
2 1:00:03
3 1:00:06
4 1:00:10
Here is the first query to write:
SELECT a.id as aid, b.id as bid
FROM Table a
JOIN Table b
ON (a.Timestamp is within 3 seconds of b.Timestamp)
It returns:
aid bid
1 1
1 2
2 1
2 2
2 3
3 2
3 3
4 4
Let's create a nice table to hold those things that won't allow duplicates:
CREATE TABLE
Adjacency
( aid INT(11)
, bid INT(11)
, PRIMARY KEY (aid, bid) --important for later
)
Now the challenge is to find something like the transitive closure of that relation.
To do so, let's find the next level of links. by that I mean, since we have 1 2 and 2 3 in the Adjacency table, we should add 1 3:
INSERT IGNORE INTO Adjacency(aid,bid)
SELECT adj1.aid, adj2.bid
FROM Adjacency adj1
JOIN Adjacency adj2
ON (adj1.bid = adj2.aid)
This is the non-elegant part: You'll need to run the above INSERT statement over and over until it doesn't add any rows to the table. I don't know if there is a neat way to do that.
Once this is over, you will have a transitively-closed relation like this:
aid bid
1 1
1 2
1 3 --added
2 1
2 2
2 3
3 1 --added
3 2
3 3
4 4
And now for the punchline:
SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
FROM Adjacency
GROUP BY aid
returns:
aid Neighbors
1 1,2,3
2 1,2,3
3 1,2,3
4 4
So
SELECT DISTINCT Neighbors
FROM (
SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
FROM Adjacency
GROUP BY aid
) Groupings
returns
Neighbors
1,2,3
4
Whew!
I like #Chris Cunningham's answer, but here's another take on it.
First, my understanding of your problem statement (correct me if I'm wrong):
You want to look at your event log as a sequence, ordered by the time of the event,
and partitition it into groups, defining the boundary as being an interval of
more than 3 seconds between two adjacent rows in the sequence.
I work mostly in SQL Server, so I'm using SQL Server syntax. It shouldn't be too difficult to translate into MySQL SQL.
So, first our event log table:
--
-- our event log table
--
create table dbo.eventLog
(
id int not null ,
dtLogged datetime not null ,
title varchar(200) not null ,
primary key nonclustered ( id ) ,
unique clustered ( dtLogged , id ) ,
)
Given the above understanding of the problem statement, the following query should give you the upper and lower bounds your groups. It's a simple, nested select statement with 2 group by to collapse things:
The innermost select defines the upper bound of each group. That upper boundary defines a group.
The outer select defines the lower bound of each group.
Every row in the table should fall into one of the groups so defined, and any given group may well consist of a single date/time value.
[edited: the upper bound is the lowest date/time value where the interval is more than 3 seconds]
select dtFrom = min( t.dtFrom ) ,
dtThru = t.dtThru
from ( select dtFrom = t1.dtLogged ,
dtThru = min( t2.dtLogged )
from dbo.EventLog t1
left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
and datediff(second,t1.dtLogged,t2.dtLogged) > 3
group by t1.dtLogged
) t
group by t.dtThru
You could then pull rows from the event log and tag them with the group to which they belong thus:
select *
from ( select dtFrom = min( t.dtFrom ) ,
dtThru = t.dtThru
from ( select dtFrom = t1.dtLogged ,
dtThru = min( t2.dtLogged )
from dbo.EventLog t1
left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
and datediff(second,t1.dtLogged,t2.dtLogged) > 3
group by t1.dtLogged
) t
group by t.dtThru
) period
join dbo.EventLog t on t.dtLogged >= period.dtFrom
and t.dtLogged <= coalesce( period.dtThru , t.dtLogged )
order by period.dtFrom , period.dtThru , t.dtLogged
Each row is tagged with its group via the dtFrom and dtThru columns returned. You could get fancy and assign an integral row number to each group if you want.
Simple query:
SELECT * FROM time_history GROUP BY ROUND(UNIX_TIMESTAMP(time_stamp)/3);

sql query to get duplicate records with different dates

I need to get records with different date field ,
table Sites:
field id
reference
created
Every day we add lot of records, so I need to do a function that extract all records existing with duplicates of rows just was added, to do some notifications.
the conditions that i can't get is the difference between records of the current day and the old data in the table should be (one day to 4 days) .
If is there any simple query to do that without using transaction .
I'm not sure I totally understand what you mean by duplicate records, but here's a basic date query:
SELECT fieldId, reference, created, DATE(created) as the_date
FROM Sites
WHERE the_date
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 3 DAY ) )
AND DATE ( NOW() )
I'm making several assumptions such as:
You don't want the "first" row returned
Duplicates don't carry the
date forward (The next after initial 4 days is not a duplicate)
The 4 days means +4 days so Day 5 is included
So, my code is :
with originals as (
select s1.*
from sites as s1
where 0 = (
select count(*)
from sites as s2
where s1.field_id = s2.field_id
and s1.reference = s2.reference
and s1.created <> s2.created
and DATEDIFF(DAY,s2.created, s1.created) between 1 and 4
)
)
select s1.*
from sites as s1
inner join originals as o
on s1.field_id = o.field_id
and s1.reference = o.reference
and s1.created <> o.created
where DATEDIFF(DAY,o.created, s1.created) between 1 and 4
order by 1,2,3;
Here it is in a fiddle: http://sqlfiddle.com/#!3/9b407/20
This could be simpler if some conditions are relaxed.
thanks a lot for every one who tried to help me ,
i have found this solution after lot of test
SELECT `id`,`reference`,count(`config_id`) as c,`created` FROM `sites`
where datediff(date(current_date()),date(`created`)) < 4
group by `reference`
having c > 1
thanks a lot for your help