Find duplicate value in the table via sql

Find duplicate value in the table via sql - mysql

I have a table called stocktransfer
Rows truncated to keep it simple here.
as per the image the problem with the record is that there is duplicated transaction number across two different invoice number which is incorrect to context of how business logic is.
So duplicate transaction is expected as long as it is under the same invoice number.
I wrote this query but it does not help since the duplication is expected.
Select strefno,sttran,STDATE,count(sttran)
From postfrh
Group By sttran,strefno,STDATE
Having Count(sttran) >1
Order By sttran
Can anyone please help with how to write a logic to find duplicated transaction where invoice numbers are different two.

strefno > TransctionNumber
sttran > InvoiceNumber
STDATE > date
SELECT strefno,
sttran,
STDATE,
row_number ( )
OVER ( PARTITION BY strefno
ORDER BY STDATE ) AS `rowNumber`
FROM postfrh
WHERE strefno IN
(SELECT strefno
FROM postfrh
GROUP BY strefno
HAVING count( sttran ) > 1 )
ORDER BY strefno;

You are probably looking for something like this. I don't have the exact table so I cannot be sure.
select a.tnum
from postfrh as a
, postfrh as b
where a.tnum = b.tnum
and b.inum != a.inum
(tnum = transaction number, inum = invoice number)
There are several ways to approach the problem but the above query works by joining two instances of the table, the first condition in the where clause means that there will only be items with the same transaction number, the second statement filters out transactions that have the same transaction number and invoice number.

Related

MySql: Select value of current and next row

I am trying to learn of a better way in achieving the desired result of a select query - details below, thank you in advance.
MySQL version: 5.7
Table:
id int(11)
product_number int(8)
service_group int (4)
datetime datetime
value int (6)
Indexes on all but value column.
MySql table has the following data:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,0
2,1234,1,2022-02-10 00:01:30,25
3,1234,1,2022-02-10 00:02:30,11
4,1234,2,2022-02-10 01:00:30,0
5,1234,2,2022-02-10 01:01:30,65
6,1234,2,2022-02-10 01:02:30,55
In essence, the value for each product within the service group is wrongly recorded, and the correct value for the "current" row is actually recorded against the next row for the product within the same service group - correct output should look like this:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,25
2,1234,1,2022-02-10 00:01:30,11
3,1234,1,2022-02-10 00:02:30,0
4,1234,2,2022-02-10 01:00:30,65
5,1234,2,2022-02-10 01:01:30,55
6,1234,2,2022-02-10 01:02:30,0
The below query is what seems to be hugely inefficient way of returning the correct results - what would be a better way to go about this in MySql? Thank you.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(
Select b.value FROM products b
Where b.product_number=a.product_number AND b.service_group=a.service_group
AND b.datetime>a.datetime
Order by b.datetime ASC
Limit 1
)
FROM products a```

If there's no skipped id (the number is in sequence) then you could probably use simple select like below
1.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id = a.id+1)
FROM products a
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
b.value
FROM products a
INNER JOIN products b ON b.id = a.id+1
Note that both SQL 1 and 2 is assuming your ID is primary key as I see that's an incrementing value
Either way you need to run an explain query so you could analyze which one is the most efficient one
And more importantly I suggest to update it if it's "wrongly recorded", you should put the your service on maintenance mode and do update+fix on the data using query
Edit:
based on your comment "Hi, Gunawan. Thank you for your suggestion. Unfortunately IDs will not be in sequences to support the proposed approach."
You could alter the subquery on (1) a bit to
Select b.value
FROM products b
Where b.id > a.id order by id asc limit 1
so it became
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id > a.id order by b.id asc limit 1)
FROM products a

I think what you need in THIS case is the Windows LEAD() function and can be found Here for example and clarification
In summary, LEAD() looks at the NEXT possible record for the given column in question, LAG() looks at the prior.
So in this example, I am asking for the LEAD() of the record (the next one in line) and getting the VALUE column of that record. The 1 represents how many records to skip ahead, in this case 1. The last parameter 0 is what should be returned if no such record exists.
The second half of the clause with the ORDER BY clause identifies how you want the records returned, in this case the datetime sequence.
I included both the value and the NEXTVALUE so you could see the results, but can remove the extra column once you see and understand how it works.
But since you have each service group to its own, and dont want to carry-over the value from another, you need the PARTITION clause as well. So I added that as an additional column... both so you can see how the results work with OR without it and the impact of the query you need.
select
t.id,
t.product_number,
t.service_group,
t.datetime,
t.value,
-- lead (t.value, 1, 0)
-- over (order by t.datetime) as NextValue,
-- without the ,1, 0 in the lead call as sample above
-- you can see the NULLs are not getting the value
-- from the next row when the service group changes.
lead (t.value)
over ( PARTITION BY
t.service_group
order by
t.datetime) as NextValuePerServiceGroup
from
Tmp1 t
order by
t.service_group,
t.datetime

MySQL - Find Orders placed by repeat customers vs. new customers over time

I have an orders table that contains the orders_id, customers_email_address and date_purchased. I want to write a SQL query that will, for each line in the table, add a new field called 'repeat_order_count' that shows how many times this customer ordered before and including this order.
For example, if John ordered once before this order, the repeat_order_count would be 2 for this order, or in other words, this is the second time John has ordered. The next order row I encounter for John will have a 3, and so on. This will allow me to create a line graph that shows the number of orders placed by repeat customers over time. I can now go to a specific time in the past and figure out how many orders were placed by repeat customers during that time period:
SELECT
*
FROM orders
WHERE repeat_order_count > 1
WHERE date_purchased = January 2014 --(simplifying things here)
I'm also able to determine now WHEN a customer became a repeat customer.
I can't figure out the query to solve this. Or perhaps there may be an easier way to do this?

One approach to retrieving the specified result would be to use a correlated subquery in the SELECT list. This assumes that the customer identifier is customers_email_address, and that date_purchased is a DATETIME or TIMESTAMP (or other canonical format), and that there are no duplicated values for the same customer (that is, the customer doesn't have two or more orders with the same date_purchased value.)
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, ( SELECT COUNT(1)
FROM orders p
WHERE p.customers_email_address = s.customers_email_address
AND p.date_purchased < s.date_purchased
) AS previous_order_count
FROM orders s
ORDER
BY s.customers_email_address
, s.date_purchased
The correlated subquery will return 0 for the "first" order for a customer, and 1 for the "second" order. If you want to include the current order in the count, replace the < comparison operator with <= operator.
FOLLOWUP
For performance of that query, we need to be particulary concerned with the performance of the correlated subquery, since that is going to be executed for every row in the table. (A million rows in the table means a million executions of that query.) Having a suitable index available is going to be crucial.
For the query in my answer, I'd recommend trying an index like this:
ON orders (customers_email_address, date_purchased, orders_id)
With that index in place, we'd expect EXPLAIN to show the index being used by both the outer query, to satisfy the ORDER BY (No "Using filesort" in the Extra column), and as a covering index (no lookups to the pages in the underlying table, "Using index" shown in the Extra column.)
The answer I gave demonstrated just one approach. It's also possible to return an equivalent result using a join pattern, for example:
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, COUNT(p.orders_id)
FROM orders s
JOIN orders p
ON p.customers_email_address = s.customers_email_address
AND p.date_purchased <= s.date_purchased
GROUP
BY s.customers_email_address
, s.date_purchased
, s.orders_id
ORDER
BY s.customers_email_address
, s.date_purchased
, s.orders_id
(This query is based on some additional information provided in a comment, which wasn't available before: orders_id is UNIQUE in the orders table.)
If we are guaranteed that orders_id of a "previous" order is less than the orders_id of a previous order, then it would be possible to use that column in place of the date_purchased column. We'd want a suitable index available:
... ON orders (customers_email_address, orders_id, date_purchased)
NOTE: The order of the columns in the index is important. With that index, we could do:
SELECT s.orders_id
, s.customers_email_address
, s.date_purchased
, COUNT(p.orders_id)
FROM orders s
JOIN orders p
ON p.customers_email_address = s.customers_email_address
AND p.orders_id <= s.orders_id
GROUP
BY s.customers_email_address
, s.orders_id
ORDER
BY s.customers_email_address
, s.orders_id
Again, we'd want to review the output from EXPLAIN to verify that the index is being used for both the join operation and the GROUP BY operation.
NOTE: With the inner join, we need to use a <= comparison, so we get at least one matching row back. We could either subtract 1 from that result, if we wanted a count of only "previous" orders (not counting the current order), or we could use an outer join operation with a < comparison, so we could get a row back with a count of 0.

when you are inserting into your orders table, for the column you have for your OrderCount you use a co-related sub-query.
eg:
select
col1,
col2,
(isnull((select count(*) from orders where custID = #currentCustomer),0) + 1),
col4
Note that you wouldn't be adding the field when the 2nd order is processed, the field would already exist and you would just be populating it.

Suggest an optimised mysql query

I have table with user transactions.I need to select users who made total transactions more than 100 000 in a single day.Currently what I'm doing is gather all user ids and execute
SELECT sum ( amt ) as amt from users where date = date("Y-m-d") AND user_id=id;
for each id and checking weather the amt > 100k or not.
Since it's a large table, it's taking lot of time to execute.Can some one suggest an optimised query ?

This will do:
SELECT sum ( amt ) as amt, user_id from users
where date = date("Y-m-d")
GROUP BY user_id
HAVING sum ( amt ) > 1; ' not sure what Lakh is

What about filtering the record 1st and then applying sum like below
select SUM(amt),user_id from (
SELECT amt,user_id from users where user_id=id date = date("Y-m-d")
)tmp
group by user_id having sum(amt)>100000

What datatype is amt? If it's anything but a basic integral type (e.g. int, long, number, etc.) you should consider converting it. Decimal types are faster than they used to be, but integral types are faster still.
Consider adding indexes on the date and user_id field, if you haven't already.
You can combine the aggregation and filtering in a single query...
SELECT SUM(Amt) as amt
FROM users
WHERE date=date(...)
AND user_id=id
GROUP BY user_id
HAVING amt > 1

The only optimization that can be done in your query is by applying primary key on user_id column to speed up filtering.
As far as other answers posted which say to apply GROUP BY on filtered records, it won't have any effect as WHERE CLAUSE is executed first in SQL logical query processing phases.
Check here

You could use MySql sub-queries to let MySql handle all the iterations. For example, you could structure your query like this:
select user_data.user_id, user_data.total_amt from
(
select sum(amt) as total_amt, user_id from users where date = date("Y-m-d") AND user_id=id
) as user_data
where user_data.total_amt > 100000;

mysql displaying grouped columns base on condition

I am working on a query that needs to output 'total engagements' by users in columns like 1 -eng column will display users who have one engagements, second column 2-eng which will display users who have done 2 engagements. Likewise 3eng, and so on. Note that the display should be like this. I have a engagement table which has userID. So I get distinct users like this
select count(distinct userID) from engagements
and I get engagements as
select count(*) from engagements
Engagements here refers to users who have either liked,replied,or shared the content
Please help. Thanks! I have used CASE and IF but unable to display in the below form
1eng 2eng 3eng
100 200 100

Consider returning the results in rows and pivoting them afterwards in your application.
To return the desired results in rows, you could use the following query:
SELECT
engagementCount,
COUNT(*) AS userCount
FROM (
SELECT
userID,
COUNT(*) AS engagementCount
FROM engagements
GROUP BY userID
) AS s
GROUP BY engagementCount
;
Basically, you first group the engagements rows by userID and get the row counts per userID. Afterwards, you use the counts as the grouping criterion and count how many users were found with that count.
If you insist on returning the columnar view in SQL, you'll need to resort to dynamic SQL because of the indefinite number of columns in the final result set. You'd probably need to store the results of the inner SELECT temporarily, scan it to build the list of count expressions for every engagementCount value and ultimately construct a query of this kind:
SELECT
COUNT(engagementCount = 1 OR NULL) AS `1eng`,
COUNT(engagementCount = 2 OR NULL) AS `2eng`,
COUNT(engagementCount = 3 OR NULL) AS `3eng`,
...
FROM temporary_storage
;
Or SUM(engagementCount = value) instead COUNT(engagementCount = value OR NULL). (For me, the latter expresses the intention more explicitly, hence why I've suggested it first, but, in case you happen to prefer the SUM technique, there should be no discernible difference in performance between the two. The OR NULL trick is explained here.)

MySQL INSERT/SELECT subquery syntax

Just can't wrap my head around the proper syntax for this one. Below is my query, with a plain english explanation of my subquery, in the spot where I think I'd want it to execute.
mysql_query("INSERT INTO donations(
tid,
email,
amount,
ogrequest,
total
)
VALUES (
'".esc($p->ipn_data['txn_id'])."',
'".esc($p->ipn_data['pay_email'])."',
".(float)$amount.",
'".esc(http_build_query($_POST))."',
Here I want to select the row with the max date, get the value of the "total" column in that row, and add $amount to that value to form the new "total" for my newly inserted row.
)");
Can anyone help a bro out?

The real answer is you should not be storing the total in a column in this table. It isn't really any useful information. What you should be storing is the current date, and then calculating the total via SUM and GROUP BY. If it's something that you need to access often, then cache the value elsewhere.
Why do you need the total in any of the rows before the last one? It is just wasted data, and it can be easily regenerated from the table.
Why do you want to store the total in this column. What value does this data add to your schema? The important thing to note here is that the total is NOT a property of the individual transaction. The total is a property of an aggregated subset of individual transactions.
Also - make sure you are using DECIMAL and not FLOAT for your monetary column types in MySQL if you aren't. FLOAT values could result in rounding errors depending on what you are doing, which is something there is no reason to risk when money is involved.

I don't have access to a MySQL server to verify what I created, but try this:
INSERT INTO donations
(
tid,
email,
amount,
ogrequest,
total
)
SELECT
'".esc($p->ipn_data['txn_id'])."',
'".esc($p->ipn_data['pay_email'])."',
".(float)$amount.",
'".esc(http_build_query($_POST))."',
total + '".esc($amount)."'
FROM
ORDER BY date DESC
LIMIT 1
Instead of using a direct "INSERT INTO (...) VALUES (...)" I used a "INSERT INTO (...) SELECT ...". The SELECT statement retrieves the row with the highest date (ORDER BY date DESC LIMIT 1), then the total field is accessed and added with the value of $amount.

mysql_query("INSERT INTO donations(
tid,
email,
amount,
ogrequest,
total
)
VALUES (
'".esc($p->ipn_data['txn_id'])."',
'".esc($p->ipn_data['pay_email'])."',
".(float)$amount.",
'".esc(http_build_query($_POST))."',
(select max(total) from donations) + ".(float)$amount."
)");

Your subquery could look like this:
SELECT total
FROM donations
WHERE tid = <x>
ORDER BY date DESC
LIMIT 1
This of course requires that you have a date column in your table. If you run this one (without the outer query you already have), it should come back with a single row, single column result containing the value of latest total for tid = <x>.
If there's not already a row for txn = <x> in the table, then it will obviously return no row at all. When used as a subquery for your INSERT statement, you should probably check for NULL and replace it with a numeric 0 (zero). This is what IFNULL() can do for you.
Combining this and what you already have:
mysql_query("INSERT INTO donations(
tid,
email,
amount,
ogrequest,
total
)
VALUES (
'".esc($p->ipn_data['txn_id'])."',
'".esc($p->ipn_data['pay_email'])."',
".(float)$amount.",
'".esc(http_build_query($_POST))."',
IFNULL(SELECT total
FROM donations
WHERE id = ".esc(p->ipn_data[txn_id']."
ORDER BY date DESC
LIMIT 1),0) + ".esc($p->ipn_data['value']
)");

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008