SQL Query Still having duplicates after group by - mysql

SELECT *
FROM `eBayorders`
WHERE (`OrderIDAmazon` IS NULL
OR `OrderIDAmazon` = "null")
AND `Flag` = "True"
AND `TYPE` = "GROUP"
AND (`Carrier` IS NULL
OR `Carrier` = "null")
AND LEFT(`SKU`, 1) = "B"
AND datediff(now(), `TIME`) < 4
AND (`TrackingInfo` IS NULL
OR `TrackingInfo` = "null")
AND `STATUS` = "PROCESSING"
GROUP BY `Name`,
`SKU`
ORDER BY `TIME` ASC LIMIT 7
I am trying to make sure that none of the names and skus will show up in the same result. I am trying to group by name and then sku, however I ran into the problem where a result showed up that has the same name and different skus, which I dont want to happen. How can I fix this query to make sure that there is always distinct names and skus in the result set?!
For example say I have an Order:
Name: Ben Z, SKU : B000334, oldest
Name: Ben Z, SKU : B000333, second oldest
Name: Will, SKU: B000334, third oldest
Name: John, SKU: B000036, fourth oldest
The query should return only:
Name: Ben Z, SKU : B000334, oldest
Name: John, SKU: B000036, fourth oldest
This is because all of the Names should only have one entry in the set along with SKU.

There are two problems here.
The first is the ANSI standard says that if you have a GROUP BY clause, the only things you can put in the SELECT clause are items listed in GROUP BY or items that use an aggregate function (SUM, COUNT, MAX, etc). The query in your question selects all the columns in the table, even those not in the GROUP BY. If you have multiple records that match a group, the table doesn't know which record to use for those extra columns.
MySql is dumb about this. A sane database server would throw an error and refuse to run that query. Sql Server, Oracle and Postgresql will all do that. MySql will make a guess about which data you want. It's not usually a good idea to let your DB server make guesses about data.
But that doesn't explain the duplicates... just why the bad query runs at all. The reason you have duplicates is that you group on both Name and SKU. So, for example, for Ben Z's record you want to see just the oldest SKU. But when you group on both Name and SKU, you get a seperate group for { Ben Z, B000334 } and { Ben Z, B000333 }... that's two rows for Ben Z, but it's what the query asked for, since SKU is also part of what determines a group.
If you only want to see one record per person, you need to group by just the person fields. This may mean building that part of the query first, to determine the base record set you need, and then JOINing to this original query as part of your full solution.

SELECT T1.*
FROM eBayorders T1
JOIN
( SELECT `Name`,
`SKU`,
max(`TIME`) AS MAX_TIME
FROM eBayorders
WHERE (`OrderIDAmazon` IS NULL OR `OrderIDAmazon` = "null") AND `Flag` = "True" AND `TYPE` = "GROUP" AND (`Carrier` IS NULL OR `Carrier` = "null") AND LEFT(`SKU`, 1) = "B" AND datediff(now(), `TIME`) < 4 AND (`TrackingInfo` IS NULL OR `TrackingInfo` = "null") AND `STATUS` = "PROCESSING"
GROUP BY `Name`,
`SKU`) AS dedupe ON T1.`Name` = dedupe.`Name`
AND T1.`SKU` = dedupe.`SKU`
AND T1.`Time` = dedupe.`MAX_TIME`
ORDER BY `TIME` ASC LIMIT 7
Your database platform should have complained because your original query had items in the select list which were not present in the group by (generally not allowed). The above should resolve it.
An even better option would be the following if your database supported window functions (MySQL doesn't, unfortunately):
SELECT *
FROM
( SELECT *,
row_number() over (partition BY `Name`, `SKU`
ORDER BY `TIME` ASC) AS dedupe_rank
FROM eBayorders
WHERE (`OrderIDAmazon` IS NULL OR `OrderIDAmazon` = "null") AND `Flag` = "True" AND `TYPE` = "GROUP" AND (`Carrier` IS NULL OR `Carrier` = "null") AND LEFT(`SKU`, 1) = "B" AND datediff(now(), `TIME`) < 4 AND (`TrackingInfo` IS NULL OR `TrackingInfo` = "null") AND `STATUS` = "PROCESSING" ) T
WHERE dedupe_rank = 1
ORDER BY T.`TIME` ASC LIMIT 7

You are trying to obtain a result set which doesn't have repeats in either the SKU nor the Name column.
You might have to add a subquery to your query, to accomplish that. The inner query would group by Name, and the Outer query would group by SKU, such that you won't have repeats in either column.
Try this :
SELECT *
FROM
(SELECT *
FROM eBayorders
WHERE (`OrderIDAmazon` IS NULL
OR `OrderIDAmazon` = "null")
AND `Flag` = "True"
AND `TYPE` = "GROUP"
AND (`Carrier` IS NULL
OR `Carrier` = "null")
AND LEFT(`SKU`, 1) = "B"
AND datediff(now(), `TIME`) < 4
AND (`TrackingInfo` IS NULL
OR `TrackingInfo` = "null")
AND `STATUS` = "PROCESSING"
GROUP BY Name)
GROUP BY `SKU`
ORDER BY `TIME` ASC LIMIT 7

With this approach you just filter out rows that do not contain the largest/latest value for TIME.
SELECT SKU, Name
FROM eBayOrders o
WHERE NOT EXISTS (SELECT 0 FROM eBayOrders WHERE Name = o.name and Time > o.Time)
GROUP BY SKU, Name
Note: If two records have exactly the same Name and Time values, you may still end up getting duplicates, because the logic you have specified does not provide any way to break up a tie.

Related

Make a select with max and min passing condition to each of the two

When a post is accessed, I need, in addition to returning the information of this posts, to return the previous one if it exists and the next one.
I would like to know if there is a way to select MAX(id) and MIN(id) in a single query/select, passing a condition for each one of them. Example of what I'm trying to do in Laravel and I'll write it in SQL to make it easier too
Laravel:
$query = Post::query();
$query = $query->from('posts')->select(DB::raw('MAX(id), MIN(id)'))->whereRaw("id < {$id} and id > {$id}")->first();
SQL:
select MAX(id), MIN(id) from `posts` where id < 5 and id > 5 limit 1
The id variable is the post id value. In this example, it has the value 5. The query I'm doing is to get the MAX and MIN referring to this id, but I also need to get the info of the post that the user accessed.
The DB has post id number 4 and number 6. That is, I need to get the information from posts number 4, 5 and 6 in this example.
The where condition will never be true, but I cannot use or. The first condition is for MAX and the second for MIN. If I use the or, the biggest id will come of the DB.
I need to get the min and max value compared to a value. That is, as I explained above. If the id is 5, I need to get the largest existing id() below that value and I need to get the smallest value above it. In my case, from the information I have in the DB, it would be id 4, 5 and 6
Is it possible in a single consultation or do I really have to do more than one?
Yes, you can do it with case-when
select MAX(
CASE
WHEN id < 5 THEN id
ELSE NULL
END
), MIN(
CASE
WHEN id > 5 THEN id
ELSE NULL
END
)
from `posts`
where id <> 5
EDIT
Laravel equivalent, as shared by Gabriel Edu in the comment-section:
$query = Post::query();
$query = $query->from('posts')->
select(DB::raw("MAX(CASE WHEN id < {$id} THEN id ELSE null END), MIN(CASE WHEN id > {$id} THEN id ELSE null END)"))->first();
The LEAD() and LAG() function in MySQL are used to get preceding and succeeding value of any row within its partition.
Try this:
SELECT ID,
LAG (id) OVER (ORDER BY NULL) ONE_SHIFT_FORWARD,
LEAD (id) OVER (ORDER BY NULL) ONE_SHIFT_BACKWARD
FROM POSTS
ORDER BY ID ASC;
SELECT *
FROM ( SELECT ID,
LAG (id) OVER (ORDER BY NULL) ONE_SHIFT_FORWARD,
LEAD (id) OVER (ORDER BY NULL) ONE_SHIFT_BACKWARD
FROM POSTS
ORDER BY ID ASC)
WHERE id = 5;
You may use lead and lag to access the values before and after the current row.
You may then use those to select the post with a given id and the values before and after in a single select.
The following query
select *
from (
select
p.*,
lead(id) over(order by id) _lead,
lag(id) over(order by id) _lag
from post p
) x
where 23 in (id, _lead, _lag);
results in
id
text
_lead
_lag
15
fifteen
23
10
23
twentythree
24
15
24
twentyfour
50
23
With the following setup:
Schema (MySQL v8.0)
create table post (
id integer,
text varchar(50)
);
insert into post(id, text)
values
( 10, 'ten'),
( 15, 'fifteen'),
( 23, 'twentythree'),
( 24, 'twentyfour'),
( 50, 'fifty');
View on DB Fiddle

SQL pivot output result set of a COUNT in MySQL without UNION nor GROUP BY

I'm working on couple of millions of lines so I won't use an UNION to display my query as I would like.
For design purposes I need this query returned in a certain way to upload automatically a pie chart.
Query:
SELECT
COUNT(IF( b IS NULL, id , NULL)) AS 'not_assigned',
COUNT(IF(b IS NOT NULL, id, NULL)) AS 'assigned'
FROM table
WHERE
OverType = "abc"
AND Type = "def"
AND Sub_Type = "ghi"
AND Date BETWEEN "2022-12-01" AND "2022-12-25"
AND Client LIKE '%john%';
Result set:
not_assigned assigned
1000 500
So I would like to transform the output as this:
Count
not_assigned 1000
assigned 500
Any advice for a MySQL version 5.0?
You may aggregate by a CASE expression:
SELECT
CASE WHEN b IS NULL THEN 'not_assigned' ELSE 'assigned' END AS category,
COUNT(*) AS cnt
FROM yourTable
WHERE
OverType = 'abc' AND
Type = 'def'
Sub_Type = 'ghi' AND
Date BETWEEN '2022-12-01' AND '2022-12-25' AND
Client LIKE '%john%'
GROUP BY 1;

SQL Infinite loading

I have a database with 1 million records, it's working fine with around to 1.2s response time for simple queries using JOIN, GROUP BY, ORDER, .. It's ok and there are no problems with that. I'm working to simplify my queries using table aliases, but when I execute a simple query with two table aliases or more, the request never ends and MariaDB doesn't respond anymore, I have to restart the service manually.
Whats is going wrong ?
Here it's structure:
CREATE TABLE `values` (
`id` mediumint(11) UNSIGNED NOT NULL,
`date` int(11) NOT NULL DEFAULT '0',
`indexVar` int(11) NOT NULL,
`value` float NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Data:
exemple
Working query:
SELECT
v.date,
v.value
FROM
`values` AS v
WHERE
v.date > 1548460800 AND v.indexVar = 6 OR v.indexVar = 2
expected result
Infinite loading query:
SELECT DISTINCT
v.date,
v1.value,
v2.value
FROM
`values` AS v,
`values` AS v1,
`values` AS v2
WHERE
v.date > 1548460800 AND v1.indexVar = 6 AND v2.indexVar = 2
expected result
You aren't including any join conditions in your query.
If the values table has 1 million rows, then including it twice gives you a result set with 1 million * 1 million = 1 trillion rows. You are applying some conditions, but you're still going to wind up with a huge number of results. (And you're including the values table three times!)
Let's say you have a table with a million rows, and each row is just an integer from 1 to 1 million. If you do select value from values where value > 900000 then you'll get 100,000 rows. But if you say select value from values v, values v2 where v.value > 900000 then for each of 100,000 rows matched by the v.value > 900000 condition you'll get all million rows from v2. Even if you apply the same filter to v2 (i.e., v2.value > 900000) the query will still return 100,000 v2 rows for each row in the original values table--10 billion rows in all.
If date is the primary key of the table, then you must make sure that all the date values in each result row are the same:
select v.date, v1.value, v2.value
from values v, values v1, values v2
where v.date = v1.date and v.date = v2.date
and v1.indexVar = 6 and v2.indexVar = 2
or better yet:
select v.date, v1.value, v2.value
from values v
inner join values v1 on (v1.date = v.date)
inner join values v2 on (v2.date = v.date)
where v1.indexVar = 6 and v2.indexVar = 2
If the primary key is id then just do the same with id. (You said you wanted to align rows based on the date, so not sure which column is most significant.)
You could try using a fake aggregation function and group for reduce the result of a case when for filter
SELECT date
, max(case when v.indexVar = 6 then v.values end) v1_value
, max(case when v.indexVar = 2 then v.values end) v2_value
FROM values
WHERE
date > 1548460800 AND indexVar = 6 OR indexVar = 2
group by date
you should also add a proper composite index
create index idx1 on values ( indexVar, date)

Avoid Subquery returned more than 1 value error in a table valued function

Is there a way to rewrite this query without getting error?: Subquery returned more than 1 value.
This is query is used in a LEFT JOIN in a table-valued function. Per requirement, I need to by default pull two scenario IDs (if parameter value is NULL or empty)
DECLARE #pScenarioName AS VARCHAR(30)
select
externalID,
PropertyAssetId,
LeaseID,
BeginDate
from ae11.dbo.ivw_Leases
WHERE PropertyAssetID IN
(select ID from AE11.dbo.PropertyAssets where scenarioID IN
(CASE WHEN isnull(#pScenarioName, '') = ''
THEN (select top 2 ID from rvw_Scenarios where Name like '[0-9][0-9][0-9][0-9]%'
AND LEN(Name) = 8
order by Name desc)
ELSE
(select ID from aex.dbo.rvw_Scenarios
where [Name] IN (#pScenarioName))
END)
)
I haven't tested this, but I use a similar approach when dealing with parameters. Of course, this won't necessarily work if the order of the ID is crucial in your second subquery.
SELECT ExternalID
,PropertyAssetId
,LeaseID
,BeginDate
FROM ae11.dbo.ivw_Leases
WHERE PropertyAssetID IN
(SELECT ID
FROM AE11.dbo.PropertyAssets
WHERE scenarioID IN
(SELECT TOP 2 ID
FROM rvw_Scenarios
WHERE (#ISNULL(#pScenarioName,'') = ''
AND Name LIKE '[0-9][0-9][0-9][0-9]%'
AND LEN(Name) = 8)
ORDER BY Name DESC
UNION ALL
SELECT ID FROM aex.dbo.rvw_Scenarios
WHERE (#pScenarioName IS NOT NULL)
AND [Name] IN (#pScenarioName)))

Why is order by not working in my Union query?

I'm trying to order by month by doing this query between 3 tables :
SELECT NULL AS `inState`, NULL AS `outState`, mb.`isDuplicate`, mb.`questStatus`, mb.state, mb.`subState`, mb.`recomputedOn`, c.`TSsubmitOn`, c.`submittedOn`, mb.week, mb.month
FROM metric_backlog mb INNER JOIN `CR` c ON c.crdbid = mb.crdbid
WHERE (mb.`productName` = 'ecc' AND mb.`releaseName`
IN ('6.7.3', '6.5.0', '6.7.0', '6.7.1', '6.6.0', '6.7.2', '6.2.0', '6.1.0')) AND mb.month = '1101'
UNION ALL
SELECT mi.`inState`, mi.`outState`, NULL AS sq, NULL AS ee, NULL AS yy, NULL AS qq, NULL AS xx, NULL AS mer, NULL AS yi, mi.week, mi.month as monthh
FROM metric_inout mi INNER JOIN `CR` c ON c.crdbid = mi.crdbid
WHERE mi.month = '1101' AND mi.month != "NULL" AND mi.month IS NOT NULL AND
mi.`productName` = 'ecc' AND mi.`releaseName`
IN ('6.7.3', '6.5.0', '6.7.0', '6.7.1', '6.6.0', '6.7.2', '6.2.0', '6.1.0')
ORDER BY mi.month
I get the error : Unknown column mi.month in order clause
thanks!
try this in the order by :
ORDER BY month
instead of select month from second query
SELECT NULL AS `inState`, NULL AS `outState`, mb.`isDuplicate`, mb.`questStatus`, mb.state, mb.`subState`, mb.`recomputedOn`, c.`TSsubmitOn`, c.`submittedOn`, mb.week, mb.month as month
With UNION ALL you select columns with the names gathered from the first select statement. There are no langer tables names associated with these names. So you select: inState, outState, isDuplicate, questStatus, state, subState, recomputedOn, TSsubmitOn, submittedOn, week, and month. (Names from the second - and further SQL statements if any - are completely irrelevant, by the way.)
Hence you cannot order by mi.month. After UNION ALL being applied, the table-associated field "mi.month" is no longer available. Only the unioned field "month" is. You can only order by month.