I need some help for a MySQL query.
Let's assume we have a table with the columns "id", "unique_id" and "consecutive_id". The numbers in column "id" are NOT always consecutive, while we have to keep consecutiveness in column "consecutive_id". Basically every row should get its own consecutive number, but sometimes there may occur rows that should share the same consecutive number. Such rows have the same value in column "unique_id". I need a query to find the first ID of a row that has more than one row with the same consecutive ID and a unique ID which is not part of another row.
In created a little fiddle at https://www.db-fiddle.com/f/hy8SACLyM2D65H2ZY31c2f/0 to demonstrate my issue. As you can see, IDs 3 and 5 have the same consecutive number (2). That's okay as they share the same unique ID. IDs 9, 10, 12 and 14 also have the same consecutive number (4), but only IDs 9 and 10 have an identical unique ID. Therefore in this case the query should find ID 12.
Can you please help me with developing a solution for this?
Thank you so very much for your help in advance.
All the best,
Marianne
You can use COUNT(DISTINCT unique_id) to find the values of consecutive_id that have different unique_id.
SELECT consecutive_id
FROM test
GROUP BY consecutive_id
HAVING COUNT(DISTINCT unique_id) > 1
You can then join this with your original table, group by both unique_id and consecutive_id, and get the rows that just have a count of 1, which means they're not equal to the rest of the group.
Since there can be multiple outliers, you need another level of subquery that just gets the minimum outlier for each consecutive ID.
SELECT consecutive_id, MIN(id) as id
FROM (
SELECT a.consecutive_id, MIN(id) AS id
FROM test AS a
JOIN (
SELECT consecutive_id
FROM test
GROUP BY consecutive_id
HAVING COUNT(DISTINCT unique_id) > 1) AS b
ON a.consecutive_id = b.consecutive_id
GROUP BY a.consecutive_id, a.unique_id
HAVING COUNT(*) = 1) AS x
GROUP BY consecutive_id
DEMO
Related
I need help in constructing an MySQL Statement where I need to find previous rows in the same table.
My data looks like this:
history_id (auto increment), object_id (exists multiple times), timestamp, ...
example:
1, 2593, 2018-08-07 09:37:21
2, 2593, 2018-08-07 09:52:54
3, 15, 2018-08-07 10:41:15
4, 2593, 2018-08-07 09:57:36
Some properties of this data:
the higher the auto increment gets the later the timestamp is for the same object id
it is possible that there is only one row for one object_id at all
the combination of object_id and timestamp is always unique, no duplicates are possible
For every row I need to find the most previous row with the same object_id.
I found this post: https://dba.stackexchange.com/questions/24014/how-do-i-get-the-current-and-next-greater-value-in-one-select and worked through the examples but I was not able to solve my problem.
I just tested around a bit and got to this point:
SELECT
i1.history_id,
i1.object_id,
i1.timestamp AS state_time,
i2.timestamp AS previous_time
FROM
history AS i1
LEFT JOIN (
select timestamp as timestamp,history_id as history_id,object_id as object_id
from history
group by object_id
) AS i2 on i2.object_id = i1.object_id and i2.history_id < i1.history_id
Now I only need to cut of the subquery that I only get the highest value of history_id for each row but its not working when I use limit 1, because then I will get only one value at all.
Do you have any Idea on how to solve this problem? Or you may have better and more efficient techniques?
Performance is a point here because I have 3.1 million rows growing higher..
Thank you!
The best direction is to use window function. Simple lag(timestamp) would do the job with proper order by clause. See here: https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lag
But if all You need is
to cut of the subquery that I only get the highest value of history_id for each row but its not working when I use limit 1
Then change subquery from
select timestamp as timestamp,history_id as history_id,object_id as object_id
from history
group by object_id
to
select object_id as object_id, MAX(history_id) as history_id, MAX(timestamp) as timestamp
from history
group by object_id
In general You should not SELECT more columns, than You have in GROUP BY clause, unless they are enclosed with aggregate function.
I have a very big subquery:
(SELECT Id, Count [...] FROM Something) Counts
I want to create a score for each Id that is the count divided by the max count.
I tried:
SELECT Id, Count/(SELECT MAX(Count)) AS Score
FROM (SELECT Id, Count [...] FROM Something) Counts
But this only returns the first row!
If I do a GROUP BY Id, all scores are equal to 1 (because the maximum is taken for each Id, and not for all Ids).
Do you know what I can do please? I know that in some contexts we can embed a subquery in a WITH clause, but this is not valid in MySQL.
I believe this is what you need:
Select Id, (Count/(SELECT MAX(Count) FROM Something)) As [Score] FROM Something
Explanation:
I believe you want to take max of all counts in the table. In order to do so you need to perform a subquery on the entire set of the table, versus limiting it to a specific id, or grouping it. When you performed your group by operating on ID, assuming each Id is unique, it is effectively returning Id, Count/Count. As you know any non-zero number divided by itself is 1.
Lets say I have a table with a column of ages..
Here is the list of ages
1
2
3
1
1
3
I want the SQL to count how many of age 1s, how many of 2s and 3s.
The code:
Select count(age) as age1 where age = ‘1’;
Select count(age) as age2 where age = ‘2’;
Select count(age) as age3 where age = ‘3’;
Should work but would there be a way to just display it all using only 1 line of code?
This is an instance where the GROUP BY clause really shines:
SELECT age, COUNT(age)
FROM table_name
GROUP BY age
Just an additional tip:
You shouldn't use single quotes here in your query:
WHERE age = '1';
This is because age is an INT data type and therefore does not have single quotes. MySQL will implicitly convert age to the correct data type for you - and it's a negligible amount of overhead here. But imagine if you were doing a JOIN of two tables with millions of rows, then the overhead introduced would be something to consider.
Try this ,if the count is limited to three ages ,also using aggregate functions without grouping them will result in a single row,you can use SUM() with the condition which will result in a boolean and you can get the count based on your criteria
Select SUM(age = '1') as age1,
SUM(age = '2') as age2,
SUM(age = '3') as age3
from table
SELECT SUM(CASE WHEN age = 1 THEN 1 ELSE 0 END) AS age1,
SUM(CASE WHEN age = 2 THEN 1 ELSE 0 END) AS age2,
SUM(CASE WHEN age = 3 THEN 1 ELSE 0 END) AS age3
FROM YourTable
If your query should return only one column (age in this case, you can use Count+groupby):
SELECT age, Count(1) as qty
FROM [yourTable]
GROUP BY age
Remember you must include any additional column in your group by condition.
Select age as Age_Group, count(age) as Total_count from table1 group by age;
select age, count(age) from SomeTable group by age
http://sqlfiddle.com/#!2/b40da/2
The group by clause works like this:
When using aggregate functions, like the count function without a group by clause the function will apply to the entire dataset determined by the from and where clauses. A count will for instance count the number of rows in the result set, and sum over a specfic column will sum all the rows in the result set.
What the group by clause allows us to do, is to divide the result set determined by the from and where clause into partitions, so that the aggregate functions no longer applies to the result set as a whole, but rather within each partition of the result set.
When you specify a column to group by, what you are saying is something like "for each distinct value of column x in the result set, create a partition containing any row in the result set with this particular value in column x". Then, instead of yielding one result covering the entire resultset, aggregate functions will yield one result for each distinct value of column x in the result set.
With your example input of:
1
2
3
1
1
3
let's analyze the above query. As always, we should look at the from clause and the where clause first. The from clause tells us that we are selecting from SomeTable and only this, and the lack of a where clause tells us that we are selecting from the full contents of SomeTable.
Next, we'll look at the group by clause. It's present, and it groups by the age column, which is the only column in our example. The presence of the group by clause changes our dataset completely! Instead of selecting from the entire row set of SomeTable, we are now selecting from a set of partitions, one for each distinct value of the age-column in our original result set (which was every row in SomeTable).
At last, we'll look at the select-clause. Now, since we are selecting from partitions and not regular rows, the select-clause has fewer options for what it can contain, actually it only has 2: The column that it is grouped by, or an aggregate function.
Now, in our example we only have one column, but consider that we had another column, like here:
http://sqlfiddle.com/#!2/d5479/2
Now, imagine that in our data set we have two rows, both with age='1', but with different values in the other column. If we were to include this other column in a query that is grouped by the age-column (which we now know will return one row for each partition over the age-column), which value should be presented in the result? It makes no sense to include other column than the one you grouped by. (I'll leave multiple columns in the group by clause out of this, in my experience one usually just wants one..)
But back to our select-clause, knowing our dataset has the distinct values {1, 2, 3} in the age-column, we should expect to get 3 rows in our result set. The first thing to be selected is the age-column, which will yield the values [1, 2, 3]´ in the three rows. Next in theselect-list is an aggregate functioncount(age), which we now know will count the number of rows in each partition. So, for the row in the result whereage='1', it will count the number of rows withage='1', for the row whereage='2'it will count the number of rows whereage='2'`, and so on.
The result would look something like this:
age count(age)
1 3
2 1
3 2
(of course you are free to override the name of the second column in the result, with the as-operator..)
And that concludes today's lesson.
I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.
The query is:
SELECT DISTINCT
ShoppingSessionId, userid
FROM
dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId, userid
HAVING
userid = 7
This query produces correct result, but if we add another column then result is wrong.
Please help me as I want to use the ShoppingSessionId as a distinct, except when I want to use all the columns from the table, including with the where clause .
How can I do that?
The DISTINCT keyword applies to the entire row, never to a column.
Presently DISTINCT is not needed at all, because your script already makes sure that ShoppingSession is distinct: by specifying the column in GROUP BY and filtering on the other grouping column (userid).
When you add a third column to GROUP BY and it results in duplicated ShoppingSession, it means that some ShoppingSession values are associated with many different values of the added column.
If you want ShoppingSession to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX() function to that column, or MIN() or any other suitable aggregate function. Note that the column should not be included in GROUP BY in this case.
Here's an illustration of what I'm talking about:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId,
userid
HAVING userid = 7
There's one more note on your query. The HAVING clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE clause instead:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId,
userid
Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.
You could go even further and exclude the userid column from GROUP BY and pull its value with an aggregate function:
SELECT
ShoppingSessionId,
MAX(userid) AS userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId
Since all userid values in your output are supposed to contain 7 (because that's in your filter), you can just pick a maximum value per every ShoppingSession, knowing that it'll always be 7.
I have this table:
id name bid
1 Test1 5.50
2 Test2 5.50
3 Test3 5.49
I want to select the row with the highest bid. If the highest bid is equal on another row, then it should randomly select one of the highest bid rows.
I tried:
SELECT name,max(bid) FROM table ORDER BY rand()
The output:
id name bid
1 Test1 5.50
My problem is that id "2" is never displayed because for some reason my query is only selecting id "1"
SELECTing name and MAX(bid) in the same query makes no sense: you are asking for the highest bid aggregated across all the rows, plus a name that's not aggregated, so it's not at all clear which row's name you'll be picking. MySQL typically picks the “right” answer you meant (one of the rows that owned the maximum bid) but it's not guaranteed, fails in all other databases, and is invalid in ANSI SQL.
To get a highest-bid row, order by bid and pick only the first result. If you want to ensure you get a random highest-bid row rather than just an arbitrary one, add a random factor to the order clause:
SELECT name, bid
FROM table
ORDER BY bid DESC, RAND()
LIMIT 1
SELECT name,bid
FROM table
WHERE bid=(SELECT max(bid) FROM table)
ORDER BY RAND()
LIMIT 1
should do the trick. Waiting for more optimized request ^^
That's because you're using an aggregate function, which collapses everything into a single row. You need a sub-select:
SELECT *
FROM table
WHERE bid = (SELECT MAX(bid) FROM table)
ORDER BY rand()
LIMIT 1;
But also be aware of why not to use ORDER BY RAND(). Although if you have only a few results, the performance implications may not be significant enough to bother changing.