get distinct values as array by user_id - mysql

I want to get a list of distinct values for each user limited by 3 values per user:
id, user_id, value
1, 1, a
2, 1, b
3, 2, c
4, 1, b
5, 1, d
6, 1, e
expected result:
user_id, values
1, [a,b,d]
2, [c]
is there some way to do this with GROUP BY user_id and DISTINCT?

Edit (based on comments):
We can use user-defined variables to assign row number to various value within a partition of user_id. Eventually, we will filter out this result-set to consider upto 3 rows per user_id only.
SELECT
dt2.user_id,
dt2.value
FROM
(
SELECT
#rn := CASE WHEN #ui = dt.user_id THEN #rn + 1
ELSE 1
END AS row_no,
#ui = dt.user_id,
dt.value
FROM
(
SELECT DISTINCT
user_id,
value
FROM your_table
ORDER BY user_id
) AS dt
CROSS JOIN (SELECT #rn := 0, #ui := null) AS user_init_vars
) AS dt2
WHERE dt2.row_no <= 3
Previous question's answer:
Group_Concat(Distinct...) all the unique value for a user_id.
We can then use Substring_Index() function to consider string upto 3rd comma. This will then result in consideration of upto 3 values only.
At the end, we can use Concat() function to enclose the resultant string in square brackets.
Values is Reserved keyword in MySQL. You can consider naming the resultant column into something else.
Try the following:
SELECT user_id,
CONCAT('[',
SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT value), ',', 3),
']') AS user_values
FROM your_table
GROUP BY user_id

Related

how can i group by field value?

how can i group by one field start by value 0
eg.
select * from t;
id, check_id, user_name
1, 0, user_a
2, 1, user_a
3, 2, user_a
1, 0, user_a
2, 1, user_a
3, 3, user_a
1, 0, user_b
2, 1, user_b
3, 3, user_b
group by check_id by start by value 0 per group
user_name, check_info
user_a, 0-1-2
user_a, 0-1-3
user_b, 0-1-3
how can i group by?
Well, i read in the question : group by one field start by value 0
Then, you can try this.
select user_name,group_concat(distinct check_id order by check_id asc separator '-') check_info
from (
select id,check_id,user_name,
case when check_id = 0 then
#rn := #rn+1
else
#rn := #rn
end as unique_id
from t
inner join (select #rn := 0) as tmp
order by user_name
) as tbl
group by user_name,unique_id
This will group by for every records start by 0 and order by user_name.
This will give you what you want....maybe. It does work but is relying on the records coming back in the appropriate order when selected from the table (and that is NOT certain to occur).
SELECT user_name, GROUP_CONCAT(check_id ORDER BY grouping, check_id SEPARATOR '-')
FROM
(
SELECT id, check_id, user_name, #grouping:=if(id > #prev_id, #grouping, #grouping + 1) AS grouping, #prev_id:=id
FROM t
CROSS JOIN
(
SELECT #grouping:=0, #prev_id:=0
) sub0
) sub1
GROUP BY user_name, grouping
It works by returning the rows and using variables to assign a grouping to them (so when the id gets smaller it adds one to the grouping value), then does a GROUP BY on the user name and the grouping value.
But really you need to have the grouping value somehow stored with your data in advance.
Provided that id is an auto-increment field, then you can use:
SELECT user_name,
GROUP_CONCAT(check_id ORDER BY check_id SEPARATOR '-') AS check_info
FROM (
SELECT id, check_id, user_name,
#grp := IF (#uname = user_name,
IF (check_id = 0, #grp + 1, #grp),
IF (#uname := user_name, #grp + 1, #grp + 1)) AS grp
FROM mytable
CROSS JOIN (SELECT #grp := 0, #uname := '') AS vars
ORDER BY id) AS t
GROUP BY user_name, grp
Variables are used to identify slices of consecutive records, within each user_name partition, starting by 0.
Demo here

MySQL Query get the last N rows per Group

Suppose that I have a database which contains the following columns:
VehicleID|timestamp|lat|lon|
I may have multiple times the same VehicleId but with a different timestamp. Thus VehicleId,Timestamp is the primary key.
Now I would like to have as a result the last N measurements per VehicleId or the first N measurements per vehicleId.
How I am able to list the last N tuples according to an ordering column (e.g. in our case timestamp) per VehicleId?
Example:
|VehicleId|Timestamp|
1|1
1|2
1|3
2|1
2|2
2|3
5|5
5|6
5|7
In MySQL, this is most easily done using variables:
select t.*
from (select t.*,
(#rn := if(#v = vehicle, #rn + 1,
if(#v := vehicle, 1, 1)
)
) as rn
from table t cross join
(select #v := -1, #rn := 0) params
order by VehicleId, timestamp desc
) t
where rn <= 3;

What are the subquery equivalents of SQL aggregate functions MAX/MIN/AVG/COUNT

Can someone show me how to represent the following SQL statements without the use of aggregate functions?
SELECT COUNT(column) FROM table;
SELECT AVG(column) FROM table;
SELECT MAX(column) FROM table;
SELECT MIN(column) FROM table;
MIN() and MAX() can be done with simple subqueries:
select (select column from table order by column is not null desc, column asc limit 1) as "MIN",
(select column from table order by column is not null desc, column desc limit 1) as "MAX"
COUNT() and AVG() require the use of variables, if you don't allow any aggregations:
select rn as "COUNT", sumcol / rnaas "AVG"
from (select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
This latter formulation only works in MySQL.
EDIT:
The empty table is a challenge. Let's do this with a left outer join:
select cast(coalesce(rn, 0) as int) as "COUNT",
(case when rna > 0 then sumcol / rna end) as "AVG"
from (select 1 as n
) n left outer join
(select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
on n.n = 1;
Notes. This will return 0 for the count if the table is empty. That is correct. If the table is empty, it will return NULL for the average, and that is also correct.
If the table is not empty, but the values are all NULL, then it will also return NULL. The types for the count are always integers, so that should be ok. The type of the average is more problematic, but the variables will return some sort of generic numeric type, which seems compatible in spirit.
min/max can be replaced with something like this:
select t1.pk_column,
t1.some_column
from the_table t1
where t1.some_column < ALL (select t2.some_column
from the_table t2
where t2.pk_column <> t2.pk_column);
For getting the max you need to replace < with >. pk_column is the primary key column of the table and is needed to avoid comparing each row to itself (it doesn't have to be a PK it only needs to be unique)
I don't think there is an alternative for count() or avg() (at least I can't think of one)
I used the_column and the_table because column and table are reserved words
SET #t1=0, #t2=0, #t3=0,#T4=0;
COUNT:
Select #t1:=#t1+1 as CNT from table
order by #t1:=#t1+1 DESC
LIMIT 1
Similar methods could be put together for Avg and max/min using limits...
Still thinking about Min/Max...
Not to supersede the excellent answer from Gordon Linoff, but there's a little more work involved to accurately emulate the AVG(), COUNT(), and SUM() functions. (The answer for the MIN and MAX functions in Gordon's answer are spot on.)
There's a corner case when the table is empty. In order to emulate the SQL aggregate functions, we need our query to return a single row. But at the same time, we need a test of whether or not the table contains at least one row.
Here's a query that is a more precise emulation:
-- create an empty table
CREATE TABLE `foo` (col INT);
-- TRUNCATE TABLE `foo`;
SELECT IF(s.ne IS NULL,0,s.rn) AS `COUNT(*)`
, IF(s.cc>0,s.tc,NULL) AS `SUM(col)`
, IF(s.cc>0,s.tc/s.cc,NULL) AS `AVG(col)`
FROM ( SELECT v.rn
, v.cc
, v.tc
, e.ne
FROM ( SELECT #rn := #rn + 1 AS rn
, #cc := #cc + (t.col IS NOT NULL) AS cc
, #tc := #tc + IFNULL(t.col,0) AS tc
FROM (SELECT #rn := 0, #cc := 0, #tc := 0) c
LEFT
JOIN `foo` t
ON 1=1
) v
LEFT
JOIN (SELECT 1 AS ne FROM `foo` z LIMIT 1) e
ON 1=1
ORDER BY v.rn DESC
LIMIT 1
) s
NOTES:
The purpose of the inline view aliased as e is to give us a way to determine whether or not the table contains any rows. If the table contains at least one row, we'll get a value of 1 returned as column ne (not empty). If the table is empty, that query won't return a row, and e.ne will be NULL, which is something we can test in the outer query.
In order to return a row, so we can return a value, like a 0 for a COUNT, we need to insure that we return at least one row from the inline view v. Since we are guaranteed exactly one row from the inline view aliased as c (which initializes our user defined variables), we'll use that as the "driving" table for a LEFT [OUTER] JOIN operation.
But, if the table is empty, our our row counter (#rn) coming out of v is going to have a value of 1. But we'll deal with that, we have the e.ne we can check to know if the count should really be returned as 0.
In order to calculate the average, we can't divide by the row counter, we have to divide by the number of rows where col was not null. We use the #cc user defined variable to keep track of the count of those rows.
Similarly, for the SUM (and the average) we need to accumulate only the non-NULL values. (If we were to add a NULL, it would turn the whole total to NULL, basically wiping out are accumulation. So, we're going to do a conditional test to check if t.col IS NULL, to avoid accidentally wiping out the accumulation. And our accumulator is going to be a 0 if there aren't any rows that are not null. But that's not a problem, because we'll make sure we check our #cc to see if there were any rows that were included. We're going to need to check it anyway, to avoid a "divide by zero" issue.
To test, run against the empty table foo. It will return a count of 0, and NULL for SUM and AVG, equivalent to the result we get from:
SELECT COUNT(*), SUM(col), AVG(col) FROM foo;
We can also test the query against a table containing only NULL values for col:
INSERT INTO `foo` (col) VALUES (NULL);
As well as some non-NULL values:
INSERT INTO `foo` (col) VALUES (2),(3),(5),(7),(11),(13),(17),(19);
And compare the results of the two queries.
This essentially the same as the answer from Gordon Linoff, with just a little more precision to work around the corner cases of NULL values and the empty table.

Get column name which has the max value in a row sql

I have a a table in my database where I store categories for newsarticles and each time a user reads an article it increments the value in the associated column. Like this:
Now I want to execute a query where I can get the column names with the 4 highest values for each record. For example for user 9, it would return this:
I've tried several things, searched a lot but don't know how to do it. Can anyone help me?
This should do it:
select
userid,
max(case when rank=1 then name end) as `highest value`,
max(case when rank=2 then name end) as `2nd highest value`,
max(case when rank=3 then name end) as `3rd highest value`,
max(case when rank=4 then name end) as `4th highest value`
from
(
select userID, #rownum := #rownum + 1 AS rank, name, amt from (
select userID, Buitenland as amt, 'Buitenland' as name from newsarticles where userID = 9 union
select userID, Economie, 'Economie' from newsarticles where userID = 9 union
select userID, Sport, 'Sport' from newsarticles where userID = 9 union
select userID, Cultuur, 'Cultuur' from newsarticles where userID = 9 union
select userID, Wetenschap, 'Wetenschap' from newsarticles where userID = 9 union
select userID, Media, 'Media' from newsarticles where userID = 9
) amounts, (SELECT #rownum := 0) r
order by amt desc
limit 4
) top4
group by userid
Demo: http://www.sqlfiddle.com/#!2/ff624/11
A very simple way of doing this is shown below
select userId, substring_index(four_highest,',',1) as 'highest value', substring_index(substring_index(four_highest,',',2),',',-1) as '2th highest value', substring_index(substring_index(four_highest,',',3),',',-1) as '3 rd highest value', substring_index(four_highest,',',-1) as '4th highest value' from
(
select userid, convert(group_concat(val) using utf8) as four_highest from
(
select userId,Buitenland as val,'Buitenland' as col from test where userid=9 union
select userId,Economie as val,' Economie' as col from test where userid=9 union
select userId,Sport as val ,'Sport' as col from test where userid=9 union
select userId,Cultuur as val,'Cultuur' as col from test where userid=9 union
select userId,Wetenschap as val,'Wetenschap' as col from test where userid=9 union
select userId,Media as val,'Media' as col from test where userid=9 order by val desc limit 4
) inner_query
)outer_query;
PL/SQL, maybe? Set user_id, query your table, store the returned row in an nx2 array of column names and values (where n is the number of columns) and sort the array based on the values.
Of course, the correct thing to do is redesign your database in the manner that #octern suggests.
This will get you started with the concept of grabbing the highest value from multiple columns on a single row (modify for your specific tables - I created a fake one).
create table fake
(
id int Primary Key,
col1 int,
col2 int,
col3 int,
col4 int
)
insert into fake values (1, 5, 9, 27, 10)
insert into fake values (2, 3, 5, 1, 20)
insert into fake values (3, 89, 9, 27, 6)
insert into fake values (4, 17, 40, 1, 20)
SELECT *,(SELECT Max(v)
FROM (VALUES (col1), (col2), (col3), (col4) ) AS value(v))
FROM fake

With MySQL, how can I generate a column containing the record index in a table?

Is there any way I can get the actual row number from a query?
I want to be able to order a table called league_girl by a field called score; and return the username and the actual row position of that username.
I'm wanting to rank the users so i can tell where a particular user is, ie. Joe is position 100 out of 200, i.e.
User Score Row
Joe 100 1
Bob 50 2
Bill 10 3
I've seen a few solutions on here but I've tried most of them and none of them actually return the row number.
I have tried this:
SELECT position, username, score
FROM (SELECT #row := #row + 1 AS position, username, score
FROM league_girl GROUP BY username ORDER BY score DESC)
As derived
...but it doesn't seem to return the row position.
Any ideas?
You may want to try the following:
SELECT l.position,
l.username,
l.score,
#curRow := #curRow + 1 AS row_number
FROM league_girl l
JOIN (SELECT #curRow := 0) r;
The JOIN (SELECT #curRow := 0) part allows the variable initialization without requiring a separate SET command.
Test case:
CREATE TABLE league_girl (position int, username varchar(10), score int);
INSERT INTO league_girl VALUES (1, 'a', 10);
INSERT INTO league_girl VALUES (2, 'b', 25);
INSERT INTO league_girl VALUES (3, 'c', 75);
INSERT INTO league_girl VALUES (4, 'd', 25);
INSERT INTO league_girl VALUES (5, 'e', 55);
INSERT INTO league_girl VALUES (6, 'f', 80);
INSERT INTO league_girl VALUES (7, 'g', 15);
Test query:
SELECT l.position,
l.username,
l.score,
#curRow := #curRow + 1 AS row_number
FROM league_girl l
JOIN (SELECT #curRow := 0) r
WHERE l.score > 50;
Result:
+----------+----------+-------+------------+
| position | username | score | row_number |
+----------+----------+-------+------------+
| 3 | c | 75 | 1 |
| 5 | e | 55 | 2 |
| 6 | f | 80 | 3 |
+----------+----------+-------+------------+
3 rows in set (0.00 sec)
SELECT #i:=#i+1 AS iterator, t.*
FROM tablename t,(SELECT #i:=0) foo
Here comes the structure of template I used:
select
/*this is a row number counter*/
( select #rownum := #rownum + 1 from ( select #rownum := 0 ) d2 )
as rownumber,
d3.*
from
( select d1.* from table_name d1 ) d3
And here is my working code:
select
( select #rownum := #rownum + 1 from ( select #rownum := 0 ) d2 )
as rownumber,
d3.*
from
( select year( d1.date ), month( d1.date ), count( d1.id )
from maindatabase d1
where ( ( d1.date >= '2013-01-01' ) and ( d1.date <= '2014-12-31' ) )
group by YEAR( d1.date ), MONTH( d1.date ) ) d3
You can also use
SELECT #curRow := ifnull(#curRow,0) + 1 Row, ...
to initialise the counter variable.
Assuming MySQL supports it, you can easily do this with a standard SQL subquery:
select
(count(*) from league_girl l1 where l2.score > l1.score and l1.id <> l2.id) as position,
username,
score
from league_girl l2
order by score;
For large amounts of displayed results, this will be a bit slow and you will want to switch to a self join instead.
If you just want to know the position of one specific user after order by field score, you can simply select all row from your table where field score is higher than the current user score. And use row number returned + 1 to know which position of this current user.
Assuming that your table is league_girl and your primary field is id, you can use this:
SELECT count(id) + 1 as rank from league_girl where score > <your_user_score>
I found the original answer incredibly helpful but I also wanted to grab a certain set of rows based on the row numbers I was inserting. As such, I wrapped the entire original answer in a subquery so that I could reference the row number I was inserting.
SELECT * FROM
(
SELECT *, #curRow := #curRow + 1 AS "row_number"
FROM db.tableName, (SELECT #curRow := 0) r
) as temp
WHERE temp.row_number BETWEEN 1 and 10;
Having a subquery in a subquery is not very efficient, so it would be worth testing whether you get a better result by having your SQL server handle this query, or fetching the entire table and having the application/web server manipulate the rows after the fact.
Personally my SQL server isn't overly busy, so having it handle the nested subqueries was preferable.
I know the OP is asking for a mysql answer but since I found the other answers not working for me,
Most of them fail with order by
Or they are simply very inefficient and make your query very slow for a fat table
So to save time for others like me, just index the row after retrieving them from database
example in PHP:
$users = UserRepository::loadAllUsersAndSortByScore();
foreach($users as $index=>&$user){
$user['rank'] = $index+1;
}
example in PHP using offset and limit for paging:
$limit = 20; //page size
$offset = 3; //page number
$users = UserRepository::loadAllUsersAndSortByScore();
foreach($users as $index=>&$user){
$user['rank'] = $index+1+($limit*($offset-1));
}