Query membership of group on date with open-ended memberships - mysql

Given the following table structure for tracking membership of given groups:
+----+----------+----------------+--------------+
| id | group_id | in_group_begin | in_group_end |
+----+----------+----------------+--------------+
| 1 | 10 | 2019-01-01 | 2019-02-01 |
| 1 | 11 | 2019-02-02 | 2019-03-01 |
| 1 | 12 | 2019-03-01 | NULL |
| 2 | 10 | 2019-01-01 | NULL |
+----+----------+----------------+--------------+
(Where in_group_end being NULL signifies this is their current group)
How would I form a query that would tell me, for example, what group_id each member was associated with on a given date?
... in_group_end IS NULL will give me their current group, not necessarily the group they were in
... in_group_end IS NULL OR in_group_end >= '{$date_str}' could give me multiple options
Ideally I would like something I can use in a joined query, e.g. with a table storing a persons name, address, etc. from which I expect only one row back.
Would some kind of IF stmt in the JOIN do it? or GROUP in a sub-query?

Consider the following logic, which would find all matches for 2019-01-15:
SELECT group_id
FROM yourTable
WHERE
'2019-01-15' >= in_group_begin AND
('2019-01-15' <= in_group_end OR in_group_end IS NULL);
The WHERE clause considers an input date a match if it lies in between the start and end dates or it is greater than the start date and there is no end date. Also, the WHERE clause as written can make use of an index.

Let's say you want to search for a date 2019-01-03.
SELECT
id,
group_id
FROM membership
WHERE '2019-01-03' BETWEEN in_group_begin AND IFNULL(in_group_end, CURRENT_DATE);
If you have another users table which stores details of users and id of that table is used in membership table using id field. You can do following query.
SELECT
u.id,
u.name,
u.address,
m.group_id
FROM users u
INNER JOIN membership m ON u.id = m.id
WHERE '2019-01-03' BETWEEN in_group_begin AND IFNULL(in_group_end, CURRENT_DATE);

Assuming there is a table users from which you want the user's details returned, join it to your table tablename like this:
select u.*, t.group_id
from users u inner join (
select
id, group_id, in_group_begin,
coalesce(in_group_end, current_date) in_group_end
from tablename
) t on t.id = u.id and #date between t.in_group_begin and t.in_group_end
Replace #date with the date you search for.

Related

Joining table to itself with multiple join criteria logic

I'm trying to understand the logic behind the syntax below. Based on the following question, table and syntax:
Write a query that'll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.
Column + Data Type:
id: int | user_id: int | item: varchar |created_at: datetime | revenue: int
SELECT DISTINCT(a1.user_id)
FROM amazon_transactions a1
JOIN amazon_transactions a2 ON a1.user_id=a2.user_id
AND a1.id <> a2.id
AND a2.created_at::date-a1.created_at::date BETWEEN 0 AND 7
ORDER BY a1.user_id
Why does the table need to be joined to itself in this case?
How does 'AND a1.id <> a2.id' portion of syntax contribute to the join?
You are looking for users that have 2 records on that table whose date distance is lower (or equal) than 7 days
To accomplish this, you treat the table as if it were 2 different (but equal tables) because you have to match a row on the first table with a row on the second table
Of course you don't want to match a row with itself, so
AND a1.id <> a2.id
accomplishes that
The table needs to be joined with itself because, you just have one table, and you want to find out returning users (by comparing the duration between transaction dates for the same user).
AND a1.id <> a2.id portion of the syntax removes the same transactions, i.e. prevents the transactions with the same id to be included in the joined table.
There are two scenarios I can think of based on the id column values. Are id column values generated based on timely sequence ? If so, to answer your first question ,we can but don't have to use join syntax. Here is how to achieve your goal using a correlated subquery , with sample data created.
create table amazon_transactions(id int , user_id int , item varchar(20),created_at datetime , revenue int);
insert amazon_transactions (id,user_id,created_at) values
(1,1,'2020-01-05 15:33:22'),
(2,2,'2020-01-05 16:33:22'),
(3,1,'2020-01-08 18:33:22'),
(4,1,'2020-01-22 17:33:22'),
(5,2,'2020-02-05 15:33:22'),
(6,2,'2020-03-05 15:33:22');
select * from amazon_transactions;
-- sample set:
| id | user_id | item | created_at | revenue |
+------+---------+------+---------------------+---------+
| 1 | 1 | NULL | 2020-01-05 15:33:22 | NULL |
| 2 | 2 | NULL | 2020-01-05 16:33:22 | NULL |
| 3 | 1 | NULL | 2020-01-08 18:33:22 | NULL |
| 4 | 1 | NULL | 2020-01-22 17:33:22 | NULL |
| 5 | 2 | NULL | 2020-02-05 15:33:22 | NULL |
| 6 | 2 | NULL | 2020-03-05 15:33:22 | NULL |
-- Here is the answer using a correlated subquery:
select distinct user_id
from amazon_transactions t
where datediff(
(select created_at from amazon_transactions where user_id=t.user_id and id-t.id>=1 limit 1 ),
created_at
)<=7
;
-- result:
| user_id |
+---------+
| 1 |
However,what if the id values are NOT transaction time based? Then the id values are not at all helpful in our requirement. In this case, a JOIN is more capable than a correlated subquery and we need to arrange the order based on transaction time for each user in order to make the necessary join condition. And to answer your second question, the AND a1.id <> a2.id portion of syntax contribute by excluding two of the same transaction making a pair. However, to my understanding the matching scope is too high to be effective. We only care if CONSECUTIVE transactions have a within-7-day gap, but the AND a1.id <> a2.id overdoes the job. For instance, we want to check the gap between transaction1 and transaction2,transaction2 and transaction3, NOT transaction1 and transaction3
Note: by using the user variable row_id trick, we can produce the row id which is used to match consecutive transactions for each user, thus eliminating the wasteful job of random transaction check.
select distinct t1.user_id
from
(select user_id,created_at,#row_id:=#row_id+1 as row_id
from amazon_transactions ,(select #row_id:=0) t
order by user_id,created_at)t1
join
(select user_id,created_at,#row_num:=#row_num+1 as row_num
from amazon_transactions ,(select #row_num:=0) t
order by user_id,created_at)t2
on t1.user_id=t2.user_id and t2.row_num-t1.row_id=1 and datediff(t2.created_at,t1.created_at)<=7
;
-- result
| user_id |
+---------+
| 1 |

Additional condition for equal values using MAX() function

I have a simple database for auctions. It includes a table that contains the bids.
+---------+---------+--------+------------+
| item_id | user_id | amount | time |
+---------+---------+--------+------------+
| 3 | 2 | 500 | 1540152972 |
| 3 | 4 | 500 | 1540151466 |
+---------+---------+--------+------------+
At the end of the auction I need to find which users won which items (highest amount). I've considered the following query for that
SELECT item_id, user_id, MAX(amount)
FROM auction_bids
GROUP BY item_id
Which appears to work fine, until multiple users have made a bid with the same amount.
In that case I need to retrieve the earliest one (i.e: the lowest time value).
How do I work this into my GROUP BY query?
Return a row if no other row with the same item_id has a higher price, or, if the prices are the same, the other row is later.
SELECT item_id, user_id, amount
FROM auction_bids a1
WHERE NOT EXISTS (select 1 from auction_bids a2
where a2.item_id = a1.item_id
and (a2.amount > a1.amount
or (a2.amount = a1.amount and a2.time < a1.time)))
No, this is filtering, not aggregation:
SELECT ab.*
FROM auction_bids ab
WHERE ab.amount = (SELECT MAX(ab2.amount)
FROM auction_bids ab2
WHERE ab2.item_id = ab.item_id
);

MAX function in MySQL does not return proper key value

I have a table called tbl_user_sal:
| id | user_id | salary | date |
| 1 | 1 | 1000 | 2014-12-01 |
| 2 | 1 | 2000 | 2014-12-02 |
Now I want to get the id of the maximum date. I used the following query:
SELECT MAX(date) AS from_date, id, user_id, salary
FROM tbl_user_sal
WHERE user_id = 1
But it gave me this output:
| id | user_id | salary | from_date |
| 1 | 1 | 2000 | 2014-12-02 |
Which is correct as far as the max date being 2014-12-02, but the corresponding id is not correct. This happens for other records as well. I used order by to check but that was not successful either. Can anyone shed some light on this?
Note: Its not necessary that max date will have max id, according to my needs. Records can have max date but id may be older.
If you only want to retrieve that information for a single user, which you seem to, because of your WHERE clause, just use ORDER BY and LIMIT:
SELECT *
FROM tbl_user_sal
WHERE user_id = 1
ORDER BY date DESC
LIMIT 1
If you want to do that for every user, however, you will have to get a little bit fancier. Something like that should do it:
SELECT t2.id, user_id, date
--find max date for each user_id
FROM (SELECT user_id, MAX(date) AS date
FROM tbl_user_sal
GROUP BY user_id) AS t1
--join ids for each max date/user_id combo
JOIN tbl_user_sal AS t2
USING (user_id, date)
--limit to 1 id for every user_id
GROUP BY
user_id
You are missing group by clause Try this:
select max(awrd_date) as from_date,awrd_id
from tbl_user_sal
where awrd_user_id = 106
group by awrd_id
What I believe you should do here is have a subquery that pulls the max date, and your outer query looks for the row with that date.
It looks like this:
SELECT *
FROM myTable
WHERE date = (SELECT MAX(date) FROM myTable);
Additional things may need to be added if you want to search for a specific user_id, or get the largest date for each user_id, but this gives your expected results for this example here.
Here is the SQL Fiddle.

SQL complicated select statement

I am trying to create a SELECT statement, but I am not really sure how to accomplish it.
I have 2 tables, user and group. Each user has a userid and each group has a ownerid that specifies who owns the group. Each group also has a name and then inside the user table, there is a column group designating which group that person belongs to. (excuse the annoying structure, I did not create it). I am trying to find all rows in group where the ownerid of that group does not have group (inside the user table) set to the name of that group. If this helps:
User
|-----------------------|
| id | username | group |
|----|----------|-------|
| 0 | Steve | night |
| 1 | Sally | night |
| 2 | Susan | sun |
| 3 | David | xray |
|-----------------------|
Group
|---------------------|
| ownerid | name |
|---------|-----------|
| 1 | night |
| 3 | bravo |
| 2 | sun |
|---------------------|
Where the SQL statement would return the group row for bravo because bravo's owner does not have his group set to bravo.
This is a join back to the original table and then a comparison of the values:
select g.*
from group g join
user u
on g.ownerid = id
where g.name <> u.group;
If the values can be NULL, then the logic would need to take that into account.
An anti-join is a familiar pattern:
SELECT g.*
FROM `Group` g
LEFT
JOIN `User` u
ON u.group = g.name
AND u.id = g.ownerid
WHERE u.id IS NULL
Let's unpack that a bit. We're going to start with returning all rows from Group. Then, we're going to "match" each row in Group with a row (or rows) from User. To be considered a "match", the User.id has to match the Group.ownerid, and the User.group value has to match the Group.name.
The "trick" is to eliminate all rows where we found a match (that's what the WHERE clause does), and that leaves us with only those rows from Group that didn't have a match.
Another way to obtain an equivalent result using a NOT EXISTS predicate
SELECT g.*
FROM `Group` g
WHERE NOT EXISTS
( SELECT 1
FROM `User` u
WHERE u.group = g.name
AND u.id = g.ownerid
)
This is uses a correlated subquery; it usually doesn't perform as fast as a join.
Note that these have the potential to return a slightly different result than the query from Gordon Linoff, if you had a row with in Group that had an ownerid value that wasn't in the user table.
SELECT G.*
FROM Group AS G
WHERE G.Name NOT IN (SELECT DISTINCT U.Group FROM User AS U)

MySQL conditionally populate column 3 based on DISTINCT involving 2 other columns in one table

Had a good read through similar topics but I can't quite a) find one to match my scenario, or b) understand others enough to fit / tailor / tweek to my situation.
I have a table, the important fields being;
+------+------+--------+--------+
| ID | Name | Price |Status |
+------+------+--------+--------+
| 1 | Fred | 4.50 | |
| 2 | Fred | 4.50 | |
| 3 | Fred | 5.00 | |
| 4 | John | 7.20 | |
| 5 | John | 7.20 | |
| 6 | John | 7.20 | |
| 7 | Max | 2.38 | |
| 8 | Max | 2.38 | |
| 9 | Sam | 21.00 | |
+------+------+--------+--------+
ID is an auto-incrementing value as records get added throughout the day.
NAME is a Primary Key field, which can repeat 1 to 3 times in the whole table.
Each NAME will have a PRICE value, which may or may not be the same per NAME.
There is also a STATUS field that need to be populated based on the following, which is actually the part I am stuck on.
Status = 'Y' if each DISTINCT name has only one price attached to it.
Status = 'N' if each DISTINCT name has multiple prices attached to it.
Using the table above, ID's 1, 2 and 3 should be 'N', whilst 4, 5, 6, 7, 8 and 9 should be 'Y'.
I think this may well involve some form of combination of JOINs, GROUPs, and DISTINCTs but I am at a loss on how to put that into the right order for SQL.
In order to get the count of distinct Price values per name, we must use a GROUP BY on the Name field, but since you also want to display all names ungrouped but with an additional Status field, we must first create a subselect in the FROM clause which groups by the name and determines whether the name has multiple price values or not.
When we GROUP BY Name in the subselect, COUNT(DISTINCT price) will count the number of distinct price values for each particular name. Without the DISTINCT keyword, it would simply count the number of rows where price is not null.
In conjunction with that, we use a CASE expression to insert N into the Status column if there is more than one distinct Price value for the particular name, otherwise, it will insert Y.
The subselect only returns one row per Name, so to get all names ungrouped, we join that subselect to the main table on the condition that the subselect's Name = the main table's Name:
SELECT
b.ID,
b.Name,
b.Price,
a.Status
FROM
(
SELECT Name, CASE WHEN COUNT(DISTINCT Price) > 1 THEN 'N' ELSE 'Y' END AS Status
FROM tbl
GROUP BY Name
) a
INNER JOIN
tbl b ON a.Name = b.Name
Edit: In order to facilitate an update, you can incorporate this query using JOINs in the UPDATE like so:
UPDATE
tbl a
INNER JOIN
(
SELECT Name, CASE WHEN COUNT(DISTINCT Price) > 1 THEN 'N' ELSE 'Y' END AS Status
FROM tbl
GROUP BY Name
) b ON a.Name = b.Name
SET
a.Status = b.Status
Assuming you have an unfilled Status column in your table.
If you want to update the status column, you could do:
UPDATE mytable s
SET status = (
SELECT IF(COUNT(DISTINCT price)=1, 'Y', 'N') c
FROM (
SELECT *
FROM mytable
) s1
WHERE s1.name = s.name
GROUP BY name
);
Technically, it should not be necessary to have this:
FROM (
SELECT *
FROM mytable
) s1
but there is a mysql limitation that prevents you to select from the table you're updating. By wrapping it in parenthesis, we force mysql to create a temporary table and then it suddenly is possible.