Limit query results based on value of multiple columns - mysql

I am using MySQL 5.6 and I have a table structure like below
| user_id | email_1 | email_2 | email_3 |
| 1 | abc#test.com | | |
| 2 | xyz#test.com | | joe#test.com |
| 3 | | test#test.com | bob#joh.com |
| 4 | | | x#y.com |
I want to fetch the first n email addresses from this table.
For example, if I want to fetch the first 5 then only the first 3 rows should return.

This makes certain assumptions about the uniqueness of data, that might not be true...
SELECT DISTINCT x.* FROM my_table x
JOIN
(SELECT user_id, 1 email_id,email_1 email FROM my_table WHERE email_1 IS NOT NULL
UNION ALL
SELECT user_id, 2 email_id,email_2 email FROM my_table WHERE email_2 IS NOT NULL
UNION ALL
SELECT user_id, 3 email_id,email_3 email FROM my_table WHERE email_3 IS NOT NULL
ORDER BY user_id, email_id LIMIT 5
) y
ON y.user_id = x.user_id
AND CASE WHEN y.email_id = 1 THEN y.email = x.email_1
WHEN y.email_id = 2 THEN y.email = x.email_2
WHEN y.email_id = 3 THEN y.email = x.email_3
END;

You want to return as many rows as necessary to get five emails. So you need a running total of the email count.
select user_id, email_1, email_2, email_3
from
(
select
user_id, email_1, email_2, email_3,
coalesce(
sum((email_1 is not null) + (email_2 is not null) + (email_3 is not null))
over (order by user_id rows between unbounded preceding and 1 preceding)
, 0) as cnt_prev
from mytable
) counted
where cnt_prev < 5 -- take the row if the previous row has not reached the count of 5
order by user_id;
You need a current MySQL version for SUM OVER to work.
The counting of the emails uses a MySQL feature: true equals 1 and false equals 0 in MySQL. Thus (email_1 is not null) + (email_2 is not null) + (email_3 is not null) counts the emails in the row.
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ac415e71733699547196ae01cb1caf13

Related

First Unique Sql row

I have a MySql table of users order and it has columns such as:
user_id | timestamp | is_order_Parent | Status |
1 | 10-02-2020 | N | C |
2 | 11-02-2010 | Y | D |
3 | 11-02-2020 | N | C |
1 | 12-02-2010 | N | C |
1 | 15-02-2020 | N | C |
2 | 15-02-2010 | N | C |
I want to count number of new custmer per day defined as: a customer who orders non-parent order and his order status is C AND WHEN COUNTING A USER ONCE IN A DAY WE DONT COUNT HIM FOR OTHER DAYS
An ideal resulted table will be:
Timestamp: Day | Distinct values of User ID
10-02-2020 | 1
11-02-2010 | 1
12-02-2010 | 0 <--- already counted user_id = 1 above, so no need to count it here
15-02-2010 | 1
table name is cscart_orders
If you are running MySQL 8.0, you can do this with window functions an aggregation:
select timestamp, sum(timestamp = timestamp0) new_users
from (
select
t.*,
min(case when is_order_parent = 'N' and status = 'C' then timestamp end) over(partition by user_id) timestamp0
from mytable t
) t
group by timestamp
The window min() computes the timestamp when each user became a "new user". Then, the outer query aggregates by date, and counts how many new users were found on that date.
A nice thing about this approach is that it does not require enumerating the dates separately.
You can use two levels of aggregation:
select first_timestamp, count(*)
from (select t.user_id, min(timestamp) as first_timestamp
from t
where is_order_parent = 'N' and status = 'C'
group by t.user_id
) t
group by first_timestamp;

MySQL GROUP BY while keeping certain rows by column content

There is table with duplicated rows. See rows 1 and 2:
id full_name email status active
1 John Doe john#mail.com ok 1
2 John Doe john#mail.com null 1
3 Ricky Duke rick#mail.com null 1
4 Jane Doe jane#mail.com block 1
I need to select distinct rows, not randomly - one distinct row, but the one that has a 'status' NOT NULL.
My query is:
SELECT full_name, email
FROM `subscribers`
WHERE active = 1 AND (status = 'ok' OR status IS NULL)
GROUP BY email
That query selects distinct rows randomly, without prioritizing 'status' field.
How can i prioritize selection of distinct rows, that has a 'status' NOT NULL, and select ones with NULL only in case there is no rows with 'ok' status is present?
You can use row_number():
select s.*
from (select s.*,
row_number() over (partition by email order by (status is not null) desc) as seqnum
from subscribers s
where active = 1
) s
where seqnum = 1;
You could filter with a correlated subquery that does conditional ordering, and gives a lowest priority to null statuses:
select t.*
from mytable t
where t.id = (
select id
from mytable t1
where
t1.full_name = t.full_name
and t1.email = t.email
and t1.active = t.active
order by status is null, status
limit 1
)
This defines duplicats as records that have the same full_name, email and active. You might want to adapt that to your actual definition of duplicates.
Demo on DB Fiddle:
id | full_name | email | status | active
-: | :--------- | :------------ | :----- | :-----
1 | John Doe | john#mail.com | ok | 1
3 | Ricky Duke | rick#mail.com | null | 1
4 | Jane Doe | jane#mail.com | block | 1
(SELECT full_name, email
FROM `subscribers`
WHERE active = 1 AND status IS NOT NULL
GROUP BY email)
UNION ALL
(SELECT full_name, email
FROM `subscribers`
WHERE active = 1 AND status IS NULL AND
email not in (SELECT distinct email
FROM `subscribers`
WHERE active = 1 AND status IS NOT NULL)
GROUP BY email);

Create new columns for duplicate row values based on column ID duplicate in sql

I have a table with 2 columns, the first column is called ID and the second is called TRACKING. The ID column has duplicates, I want to to take all of those duplicates and consolidate them into one row where each value from TRACKING from the duplicate row is placed into a new column within the same row and I no longer have duplicates.
I have tried a few suggested things where all of the values would be concatenated into one column but I want these TRACKING values for the duplicate IDs to be in separate columns. The code below did not do what I intended it to.
SELECT ID, TRACKING =
STUFF((SELECT DISTINCT ', ' + TRACKING
FROM #t b
WHERE b.ID = a.ID
FOR XML PATH('')), 1, 2, '')
FROM #t a
GROUP BY ID
I am looking to take this:
| ID | TRACKING |
-----------------
| 5 | 13t3in3i |
| 5 | g13g13gg |
| 3 | egqegqgq |
| 2 | 14y2y24y |
| 2 | 42yy44yy |
| 5 | 8i535i35 |
And turn it into this:
| ID | TRACKING | TRACKING1 | TRACKING2 |
-----------------
| 5 | 13t3in3i | g13g13gg | 8i535i35 |
| 3 | egqegqgq | | |
| 2 | 14y2y24y | 42yy44yy | |
On (relatively) painful way to do this in MySQL is to use correlated subqueries:
select i.id,
(select t.tracking
from t
where t.id = i.id
order by t.tracking
limit 1, 0
) as tracking_1,
(select t.tracking
from t
where t.id = i.id
order by t.tracking
limit 1, 1
) as tracking_2,
(select t.tracking
from t
where t.id = i.id
order by t.tracking
limit 1, 2
) as tracking_3
from (select distinct id from t
) i;
As bad as this looks, it will probably have surprisingly decent performance with an index on (id, tracking).
By the way, your original code with stuff() would put everything into one column:
select id, group_concat(tracking)
from t
group by id;
with test_tbl as
(
select 5 id, 'goog' tracking,'goog' tracking1
union all
select 5 id, 'goog1','goo'
union all
select 2 , 'yahoo','yah'
union all
select 2, 'yahoo1','ya'
union all
select 3,'azure','azu'
), modified_tbl as
(
select id,array_agg(concat(tracking)) Tracking,array_agg(concat(tracking1)) Tracking1 from test_tbl group by 1
)
select id, tracking[safe_offset(0)] Tracking_1,tracking1[safe_offset(0)] Tracking_2, tracking[safe_offset(1)] Tracking_3,tracking1[safe_offset(1)] Tracking_4 from modified_tbl where array_length(Tracking) > 1

How to limit a query by column value

Following query...
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2)
...gives me the following result:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 1 | 1 |
| 5 | 1 |
| 4 | 1 |
| 6 | 1 |
| 4 | 2 |
| 2 | 2 |
| 1 | 2 |
| 5 | 2 |
+----------+---------+
Now, I want to modify the above query so that I only get for example two rows for each user_id, eg:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 4 | 2 |
| 5 | 2 |
+----------+---------+
I am thinking about something like this, which of course does not work:
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2) LIMIT 2 by user_id
Ideally, this should work with offsets as well because I want to use it for paginations.
For performance reasons it is essential to use the WHERE user_id IN (1, 2) part of the query.
One method -- assuming you have at least two rows for each user -- would be:
(select min(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
) union all
(select max(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
);
Admittedly, this is not a "general" solution, but it might be the simplest solution for what you want.
If you want the two biggest or smallest, then an alternative also works:
select t.*
from t
where t.user_id in (1, 2) and
t.event_id >= (select t2.event_id
from t t2
where t2.user_id = t.user_id
order by t2.event_id desc
limit 1, 1
);
Here is a dynamic example for such problems, Please note that this example is working in SQL Server, could not try on mysql for now. Please let me know how it works.
CREATE TABLE mytable
(
number INT,
score INT
)
INSERT INTO mytable VALUES ( 1, 100)
INSERT INTO mytable VALUES ( 2, 100)
INSERT INTO mytable VALUES ( 2, 120)
INSERT INTO mytable VALUES ( 2, 110)
INSERT INTO mytable VALUES ( 3, 120)
INSERT INTO mytable VALUES ( 3, 150)
SELECT *
FROM mytable m
WHERE
(
SELECT COUNT(*)
FROM mytable m2
WHERE m2.number = m.number AND
m2.score >= m.score
) <= 2
How about this?
SELECT event_id, user_id
FROM (
SELECT event_id, user_id, row_number() OVER (PARTITION BY user_id) AS row_num
FROM EventUser WHERE user_id in (1,2)) WHERE row_num <= n;
And n can be whatever
Later but help uses a derived table and the cross join.
For the example in this post the query will be this:
SELECT
#row_number:=CASE
WHEN #user_no = user_id
THEN
#row_number + 1
ELSE
1
END AS num,
#user_no:=user_id userid, event_id
FROM
EventUser,
(SELECT #user_no:=0,#row_number:=0) as t
group by user_id,event_id
having num < 3;
More information in this link.

Return latest entry result providing there is a review or close entry

I have a table that holds the answers to a question which is asked at entry to the system, at review periods and then at closure. The client can be opened and closed multiple times during their life on the system.
I am trying to get the latest 'entry' result from the table which also has either an associated 'review' or 'close' result.
This is my table (I have just included 1 user but the actual table has thousands of users):
row | user_id | answer | type | date_entered |
----+---------+--------+--------+--------------+
1 | 12 | 3 | entry | 2016-03-13 |
2 | 12 | 1 | review | 2016-03-14 |
3 | 12 | 7 | review | 2016-03-16 |
4 | 12 | 7 | close | 2016-03-17 |
5 | 12 | 8 | entry | 2016-03-20 |
6 | 12 | 2 | review | 2016-03-21 |
7 | 12 | 3 | close | 2016-03-22 |
8 | 12 | 1 | entry | 2016-03-28 |
So for this table the query would just return row 5 because the 'entry' on row 8 doesn't have any 'review' or 'closure' records after it.
Hopefully that makes sense.
SELECT a.*
FROM my_table a
JOIN
( SELECT x.user_id
, MAX(x.date_entered) date_entered
FROM my_table x
JOIN my_table y
ON y.user_id = x.user_id
AND y.date_entered > x.date_entered
AND y.type IN ('review','close')
WHERE x.type = 'entry'
GROUP
BY x.user_id
) b
ON b.user_id = a.user_id
AND b.date_entered = a.date_entered;
Basically you can seperate your query into two sub-queries. First query should get lastest record id (review and closure). Second query should have row_id > found_id.
SELECT *
FROM my_table
WHERE type = 'entry'
AND row_id > (SELECT Max(row_id)
FROM my_table
WHERE ( type = 'review'
OR type = 'close' ))
Please be careful about that; subquery may return zero-set.
I could think of several ways of doing it. But first a note: your date_entered field seems to be just a date. To tell which occurs "later" I'm going to use row because e.g. if both entry and review occurred on the same date, it's not possible to tell from the date_entered which one was later.
I just list a couple of solutions. The first one might be more efficient, but you should measure.
Here's a join against a subquery:
SELECT
m1.*
FROM
mytable m1
JOIN (SELECT
row, user_id
FROM
mytable
WHERE
type IN ('review', 'close') AND
user_id = 12
ORDER BY row DESC LIMIT 1) m2 ON m1.user_id = m2.user_id
WHERE
m1.user_id = 12 AND
m1.row < m2.row
ORDER BY
row DESC LIMIT 1
Here's a subquery for max:
SELECT
*
FROM
mytable
WHERE
row = (SELECT
MAX(m1.row)
FROM
mytable m1,
mytable m2
WHERE
m1.user_id = m2.user_id AND
m1.type = 'entry' AND
m2.type IN ('review', 'close') AND
m1.row < MAX(m2.row))