I am trying to figure out how to count all instances where a student is online without counting duplicate instances.
For example, in the screenshot below, I want to see a column counting only instances where a student is logged in. So, if Student A is logged in at 5 AM, count = 1. Student B logged in at 7, Count = 2. At some point student A logged off and logged back on at 8 am, the count should be 2, not 3.
Thank you!
Student
Time.
Desired Column (Count)
A
5 AM
1
B
7 AM
2
A
8 AM
2
C
9 AM
3
D
10 AM
4
E
11 AM
5
D
12 PM
5
I am mainly trying to track the activity and only count when someone is logged in. If those students appear multiple times, we can assume they logged off at some point and logged back in. It's basically a unique running count. Not sure how to write this in SQL. I hope this makes sense.
One option, use the exists operator with a correlated subquery to check if the student has logged in before:
SELECT Student, Time_,
SUM(flag) OVER (ORDER BY Time_) AS expected_count
FROM
(
SELECT *,
CASE
WHEN EXISTS(SELECT 1 FROM table_name D WHERE D.Student = T.Student AND D.Time_<T.Time_)
THEN 0 ELSE 1
END AS flag
FROM table_name T
) D
ORDER BY Time_
See demo.
Related
I tried to write a query that selects rows with steps that both user 1 and user 2 did, with combined number of times they did the step (i.e., if user 1 did step 1 3 times and user 2 did 1 time then the count should show 4 times.)
when I put condition as user_id=1, user_id=2 there is no error but it return nothing, when it should return some rows with values.
there is table step, and step taken
and table step has column id, title
table step_taken has column id, user_id(who performs steps), step_id
i want to find step that both of two user whose id 1,2 did
and also want to have the value as count added up how many times they performed that step.
for example if user id 1 did step named meditation 2 times,
and user id 2 did step named meditation 3 times,
the result i want to find should be like below ;
------------------------------
title | number_of_times
------------------------------
meditation| 5
------------------------------
here is my sql query
select title, count(step_taken.step_id)as number_of_times
from step join step_taken
on step.id = step_taken.step_id
where user_id = 1 and user_id=2
group by title;
it returns nothing, but it should return some rows of step both user1 and user 2 did.
when i wrote same thing only with user_id=1 or user_id=2, it shows selected information
how can I fix my code so it can show the information I want to get?
thanks in advance :)
user_id cannot be 1 and 2 at the same time. You need a second user table. Then join those on your criteria and count:
select title, count(u1.id) + count(u2.id) as number_of_times
from step u1 join step u2
on u1.id = u2.id
where u1.user_id = 1 and u2.user_id=2
group by title;
note: cannot tell what table title is in, or the purpose of step_taken was as step.id is identical.
I have a data set like this:
User Date Status
Eric 1/1/2015 4
Eric 2/1/2015 2
Eric 3/1/2015 4
Mike 1/1/2015 4
Mike 2/1/2015 4
Mike 3/1/2015 2
I'm trying to write a query in which I will retrieve users whose MOST RECENT transaction status is a 4. If it's not a 4 I don't want to see that user in the results. This dataset could have 2 potential results, one for Eric and one for Mike. However, Mike's most recent transaction was not a 4, therefore:
The return result would be:
User Date Status
Eric 3/1/2015 4
As this record is the only record for Eric that has a 4 as his latest transaction date.
Here's what I've tried so far:
SELECT
user, MAX(date) as dates, status
FROM
orders
GROUP BY
status,
user
This would get me to a unqiue record for every user for every status type. This would be a subquery, and the parent query would look like:
SELECT
user, dates, status
WHERE
status = 4
GROUP BY
user
However, this is clearly flawed as I don't want status = 4 records IF their most recent record is not a 4. I only want status = 4 when the latest date is a 4. Any thoughts?
SELECT user, date
, actualOrders.status
FROM (
SELECT user, MAX(date) as date
FROM orders
GROUP BY user) AS lastOrderDates
INNER JOIN orders AS actualOrders USING (user, date)
WHERE actualOrders.status = 4
;
-- Since USING is being used, there is not a need to specify source of the
-- user and date fields in the SELECT clause; however, if an ON clause was
-- used instead, either table could be used as the source of those fields.
Also, you may want to rethink the field names used if it is not too late and user and date are both found here.
SELECT user, date, status FROM
(
SELECT user, MAX(date) as date, status FROM orders GROUP BY user
)
WHERE status = 4
The easiest way is to include your order table a second time in a subquery in your from clause in order to retrieve the last date for each user. Then you can add a where clause to match the most recent date per user, and finally filter on the status.
select orders.*
from orders,
(
select ord_user, max(ord_date) ord_date
from orders
group by ord_user
) latestdate
where orders.ord_status = 4
and orders.ord_user = latestdate.ord_user
and orders.ord_date = latestdate.ord_date
Another option is to use the over partition clause:
Oracle SQL query: Retrieve latest values per group based on time
Regards,
I have a mysql table-
User Value
A 1
A 12
A 3
B 4
B 3
B 1
C 1
C 1
C 8
D 34
D 1
E 1
F 1
G 56
G 1
H 1
H 3
C 3
F 3
E 3
G 3
I need to run a query which returns 2nd distinct value that each user has.
Means if any 2 values are accessed by each user , then based on the occurrence, pick the 2nd distinct value.
So as above 1 & 3 is being accessed by each User. Occurrence of 1 is
more than 3 , so 2nd distinct will be 3
So I thought first I will get all distinct user.
create table temp AS Select distinct user from table;
Then I will have an outer query-
Select value from table where value in (...)
In programmatically way , I can iterate through each of the value user contains like Map but in Hive query I just couldn't write that.
This will return the second most frequented value from your list that spans all users. There isn't one of these values in the table which I expect is a typo in the data. In real data you will likely have muliple ties that you need to figure out how to handle.
Select value as second_distinct from
(select value, rank() over (order by occurrences desc) as rank
from
(SELECT value, unique_users, max(count_users) as count_users, count(value) as occurrences
from
(select value, size(collect_set(user) over (partition by value))
as count_users from my_table
) t
left outer join
(select count(distinct user) as unique_users from my_table
) t2 on (1=1)
where unique_users=count_users
group by value, unique_users
) t3
) t4
where rank = 2;
This works. It returns NULL because there is only value that visited every user (value of 1). Value 3 is not a solution because not every user has seen that value in your data. I expect you intended that three should be returned but again it doesn't span all the users (user D did not see value 3).
Not sure how #invoketheshell's answer was marked correct; it doesn't run and it needs 6 MR jobs. This will get you there in 4 and is less code.
Query:
select value
from (
select value, value_count, rank() over (order by value_count desc) rank
from (
select value, count(value) value_count
from (
select value, num_users, max(num_users) over () max_users
from (
select value
, size(collect_set(user) over (partition by value)) num_users
from db.table ) x ) y
where num_users = max_users
group by value ) z ) f
where rank = 2
Output:
3
EDIT: Let me clarify my solution as there seems to be some confusion. The OP's example says
"So as above 1 & 3 is being accessed by each User ... "
As my comment below the question suggests, in the example given, user D never accesses value 3. I made the assumption that this was a typo and added this to the dataset and then added another 1 as well to make there be more 1's than 3's. So my code correctly returns 3, which was the desired output. If you run this script on the actual dataset it will also produce the correct output which is nothing because there isn't a "2nd Distinct". The only time it could produce an incorrect value, is if there was no one specific number that was accessed by all users, which illustrates the point I was trying to make to #invoketheshell: if there is no single number that every user has accessed, running a query with 6 map-reduce jobs is an absurd way to find that out. Since we are using Hive I believe it would be fair to assume that if this problem were a "real-world" problem, it would most likely be executed on at least 100's of TBs of data (probably more). I the interest of preserving time and resources, it would behoove an individual to at least check that one number had been accessed by all users before running a massive query whose analysis hinges on that assumption being true.
I have a table with two columns of importance, customer ID# and timestamp. Whenever a customer orders something, five rows are created with the customer ID # and the timestamp of when it went through.
If there is more than five rows, it means our system hasn't processed the order correctly and there could be a problem, and I was asked to look through the log to find the customer IDs of any people who received more than 5, as well as how many times they received an incorrect amount and the number they received each time (when it was not 5)
I want it to show me, whenever the same customer ID (in column "ID") has more than 5 rows with the same timestamp (column "stamp") it will tell me 1. the person's customer ID 2. how many times this irregularity has happened to that customer ID, and 3. how many rows were in each irregularity (was it 6 or 7... or more? etc.) (if #2 was 3 times, I would like #3 to be an array like { 7, 8, 6 })
I don't know if this is possible... but any help at all will be appreciated. Thanks!
This should get you most of the way there:
SELECT `CustomerID`, `Timestamp`, COUNT(1)
FROM
OrderItems
GROUP BY
`CustomerID`, `Timestamp`
HAVING
COUNT(1) > 5
This will get you the IDs and Timestamps with more than 5 rows. I am making the assumption that the timestamps for all 5 (or more rows) are identical.
SELECT A.ID, A.TIMESTAMP
FROM "TABLE" A
WHERE
(SELECT COUNT(B.ID)
FROM "TABLE" B
WHERE B.ID = A.ID
AND B.TIMESTAMP = A.TIMESTAMP) > 5
Given the following data:
visit_id
1
1
1
2
3
3
4
5
is it possible using only sql (mysql's dialect actually, and no loops in another programming language) to output:
total visits number of visitor ids
1 3
2 1
3 1
i.e. to break down the data into the number of times they occur? So in the example above, there are 3 visit ids that only occur once (2,4,5), one visit id that occurs twice (3), and one that occurs three times (1).
thanks
Of course, it's called grouping.
select visit_id, count(visit_id) from visits group by visit_id
Building on FrantiĊĦek's answer
select acc.visitCount as total_visits,
count(acc.visitCount) as number_of_visitor_ids
from (
select visit_id,
count(visit_id) as visitCount
from visits
group by visit_id
) acc
group by acc.visitCount