Given the following data:
visit_id
1
1
1
2
3
3
4
5
is it possible using only sql (mysql's dialect actually, and no loops in another programming language) to output:
total visits number of visitor ids
1 3
2 1
3 1
i.e. to break down the data into the number of times they occur? So in the example above, there are 3 visit ids that only occur once (2,4,5), one visit id that occurs twice (3), and one that occurs three times (1).
thanks
Of course, it's called grouping.
select visit_id, count(visit_id) from visits group by visit_id
Building on František's answer
select acc.visitCount as total_visits,
count(acc.visitCount) as number_of_visitor_ids
from (
select visit_id,
count(visit_id) as visitCount
from visits
group by visit_id
) acc
group by acc.visitCount
Related
I am trying to figure out how to count all instances where a student is online without counting duplicate instances.
For example, in the screenshot below, I want to see a column counting only instances where a student is logged in. So, if Student A is logged in at 5 AM, count = 1. Student B logged in at 7, Count = 2. At some point student A logged off and logged back on at 8 am, the count should be 2, not 3.
Thank you!
Student
Time.
Desired Column (Count)
A
5 AM
1
B
7 AM
2
A
8 AM
2
C
9 AM
3
D
10 AM
4
E
11 AM
5
D
12 PM
5
I am mainly trying to track the activity and only count when someone is logged in. If those students appear multiple times, we can assume they logged off at some point and logged back in. It's basically a unique running count. Not sure how to write this in SQL. I hope this makes sense.
One option, use the exists operator with a correlated subquery to check if the student has logged in before:
SELECT Student, Time_,
SUM(flag) OVER (ORDER BY Time_) AS expected_count
FROM
(
SELECT *,
CASE
WHEN EXISTS(SELECT 1 FROM table_name D WHERE D.Student = T.Student AND D.Time_<T.Time_)
THEN 0 ELSE 1
END AS flag
FROM table_name T
) D
ORDER BY Time_
See demo.
select * from "Test"."EMP"
id
1
2
3
4
5
Select SUM(1) FROM "Test"."EMP";
Select SUM(2) FROM "Test"."EMP";
Select SUM(3) FROM "Test"."EMP";
why the output of these queries is?
5
10
15
And
I don't understand why they write table name like this "Test"."EMP"
your table has 5 records. the statement select 1 from test.emp returns 5 records with values as 1 for all 5 records.
id
1
1
1
1
1
This is because db engine simply returns 1 for each existing record without reading the contents of the cell. and same happens for select <any static value> from test.emp
same happens for 2 and 3
id
2
2
2
2
2
hence there are 5 records returned with the static values and sum of those values will be the product of static number passed in the select statement and total records in the table
additional fact: It is always recommended to perform count(1) than count(*) as it consumes less resource and hence less load on the server
I don't think it's "Test"."EMP" with double quotes.. it's probably `Test`.`EMP` with backticks instead. The definition means its database_name.table_name. This is the recommended format to get the correct table_name from database_name; in this case, you're specifically making the syntax to query from `Test`.`EMP`. Read more about identifier qualifiers.
As for SUM(x), the x get's repeated according to the rows present in the table. So SUM(1) on 5 rows is 1+1+1+1+1, SUM(2) on 5 rows is 2+2+2+2+2, and so on.
I have a data like,
ID Name ItemA ItemB ItemC
OXZ234 Adam 4 4 5
OXZ234 Adam 1 2 3
OXZ345 Tarzen 6 7 8
OXDER2 William 9 8 2
OXDER2 William 0 8 0
I need to find how much of food each person eats. For example by referring first two records I can say, Adam of ID OXZ234 ate ItemA-5, ItemB-6 and ItemC-8. But for small amount of data this kind of manual calculation is affordable. I have a million data records like this. So initially I need to find the records which is having same ID and name but only items count differing.
I have tried the query to find duplicate records by grouping all columns like below,
select ID,Name,ItemA,ItemB,ItemC, COUNT(*)
from DATA_REFRESH
group by ID,Name,ItemA,ItemB,ItemC
having COUNT(*) > 1
But Now I have to identify records having items columns differed.
So the expected output is like,
OXZ234 Adam 2
OXDER2 William 2
OXZ345 Tarzen 1
Any suggestion would be helpful!
You want SUM
select ID,
Name,
sum(ItemA) as ItA,
sum(ItemB) as ItB,
sum(ItemC) as ItC,
count(ID) as Occurrences -- Counts the number of entries per person
from DATA_REFRESH
group by ID,Name
having count(ID) >1 -- restricts this so only those with more than one entry appear
Hi, You can have a simple query without having clause,
select ID,Name,COUNT(*)
from DATA_REFRESH
group by ID,Name order by COUNT(*) desc ;
Simply try like this,
select ID,Name,COUNT(*)
from Sample_Check
group by ID,Name
having COUNT(*) > 1
I am trying to query a dataset from a single table, which contains quiz answers/entries from multiple users. I want to pull out the highest scoring entry from each individual user.
My data looks like the following:
ID TP_ID quiz_id name num_questions correct incorrect percent created_at
1 10154312970149546 1 Joe 3 2 1 67 2015-09-20 22:47:10
2 10154312970149546 1 Joe 3 3 0 100 2015-09-21 20:15:20
3 125564674465289 1 Test User 3 1 2 33 2015-09-23 08:07:18
4 10153627558393996 1 Bob 3 3 0 100 2015-09-23 11:27:02
My query looks like the following:
SELECT * FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`
ORDER BY `correct` DESC
In my mind, what that should do is get the two users from the IN clause, order them by the number of correct answers and then group them together, so I should be left with the 2 highest scores from those two users.
In reality it's giving me two results, but the one from Joe gives me the lower of the two values (2), with Bob first with a score of 3. Swapping to ASC ordering keeps the scores the same but places Joe first.
So, how could I achieve what I need?
You're after the groupwise maximum, which can be obtained by joining the grouped results back to the table:
SELECT * FROM entries NATURAL JOIN (
SELECT TP_ID, MAX(correct) correct
FROM entries
WHERE TP_ID IN ('10153627558393996', '10154312970149546')
GROUP BY TP_ID
) t
Of course, if a user has multiple records with the maximal score, it will return all of them; should you only want some subset, you'll need to express the logic for determining which.
MySql is quite lax when it comes to group-by-clauses - but as a rule of thumb you should try to follow the rule that other DBMSs enforce:
In a group-by-query each column should either be part of the group-by-clause or contain a column-function.
For your query I would suggest:
SELECT `TP_ID`,`name`,max(`correct`) FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`,`name`
Since your table seems quite denormalized the group by name-par could be omitted, but it might be necessary in other cases.
ORDER BY is only used to specify in which order the results are returned but does nothing about what results are returned - so you need to apply the max()-function to get the highest number of right answers.
My prob in brief:
I have two tables namely category and product.
table: category
id category
1 cat1
2 cat2
3 cat3
4 cat4
table: product
id productName category
1 test1 1
2 test2 2
3 test3 3
4 test4 4
5 test5 2
6 test6 2
My prob is:
I need products which are inserted last in every category.
How to solve this.
thanks in advance
You could add a create_time timestamp when a new a product has been added, and retrieve the newest by category be something like:
select max(create_time),category from product group by category
This is a variation of one of the most-common SQL questions asked here, the per-group maximum. See eg. this question for a variety of approaches.
The one I often use is a null-self-left-join, to select the row which has no value above it:
SELECT p0.*
FROM product AS p0
LEFT JOIN product AS p1 ON p1.category=p0.category AND p1.id>p0.id
WHERE p1.id IS NULL
This is assuming that ids are allocated in order so the highest is the most recent. Normally it wouldn't be a good idea to rely on identity as an ordering mechanism; you'd typically add an added timestamp to each row to work on instead.
(Note this and many other per-group maximum functions can return more than one row when two rows have identical order columns. If this is a problem it could be avoided by using a UNIQUE ordering column; as a primary key id is already that. You can get a single row even when there are two maxima using SQL:2003's useful but rather ugly ROW_NUMBER() OVER windowing functions, but this is not supported by MySQL.)