Get count of Groupby (Sqlalchemy) - sqlalchemy

I am new to SQLAlchemy and need some help with a query.
I have a usertraffic table with 200000 records, it has user_id and a date-time stamp with the time of their visit.
I am using this query to group user by visit and show count
session.query(UserTraffic.user_id,func.count(UserTraffic.X)).group_by(UserTraffic.user_id).all()
It returns data as (UserId and no of visits):
(49386, 1L), (49387, 2L), (49388, 5L), (49389, 3L), (49390, 4L), (49391, 3L), (49392, 2L)
What I want to get is to get the count of these, example:
(x,y), (1,1), (2,2), (3,2),(4,1),(5,1)
[as no of users who have only 1 visit is 1, and no of users who have 3 visits is 2)
Where x is no of repeat visits and y is the no of of users with repeat visits.
Can you please help me?
Thanks in advance.

Just add another GROUP BY on top of it. Easiest is to use subquery:
subq = (session
.query(UserTraffic.user_id, func.count(UserTraffic.X).label("cnt_X"))
.group_by(UserTraffic.user_id)
).subquery("subq")
q = (session
.query(subq.c.cnt_X, func.count(subq.c.user_id).label("cnt_users"))
.group_by(subq.c.cnt_X)
.order_by(subq.c.cnt_X)
).all()

Related

SQL : Geeting Order's status timestamp in a nested query

I'm working on a Order Table which has all the details regarding the order's that were allocated. The sample DB example is
Order ID Order Status Action Date
23424 ALC 1571467792094280
23424 PIK 1571467792999990
23424 PAK 1571469792999990
23424 SHP 1579967792999990
33755 ALC 1581467792238640
33755 PIK 1581467792238640
33755 PAK 1581467792238640
33755 SHP 1581467792238640
In the table I have order ID , status, action_date (the action dates updated when ever there is an update on order status against the update timestamp, the action_date is unix time)
I'm trying to write a query that can provide me the Order ID, ALC_AT, PIK_AT, PAK_AT, SHP_AT
Basically all the timestamp updates against a Order ID within one row, I know it can be done via Nested Query but, I'm unable to figure how how to do it.
Any help would be highly appreciated.
Edit (As asked to provide the sample result ) :
Order ID Order Status ALC_AT PIK_AT PAK_AT SHP_AT
23424 SHP 1571467792094280 1571467792999990 1571469792999990 1579967792999990
I am not sure how it is done in mysql. But below describes how it will be done in Oracle.
You can searh more for PIVOT in mysql to help you in the same.
select *
from (select order_id,
status,
action_date
from order)
pivot (max(status)
for app_id in ( 'ALC' as 'ALC_AT', 'PIK' as 'PIK_AT', 'PAK' as 'PAK_AT', 'SHP' as 'SHP_AT'))
Hope this will help you.
EDIT for mysql:
select *
from (select "order.order_number",
"shipment.status",
from_unixtime("action_date"/1000000) as "action_date"
from order_table
where "order.order_number" = '2019-10-19-N2-6411')
pivot (max("action_date")
for "shipment_status" in ( 'ALC' AS 'ALC_AT', 'PIK' AS 'PIK_AT', 'PAK'
AS 'PAK_AT', 'SHP' AS 'SHP_AT'))

GROUP BY id showing newest entry of same ID

I know this has already been asked answered a thousend times. But I seem not to be able to resolve this.
I am trying to group by and order this query so I can join it as subquery to a bigger query
SELECT * FROM `ggorderlog`
WHERE `GGTITLE` LIKE '%Reklamation%'
ORDER BY `GGDATE` DESC, `ggorderlog`.`GGORDERID` DESC
This is the result from the ggorderlog table and the query above
GGTITLE GGOXID GGDATE GGORDERID GGTITLE User
Reklamation uniqueid1 2018.12.7 16:20:00 1 Reklamation created Max Mustermann
Reklamation uniqueid2 2018.12.7 16:24:00 1 Reklamation finished Maxine Musterfrau
Reklamation uniqueid3 2018.12.7 16:22:00 2 Reklamation created Max mustermann
Now what I want is to have this table be displayed so that for every GGORDERID I only see the latest entry. In order to give an overview over the User who has worked on this and the status of the ticket.
Like this:
GGTITLE GGOXID GGDATE GGORDERID GGTITLE User
Reklamation uniqueid2 2018.12.7 16:24:00 1 Reklamation finished Maxine Musterfrau
Reklamation uniqueid3 2018.12.7 16:22:00 2 Reklamation created Max mustermann
I tried standard group by with order by but mysql seem to do the group by first and give out a random column
I tried this but it still shows always a random date.
Select* from (
Select *
from ggorderlog as b
where GGTITLE like '%Reklamation%'
ORDER BY b.GGDATE DESC
) b2
group by b2.GGORDERID
I tried a lot of other suggestions with left itselfe or group_concat and then desolve again but nothing seems to work.
Can you try this ?
SELECT gg.*
FROM ggorderlog gg
INNER JOIN
(SELECT ggorderid, MAX(ggdate) AS maxggdate,oxid
FROM ggorderlog
GROUP BY ggorderid) groupedgg
ON gg.oxid = groupedgg.oxid
AND gg.ggdate = groupedgg.maxggdate

Get user rank in a score table

I'm using Laravel and this could be my user table:
id|score
1|10
2|13
3|15
4|7
5|11
An user can sees a page with two ranks: rank A and rank B.
RANK A
The first 10 users by score, two possible scenarios: the user who sees this rank is in the first 10 users or not.
User in the first 10: get list of 10 users
User not in the first 10: get list of 11 users (the first 10 + current user with is position in all users)
RANK B
The first 10 users by score in a given group of ID (sometimes could be 1 sometimes 10 sometimes 0), the same two scenarios: the user is in the first 10 or not.
User in the first 10: get list of 10 users
User not in the first 10: get list of 11 users (the first 10 + current user with is position in the group of ids)
Is there any way to do it with Eloquent? Otherwise how can I do it in MySql?
To get the first rank, you could do this with Eloquent :
$users = User::orderBy('score', 'DESC')->limit(10)->get();
$currentUser = Auth::user();
$currentUserId = $currentUser->id;
if (!$users->contains('id', $currentUserId)) {
$users->push($currentUser);
}
Since $users will be ordered by score, if the current user doesn't exist in top 10, then he has a worse score than the last of the list, so it makes sense to add it to the end.
For the second rank :
$idsFilter = [1, 2, 3, 4, 5];
$users = User::whereIn('id', $idsFilter)->orderBy('score', 'DESC')->limit(count($idsFilter))->get();
$currentUser = Auth::user();
$currentUserId = $currentUser->id;
if (!$users->contains('id', $currentUserId)) {
$users->push($currentUser);
}
return $users;
Since you want a fixed list of IDs to rank, it only makes sense to show the IDs, and to add the current user if he's not part of the top X IDs you asked for.
To get the position of the user in rank A, you could do a method on the User model such as :
public function getPosition()
{
return DB::raw("SELECT COUNT(*) + 1
FROM users
WHERE score > {$this->score}");
}
And add a filter on the ids for the rank B, with the same process.

how can I tell if the last x rows of 'state' = 1

I need help with a SQL query.
I have a table with a 'state' column. 0 means closed and 1 means opened.
Different users want to be notified after there have been x consecutive 1 events.
With an SQL query, how can I tell if the last x rows of 'state' = 1?
If, for example, you want to check if the last 5 consecutive rows have a state equals to 1, then here's you could probably do it :
SELECT IF(SUM(x.state) = 5, 1, 0) AS is_consecutive
FROM (
SELECT state
FROM table
WHERE Processor = 3
ORDER BY Status_datetime DESC
LIMIT 5
) as x
If is_consecutive = 1, then, yes, there is 5 last consecutive rows with state = 1.
Edit : As suggested in the comments, you'll have to use ORDER BY in your query, to get the last nth rows.
And for more accuracy, since you have a timestamp column, you should use Status_datetime to order the rows.
You should be able to use something like this (replace the number in the HAVING with the value of x you want to check for):
SELECT Processor, OpenCount FROM
(
SELECT TOP 10 Processor, DateTime, Sum(Status) AS OpenCount
FROM YourTable
WHERE Processor = 3
ORDER BY DateTime DESC
) HAVING OpenCount >= 10

Are GROUP BY and/or UNION appropriate? Or should I just use a nested SELECT?

I am interested in "cross-referencing" two columns and return two pieces of information:
The columns are saddr,daddr,sbytes, dbytes.
I would like to find DISTINCT saddr and match them with DISTINCT daddr, then SUM the sbytes and dbytes.
I would also like to simply find the count of records that exist per saddr per daddr (given an daddr N records match this saddr).
For those of you who may be interested in context, I am using a package called argus, and its client rasqlinsert to build a database of network traffic flows.
Thanks,
Matt
[edit]
Sample data:
SELECT saddr,daddr,sbytes,dbytes FROM argus.argus2012K17 limit 5;
'01:80:c2:00:00:0a', '20:fd:f1:74:36:96', 194, 0
'01:80:c2:00:00:0a', '20:fd:f1:74:36:b6', 194, 0
'192.168.100.11', '212.243.210.210', 120, 120
'192.168.100.11', '212.243.210.210', 422, 3667
'192.168.100.23', '99.248.99.240', 132, 0
Desired result:
saddr, daddr, how many records found where they both exist, sum of all sbytes in these records, sum of all dbytes in these records
'01:80:c2:00:00:0a', '20:fd:f1:74:36:96', 2, 388, 0
'192.168.100.11', '212.243.210.210', 2, 542, 3787
'192.168.100.23', '99.248.99.240', 1, 132, 0
I think I'm having the most trouble wrapping my head around the "where they both exist" aspect of the query.
[edit2]
I've concluded that I just need to spend time reading and gain understanding of GROUP BY and perform a nested query to get the info I'd like. However, if anyone has any more input it would be appreciated.
[edit 3]
Solution:
SELECT saddr, daddr, SUM(sbytes), SUM(dbytes), count(saddr) FROM argus.argus2012K17 GROUP BY saddr, daddr;
Returns:
SELECT saddr, daddr, SUM(sbytes), SUM(dbytes), count(saddr) FROM argus.argus2012K17 where saddr='01:80:c2:00:00:0a' GROUP BY saddr, daddr;
'01:80:c2:00:00:0a', '20:fd:f1:74:36:96', 326114, 0, 1681
'01:80:c2:00:00:0a', '20:fd:f1:74:36:b6', 326114, 0, 1681
Hell yea.
SELECT stime, saddr, daddr, SUM(sbytes), SUM(dbytes), count(saddr) FROM argus.argus2012K17 WHERE stime BETWEEN 1337187600 AND 1337187700 GROUP BY saddr, daddr;
There is nothing wrong with using these constructs, supposing they give you the results you want. Simulating them with nested SELECTs will give you either the same or worse performance.
I think you simply need this:
SELECT saddr, daddr, SUM(bytes) GROUP BY saddr, daddr
To do this, you need a driving table and a group by. SQL cannot produce the rows with 0 cnt using the group by alone:
select driver.saddr, driver.daddr, coalesce(t.sumbytes) as bytes
from (select saddr, daddr
from (select distinct saddr from t) cross join
(select distinct daddr from t)
) driver left outer join
(select saddr, daddr, sum(byets) as sumbytes
from t
group by saddr, daddr
) as tsum
on t.saddr = tsum.saddr and t.daddr = tsum.daddr
This statement gets all combinations of saddr and daddr. It then joins this to the sum of the bytes. The outer select produces 0 when no sum is present.