I have an application that needs to query a MySQL database and retrieve a list of users that may be sharing an IP address, and I am having some trouble turning my the concept of what I want to do with my query in my head into a functional query.
The situation is that I have a table which contains known ip information for users. Each time the user logs in, it creates a timestamped entry containing their user id and the ip address they logged into.
Initially, I used the following query to return rows representing IP addresses that were shared:
select ip, GROUP_CONCAT(DISTINCT account ORDER BY timestamp SEPARATOR ' ')
from known_ips
group by ip having count(1) > 1
However, many users have dynamic IP addresses so this list contains many duplicate entries (one for each IP address they share with others, obviously).
What I would like to do is have each row returned be a unique group of users that have shared any ip address with one another at any point.
For example, if Bob and Jane had shared IP address 192.168.0.1 and Bob and Fred had shared IP address 192.168.0.2, I would want the row to return 'Bob Fred Jane' (the program is taking the results of this query and doing some operations with it, and essentially needs a list of accounts on which to take action).
What I can't figure out on my own is how to do this aggregation (or whether it is even possible). I initially tried to have the original query as CTE (using with clause) and then trying to group on that, however I reached a stumbling block in that I couldn't figure out how to logically carry out the operation "compare string-delimited list of users in group 1 to see if any exist in group 2", and I figure doing that sort of string comparison is not what SQL is all about anyway (and I can do it in the program rather than the SQL besides).
Does anyone know of any technique whereby I can represent the logic of what I am trying to achieve here in MySQL? Or, should I accept the solution I already reached and then do the aggregation in the client application?
Edit:
In response to the request to a sample of data and output, here is a contrived example of the data:
Account IP Timestamp
Bob 192.168.0.1 2014-02-12 08:00
Bob 192.168.0.1 2014-02-12 09:30
Bob 192.168.0.2 2014-02-12 10:00
Mary 192.168.0.1 2014-03-12 07:00
Bob 192.168.0.2 2014-03-12 08:00
Jim 192.168.0.4 2014-03-12 08:30
Ted 192.168.0.2 2014-03-12 09:00
Jim 192.168.0.5 2014-04-12 08:30
Bob 192.168.0.3 2014-04-12 09:30
Andy 192.168.0.6 2014-04-12 10:30
Paul 192.168.0.6 2014-04-12 11:30
From this sample data, I would expect exactly two rows returned:
Bob Mary Ted
Andy Paul
I am ambivalent about the ordering of accounts in the list, despite my use of ORDER BY timestamp earlier.
Related
Here's my situation : I have a table that has large amounts of records, I need to pull out a number of these records for each name in the database, note that TOP will not work for my use case. My end user wants the report formatted in such a way that each user shows up only once, and up to 3 different dates are shown for the user.
Table format
AutoID
Enum
TNum
Date
Comments
1
25
18
2/2/22
2
25
18
1/2/21
Blah
3
18
18
1/2/21
4
18
18
1/2/20
5
25
17
1/2/22
6
25
17
1/2/20
Now the Enum and TNum fields are fk with other tables, I have created a join that pulls the correct information from the other tables. In the end my query provides this output
RecordID
Training
CompletedDate
FirstName
LastName
Location
2821
MaP
1/1/21
David
Simpson
123 Sesame St.
2822
1/2/22
Fuller
MaP
Dough
GHI
David
123 Sesame St.
2825
1/1/20
Simpson
The two "Blank fields" represent information that is pulled and may or may not be needed in some future report.
So to my question : How do I manage to get a report, with this query's pull to look like this:
Place
LastName
FirstName
Training
FirstCuttoff
Secondcutoff
ThirdCutoff
Comments
123 Sesame St.
David
Simpson
MaP
1/1/20
1/1/21
123 Sesame St.
John
Dough
MaP
1/1/22
I was originally planning on joining my query to itself using where clauses. But when I tried that it just added two extra columns of the same date. In addition it is possible that each record is not identical; locations may be different but since the report needs the most recent location and the name of the trainee. In addition, to add more complexity, there are a number of people in the company with effectively the same name as far as the database is concerned, so rejoining on the name is out. I did pull the Enum in my query, I can join on that if needed.
Is there an easier way to do this, or do I need to sort out a multiple self-joining query?
I have a project I am working on where I am going to have to do this. Some of the suggestions I received were to use a Pivot query. It wouldn't work in my case but it might for yours. Here is a good example
Pivot Columns
I have some columns in mysql table with field vaues are seperated with commas. fields like IP address and running_port_ids, dns_range or subnet etc. running a cron to check every hour whether the ports are used or not on the appliance. if ports are used against each appliance running_port_ids(like 2,3,7) are inserted with comma seperated values.
How to process the data so that i can get a reports which ports are less used (i have a static list of port ids) in ascending order like below by grouping of address, running_port_ids and insert date for a date range of one month.
address port usage%
10.2.1.3 3 1
10.3.21.22 2 20
there are thousands of record now in the table with comma seperated running_port_ids. is there any methods available in MySql to do this?
Any help much appreciated.
If you can convert your data model to a n:m relation (or "link table"), i.e. normalize your data model, this is pretty easy using grouping (or "aggregate") functions. So I'd advise to revise your data model and introduce a table containing one row for each of the ports, in stead of storing this de-normalized in a text column.
A typical example would be: "student has many classes", and a property of this relation is "attendance":
Student
id name
1 John
2 Jane
Course
id name
1 Engineering
2 Databases
Class
id courseid date room
1 1 2015-08-05 10:00:00 301
2 1 2015-08-13 10:00:00 301
3 1 2015-09-03 10:00:00 301
StudentClass
studentid classid attendance
1 1 TRUE
1 2 FALSE
1 3 NULL
2 1 TRUE
2 2 TRUE
2 3 NULL
In this case, you can see the relation between student and class is normalized, i.e. every other value is stored vertically in stead of horizontally. This way, you can easily query things like "How many classes did John miss?" or "How many students did not miss any class". NULL in the example shows that we can not yet tell anything about the attendance (as the date is in the future), but we do know that they should attend.
This is the way you should keep track of properties of a relation between two things. I can't really make out what you're trying to build, but I'm pretty sure you need a similar model.
Hope this helps.
So I was looking for a possible solution to my problem but could not find it anywhere. I have a log table that logs users' visits (campaign name, IPs, useragent string, hostname etc etc). What I'm trying to get is a list of shared IPs that were seen across the campaigns I define.
so, here is my table for ex:
Log
-------------------------------------------------------------------
id ip campaignName UserName
-------------------------------------------------------------------
1 173.45.87.2 UK-Test John
2 12.45.76.53 Go-4 John
3 173.45.87.2 Robo-s John
4 67.55.33.77 Wrangles John
5 3.25.233.53 Stan-Die John
6 173.45.87.2 StartMa John
7 123.45.67.23 Fresh.Goal John
8 54.23.57.86 Ert56 John
9 173.45.87.2 Yuoit John
Desired output should be:
173.45.87.2
As this IP only appears in all UserName (John) campaigns.
forgot to mention that I know the UserName and all his campaigns, it's just the shared IPs across campaigns that I'm looking for.
Thanks for all helpers
SELECT ip
FROM Log
WHERE UserName = 'John'
GROUP BY ip
HAVING COUNT(*) > 1
I have to model over a relational database the following scenario.
Imagine you have a number (say 10.000) of persons.
Imagine each of those person may, or may not, offer a given service inside a timespan of a given day. Let's call these services "Answer phone", "Answer email", and "Answer SMS".
I have 48 timespans a day (00:00 - 00:30, 00:30 - 01:00, 01:00 - 01:30, etc.)
I have to schedule 7 week days (1 to 7)
Each service can be overlapped to another.
I'm currently thinking about a structure like this:
id | user_id | day | t00 | t05 | t10 | [... more timespans ...] | service_type
x 001 1 1 1 0 ... 'answer_phone'
y 001 1 1 1 1 ... 'answer_email'
z 002 1 0 0 1 ... 'answer_phone'
And so on. About the t* columns:
every t* column is a boolean value
t00 means "service is ON from 00:00 to 00:29"
t05 means "service is ON from 00:30 to 00:59"
t10 means "service is ON from 01:00 to 01:29"
and so on. So, at row "x" i've modeled that
user 001 will answer phone between 00:00 and 00:59, while answering
emails from 00:00 to 01:29 on Monday.
After thinkin about for a while, this approach seems to be enough straightforward, but i fear it will suffer performance and disk space issues when dealing with thousands of users.
Infact, for 10k users, i would have (10k * how_many_services * 7days) rows, which means 210.000 records. Not that much, but users may grow, or new services may be added.
Can you suggest a better approach?
This is a terrible design. It's not normalized at all.
I would imagine there's a 1:many relationship between a user and their activity schedules. I'd model it that way.
If you don't know what normal forms are and why they're important, you shouldn't be doing relational modeling. Get someone who understands it to help you.
I would have separate tables for TIMES, SERVICES, USERS and ACTIONS.
TIMES would contain just the time splits (including a textual description of the time period)
SERVICES would have the service types such as answer_phone, answer_email etc. This allows for easy future expansion.
USERS would have any info on the users of the system. Such as userID, name, department, whatever.
The ACTIONS table would be for linking all the above tables together using foreign keys.
An entry in the actions table would have its own primary key, user_FK, time_FK, service_FK.
One of my coworkers is working on a SQL query. After several joins (workers to accounts to tasks), she's got some information sort of like this:
Worker Account Date Task_completed
Bob Smith 12345 01/01/2010 Received
Bob Smith 12345 01/01/2010 Received
Bob Smith 12345 01/01/2010 Processed
Sue Jones 23456 01/01/2010 Received
...
Ultimately what she wants is something like this - for each date, for each account, how many tasks did each worker complete for that account?
Worker Account Date Received_count Processed_count
Bob Smith 12345 01/01/2010 2 1
... and there are several other statuses to count.
Getting one of these counts is pretty easy:
SELECT
COUNT(Task_completed)
FROM
(the subselect)
WHERE
Task_completed = 'Received'
GROUP BY
worker, account, date
But I'm not sure the best way to get them all. Essentially we want multiple COUNTs using different GROUP BYs. The best thing I can figure out is to copy and paste the subquery several times, change the WHERE to "Processed", etc, and join all those together, selecting just the count from each one.
Is there a more obvious way to do this?
SELECT worker, account, date,
SUM(task_completed = 'Received') AS received_count,
SUM(task_completed = 'Processed') AS processed_count
FROM mytable
GROUP BY
worker, account, date