MySQL: Select Where Distinct if 1st entry, not Distinct within query - mysql

I have a Table containing columns Email, Ip, State, City, TimeStamp, Id
I need to count where Email and IP are distinct, group by State
So when I run a MYSQL query,
select State, City ,count(distinct( Email )), count(DISTINCT( IP))
from table
group by Stat, City
It gives me distinct of each, but not AND
I need a count of distinct Email && Distinct IP ; grouped by State, City.
And distincts cant be within the Group, it has to be the 1st instance of EMAIL, and first instance of IP in entire database. So if i expand it, and add a date parameter, even though im selecting a specific date, I still can check whole database for the uniques.
So if i need
select state, city, count ( distinct ( IP ) , count ( distinct ( EMAIL ))
from table
where timestamp > date(2014-12-01)
group by state, city
What type of query is this? And how can I accomplish this?
My gut tells me i need to do CONCAT as suggested, but also another select inside. So select whole database distinct ip, then select that specific criteria from the other select.

This can help a bit to have a "distinct(A && B)"
SELECT DISTINCT(CONCAT(A,'_',B)),C,D
FROM table
GROUP BY C,D

We struggled to do this on a production server and found the query required was too resource intensive. So we created a table with an update on first instance the item occurs, then we check for counts with a join like so:
select count(a.State) from tablename A
inner join table_update U
on a.id = u.id
WHERE a.parameters..
and c.first_email = 1
and c.first_ip = 1
We couldnt find a single table that wouldnt bring our server down with 400,000 records. Its not a classy answer, but its what we had to use.

Related

MYSQL max(timestamp) returns different results than max(id)

I am creating a query for a log system. The table contains 100,000 rows or so and I would like to remove duplicates for the the following columns and only return the latest entry.
Columns to avoid duplicates,
user
ip
time_accessed
mainlocation
secondlocation
thirdlocation
did_user_have_access
The purpose of this is to see which portions of the site a user has visited. We do not need to know that they have visited a particular pages 100 times, we only need to know that they visited it once.
The table has the following columns,
id
user
ip
time_accessed
mainlocation
secondlocation
thirdlocation
task
did_user_have_access
My question is, why do the following queries return such drastic results? The MAX('id') query returns 450 results and the MAX(time_accessed) returns 835. Shouldn't they return the same ammount?
SELECT DISTINCT mainlocation, secondlocation, thirdlocation, ip, user, did_user_have_access, time_accessed
FROM `log_table`
WHERE `id` IN (SELECT MAX(id) AS id
FROM `log_table`
GROUP BY `mainlocation`, `secondlocation`, `thirdlocation`, `ip`, `user`, `did_user_have_access`)
ORDER BY `log_table`.`time_accessed` DESC;
SELECT DISTINCT mainlocation, secondlocation, thirdlocation, ip, user, did_user_have_access, time_accessed
FROM `log_table`
WHERE `time_accessed` IN (SELECT MAX(`time_accessed`) AS time_accessed
FROM `log_table`
GROUP BY `mainlocation`, `secondlocation`, `thirdlocation`, `ip`, `user`, `did_user_have_access`)
ORDER BY `log_table`.`time_accessed` DESC;
Without knowing how do you populate both fields you're applying MAX() to - it could hardly be answered, only guessed, perhaps.
Although... does this matter? If you get a proper result - does it?
Then, you don't have to split it into two queries - as you're grouping results exactly by the fields you expect to be de-duped, you guaranteed to have unique combinations with MAX() in the main query:
SELECT DISTINCT mainlocation, secondlocation, thirdlocation,
ip, user, did_user_have_access,
MAX(`time_accessed`) AS last_accessed
FROM log_table
GROUP BY mainlocation, secondlocation, thirdlocation,
ip, user, did_user_have_access
In other words, each six-tuple would be unique with each last_accesed

SQL Query sorting rows by duplicate name keeping lowest in result

I've got a table with 11 columns and I want to create a query that removes the rows with duplicate names in the Full Name's column but keeps the row with the lowest value in the Result's column. Currently I have this.
SELECT
MIN(sql363686.Results2014.Result),
sql363686.Results2014.Temp,
sql363686.Results2014.Full Name,
sql363686.Results2014.Province,
sql363686.Results2014.BirthDate,
sql363686.Results2014.Position,
sql363686.Results2014.Location,
sql363686.Results2014.Date
FROM
sql363686.Results2014
WHERE
sql363686.Results2014.Event = '50m Freestyle'
AND sql363686.Results2014.Gender = 'M'
AND sql363686.Results2014.Agegroup = 'Junior'
GROUP BY
sql363686.Results2014.Full Name
ORDER BY
sql363686.Results2014.Result ASC ;
At first glance it seems to work fine and I get all the correct values, but I seem to be getting a different (wrong) value in the Position column then what I have in my database table. All other values seem to be right. Any ideas on what I'm doing wrong?
I'm currently using dbVisualizer connected to a mysql database. Also, my knowledge and experience with sql is the bare mimimum
Use group by and a join:
select r.*
from sql363686.Results2014 r
(select fullname, min(result) as minresult
from sql363686.Results2014 r
group by fullname
) rr
on rr.fullname = r.fullname and rr.minresult = r.minresult;
You have fallen into the trap of the nonstandard MySQL extension to GROUP BY.
(I'm not going to work with all those fully qualified column names; it's unnecessary and verbose.)
I think you're looking for each swimmer's best time in a particular event, and you're trying to pull that from a so-called denormalized table. It looks like your table has these columns.
Result
Temp
FullName
Province
BirthDate
Position
Location
Date
Event
Gender
Agegroup
So, the first step is to locate the best time in each event for each swimmer. To do this we need to make a couple of assumptions.
A person is uniquely identified by FullName, BirthDate, and Gender.
An event is uniquely identified by Event, Gender, Agegroup.
This subquery will get the best time for each swimmer in each event.
SELECT MIN(Result) BestResult,
FullName,BirthDate, Gender,
Event, Agegroup
FROM Results2014
GROUP BY FullName,BirthDate, Gender, Event, Agegroup
This gets you a virtual table with each person's fastest result in each event (using the definitions of person and event mentioned earlier).
Now the challenge is to go find out the circumstances of each person's best time. Those circumstances include Temp, Province, Position, Location, Date. We'll do that with a JOIN between the original table and our virtual table, like this
SELECT resu.Event,
resu.Gender,
resu.Agegroup,
resu.Result,
resu.Temp.
resu.FullName,
resu.Province,
resu.BirthDate,
resu.Position,
resu.Location,
resu.Date
FROM Results2014 resu
JOIN (
SELECT MIN(Result) BestResult,
FullName,BirthDate, Gender,
Event, Agegroup
FROM Results2014
GROUP BY FullName,BirthDate, Gender, Event, Agegroup
) best
ON resu.Result = best.BestResult
AND resu.FullName = best.FullName
AND resu.BirthDate = best.BirthDate
AND resu.Gender = best.Gender
AND resu.Event = best.Event
AND resu.Agegroup = best.Agegroup
ORDER BY resu.Agegroup, resu.Gender, resu.Event, resu.FullName, resu.BirthDate
Do you see how this works? You need an aggregate query that pulls the best times. Then you need to use the column values in that aggregate query in the ON clause to go get the details of the best times from the detail table.
If you want to report on just one event you can include an appropriate WHERE clause right before ORDER BY as follows.
WHERE resu.Event = '50m Freestyle'
AND resu.Gender = 'M'
AND resu.Agegroup = 'Junior'

SQL performance of a large number of sum()s

Within my J2EE web application, I need to generate a bar chart representing the percentage of users in the system with specific alerts. (EDIT - I forgot to mention, the graph only deals with alerts associated with the first situationof each user, thus the min(date) ).
A simplified (but structurally similar) version of my database schema is as follows :
users { id, name }
situations { id, user_id, date }
alerts { id, situation_id, alertA, alertB }
where users to situations are 1-n, and situations to alerts are 1-1.
I've omitted datatypes but the alerts (alertA and B) are booleans. In my actual case, there are many such alerts (30-ish).
So far, this is what I have come up with :
select sum(alerts.alertA), sum(alerts.alertB)
form alerts, (
select id, min(date)
from situations
group by user_id) as situations
where situations.id = alerts.situation_id;
and then divide these sums by
select count(users.id) from users;
This seems far from ideal.
Your recommendations/advice as to how to improve as query would be most appreciated (or maybe I need to re-think my database schema)...
Thanks,
Anthony
PS. I was also thinking of using a trigger to refresh a chart specific table whenever the alerts table is updated but I guess that's a subject for a different query (if it turns out to be problematic).
At first, think about your schema again. You will have a lot of different alerts and you probably don't want to add a single column for every one of those.
Consider changing your alerts table to something like { id, situation_id, type, value } where type would be (A,B,C,....) and value would be your boolean.
Your task to calculate the percentages would then split up into:
(1) Count the total number of users:
SELECT COUNT(id) AS total FROM users
(2) Find the "first" situation for each user:
SELECT situations.id, situations.user_id
-- selects the minimum date for every user_id
FROM (SELECT user_id, MIN(date) AS min_date
FROM situations
GROUP BY user_id) AS first_situation
-- gets the situations.id for user with minimum date
JOIN situations ON
first_situation.user_id = situations.user_id AND
first_situation.min_date = situations.date
-- limits number of situations per user to 1 (possible min_date duplicates)
GROUP BY user_id
(3) Count users for whom an alert is set in at least one of the situations in the subquery:
SELECT
alerts.type,
COUNT(situations.user_id)
FROM ( ... situations.user_id, situations.id ... ) AS situations
JOIN alerts ON
situations.id = alerts.situation_id
WHERE
alerts.value = 1
GROUP BY
alerts.type
Put those three steps together to get something like:
SELECT
alerts.type,
COUNT(situations.user_id)/users.total
FROM (SELECT situations.id, situations.user_id
FROM (SELECT user_id, MIN(date) AS min_date
FROM situations
GROUP BY user_id) AS first_situation
JOIN situations ON
first_situation.user_id = situations.user_id AND
first_situation.min_date = situations.date
GROUP BY user_id
) AS situations
JOIN alerts ON
situations.id = alerts.situation_id
JOIN (SELECT COUNT(id) AS total FROM users) AS users
WHERE
alerts.value = 1
GROUP BY
alerts.type
All queries written from my head without testing. Even if they don't work exactly like that, you should still get the idea!

MySQL GROUP BY and COUNT

I have a small problem regarding a count after grouping some elements from a mysql table,
I have an orders table .. in which each order has several rows grouped by a code (named as codcomanda) ... I have to do a query which counts the number of orders per customer and lists only the name and number of orders.
This is what i came up (this might be dumb ... i'm not a pro programmer)
SELECT a.nume, a.tel, (
SELECT COUNT(*) AS `count`
FROM (
SELECT id AS `lwtemp`
FROM lw_comenzi_confirmate AS yt
WHERE status=1 AND yt.tel LIKE **a.tel**
GROUP BY yt.codcomanda
) AS b
) AS numar_comenzi
FROM lw_comenzi_confirmate AS a
WHERE status=1
GROUP BY tel;
nume = NAME
tel = PHONE (which is the distinct identifier for clients since there's no login system)
The problem with the above query is that I don't know how to match the a.tel with the one on which the first select is on. If I replace it with a number that is in the db it works....
Can anyone help me one how to refer to that var?
or maybe another solution on how to get this done?
If any more info is needed I`ll provide asap.
Please, correct me if I'm wrong in my understanding of your schema:
lw_comenzi_confirmate contains nume and tel of the customer;
lw_comenzi_confirmate contains order details (same table);
one order can have several entries in the lw_comenzi_confirmate table, order is distinguished by codcomanda field.
First, I highly recommend reading about Normalisation and fixing your database design.
The following should do the job for you:
SELECT nume, tel, count(DISTINCT codcomanda) AS cnt
FROM lw_comenzi_confirmate
WHERE status = 1
GROUP BY nume, tel
ORDER BY nume, tel;
You can test this query on SQL Fiddle.

Combining count(*) with SQL statements into one table

I have two tables in my MySQL database (table1 and table 2). I want to write a SQL query that outputs some summary stats in a nicely formatted report.
Let's for an example consider a first SQL query that takes the users over 57 year of age from the first table
SELECT count(*) AS OlderThank57
FROM table1
WHERE age >57
And from the second table we want to get the number of users that are female
SELECT count(*) AS FemaleUsers
FROM table2
WHERE gender = "female"
Now I want to have an output like the following
Number of Felame users from table 2: 514
Number of users over the age of 57 from table 1: 918
What is the best way of generating such a report?
I would offer to expand one level from Adrian's answer... return as two separate fields so you could place them separately in a report, or align / format the number, etc
SELECT 'Number of Female users from table 2:' as Msg,
count(*) as Entries
FROM table1
WHERE age >57
UNION ALL
SELECT 'Number of users over the age of 57 from table 1:' as Msg,
count(*) as Entries
FROM table2
WHERE gender = "female"
You might have to force both "Msg" columns to the same padded length, otherwise one might get truncated. Again, just another option...
You could always try the WITH ROLLUP directive when using a GROUP BY:
SELECT COUNT(*), gender FROM table1 GROUP BY gender WITH ROLLUP
If you want to get a bit crazy you can always make a series of IFs that handles the logic for one or more thing at a time:
SELECT COUNT(*) IF(gender='female', 'female', IF(age>57, 'older_than_57', '*')) AS switch FROM table1 GROUP BY switch WITH ROLLUP
SELECT CONCAT('Number of users over the age of 57 from table 1:', count(*))
FROM table1
WHERE age >57
UNION ALL
SELECT CONCAT('Number of Felame users from table 2: ', count(*))
FROM table2
WHERE gender = "female"
I don't have mysql database to check it, so you might have to cast count(*) to string.
A union is your only option if you don't have a normalized database. Your other option is to better standardize/normalize your db so you can run much more efficient queries.