Mysql selecting unique values - mysql

I am really stuck with an sql query... I hope someone can help shed some light for me.
Here is what my table looks like
mysql> show fields from france_data;
+----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| email | varchar(45) | YES | | NULL | |
| name | varchar(45) | YES | | NULL | |
| lastname | varchar(45) | YES | | NULL | |
| quality | varchar(45) | YES | | NULL | |
| country | varchar(45) | YES | | NULL | |
| state | varchar(45) | YES | | NULL | |
| year | varchar(45) | YES | | NULL | |
| owner | varchar(45) | YES | | NULL | |
+----------+-------------+------+-----+---------+----------------+
9 rows in set (0.00 sec)
Here's the catch, I have duplicate data in my table, I would like to pull out all data from this table, non-duplicated based on the email.
I ran a simple count like this:
mysql> select count(*) from france_data;
and this is the result set:
+----------+
| count(*) |
+----------+
| 2405259 |
+----------+
1 row in set (0.01 sec)
Now I tried to run a count like this:
mysql> select count(*) from france_data group by email;
Just to see how many unique records I have. Unfortunately this times out.
Does any one know how I can do a count of unique rows and select of the same type?

Please try this
SELECT COUNT(DISTINCT email) FROM france_data

If you're looking at unique rows based on email, a simple
select count(distinct email) from france_data
should do the trick.
If you're also looking to see each email's duplicate count, try this:
select email, count(*) as cnt from france_data group by email order by cnt desc;

Please try this
SELECT count(DISTINCT email) FROM france_data ORDER BY email;

Related

Counting and ordering and joining

My previous question gave me the answer that I could take
mysql> describe taps;
+------------+-----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-----------+------+-----+-------------------+-------+
| tag | int(11) | NO | | NULL | |
| station | int(11) | NO | | NULL | |
| time_Stamp | timestamp | NO | | CURRENT_TIMESTAMP | |
+------------+-----------+------+-----+-------------------+-------+
3 rows in set (0.00 sec)
and use the query
SELECT tag
, COUNT(DISTINCT station) as `visit_count`
FROM taps
GROUP
BY tag
ORDER
BY COUNT(DISTINCT station) DESC
to get the visitors ordered by the number of stations they have visited.
Now I want to add
mysql> describe visitors;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| tag_id | int(11) | NO | | NULL | |
| name | text | NO | | NULL | |
| email | text | NO | | NULL | |
| phone | text | NO | | NULL | |
+--------+---------+------+-----+---------+-------+
4 rows in set (0.00 sec)
And, instead of getting the visitors tag_id, I want to get his name, email and phone. I know that it involves aJOIN, but just can't figure it out :-(
[Update] Just to be clear, I want to output an HTML table, ordered by whoever visited the most stations, showing name, email & phone
SELECT tag
,v.email, COUNT(DISTINCT station) as `visit_count`
FROM taps as t JOIN visitors as v ON t.tag = v.tag_id
GROUP
BY v.email
ORDER
BY COUNT(DISTINCT station) DESC

Calculate average of values between 2 columns sql

I have a table called validation_errors that looks like this:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| link | varchar(200) | NO | MUL | NULL | |
| message | varchar(500) | NO | | | |
| explanation | mediumtext | NO | | NULL | |
| type | varchar(50) | NO | | | |
| subtype | varchar(50) | NO | | | |
| message_id | varchar(50) | NO | | | |
+-------------+--------------+------+-----+---------+----------------+
Link table looks like this:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| link | varchar(200) | NO | PRI | NULL | |
| visited | tinyint(1) | NO | | 0 | |
| validated | tinyint(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+-------+
I wish to calculate the average number of validation errors per page per topdomain.
I have a query that can fetch the amount of pages per topdomain:
SELECT substr(link, - instr(reverse(link), '.')) as domain , count(*) as count
FROM links
GROUP BY domain
ORDER BY count desc
limit 30;
And have a sql query that can fetch the amount of validation errors per top domain:
SELECT substr(link, - instr(reverse(link), '.')) as domain ,count(*) as count
FROM validation_errors
GROUP BY domain
ORDER BY count desc
limit 30;
What i now need to do is combine them into a query and divise the results of one column with the other and i can't figure out how to do it.
Any help would be greatly apriciated.
First, use substring_index(), rather than your construct. Here is the query to join them together:
select domain, sum(numviews) as numviews, sum(numerrors) as numerrors,
sum(numerrors) / nullif(sum(numviews), 0) as error_rate
from ((SELECT substring_index(link, '.', -1) as domain , count(*) as numviews, 0 as numerrors
FROM links
GROUP BY domain
) UNION ALL
(SELECT substring_index(link, '.', -1) as domain , 0, count(*)
FROM validation_errors
GROUP BY domain
)
) d
GROUP BY domain;
With both variables, I don't know which 30 you want to choose, so I haven't included an order by.
Note that this doesn't use a join, it uses union all with aggregation. This ensures that you will get all domains, even those with no views and those with no errors.

mysql query sub selecting group by

I have a table described below
mysql> describe payments;
+----------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(128) | NO | | NULL | |
| email | varchar(128) | NO | | NULL | |
| txn_id | varchar(19) | NO | | NULL | |
| payment_status | varchar(20) | NO | | NULL | |
| auth | varchar(40) | NO | | NULL | |
| expired_at | datetime | YES | | NULL | |
+----------------+--------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
It is possible that I could have 2 entries such as:
name: chris|expires at: 2012-01-01|email: me#chrismuench.com
name: chris|expires at: 2014-01-01|email: me#chrismuench.com
I want to do a query that I find all expired users based on downloads. But there could be multiple entires for the same email address. In the above case this person should NOT show up in the expired list.
It seems like I want to do a GROUP BY email but somehow filter out expiration if it is > NOW()
It would seem that for each unique user you want the maximum date and to check whether that is before NOW() right? So:
SELECT name, MAX(expired_at) as latest_expired FROM payments WHERE lastest_expired < NOW() GROUP BY name;
Here is what you would have to do. I haven't used datetime functions in a bit so you'll have to double check it for correctness. I just assumed NOW() was a function.
Find the set of unexpired users, then subtract it from the entire set of users.
select * from table where email not in (select email from table where expired_at > NOW());

How to code this SELECT statment?

Given this table :
mysql> describe activity;
+---------------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------------+-------------+------+-----+---------+-------+
| user_id | varchar(16) | NO | | NULL | |
| login_time | int(11) | NO | | NULL | |
| last_activity_time | int(11) | NO | | NULL | |
| last_activity_description | text | YES | | NULL | |
| logout_time | int(11) | NO | | NULL | |
+---------------------------+-------------+------+-----+---------+-------+
5 rows in set (0.01 sec)
I want to select the most recent last_activity_time (standard Unix timestamp) for each user who is logged in (i.e has one or more rows where logout_time is not zer0).
I tried
SELECT user_id, login_time, MAX(last_activity_time)
FROM activity
WHERE logout_time="0";
...but that found only a single entry with two users logged in, probably because I am selecting for MAX(last_activity_time)
What I want is something like
SELECT all unique user_ids
SELECT each of those which has one or more entries where `logout_time` != 0
SELECT the maximum value of `logout_time` for each of those
all in one single SELECT statement. How can I do that?
SELECT user_id, MAX(logout_time)
FROM activity
WHERE logout_time <> "0"
GROUP BY user_id;

Mysql update all rows based on select from another table

I have two tabels;
mysql> describe ipinfo.ip_group_country;
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| ip_start | bigint(20) | NO | PRI | NULL | |
| ip_cidr | varchar(20) | NO | | NULL | |
| country_code | varchar(2) | NO | MUL | NULL | |
| country_name | varchar(64) | NO | | NULL | |
+--------------+-------------+------+-----+---------+-------+
mysql> describe logs.logs;
+----------------------+------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+------------+------+-----+---------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ts | timestamp | NO | | CURRENT_TIMESTAMP | |
| REMOTE_ADDR | tinytext | NO | | NULL | |
| COUNTRY_CODE | char(2) | NO | | NULL | |
+----------------------+------------+------+-----+---------------------+----------------+
I can select country code using ip address from first table:
mysql> SELECT country_code FROM ipinfo.`ip_group_country` where `ip_start` <= INET_ATON('74.125.45.100') order by ip_start desc limit 1;
+--------------+
| country_code |
+--------------+
| US |
+--------------+
In logs.logs, I have all the REMOTE_ADDR (ip address) set, but all COUNTRY_CODE entries are empty. Now, I want to populate COUNTRY_CODE appropriately using the ipinfo table. How can I do this?
thanks!
Try
UPDATE logs.logs
SET COUNTRY_CODE = (
SELECT country_code
FROM ipinfo.ip_group_country
WHERE ipinfo.ip_start <= INET_ATON(logs.REMOTE_ADDR)
LIMIT 1
)
WHERE COUNTRY_CODE IS NULL
If it fails saying the column types must match, you'll have to alter your logs.logs table so that the REMOTE_ADDR column is the same type (varchar(20)) as the ip_cidr table.
In a single-table update you use update t1 set c1=x where y.
In a multi-table update you use update t1, t2 set t1.c1=t2.c2 where t1.c3=t2.c4
Here's the relevant documentation http://dev.mysql.com/doc/refman/5.0/en/update.html
What you're looking for is something along the lines of (editted) update logs.logs as l, ipinfo.ip_group_country as c set l.COUNTRY_CODE=c.country_code where c.ip_start <= INET_ATON(l.REMOTE_ADDR) order by c.ip_start asc
Edit: you're right, the max() in the original answer I provided could not work. The query above should, although it will likely be less efficient than something like the approach in the answer provided below.