I have a table similar to:
domain | file | Number
------------------------------------
aaa.com | aaa.com_1 | 111
bbb.com | bbb.com_1 | 222
ccc.com | ccc.com_2 | 111
ddd.com | ddd.com_1 | 222
eee.com | eee.com_1 | 333
I need to query the number of Domains that share the same Number and their File name ends with _1. I tried the following:
select count(domain) as 'sum domains', file
from table
group by Number
having
count(Number) >1 and File like '%\_1';
It gives me:
sum domains | file
------------------------------
2 | aaa.com
2 | bbb.com
I expected to see the following:
sum domains | file
------------------------------
1 | aaa.com
2 | bbb.com
Because the Number 111 appears once with File ends with _1 and _2, so it should count 1 only. How can I apply the 2 conditions that I stated earlier correctly ?
As documented under SELECT Syntax:
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization.
In other words, it is applied after the grouping operation has been performed (in contrast with WHERE, which is performed before any grouping operation). See WHERE vs HAVING.
Therefore, your current query first forms the resultset from the following:
SELECT COUNT(domain) AS `sum domains`, file
FROM `table`
GROUP BY Number
See it on sqlfiddle:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
| 1 | eee.com_1 |
As you can see, the values selected for the file column are merely one of the values from each group—as documented under MySQL Extensions to GROUP BY:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Your current query then proceeds to filter these results according to your HAVING clause:
HAVING COUNT(Number) > 1 AND file LIKE '%\_1'
With the values of file selected above, every single group matches on the second criterion; and the first two groups match on the first criterion. Therefore the results of the complete query are:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
Following your comments above, you want to filter the records on file before grouping and then filter the resulting groups for those containing more than one match. Therefore use WHERE and HAVING respectively (and select Number instead of file to identify each group):
SELECT Number, COUNT(*) AS `sum domains`
FROM `table`
WHERE file LIKE '%\_1'
GROUP BY Number
HAVING `sum domains` > 1
See it on sqlfiddle:
| NUMBER | SUM DOMAINS |
------------------------
| 222 | 2 |
i am using by following
having ( SUM(qty) > 4 AND SUM(qty) < 15 )
You cannot have the file name in the SELECT statement if it is not also in the GROUP BY. You have to get your GROUP BY result than JOIN back to the original and add the filter logic like so:
SELECT *
FROM
(
select count(domain) as 'sum_domains', Number
from table
group by Number
having
count(Number) >1
) result
join table t on result.Number = t.Number
WHERE file like '%\_1'
Try the nested query below:
select count(domain) as 'sum domains', domain as fileName
from
(select domain, file from tableName
group by Number
having count(Number) >1) as temp
WHERE file like '%\_1';
Related
I'm writing a cronjob that runs analysis on a flags table in my database, structured as such:
| id | item | def | time_flagged | time_resolved | status |
+----+------+-----+--------------+---------------+---------+
| 1 | 1 | foo | 1519338608 | 1519620669 | MISSED |
| 2 | 1 | bar | 1519338608 | (NULL) | OPEN |
| 3 | 2 | bar | 1519338608 | 1519620669 | IGNORED |
| 4 | 1 | foo | 1519620700 | (NULL) | OPEN |
For each distinct def, for each unique price, I want to get the "latest" row (IFNULL(`time_resolved`, `time_flagged`) AS `time`). If no such row exists for a given def-item combination, that's okay; I just don't want any duplicates for a given def-item combination.
For the above data set, I would like to select:
| def | item | time | status |
+-----+------+------------+---------+
| foo | 1 | 1519620700 | OPEN |
| bar | 1 | 1519338608 | OPEN |
| bar | 2 | 1519620669 | IGNORED |
Row 1 is not included because it's "overridden" by row 4, as both rows have the same def-item combination, and the latter has a more recent time.
The data set will have a few dozen distinct defs, a few hundred distinct items, and a very large number of flags that will only increase over time.
How can I go about doing this? I see the greatest-n-per-group tag is rife with similar questions but I don't see any that involve my specific circumstance of needed "nested grouping" across two columns.
You could try:
select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )
I know it's not the best approach but it might work for you
Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:
select
table.def,
table.item,
table.time,
status
from
table
join (select
def,
item,
max(time) time
from table
group by def, item) t
on
table.def=t.def and
table.item=t.item and
table.time=t.time
Depending on your version of mySQL, you can use a window function:
SELECT def, item, time, status
FROM (
SELECT
def,
item,
time,
status,
RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank -- Rank each (def, item) combination by "time"
FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping
If you can have a (def, item) combo with the same "time" value, then change RANK() to ROW_NUMBER. This will guarantee you only get one row per grouping.
select table.def, table.item, a.time, table.status
from table
join (select
def, item, MAX(COALESCE(time_r, time_f)) as time
from temp
group by def, item) a
on temp.def = a.def and
temp.item = a.item and
COALESCE(temp.time_r, temp.time_f) = a.time
I have a table of customers:
id | name | email
--------------------------
1 | Rob | spam#email.com
2 | Jim | spam#email.com
3 | Dave | ham#email.com
4 | Fred | eggs#email.com
5 | Ben | ham#email.com
6 | Tom | ham#email.com
I'm trying to write an SQL query that returns all the rows with duplicate email addresses but... I'd like the query result to return the original ID and the duplicate ID. (The original ID is the first occurrence of the duplicate email.)
The desired result:
original_id | duplicate_id | email
-------------------------------------------
1 | 2 | spam#email.com
3 | 5 | ham#email.com
3 | 6 | ham#email.com
My research so far has indicated it might involve some kind of self join, but I'm stuck on the actual implementation. Can anyone help?
We could handle this using a join, but I might actually go for an option which generates a CSV list of id corresponding to duplicates:
SELECT
email,
GROUP_CONCAT(id ORDER BY id) AS duplicate_ids
FROM yourTable
GROUP BY email
HAVING COUNT(*) > 1
Functionally speaking, this gives you the same information you wanted in your question, but in what is a much simplified form in my opinion. Because we order the id values when concatenating, the original id will always appear first, on the left side of the CSV list. Also, if you have many duplicates your requested output could become verbose and harder to read.
Output:
Demo
select
orig.original_id,
t.id as duplicate_id,
orig.email
from t
inner join (select min(id) as original_id, email
from t
group by email
having count(*)>1) orig on orig.email = t.email
having t.id!=orig.original_id
By the subquery we can find all ids for emails with duplicates.
Then we join the subquery by email and for each one use minimal id as original
UPDATE: http://rextester.com/BLIHK20984 cloned #Tim Biegeleisen's answer
I have a problem which I'm having trouble putting the pieces together for.
The table in question houses online highscores for a game, and the in-game highscore list is getting saturated with the same name over and over again as people have aimed higher. What I'd like to do is only keep one score per name such that more users can see their name in the top scores (or rather, multiply all their scores except their highest by -1, so I keep the data but the negative scores won't be loaded).
The flow would be:
Select all of the same name from names column
Update where highscore is not the largest value in the first selection by multiplying score by -1.
Repeat for all different names.
The key problems I haven't found solutions for are selecting the names one by one without the need to type each name in, and then updating all but their top score within the selection.
Any help will be greatly appreciated!
Thanks,
TFS
Assuming your table looks like:
------------------------------------
| player_name | date | score |
------------------------------------
| Max | 01-03-2015 | 100 |
| Daniel | 27-02-2015 | 150 |
| Max | 24-02-2015 | 200 |
| Daniel | 26-02-2015 | 100 |
------------------------------------
You can do a SELECT statement with an ANY predicate containing a GROUP BY to get the highest score for each group, which in this case is the player, along with the date of that score.
SELECT
player_name, date, score AS "max_score"
FROM
highscores t1
WHERE
(t1.player_name, t1.score) = ANY(SELECT t2.player_name, max(t2.score) FROM highscores t2 GROUP BY t2.player_name);
This should give you the following result:
----------------------------------------
| player_name | date | max_score |
----------------------------------------
| Daniel | 27-02-2015 | 150 |
| Max | 24-02-2015 | 200 |
----------------------------------------
Note that if you just wanted to get the player_name and MAX(score), you could have used the simple GROUP BY subquery contained inside the ANY expression. However, since you also need the date, this won't work as SQL has no way of knowing which date from the group to include. According to the SQL standard, the SELECT may contain only columns mentioned in the GROUP BY (player_name) or expressions based on aggregate functions. Hence the you cannot include the date in a simple GROUP BY query and hope to get the result you want.
I have a table called visits which contains the following
link_id, id, browser, country, referer
Now, this basically records visits of a certain link and inserts the browser, country and referer of whomever visted that link in a database
Now I need to show statistics for each link
I used the following query to get me all the browsers
SELECT browser, COUNT(browser) FROM visits GROUP BY browser
Which produced something like
Browser Count(Browser)
Internet Explorer | 5
Chrome | 3
Now this worked as expected for browsers only but I'm looking for a way to count all occurrences of referers, browsers and countries in one single query.
Is there a way to do this?
To count multiple, different occurence counts of values in the DB can very easily be done in just one query.
Keep in mind, the column header in SELECT COUNT(tablename) returns only one column, with only one numeric value. For every distinct value (from the GROUP BY clause), you have two columns: Value, Count. To count for different fields, you'll need three: Field, Value, Count, and if you want to count different fields in different tables, you'll need four: Table, Field, Value, Count.
Observe how I am using UNION below for two different tables:
SELECT
"Table1" AS TableName,
"Field1" AS Field,
Field1 AS Value,
COUNT(Field1) AS COUNT
FROM Table1
GROUP BY Value
UNION
SELECT
"Table2" as TableName,
"Field2" as Field,
Field2 as Value,
COUNT(Field2) AS COUNT
FROM Table2
GROUP BY Value
You'll notice I need to use aliases: "Table2" as TableName, this is because the UNION'd columns ought to have matching column headers.
So you can visualize what this returns, take a look:
+-------------------+----------------+----------+--------+
| TableName | Field | Value | COUNT |
+-------------------+----------------+----------+--------+
| ItemFee | PaymentType | | 228 |
| ItemFee | PaymentType | All | 1 |
| ItemFee | PaymentType | PaidOnly | 1 |
| Person | Presenter | | 692258 |
| Person | Presenter | N | 590 |
| Person | Presenter | Y | 8103 |
+-------------------+----------------+----------+--------+
I have made a statement that used group by. Now, I need to know the details for every record but I need to perform this at one statement.
For example: I made the following query to query about the files that ends with _1 and share the same number:
SELECT number, COUNT(*) AS sum domains
FROM table
WHERE file LIKE '%_1'
GROUP BY number HAVING sum domains > 1
So, if I have the following table:
domain | file | Number
------------------------------------
aaa.com | aaa.com_1 | 111
bbb.com | bbb.com_1 | 222
ccc.com | ccc.com_2 | 111
ddd.com | ddd.com_1 | 222
eee.com | eee.com_1 | 333
qqq.com | qqq.com_1 | 333
The result of the query is (the number that is shared by more than file and the count of the file(s) that ends with _1 and shared this number):
number | sum domains
------------------------
222 | 2
333 | 2
What I need to do is to print out the file names. I need:
number | file
------------------------
222 | bbb.com_1
222 | ddd.com_1
333 | eee.com_1
333 | qqq.com_1
How can I do this since group by clause does not allow me to print the file(s) ?
You can JOIN your query back against the main table as a subquery, to get the original rows and filenames:
SELECT
main.number,
main.file
FROM
table AS main
/* Joined against your query as a derived table */
INNER JOIN (
SELECT number, COUNT(*) AS sum domains
FROM table
WHERE RIGHT(file, 2) = '_1'
GROUP BY number
HAVING sum domains > 1
/* Matching `number` against the main table, and limiting to rows with _1 */
) as subq ON main.number = subq.number AND RIGHT(main.file, 2) = '_1'
http://sqlfiddle.com/#!2/cb05b/6
Note that I have replaced your LIKE '%_1' with RIGHT(file, 2) = '_1'. Hard to tell which will be faster without a benchmark though.