How to print every item resulted from group by statement - mysql

I have made a statement that used group by. Now, I need to know the details for every record but I need to perform this at one statement.
For example: I made the following query to query about the files that ends with _1 and share the same number:
SELECT number, COUNT(*) AS sum domains
FROM table
WHERE file LIKE '%_1'
GROUP BY number HAVING sum domains > 1
So, if I have the following table:
domain | file | Number
------------------------------------
aaa.com | aaa.com_1 | 111
bbb.com | bbb.com_1 | 222
ccc.com | ccc.com_2 | 111
ddd.com | ddd.com_1 | 222
eee.com | eee.com_1 | 333
qqq.com | qqq.com_1 | 333
The result of the query is (the number that is shared by more than file and the count of the file(s) that ends with _1 and shared this number):
number | sum domains
------------------------
222 | 2
333 | 2
What I need to do is to print out the file names. I need:
number | file
------------------------
222 | bbb.com_1
222 | ddd.com_1
333 | eee.com_1
333 | qqq.com_1
How can I do this since group by clause does not allow me to print the file(s) ?

You can JOIN your query back against the main table as a subquery, to get the original rows and filenames:
SELECT
main.number,
main.file
FROM
table AS main
/* Joined against your query as a derived table */
INNER JOIN (
SELECT number, COUNT(*) AS sum domains
FROM table
WHERE RIGHT(file, 2) = '_1'
GROUP BY number
HAVING sum domains > 1
/* Matching `number` against the main table, and limiting to rows with _1 */
) as subq ON main.number = subq.number AND RIGHT(main.file, 2) = '_1'
http://sqlfiddle.com/#!2/cb05b/6
Note that I have replaced your LIKE '%_1' with RIGHT(file, 2) = '_1'. Hard to tell which will be faster without a benchmark though.

Related

Select most recent row based on distinct combination of two columns

I'm writing a cronjob that runs analysis on a flags table in my database, structured as such:
| id | item | def | time_flagged | time_resolved | status |
+----+------+-----+--------------+---------------+---------+
| 1 | 1 | foo | 1519338608 | 1519620669 | MISSED |
| 2 | 1 | bar | 1519338608 | (NULL) | OPEN |
| 3 | 2 | bar | 1519338608 | 1519620669 | IGNORED |
| 4 | 1 | foo | 1519620700 | (NULL) | OPEN |
For each distinct def, for each unique price, I want to get the "latest" row (IFNULL(`time_resolved`, `time_flagged`) AS `time`). If no such row exists for a given def-item combination, that's okay; I just don't want any duplicates for a given def-item combination.
For the above data set, I would like to select:
| def | item | time | status |
+-----+------+------------+---------+
| foo | 1 | 1519620700 | OPEN |
| bar | 1 | 1519338608 | OPEN |
| bar | 2 | 1519620669 | IGNORED |
Row 1 is not included because it's "overridden" by row 4, as both rows have the same def-item combination, and the latter has a more recent time.
The data set will have a few dozen distinct defs, a few hundred distinct items, and a very large number of flags that will only increase over time.
How can I go about doing this? I see the greatest-n-per-group tag is rife with similar questions but I don't see any that involve my specific circumstance of needed "nested grouping" across two columns.
You could try:
select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )
I know it's not the best approach but it might work for you
Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:
select
table.def,
table.item,
table.time,
status
from
table
join (select
def,
item,
max(time) time
from table
group by def, item) t
on
table.def=t.def and
table.item=t.item and
table.time=t.time
Depending on your version of mySQL, you can use a window function:
SELECT def, item, time, status
FROM (
SELECT
def,
item,
time,
status,
RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank -- Rank each (def, item) combination by "time"
FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping
If you can have a (def, item) combo with the same "time" value, then change RANK() to ROW_NUMBER. This will guarantee you only get one row per grouping.
select table.def, table.item, a.time, table.status
from table
join (select
def, item, MAX(COALESCE(time_r, time_f)) as time
from temp
group by def, item) a
on temp.def = a.def and
temp.item = a.item and
COALESCE(temp.time_r, temp.time_f) = a.time

Skipping row for each unique column value

I have a table from which I would like to extract all of the column values for all rows. However, the query needs to be able to skip the first entry for each unique value of id_customer. It can be assumed that there will always be at least two rows containing the same id_customer.
I've compiled some sample data which can be found here: http://sqlfiddle.com/#!9/c85b73/1
The results I would like to achieve are something like this:
id_customer | id_cart | date
----------- | ------- | -------------------
1 | 102 | 2017-11-12 12:41:16
2 | 104 | 2015-09-04 17:23:54
2 | 105 | 2014-06-05 02:43:42
3 | 107 | 2011-12-01 11:32:21
Please let me know if any more information/better explanation is required, I expect it's quiet a niche solution.
One method is:
select c.*
from carts c
where c.date > (select min(c2.date) from carts c2 where c2.id_customer = c.id_customer);
If your data is large, you want an index on carts(id_customer, date).

mysql returns wrong results with random duplicate values

i need to return the best 5 scores in each category from a table.so far i have tried query below following an example from this site: selecting top n records per group
query:
select
subject_name,substring_index(substring_index
(group_concat(exams_scores.admission_no order by exams_scores.score desc),',',value),',',-1) as names,
substring_index(substring_index(group_concat(score order by score desc),',',value),',',-1)
as orderedscore
from exams_scores,students,subjects,tinyint_asc
where tinyint_asc.value >=1 and tinyint_asc.value <=5 and exam_id=2
and exams_scores.admission_no=students.admission_no and students.form_id=1 and
exams_scores.subject_code=subjects.subject_code group by exams_scores.subject_code,value;
i get the top n as i need but my problem is that its returning duplicates at random which i dont know where they are coming from
As you can see English and Math have duplicates which should not be there
+------------------+-------+--------------+
| subject_name | names | orderedscore |
+------------------+-------+--------------+
| English | 1500 | 100 |
| English | 1500 | 100 |
| English | 2491 | 100 |
| English | 1501 | 99 |
| English | 1111 | 99 |
|Mathematics | 1004 | 100 |
| Mathematics | 1004 | 100 |
| Mathematics | 2722 | 99 |
| Mathematics | 2734 | 99 |
| Mathematics | 2712 | 99 |
+-----------------------------------------+
I have checked table and no duplicates exist
to confirm there are no duplicates in the table:
select * from exams_scores
having(exam_id=2) and (subject_code=121) and (admission_no=1004);
result :
+------+--------------+---------+--------------+-------+
| id | admission_no | exam_id | subject_code | score |
+------+--------------+---------+--------------+-------+
| 4919 | 1004 | 2 | 121 | 100 |
+------+--------------+---------+--------------+-------+
1 row in set (0.00 sec)
same result for English.
If i run the query like 5 times i sometimes end up with another field having duplicate values.
can anyone tell me why my query is behaving this way..i tried adding distinct inside
group_concat(ditinct(exams_scores.admission_no))
but that didnt work ??
You're grouping by exams_scores.subject_code, value. If you add them to your selected columns (...as orderedscore, exams_scores.subject_code, value from...), you should see that all rows are distinct with respect to these two columns you grouped by. Which is the correct semantics of GROUP BY.
Edit, to clarify:
First, the SQL server removes some rows according to your WHERE clause.
Afterwards, it groups the remaining rows according to your GROUP BY clause.
Finally, it selects the colums you specified, either by directly returning a column's value or performing a GROUP_CONCAT on some of the columns and returning their accumulated value.
If you select columns not included in the GROUP BY clause, the returned results for these columns are arbitrary, since the SQL server reduces all rows equal with respect to the columns specified in the GROUP BY clause to one single row - as for the remaining columns, the results are pretty much undefined (hence the "randomness" you're experiencing), because - what should the server choose as a value for this column? It can only pick one randomly from all the reduced rows.
In fact, some SQL servers won't perform such a query and return an SQL error, since the result for those columns would be undefined, which is something you don't want to have in general. With these servers (I believe MSSQL is one of them), you more or less can only have columns in you SELECT clause which are part of your GROUP BY clause.
Edit 2: Which, finally, means that you have to refine your GROUP BY clause to obtain the grouping that you want.

SELECT from Union x 3 using filter of another table

Background
I have a web application which must remove entries from other tables, filtered through a selection of 'tielists' from table 1 -> item_table 1, table 2, table 3.... now basically my result set is going to be filthy big unless I use a filter statement from another table, using a user_id... so can someone please help me structure my statement as needed? TY!
Tables
cars_belonging_to_user
-----------------------------
ID | user_id | make | model
----------------------------
1 | 1 | Toyota | Camry
2 | 1 |Infinity| Q55
3 | 1 | DMC | DeLorean
4 | 2 | Acura | RSX
Okay, Now the three 'tielists'
name:tielist_one
----------------------------
id | id_of_car | id_x | id_y|
1 | 1 | 12 | 22 |
2 | 2 | 23 | 32 |
-----------------------------
name:tielist_two
-------------------------------
id | id_of_car | id_x | id_z|
1 | 3 | 32 | 22 |
-----------------------------
name: tielist_three
id | id_of_car | id_x | id_a|
1 | 4 | 45 | 2 |
------------------------------
Result Set and Code
echo name_of_tielist_table
// I can structure if statements to echo result sets based upon the name
// Future Methodology: if car_id is in tielist_one, delete id_x from x_table, delete id_y from y_table...
// My output should be a double select base:
--SELECT * tielists from WHERE car_id is 1... output name of tielist... then
--SELECT * from specific_tielist where car_id is 1.....delete x_table, delete y_table...
Considering the list will be massive, and the tielist equally long, I must filter the results where car_id(id) = $variable && user_id = $id....
Side Notes
Only one car id will appear once in any single tielist..
This select statement MUST be filtered with user_id = $variable... (and remember, i'm looking for which car id too)
I MUST HAVE THE NAME of the tielist it comes from able to be echo'd into a variable...
I will only be looking for one single id_of_car at any given time, because this select will be contained in a foreach loop.
I was thinking a union all items would do the trick to select the row, but how can I get the name of the tielist the row is in, and how can the filter be used from the user_id row
If you want performance, I would suggest left outer join instead of union all. This will allow the query to make efficient use of indexes for your purpose.
Based on what you say, a car is in exactly one of the lists. This is important for this method to work. Here is the SQL:
select cu.*,
coalesce(tl1.id_x, tl2.id_x, tl3.id_x) as id_x,
tl1.y, tl2.idz, tl3.id_a,
(case when tl1.id is not null then 'One'
when tl2.id is not null then 'Two'
when tl3.id is not null then 'Three'
end) as TieList
from Cars_Belonging_To_User cu left ouer join
TieList_One tl1
on cu.id_of_car = tl1.id_of_car left outer join
TieList_Two tl2
on cu.id_of_car = tl2.id_of_car left outer join
TieList_Three tl3
on cu.id_of_car = tl3.id_of_car;
You can then add a where clause to filter as you need.
If you have an index on id_of_car for each tielist table, then the performance should be quite good. If the where clause uses an index on the first table, then the joins and where should all be using indexes, and the query will be quite fast.

How can I make two condition in having clause

I have a table similar to:
domain | file | Number
------------------------------------
aaa.com | aaa.com_1 | 111
bbb.com | bbb.com_1 | 222
ccc.com | ccc.com_2 | 111
ddd.com | ddd.com_1 | 222
eee.com | eee.com_1 | 333
I need to query the number of Domains that share the same Number and their File name ends with _1. I tried the following:
select count(domain) as 'sum domains', file
from table
group by Number
having
count(Number) >1 and File like '%\_1';
It gives me:
sum domains | file
------------------------------
2 | aaa.com
2 | bbb.com
I expected to see the following:
sum domains | file
------------------------------
1 | aaa.com
2 | bbb.com
Because the Number 111 appears once with File ends with _1 and _2, so it should count 1 only. How can I apply the 2 conditions that I stated earlier correctly ?
As documented under SELECT Syntax:
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization.
In other words, it is applied after the grouping operation has been performed (in contrast with WHERE, which is performed before any grouping operation). See WHERE vs HAVING.
Therefore, your current query first forms the resultset from the following:
SELECT COUNT(domain) AS `sum domains`, file
FROM `table`
GROUP BY Number
See it on sqlfiddle:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
| 1 | eee.com_1 |
As you can see, the values selected for the file column are merely one of the values from each group—as documented under MySQL Extensions to GROUP BY:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Your current query then proceeds to filter these results according to your HAVING clause:
HAVING COUNT(Number) > 1 AND file LIKE '%\_1'
With the values of file selected above, every single group matches on the second criterion; and the first two groups match on the first criterion. Therefore the results of the complete query are:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
Following your comments above, you want to filter the records on file before grouping and then filter the resulting groups for those containing more than one match. Therefore use WHERE and HAVING respectively (and select Number instead of file to identify each group):
SELECT Number, COUNT(*) AS `sum domains`
FROM `table`
WHERE file LIKE '%\_1'
GROUP BY Number
HAVING `sum domains` > 1
See it on sqlfiddle:
| NUMBER | SUM DOMAINS |
------------------------
| 222 | 2 |
i am using by following
having ( SUM(qty) > 4 AND SUM(qty) < 15 )
You cannot have the file name in the SELECT statement if it is not also in the GROUP BY. You have to get your GROUP BY result than JOIN back to the original and add the filter logic like so:
SELECT *
FROM
(
select count(domain) as 'sum_domains', Number
from table
group by Number
having
count(Number) >1
) result
join table t on result.Number = t.Number
WHERE file like '%\_1'
Try the nested query below:
select count(domain) as 'sum domains', domain as fileName
from
(select domain, file from tableName
group by Number
having count(Number) >1) as temp
WHERE file like '%\_1';