How to find items with multiple different names - mysql

I'm having trouble with a very simple sql query. I want to identify all items that have more than one name. Here's what I'm currently doing:
select group_concat(distinct name) names
from table
group by master_id
having names like '%,%'
Unfortunately, a lot of names have a , in it, so the above doesn't work well. What would be the correct way to do this query?

Here is a correct version of your query:
SELECT
master_id,
GROUP_CONCAT(DISTINCT name) names
FROM yourTable
GROUP BY master_id
HAVING COUNT(DISTINCT name) > 1;
The reason we need to count distinct in the HAVING clause is that a logical item in the aggregated string is a distinct name.

The correct solution would be:
… HAVING COUNT(name) > 1
In a query using GROUP BY, aggregate functions like COUNT(), MIN(), and MAX() (as well as GROUP_CONCAT(), as well as a few others) can be used to operate on all values of a column in the grouped rows.
You could also include COUNT(name) in the columns to return the number of names for the master_id.

Related

MySQL employee database

I am having trouble designing a query that does the following:
List employee names, employee numbers, and their respective total earningPerProject using the following database schema:
department(primary key(deptName), deptName, deptCity)
employee(primary key(empNum), empName, empCity)
project(primary key(projectNum), projectName, budget)
worksOn(foreign key(empNum), foreign key(projectNum), deptNum, jobTitle, startDate, earningPerProject)
I am able to display the employee names and employee numbers but when it comes to the total of the earningPerProject for each employee I am lost.
Some employees are listed more than once, and I realize I have to use the aggregate functions SUM() and COUNT(), but I haven't figured out a way to do this successfully.
Here is what I have so far:
SELECT DISTINCT(empName), employee.empNum, earningPerProject FROM employee, worksOn
WHERE worksOn.empNum = employee.empNum;
Could someone assist me with some hints or example queries. I am not sure how I would go about doing this.
Here you must use the GROUP BY clause and SUM() to compute to total earningsPerProject for the given employee.
DISTINCT is not necessary. In your code you used DISTINCT(empName) which looks like you want to eliminate duplicate employee names in the result. It is possible to have two employees with the same name so retrieving only unique names could leave some employees out of your results. This is why we use things like empNum as a primary key instead of names. You actually want to retrieve the distinct combos of empNum and empName.
You are correct that there can be duplicate empNum in the worksOn table because a given employee could work on multiple projects. The GROUP BY will group together all rows having the same empNum and empName and combine them into a single row thus eliminating the need for DISTINCT. (More below)
Here I have modified your query to include the SUM() and GROUP BY.
SELECT employee.empNum, employee.empName, SUM(worksOn.earningPerProject)
FROM employee, worksOn
WHERE employee.empNum = worksOn.empNum
GROUP BY employee.empNum, employee.empName;
JOIN
The syntax used in your FROM clause (FROM employee, worksOn) where you list the tables to be joined together on the same line and comma separated is what is known as an implicit join. This syntax was deprecated with the release of SQL-92 according to Join (SQL).
Best practice dictates that you switch to using the new syntax known as the explicit join by using the JOIN keyword with the added ON keyword to describe the link between the tables.
The new JOIN syntax is functionally equivalent to the old implicit join syntax. Both produce the same results.
SELECT employee.empNum, employee.empName, SUM(worksOn.earningsPerProject)
FROM employee
JOIN worksOn ON employee.empNum = worksOn.empNum
GROUP BY employee.empNum, employee.empName;
DISTINCT
DISTINCT is a SQL keyword that eliminates duplicate result rows based on the expressions in your SELECT list. If you request only one expression (SELECT empCity FROM employee) it returns the unique values for that expression (it only shows each city once). If you have request more than one expression it returns unique combinations of those expressions.
Many database engines use GROUP BY to calculate DISTINCT results so using them together is usually redundant.
Your query includes some unfortunately legal SQL syntax. You put parentheses around empName which gave SELECT DISTINCT (empName), employee.empNum, .... This syntax is misleading because DISTINCT is a keyword and not a function and the parentheses here are not used by DISTINCT. When DISTINCT is used it applies to all expressions in the SELECT. In this case removing the parentheses does not change the meaning though it does make it more clear.
These three queries are equivalent:
SELECT DISTINCT empName, employee.empNum, ...
SELECT DISTINCT (empName), employee.empNum, ...
SELECT DISTINCT empName, (employee.empNum), ...
Parentheses in SQL can be used to group expressions and are typically used to force the order of evaluation when dealing with operators such as <, >, =, *, /. Placing parentheses around a single expression does not change its value. While you thought you were using DISTINCT for just empName you really were just wrapping the expression empName in parentheses which effectively did nothing.
You can test this by running this query
SELECT empName FROM employee
and this query
SELECT (empName) FROM employee
and you will see the same results.

Why is the following SQL query erroneous?

/* erroneous query */
select dept name, ID, avg (salary)
from instructor
group by dept name;
I know that every non-aggregated function must appear in group by if it appears in select. However this query still runs in mySQL.
should it be:
/* erroneous query */
select dept name, ID, avg (salary)
from instructor
group by dept name, **ID**;
Because I ran the both queries and they give the exact same answers!
MySQL will allow you to not include non-aggregated columns in your group by, which is just a terrible idea to me. This can result in some very un-predictable results. Here's a link to the documentation:
Clicky!
it should be:
select [dept name], ID, AVG(salary)
from instructor
group by [dept name]
Now it would be more instructive to show the columns defined in your table, but you CANNOT have spaces in a column name without the column being wrapped in brackets live I did above.
From the MySQL documentation on this particular point:
In standard SQL, a query that includes a GROUP BY clause cannot refer
to non-aggregated columns in the select list that are not named in the
GROUP BY clause. For example, this query is illegal in standard SQL
because the name column in the select list does not appear in the
GROUP BY ...
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group.
So roughly spoken the omitted columns get added automatically.
However, note that it is not exactly the same. Have a look at this example.
SELECT name, address, MAX(age) FROM customers GROUP BY name, address;
might give you something different as:
SELECT name, address, MAX(age) FROM customers GROUP BY name;
Check this Fiddle.
Your statement that "I know that every non-aggregated function must appear in group by if it appears in select" is according to me correct. I am not a SQL guru but thats my understanding too. I would have expected a syntax error to be flagged if your statement does not meet that condition. However, if it gives the same result, then the one possibility is that you have the same value in all rows for the ID field or whatever field it is that is missing in the group by list. Just check different values and see. Also, it may help to expressly use "as" for alias rather than blanks.
MySQL extends the use of GROUP BY so you can select nonaggregated columns, not named in the group by clause:
SELECT dept_name, ID, avg(salary)
FROM instructor
GROUP BY dept_name;
the previous query is perfectly valid in MySQL, while other DBMS will rise an error because of ID not present in the group by clause.
However, if there are more than one ID for each dept_name, the value of ID returned by MySQL will be undetermined.
You can configure MySQL to disable this extension.

Scope of COUNT(DISTINCT ..) when used with GROUP BY

I'm doing something like follows (Example, getting distinct people named "Mark" by State):
Select count(distinct FirstName) FROM table
GROUP BY State
I think the group by query organization is done first, such that the distinct is only relative to each "group by"? Basically, can "Mark" show up as a "distinct" count in each group? This would "scope" my distinct expression to the group by rows only, I believe...
This may actually depend on where DISTINCT is used. For example, SELECT DISTINCT COUNT( would be different than SELECT COUNT(DISTINCT.
In this case, it will work as you want and get a count of distinct names in each group (even if the names are not distinct across groups).
Your understanding is correct. Group by says, essentially, to take a group of rows and aggregate them into one row (based on the criteria). All aggregation functions -- including count(distinct) -- summarize values in this group.
As a note, you are using the word "scope". Just so you know, this has a particular meaning in SQL. The meaning refers to the portions of the query where a column or table alias are understood by the compiler.

mysql ORDER BY MIN() not matching up with id

I have a database that has the following columns:
-------------------
id|domain|hit_count
-------------------
And I would like to perform this query on it:
SELECT id,MIN(hit_count)
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY MIN(hit_count)
I would like this query to give me the id of the row that had the smallest hit_count for $domain. The only problem is that if I have two rows that have the same domain, say www.bestbuy.com, the query will just group by whichever one came first, and then although I will get the correct lowest hit_count, the id may or may not be the id of the row that has the lowest hit_count.
Does anyone know of a way for me to perform this query and to get the id that matches up with MIN(hit_count)? Thanks!
Try this:
SELECT id,MIN(hit_count),domain FROM table GROUP BY domain HAVING domain='$domain'
See, when you're using aggregates, either via aggregate functions (and min() is such a function) or via GROUP BY or HAVING operators, your data is being grouped. In your case it is grouped by domain. You have 2 fields in your select list, id and min(hit_count).
Now, for each group database knows which hit_count to pick, as you've specified this explicitly via the aggregate function. But what about id — which one should be included?
MySQL internally wraps such fields into max() aggregate function, which I find an error prone approach. In all other RDBMSes you will get an error for such a query.
The rule is: if you use aggregates, then all columns should be either arguments of aggregate functions or arguments of GROUP BY operator.
To achieve the desired result, you need a subquery:
SELECT id, domain, hit_count
FROM `table`
WHERE domain = '$domain'
AND hit_count = (SELECT min(hit_count) FROM `table` WHERE domain = '$domain');
I've used backticks, as table is a reserved word in SQL.
SELECT
id,
hit_count
FROM
table
WHERE
domain='$domain'
AND hit_count = (SELECT MIN(hit_count) FROM table WHERE domain='$domain')
Try this:
SELECT id,hit_count
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY hit_count ASC;
This should also work:
select id, MIN(hit_count) from table where domain="$domain";
I had same question. Please see that question below.
min(column) is not returning me correct data of other columns
You are using a GROPU BY. Which means each row in result represents a group of values.
One of those values is the group name (the value of the field you grouped by). The rest are arbitrary values from within that group.
For example the following table:
F1 | F2
1 aa
1 bb
1 cc
2 gg
2 hh
If u will group by F1: SELECT F1,F2 from T GROUP BY F1
You will get two rows:
1 and one value from (aa,bb,cc)
2 and one value from (gg,hh)
If u want a deterministic result set, you need to tell the software what algorithem to apply to the group. Several for example:
MIN
MAX
COUNT
SUM
etc etc
There is a most simplist way your query is OK just modify it with DESC keyword after GROUP BY domain
SELECT
id,
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
Explanation:
When you use group by with aggregate function it always selects the first record but if you restrict it with desc keyword it will select the lowest or last record of that group.
For testing puspose use this query that has only group_concat added.
SELECT
group_concat(id),
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
If you can have duplicated domains group by id:
SELECT id,MIN(hit_count)
FROM domain WHERE domain='$domain'
GROUP BY id ORDER BY MIN(hit_count)

MySQL GROUP BY returns only first row

I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?
As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.
It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.
Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.
as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID
Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'
SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work
The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.
Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.
Query result is perfect; it will return only one row.