How Group By works with Duplicates - mysql

I am trying to get some insight as to how some SQL statements work. Right know I am looking into GROUP BY and want to know, how does it choose what to show/return with duplicate data.
Consider the following example:
CREATE TABLE customers
(
FirstName VARCHAR(50),
LastName VARCHAR(50),
MobileNo VARCHAR(15)
);
INSERT INTO customers VALUES ('Niraj','Yadav',989898);
INSERT INTO customers VALUES ('Chetan','Gadodia',959595);
INSERT INTO customers VALUES ('Chetan','Gadodia',959590);
INSERT INTO customers VALUES ('Atul','Kokam',42424242);
INSERT INTO customers VALUES ('Atul','Kokam',42424246);
INSERT INTO customers VALUES ('Vishal','Parte',9394452);
INSERT INTO customers VALUES ('Vishal','Parte',939445);
INSERT INTO customers VALUES ('Vishal','Parte',9394451);
INSERT INTO customers VALUES ('Jinendra','Jain',12121);
INSERT INTO customers VALUES ('Jinendra','Jain',121212);
If I run this query...
SELECT *
FROM customers
GROUP BY FirstName;
I get the following results:
FirstName LastName MobileNo
--------- -------- ----------
Atul Kokam 42424242
Chetan Gadodia 959595
Jinendra Jain 12121
Niraj Yadav 989898
Vishal Parte 9394452
So, my question is: is there any reason why it returns these particular records? How does it determine what to get? I'm using MySQL.

In other databases, your query would not be allowed exactly because the results are unpredictable in this case.
Notice what the MySQL documentation has to say for this case:
MySQL Handling of GROUP BY
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
I should also mention, that Gordon Linoff recently pointed out to me that, starting in version 5.7 of MySQL, a query like yours, where unpredictable results are possible, will no longer be allowed by default.
Info on that: MySQL 5.7: only_full_group_by Improved, Recognizing Functional Dependencies, Enabled by Default!

Related

Why I can't directly use the column from another table behind not in function?

I have two tables. One is customers and another one is orders. This is the raw data to create the two tables:
Create table If Not Exists Customers (Id int, Name varchar(255));
Create table If Not Exists Orders (Id int, CustomerId int);
insert into Customers (Id, Name) values ('1', 'Joe');
insert into Customers (Id, Name) values ('2', 'Henry');
insert into Customers (Id, Name) values ('3', 'Sam');
insert into Customers (Id, Name) values ('4', 'Max');
insert into Orders (Id, CustomerId) values ('1', '3');
insert into Orders (Id, CustomerId) values ('2', '1');
And now I have to write a SQL query to find all customers who never order anything.
This is the correct answer:
select
name as customer
from
customers
where
customers.id
not in
(select
customerid
from
orders);
This is my answer:
select
name as customer
from
customers
where
customers.id
not in
orders.customerid;
And the MySQL response to my codes is "Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'orders.customerid' at line 8".
What's wrong with my codes?
Your code is missing the reference to the orders table. This ref for a valid SQL syntax must be declared in proper FROM and WHERE clause or in a proper JOIN clause
For "not in" you could use left join and check for null
select name as customer
from customers
left join orders on orders.customerid = customer.id
where orders.customerid is null
If you don't declare the reference to the table orders simply the db engine can't know from which table must be retrieved the column value. The table declaration must be explicit and assigned in the proper clause (from or join).
IN, which is not a function BTW, needs a "set" or "list" as second operator which can typically be obtained by a subquery (or by giving a bunch of comma separated literals). In your case the subquery needs to get all customer IDs from the orders. So you might try:
SELECT c.name AS customer
FROM customers c
WHERE c.id NOT IN (SELECT o.customerid
FROM orders o);
But instead of using NOT IN, NOT EXISTS and a correlated subquery might be a better, possibly more performant solution here, especially if an index on (orders.customerid) exists:
SELECT c.name AS customer
FROM customers c
WHERE NOT EXISTS (SELECT *
FROM orders o
WHERE o.customerid = c.id);
You ask "why" about a language that's almost a half-century old. With respect, some of those language-design decisions are hard to recover. To get a perfect answer you need to track down Codd, Chamberlin, and Boyce. Why ask why?
At a more abstract level: SQL is fundamentally a set-manipulation language. Tables contain sets of rows. SELECT queries make their own sets -- they're even called result sets -- from other sets of rows.
In the SQL world, a column is not a set by itself. You can make a set from a column by saying things like this.
SELECT customerid FROM orders
And, the NOT IN() operation takes a set with just one column as its parameter. So you can give NOT IN() and IN() sets but not columns.

sql min function and other column for sqlite

For sqlite I see that such query return correct result:
CREATE TABLE users(id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL,
salary INTEGER NOT NULL);
insert into users (user_id, salary) values (1, 42000);
insert into users (user_id, salary) values (2, 39000);
insert into users (user_id, salary) values (3, 50000);
sqlite> SELECT user_id, MAX(salary) FROM users;
3|50000
sqlite> SELECT user_id, MIN(salary) FROM users;
2|39000
but looks like for mysql for example works in other way:
sql min function and other column
and return 1|50000.
Is it sqlite extension or may be mysql wrong in this case and this standard behaviour for SQL implementation?
The result is not "correct" in SQLite. SQLite extends its functionality to support these types of non-standard queries. This is clearly an extension of functionality, and one I wish it did not do.
The queries are non-standard because there is an unaggregated column in the SELECT (id) but the query is an aggregation query (because of the MIN()/MAX()). The more recent versions of MySQL with the default settings correctly reject this query as not syntactically correct. Older versions of MySQL return a value of id from an arbitrary row. SQLite has extended the definition of SQL for this special case and brings back the value of id that corresponds to the maximum or minimum salary.
In both databases, the better approach is:
SELECT user_id, salary as max_salary
FROM users
ORDER BY salary DESC
LIMIT 1;
and:
SELECT user_id, salary as min_salary
FROM users
ORDER BY salary ASC
LIMIT 1;

filtering from sql

I use my SQL for my app.
Say I have a table of all registered users for my app.
say I have users at hand and I want to filter (or select) from my database the only ones that are registered.
For example my data base have user1,user2......user100
and input user set : user3,user5,user10,user999,user2000 so the output of the query will be : user3,user5 and user 10 only.
Thank you in advance
You seem to want in:
select t.*
from t
where user_id in ('user3', 'user5', 'user10', 'user999', 'user2000')
This will return only the matching users.
The format the user is passing these values is very important here. I am assuming that you have different rows of information. If in that case, you could make use of the below code.
Declare #MyTableVar table
(User_ID VARCHAR(32) primary key)
INSERT #MyTableVar VALUES ('user3')
INSERT #MyTableVar VALUES ('user5')
INSERT #MyTableVar VALUES ('user10')
INSERT #MyTableVar VALUES ('user999')
INSERT #MyTableVar VALUES ('user2000')
SELECT *
FROM #MyTableVar
WHERE User_ID NOT IN (SELECT USER_ID FROM database.schema.table_name)
If your user is passing values in the same row you can convert them to multiple rows using CROSS APPLY. Example can be seen here
Kartheek

SQL SERVER 2012 IF EXIST and IF NOT EXIST

I am trying to write a query to check if a meter exist then do an update, it not do an insert to the monthly data table! The problem is I am confused about the syntax and I am not sure how to do it!
This is the database design. Let's say I have the meter number 2012345 how do I do that? Thank you
In SQL Server (using just name as a demo, you'll of course want more fields)
MERGE meters AS target
USING (SELECT '2012345') AS source (meternumber)
ON (target.meternumber = source.meternumber)
WHEN MATCHED THEN
UPDATE SET name='MeterUpdate#1', meternumber=source.meternumber
WHEN NOT MATCHED THEN
INSERT (name, meternumber) VALUES ('MeterInsert#1', source.meternumber);
An SQLfiddle to test with.
In MySQL, create a unique index on meters(meternumber);
CREATE UNIQUE INDEX bop ON meters(meternumber);
then insert/update using;
INSERT INTO meters (name, meternumber) VALUES ('MeterInsert#1', '2012345')
ON DUPLICATE KEY UPDATE name='MeterUpdate#1', meternumber='2012345';
Another SQLfiddle.
SQL Server has MERGE http://msdn.microsoft.com/en-us/library/bb522522(v=sql.105).aspx that you can use as an UPSERT. Here's an example copied from the docs (Example A):
MERGE dbo.FactBuyingHabits AS Target
USING (SELECT CustomerID, ProductID, PurchaseDate FROM dbo.Purchases) AS Source
ON (Target.ProductID = Source.ProductID AND Target.CustomerID = Source.CustomerID)
WHEN MATCHED THEN
UPDATE SET Target.LastPurchaseDate = Source.PurchaseDate
WHEN NOT MATCHED BY TARGET THEN
INSERT (CustomerID, ProductID, LastPurchaseDate)
VALUES (Source.CustomerID, Source.ProductID, Source.PurchaseDate)
OUTPUT $action, Inserted.*, Deleted.*;

SQL Insert into table only if record doesn't exist [duplicate]

This question already has answers here:
Check if a row exists, otherwise insert
(12 answers)
MySQL Conditional Insert
(13 answers)
Closed 9 years ago.
I want to run a set of queries to insert some data into an SQL table but only if the record satisfying certain criteria are met. The table has 4 fields: id (primary), fund_id, date and price
I have 3 fields in the query: fund_id, date and price.
So my query would go something like this:
INSERT INTO funds (fund_id, date, price)
VALUES (23, '2013-02-12', 22.43)
WHERE NOT EXISTS (
SELECT *
FROM funds
WHERE fund_id = 23
AND date = '2013-02-12'
);
So I only want to insert the data if a record matching the fund_id and date does not already exist. If the above is correct it strikes me as quite an inefficient way of achieving this as an additional select statement must be run each time.
Is there a better way of achieving the above?
Edit: For clarification neither fund_id nor date are unique fields; records sharing the same fund_id or date will exist but no record should have both the same fund_id and date as another.
This might be a simple solution to achieve this:
INSERT INTO funds (ID, date, price)
SELECT 23, DATE('2013-02-12'), 22.5
FROM dual
WHERE NOT EXISTS (SELECT 1
FROM funds
WHERE ID = 23
AND date = DATE('2013-02-12'));
p.s. alternatively (if ID a primary key):
INSERT INTO funds (ID, date, price)
VALUES (23, DATE('2013-02-12'), 22.5)
ON DUPLICATE KEY UPDATE ID = 23; -- or whatever you need
see this Fiddle.
Although the answer I originally marked as chosen is correct and achieves what I asked there is a better way of doing this (which others acknowledged but didn't go into). A composite unique index should be created on the table consisting of fund_id and date.
ALTER TABLE funds ADD UNIQUE KEY `fund_date` (`fund_id`, `date`);
Then when inserting a record add the condition when a conflict is encountered:
INSERT INTO funds (`fund_id`, `date`, `price`)
VALUES (23, DATE('2013-02-12'), 22.5)
ON DUPLICATE KEY UPDATE `price` = `price`; --this keeps the price what it was (no change to the table) or:
INSERT INTO funds (`fund_id`, `date`, `price`)
VALUES (23, DATE('2013-02-12'), 22.5)
ON DUPLICATE KEY UPDATE `price` = 22.5; --this updates the price to the new value
This will provide much better performance to a sub-query and the structure of the table is superior. It comes with the caveat that you can't have NULL values in your unique key columns as they are still treated as values by MySQL.
Assuming you cannot modify DDL (to create a unique constraint) or are limited to only being able to write DML then check for a null on filtered result of your values against the whole table
FIDDLE
insert into funds (ID, date, price)
select
T.*
from
(select 23 ID, '2013-02-12' date, 22.43 price) T
left join
funds on funds.ID = T.ID and funds.date = T.date
where
funds.ID is null