How can I exclude duplicate records from sql query - sql-server-2008

I am trying to create a report that basically pulls all data from a sql table. The problem is this table was used for registration purposes and there are cases where the same employee registered for two different events and so there are multiple entries for them in the table. I really don't need to know which events they registered for, just that they did register. Is there any way to modify my query so that only one record per employee is selected? I tried to use DISTINCT but that still pulled every single record.

The explanation for what you are seeing most likely is that the records you are getting back are not duplicates. For example, if you used SELECT *, then you would be getting back all columns, and even though the employee name, id, etc. columns could be identical for multiple records, the other columns might be distinct.
Without knowing your table structure, I cannot give an exact answer. But here are two ways to deal with this. First, you can use SELECT DISTINCT with only columns corresponding to the employee, e.g.
SELECT DISTINCT firstName,
lastName,
id
FROM yourTable
This has the disadvantage that it throws away information from all the other columns. If you want to retain those other columns, then a second option would be to group records by employee, and then use aggregate functions on the other columns, e.g.
SELECT firstName,
lastName,
id,
MIN(signupTime)
FROM yourTable
GROUP BY firstName,
lastName,
id
In this query, I assumed there is a timestamp column called signupTime, which recorded the time for each employee signup (including possible duplicate signups). This query would retain the first signup time, along with the employee id and name.

Use ROW_NUMBER() for this purpose
;WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY ID) RN
FROM Table
)
SELECT *
FROM CTE
WHERE RN=1

you should grouping them based on fullname, employeeid or whatever... . I have mentioned a simple sample to show you how to use group by command in SQL queries :
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;
I hope it works for you.

So by reading the question I assume you just want the same employee name to show up once no matter whether they have registered for more than 1 or 2 events.
For example:
$query_employee = mysql_query("SELECT employee_name, events FROM tbl GROUP BY employee_name");

Related

GROUP BY clause in MySQL groups records with different values

MySQL GROUP BY clause groups records even when they have different values.
However I would like it to as with DB2 SQL so that if records not contain exactly the same information they are not grouped.
Currently in MySQL for:
id Name
A Amanda
A Ana
the Group by id would return 1 record randomly (unless aggregation clauses used of course)
However in DB2 SQL the same Group by id would not group those: returning 2 records and never doing such a thing as picking randomly one of the values when grouping without using aggregation functions.
First, id is a bad name for a column that is not the primary key of a table. But that is not relevant to your question.
This query:
select id, name
from t
group by id;
returns an error in almost any database other than MySQL. The problem is that name is not in the group by and is not the argument of an aggregation function. The failure is ANSI-standard behavior, not honored by MySQL.
A typical way to write the query is:
select id, max(name)
from t
group by id;
This should work in all databases (assuming name is not some obscure type where max() doesn't work).
Or, if you want each name, then:
select id, name
from t
group by id, name;
or the simpler:
select distinct id, name
from t;
In MySQL, you can get the ANSI standard behavior by setting ONLY_FULL_GROUP_BY for the database/session. MySQL will then return an error, as DB2 does in this case.
The most recent versions of MySQL have ONLY_FULL_GROUP_BY set by default.
Group by in mysql will group the records according to the set fields. Think of it as: It gets one and the others will not show up. It has uses, for example, to count how many times that ID is repeated on the table:
select count(id), id from table group by id
You can, however, to achieve your purpose, group by multiple fields, something among the lines of:
select * from table group by id, name
I do not think there is an automated way to do this but using
GROUP BY id, name
Would give you the solution you are looking for

Mysql combine two results and group them by field

I have been trying but it seems I am missing something. I want to combine two results from two tables by a common field.
I would like to group results from these two queries by customer field.
SELECT errors.customer, count(errors.customer) as err_count,severity from errors group by customer,severity;
SELECT customer,sum(size) as Tot_size,count(customer) as Policy_count from backup group by customer;
I have tried this.
SELECT errors.customer, count(errors.customer) as err_count,severity from errors group by customer,severity union all SELECT customer,count(customer) as Policy_count ,sum(size) as Tot_size from backup group by customer;
But for some reason some columns are missing.
You should follow the requirements for union:
The UNION operator is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
Apparently, the above items are not satisfied in your query.
Try something like this:
SELECT q1.customer, Tot_size, Policy_count, err_count, severity
FROM ( SELECT customer, SUM(size) AS Tot_size, COUNT(customer) AS Policy_count
FROM backup GROUP BY customer ) q1
LEFT JOIN ( SELECT customer, COUNT(customer) AS err_count, severity
FROM errors GROUP BY customer, severity ) q2 ON q1.costumer = q2.costumer
Your first query contains three columns and your second one contains two columns.
In order to use the UNION operator your two queries need to have the same amount of columns, and the columns should be compatible.
In your case the second query lacks a third column. If there is no corresponding column to use you can set a default such as
"'n/a' as severity "
if it should be textual or
"0 as severity "
for a numerical value.
Cheers Martin

ms-access remove fails on the same date from the same person when there is a pass

Thank all of you for your help. My Access database is coming along nicely.
My new question is:
I have three fields that are to be considered to get the result I am looking for.
Date, EMPID and the RESULTS field.
The date is simply a date with no time.
The EMPID is a unique employee identifier.
The Results field states either pass or fail.
What I am trying to do is on any single date (there may be many dates, but each is to be considered separately) an employee may test many times and have multiple failures (there can only ever be one passing result). If on the same date the same employee passes, then all fails are to be removed. If there is no pass, then leave one fail.
Thank You,
Query 1: QueryPass
SELECT *
FROM Table1
WHERE Results="Pass";
Query 2: QueryFail
SELECT *
FROM Table1
WHERE EmpID & ResDate Not In (SELECT EmpID & ResDate FROM QueryPass);
Query 3: QueryReport
SELECT EmpID, ResDate, Results FROM QueryFail
UNION SELECT EmpID, ResDate, Results FROM QueryPass;
NOTE: Date is a reserved word in Access, should avoid using reserved words as names for anything.
Since there can be multiple rows per person (and all in the same table), and if someone passes, it will be the 'last' record for that person, the following will give you what you need.
SELECT Last(Table1.[TestDate]) AS LastOfTestDate,
Table1.[EMPID], Last(Table1.[TestResult]) AS LastOfTestResult
FROM Table1
GROUP BY Table1.[EMPID];

mysql ORDER BY MIN() not matching up with id

I have a database that has the following columns:
-------------------
id|domain|hit_count
-------------------
And I would like to perform this query on it:
SELECT id,MIN(hit_count)
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY MIN(hit_count)
I would like this query to give me the id of the row that had the smallest hit_count for $domain. The only problem is that if I have two rows that have the same domain, say www.bestbuy.com, the query will just group by whichever one came first, and then although I will get the correct lowest hit_count, the id may or may not be the id of the row that has the lowest hit_count.
Does anyone know of a way for me to perform this query and to get the id that matches up with MIN(hit_count)? Thanks!
Try this:
SELECT id,MIN(hit_count),domain FROM table GROUP BY domain HAVING domain='$domain'
See, when you're using aggregates, either via aggregate functions (and min() is such a function) or via GROUP BY or HAVING operators, your data is being grouped. In your case it is grouped by domain. You have 2 fields in your select list, id and min(hit_count).
Now, for each group database knows which hit_count to pick, as you've specified this explicitly via the aggregate function. But what about id — which one should be included?
MySQL internally wraps such fields into max() aggregate function, which I find an error prone approach. In all other RDBMSes you will get an error for such a query.
The rule is: if you use aggregates, then all columns should be either arguments of aggregate functions or arguments of GROUP BY operator.
To achieve the desired result, you need a subquery:
SELECT id, domain, hit_count
FROM `table`
WHERE domain = '$domain'
AND hit_count = (SELECT min(hit_count) FROM `table` WHERE domain = '$domain');
I've used backticks, as table is a reserved word in SQL.
SELECT
id,
hit_count
FROM
table
WHERE
domain='$domain'
AND hit_count = (SELECT MIN(hit_count) FROM table WHERE domain='$domain')
Try this:
SELECT id,hit_count
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY hit_count ASC;
This should also work:
select id, MIN(hit_count) from table where domain="$domain";
I had same question. Please see that question below.
min(column) is not returning me correct data of other columns
You are using a GROPU BY. Which means each row in result represents a group of values.
One of those values is the group name (the value of the field you grouped by). The rest are arbitrary values from within that group.
For example the following table:
F1 | F2
1 aa
1 bb
1 cc
2 gg
2 hh
If u will group by F1: SELECT F1,F2 from T GROUP BY F1
You will get two rows:
1 and one value from (aa,bb,cc)
2 and one value from (gg,hh)
If u want a deterministic result set, you need to tell the software what algorithem to apply to the group. Several for example:
MIN
MAX
COUNT
SUM
etc etc
There is a most simplist way your query is OK just modify it with DESC keyword after GROUP BY domain
SELECT
id,
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
Explanation:
When you use group by with aggregate function it always selects the first record but if you restrict it with desc keyword it will select the lowest or last record of that group.
For testing puspose use this query that has only group_concat added.
SELECT
group_concat(id),
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
If you can have duplicated domains group by id:
SELECT id,MIN(hit_count)
FROM domain WHERE domain='$domain'
GROUP BY id ORDER BY MIN(hit_count)

Is it wrong to have count(*) as part of the select query

select id, first_name, count(*) from users;
The users table contains 10 entries, but the above select query shows only a single row. Is it wrong to mix count(*) as part of the above query?
COUNT is a function that aggregates. You can't mix it into your normal query.
If you want to receive the ten entries just do a normal select:
SELECT id, name FROM users;
and to get the number of entries:
SELECT COUNT(id) FROM users;
Its becuase you are using an aggregate function in the select part of the query,
to return the 10 records you just need the id, and first_name in the query.
EG:
SELECT id, first_Name
FROM users
if you wanted to get a count of the records in the table then you could use
SELECT (Count(id))
FROM [users]
It's not "wrong", but it is meaningless without a "group by" clause - most databases will reject that query, as aggregate functions should include a group by if you're including other columns.
Not sure exactly what you're trying to achieve with this?
select id, first_name,(select count(*) from users) AS usercount from users;
will give each individual user and the total count but again, not sure why you would want it.
select id, first_name from users,(select count(*) as total from users) as t;
COUNT is an aggregate function and it will always give you count of all records in table unless used in combination with group by.
If you use it in combination with normal query, then it will take priority in deciding the final output as in your case it returns 1.
If you want to return all 10 records, you should just write -
select id,first_name from users
If you need number of rows in a table, you can use MySQL's SQL_CALC_FOUND_ROWS clause. Check MySQL docs to see how it's used.