Remove Dupes with checks on secondary fields first - mysql

I have a table with a field (Name) I'd like to create a unique index on, however it seems there are existing duplicates. I dont' want to just get rid of dupes since some might have information in other fields that I need. Essentially I have:
ID
ParentID
Name
Code
RelatedID
So Goal 1 is I want to keep the record that has values in the secondary fields other then ID and Name. In most cases this will be one of the dupes only.
Goal 2 is in case two identical Names both have values but in different fields I want to 'merge' those since it is remotely possible one duplicate will have values in one key field and one in the other.
Finally Goal 3 is in the case that two names both have values in a key field I'd probably want to manually review those first.
It seems to me my first step as I read this would be Goal 3; manually review duplicates where Name Field is identical, and more then one record has a non-Null/non-empty value in a key field.
Once I address this the goal would be to 'mere' the remaining records i.e keep one record with Name and any non-null/non-empty key fields from the others.
Any thoughts much appreciated.

Sounds like a solid plan - hope you have a development environment you can dry run it in.
Here is some code that may help you along
Starting with Step 3.
This statement should help you find which records need to be reviewed.
SELECT *
FROM (
SELECT name,
GROUP_CONCAT(DISTINCT parentID) AS parentID,
GROUP_CONCAT(DISTINCT code) AS code,
GROUP_CONCAT(DISTINCT RelatedID) AS RelatedID,
FROM foo
GROUP BY name
HAVING COUNT(*)>1) as summarized
WHERE parentID LIKE '%,%'
OR code LIKE '%,%'
OR RelatedID LIKE '%,%';
Anything that comes up in that query you will probably have to manually fix after figuring out why there are multiple values for the same field.
Once those fixes are in place, it's times for the merge. I would create a holding / temporary table with the correct values. MAX should take care of the logic to choose non-null values
CREATE TABLE foo_values
SELECT name, MAX(parentID) as parentID, MAX(code) AS code, MAX(RelatedID) AS RelatedID.
FROM foo
GROUP BY name
HAVING COUNT(*)>1;
In theory, now you have the merged values. You can remove the duplicate name rows using whatever technique you are most comfortable with(See here) while adding your unique index. Finally, update the secondary fields by JOINing back to foo values.

Related

MS-Access show only items that meet multiple criteria

I am new to Access and I am looking for a solution that is beyond the ability of the others in my company and may be beyond what access can do.
I have the following fields.
Date: Last Name: First Name: Test1: Test2: Test3:
I am looking for the following to happen.
On any single date a user may test multiple times.
If the user passes all three tests do not show any records with fails or any duplicate passes.
If the user fails any of the three tests, but has multiple failed records only show one.
If the user has the statement "NotUsed" in any field, but a pass in any other keep a single record for that date.
Thank You,
First, you need a primary key column in order to be able to easily and unambiguously identify each record. In Access this is easily achievable with a Autonumber column. Also, in the table designer, click the key symbol for this column. This creates a primary key index. A primary key is a must for every table.
Let us call this column TestID and let's assume that the table is named tblTest.
The problem is that your condition refers to several records; however, SQL expects a WHERE clause that specifies the conditions for each single record. So let’s try to reformulate the conditions:
Keep the record with the most passes for each user.
Keep records with "NotUsed" in any test field.
The first condition can be achieved like this:
SELECT First(TestID)
FROM
(SELECT TestID, [Last Name], [First Name] FROM tblTest
ORDER BY IIf(Test1='pass',1,0) + IIf(Test2='pass',1,0) + IIf(Test3='pass',1,0) DESC)
GROUP BY [Last Name], [First Name]
This gives you the TestID for each user with the most passes. Now, this is not the final result yet, but you can use this query as a subquery in the final query
SELECT * FROM tblTest
WHERE
Test1='NotUsed' OR Test2='NotUsed' OR Test3='NotUsed' OR
TestID IN ( <place the first query here...> )
Is this what you had in mind?
Another thought is about normalization. Your table is not normalized. You are using your table like an Excel sheet. As your database grows you'll get more and more into trouble.
You have two kinds of non-normalization.
One relates to the fact that each user's first name and last name might occur in several records. If, in future, you want to add more columns, like user address and phone number, then you will have to repeat these entries for each user record. It will become increasingly difficult to keep this information synchronized over all the records. The way to go is to have at least two tables: a user table and a test table where the user table has a UserID as primary key and the test table has this UserID as foreign key. Now a user can have many test records but still always has only one unique user record.
The other one (non-normalization) occurs because you have 3 Test fields in a single record. This is less of a problem if your tests always have the same structure and always require 3 tests per date, but even here you have to fall back to the "NotUsed" entries. There are several ways to normalize this, because a database can have different degrees of normalization. The tree ways:
Only one test table with the fields: TestID (PK), UserID (FK), Date, Result, TestNumber.
A test day table with the fields: TestDayID (PK), UserID (FK), Date + a test result table with the fields: TestResultID (PK), TestDayID (FK), Result, TestNumber
Then you can combine the two previous with this addition: Instead of having a TestNumber field, introduce a lookup table containing information on test types with the fields: TestTypeID (PK), TestNo, Description and in the other tables replace the column TestNumber with a column TestTypeID (FK).
See: How to normalize a table using Access - Part 1 of 4 or look at many other articles on this subject.

Convert One to Many to One to One Relationship MySQL

I have two MySQL tables we can call Foo and Bar.
Both tables have a column called PrizeGroupId. the goal is to create a one-to-one relationship between these columns, and I have created stored procedures to add/edit Foo that update the corresponding row via the one-to-one relationship in Bar.
The problem lies in the fact that the data wasn't always structured this way and I need to write a script to convert the data from it's previous state (which I'm about to describe) to a one-to-one relationship based off of PrizeGroupId.
Previously, multiple rows in Foo could have the same PrizeGroupId such that there was a one-to-many relationship between entries in Bar to Foo based off of PrizeGroupId. The script that I need to write has to break apart every one-to-many instance of this nature into many (almost identical) one-to-one relationships between Foo and Bar.
In principle, I want to:
Iterate through Foo
See if the current row's PrizeGroupId is not unique in Foo.
Assign it a unique value (perhaps the current items primary key)
Add a row in Bar with the new PrizeGroupId. Copy over all of the old row's other data into this new row such that it is "nearly identical".
After all is said and done, remove the old one-to-many row from Bar.
I understand the problem and how I could do this in pseudocode in a programming language, however I am still learning MySQL and am not sure how to go about solving a problem of this nature.
If you can provide me with help through MySQL code and/or what steps I can take/read about to go about solving this problem that would be, or at least point me to the kind of reading/SO question related to this kind of problem that would be appreciated, although I had a difficult time finding particular resources on my own.
What you are asking for is not that hard. Some of your thinking is getting in the way. First, one almost never iterates in SQL. SQL is not that kind of language. Everything in SQL is done via sets of something.
Your approach can be:
Identify the set of rows where the PrizeGroupId is already unique and move them to a new copy of the table.
To create a table, you can use "create table foo2 like foo;". Very useful.
To identify the rows where PrizeGroupId is already unique, use something like:
create table test_30602977 (id int primary key, other int);
insert into test_30602977 values (1, 1), (2, 2), (3, 2);
select other, count(*) as count from test_30602977 group by other having count = 1;
The rows left in the original table do not have a unique PrizeGroupId. Change the PrizeGroupId value so that they are unique.
Merge the two sets to reconstruct the table with the original rows and with PrizeGroupId unique.
One reason that this is hard is because if you had created the tables with the one-to-one join, you would have used the pk to join the tables. The pk is already unique so why use something else. Once you have the tables separated and the PrizeGroupId is unique, you might want to think about setting the pk of foo to the pk of bar and then removing the PrizeGroupId column.
What is required is for the Bar table to contain a single record for each record in Foo, and for the PrizeGroupId field in Foo to be unqiue. As Foo.Id is already unique, it makes sense to use that as the foreign key.
Running a query like SELECT Foo.id, Bar.* FROM Foo INNER JOIN Bar USING (PrizeGroupId) will give us a single record for each record in Foo, along with the data from the corresponding record from Bar. So, if we were to replace the data in Bar with what is returned by this query, and then use Foo.Id for PrizeGroupId, we'd acheive what is required.
Create a temporary table with the same structure as Bar - something like CREATE TABLE Bar_copy LIKE Bar
Fill the temporary table with one record for each record in Foo, joined to the corresponding record in Bar - you'll need to list all the columns in Bar, for example - INSERT INTO Bar_copy (id, field1, field2, field3) SELECT f.id, b.field1, b.field2, b.field3 FROM Foo AS f INNER JOIN Bar AS b USING (PrizeGroupId)
Clear the existing PrizeGroupID field from Foo - UPDATE Foo SET PrizeGroupId = NULL
Empty the existing Bar table and refill it with the records from the temporary table - INSERT INTO Bar (id, field1, field2, field3) SELECT id, field1, field2, field3 FROM Bar_copy
Update the foreign key values in Foo - UPDATE Foo SET PrizeGroupId = id
Obviously, take a back-up first!

Can't add date to archive

I have duplicated a Table to create an Archive table, and for some reason I can't make to Appending Query to work.
This is the SQL code:
INSERT INTO tblArc
SELECT tblCostumer.*
FROM tblCostumer, tblArc
WHERE (((tblArc.num)=[Enter Client Number you'd like to move to the archive]));
When I enter the costumer number, it says "You are about to append 0 row(s)" instead of appending 1 row.
That FROM clause would give you a cross join, which is probably not what you should really want ...
FROM tblCostumer, tblArc
Instead SELECT only from tblCostumer based on its primary key. For example, if the primary key is tblCostumer.num ...
INSERT INTO tblArc
SELECT tblCostumer.*
FROM tblCostumer
WHERE tblCostumer.num=[Enter Client Number you'd like to move to the archive];
And if the structures of the two tables are not the same, list the specific fields instead of ...
INSERT INTO tblArc
SELECT tblCostumer.*

Inserting into a table from an incompatible table

I have a MySql table called Person, and one day I accidentally deleted someone from this table. I have a backup table, called PersonBak so I was going to restore my deletion from the backup. However, in the course of moving forward on my application I renamed all the fields in Person, except for the primary key, PersonID. Now Person and PersonBak have the same data, but only one matching column name.
Is there any way to restore my missing person to Person from PersonBak without doing a lot of work? I have quite a few columns. Of course I could just do the work now, but I can imagine this coming up again.
Is there some way to tell MySql that these are really the same table, with the columns in the same order, just different column names? Or any way at all to do this without writing out specifics of which columns in PersonBak match which ones in Person?
If the column datatypes are the same between the tables, the column count is the same, and they are all in the same order, then MySQL will do all of the work for you:
INSERT INTO t1 SELECT * FROM t2;
The column names are ignored. The server uses ordinal position only, to decide how to line up the from/to columns.
What about this:
insert into Person(id, col11, col12) (select id, col21, col22 from personBak where id=5)
person schema:
columns (id, col11, col12)
personBak schema:
columns (id, col21, col22)
Look at Mysql SELECT INTO and you can specify the field names & create an insert statement

MySQL: LIKE Query Help?

I have a column in my table called student_id, and I am storing the student IDs associated with a particular record in that column, delimited with a | character. Here are a couple sample entries of the data in that column:
243|244|245
245|1013|289|1012
549|1097|1098|245|1099
I need to write a SQL query that will return records that have a student_id of `245. Any help will be greatly appreciated.
Don't store multiple values in the student_id field, as having exactly one value for each row and column intersection is a requirement of First Normal Form. This is a Good Thing for many reasons, but an obvious one is that it resolves having to deal with cases like having a student_id of "1245".
Instead, it would be much better to have a separate table for storing the student IDs associated with the records in this table. For example (you'd want to add proper constraints to this table definition as well),
CREATE TABLE mytable_student_id (
mytable_id INTEGER,
student_id INTEGER
);
And then you could query using a join:
SELECT * FROM mytable JOIN mytable_student_id
ON (mytable.id=mytable_student_id.mytable_id) WHERE mytable_student_id.student_id = 245
Note that since you didn't post any schema details regarding your original table other than that it contains a student_id field, I'm calling it mytable for the purpose of this example (and assuming it has a primary key field called id -- having a primary key is another requirement of 1NF).
#Donut is totally right about First Normal Form: if you have a one-to-many relation you should use a separate table, other solutions lead to ad-hoccery and unmaintainable code.
But if you're faced with data that are in fact stored like that, one common way of doing it is this:
WHERE CONCAT('|',student_id,'|') LIKE '%|245|%'
Again, I agree with Donut, but this is the proper query to use if you can't do anything about the data for now.
WHERE student_id like '%|245|%' or student_id like '%|245' or student_id like '245|%'
This takes care of 245 being at the start, middle or end of the string. But if you aren't stuck with this design, please, please do what Donut recommends.