Consolidate/Merge Duplicate Records into one - ms-access

I have an access table containing over 100,000 records. My problem is that many of the records have duplicate information. I would like to merge/combine the records into record.
I have a field (CommonField) that can be used to identify the duplicates (sometimes more than two records). Each field needs to be considered on an individual basis. For instance:
If the date fields are not equal, I would prefer to keep the most recent date.
If the count fields are not equal, I would prefer to keep the larger value.
If the company names are not equal, I would prefer to keep both names unless one is within the other.
CLICK HERE for a sample of the data:
+------------------+-------------+-------+-------+------------------+-----------+------------+--------+-----------------------------+
| Existing Records | | | | | | | | |
+------------------+-------------+-------+-------+------------------+-----------+------------+--------+-----------------------------+
| ID | CommonField | First | Last | Email | Date | Currency | Count | Company |
| 1 | AA123 | John | | | | $465,000 | | ABC Company Ltd |
| 2 | AA123 | John | | John#gmail.com | 1-Mar-78 | $465,000 | 87,000 | ABC Company |
| 3 | AA123 | | Doe | | 14-Mar-78 | $465,000 | 88,000 | |
| 4 | BB456 | Dave | Smith | | 1-Apr-92 | $1,200,000 | 5,000 | Carter Company |
| 5 | BB456 | | Smith | Dave#aol.com | 1-Apr-92 | $1,200,000 | 5,000 | Simpson Ltd |
| 6 | CC568 | | | Jane#hotmail.com | 1-Sep-05 | $60,000 | | Woods Holdings |
| 7 | CC568 | | Woods | Jane#hotmail.com | | | 40,000 | Woods |
| 8 | CC568 | Jane | Woods | | 1-Sep-05 | | | |
| 9 | DD211 | Bob | Burns | Bob#gmail.com | 5-Aug-01 | $678,100 | 21,400 | |
| | | | | | | | | |
| Desired Result | | | | | | | | |
| ID | CommonField | First | Last | Email | Date | Currency | Count | Company |
| 10 | AA123 | John | Doe | John#gmail.com | 14-Mar-78 | $465,000 | 88,000 | ABC Company Ltd |
| 11 | BB456 | Dave | Smith | Dave#aol.com | 1-Apr-92 | $1,200,000 | 5,000 | Carter Company, Simpson Ltd |
| 12 | CC568 | Jane | Woods | Jane#hotmail.com | 1-Sep-05 | $60,000 | 40,000 | Woods Holdings |
| 13 | DD211 | Bob | Burns | Bob#gmail.com | 5-Aug-01 | $678,100 | 21,400 | |
+------------------+-------------+-------+-------+------------------+-----------+------------+--------+-----------------------------+
I am interested in hearing your suggestions as to the best way of tackling this project.

Ugly.
I think for the name fields, you may need another table to combine names. I'd start by making a new table from a group by query on both the common id and the company name. Add an extra field for the standardized name to the table, then use a find duplicates query to look at all the common ids with more than one name and manually assign a standardized name.
Then you can bring both the original data table and the company names table into a group by query and pull the standardized name into the final result. For the data and count fields, you can use max(date) and max(count). This should work for the first, last and email text fields also - but you will want to manually examine the results pretty carefully.

Related

How to transpose a table in mysql?

i want to change the format of the table that i had. i've heard about the pivot statement but don't know how to apply it to my situation.
i thought about rearrange the table once i export it to csv file but it will take time and an extra step.
this is what i have so far
+------+------------+-----------+-------+
| name | date | breakfast | lunch |
+------+------------+-----------+-------+
| Mike | 21/02/2019 | 1 | 0 |
| Adam | 21/02/2019 | 1 | 0 |
| Liam | 13/05/2019 | 1 | 1 |
+------+------------+-----------+-------+
i would like to change the format to
+------------+-----------+-------+
| days | breakfast | lunch |
+------------+-----------+-------+
| 21/02/2019 | | |
| Mike | 1 | 0 |
| Adam | 1 | 0 |
| 13/05/2019 | | |
| Liam | 1 | 1 |
+------------+-----------+-------+
it does not have to be exactly the same, i just want the export to show the name and meals based on date
i tried to use group by but i got it wrong, please help

Mysql table records are displayed in a crooked manner

I have created a table in MySQL but when I display the table, some records are displayed in a crooked manner.
Here's the table displayed:
select * from air_passenger_profile;
+------------+----------+------------+-----------+---------------------------------+---------------+---------------------+
| profile_id | password | first_name | last_name | address | mobile_number | email_id |
+------------+----------+------------+-----------+---------------------------------+---------------+---------------------+
| PFL001 | PFL001 | LATHA | SANKAR | 123 BROAD CROSS ST,CHENNAI-48 | 9876543210 | LATHA#GMAIL.COM |
| PFL002 | PFL002 | ARUN | PRAKASH | 768 2ND STREET,BENGALURU-20 | 8094564243 | ARUN#AOL.COM |
| PFL003 | PFL003 | AMIT | VIKARAM | 43 5TH STREET,KOCHI-84 | 9497996990 | AMIT#AOL.COM |
| PFL004 | PFL004 | AARTHI | RAMESH | 343 6TH STREET,HYDERABAD-76 | 9595652530 | AARTHI#GMAIL.COM |
| PFL005 | PFL005 | SIVA | KUMAR | 125 8TH STREET,CHENNAI-46 | 9884416986 | SIVA#GMAIL.COM |
| PFL006 | PFL006 | RAMESH | BABU | 109 2ND CROSS ST,KOCHI-12 | 9432198760 | RAMESH#GMAIL.COM |
| PFL007 | PFL007 | GAYATHRI | RAGHU | 23 2ND CROSS ST,BENGALURU-12 | 8073245678 | GAYATHRI#GMAIL.COM |
| PFL008 | PFL008 | GANESH | KANNAN | 45 3RD ST,HYDERABAD-21 | 9375237890 | GANESH#GMAIL.COM |
+------------+----------+------------+-----------+---------------------------------+---------------+---------------------+
You place it into your database with spaces. At the point where you insert your variables into the databse, you could use PHP's trim() function, or MySQL's, to store it without the spaces.
To correct your current values:
UPDATE air_passenger_profile SET first_name = TRIM(first_name), etc...

MYSQL: Select ids from table where multiple values match a single column at least twice

I've got a table that looks like this:
+----+--------------------------------+
| id | slug |
+----+--------------------------------+
| 1 | gift |
| 1 | psychological-manipulation |
| 1 | christmas |
| 1 | giving |
| 1 | the-town-santa-forgot |
| 1 | santa-claus |
| 1 | mp3 |
| 1 | christmas |
| 2 | entertainment-culture |
| 2 | christmas |
| 2 | culture |
| 2 | literature |
| 2 | christmas-music |
| 2 | christmas-window |
| 2 | broadcasting-nec |
| 2 | how-the-grinch-stole-christmas |
| 2 | the-polar-express |
| 2 | banker |
| 2 | christmas |
| 2 | potter |
| 2 | christmas-eve |
| 2 | bailey |
| 2 | its-a-wonderful-life |
| 2 | the-polar-express |
| 2 | disney |
| 2 | tim-burton |
| 2 | a-christmas-carol |
| 2 | the-nightmare-before-christmas |
| 2 | chuck-jones |
+----+--------------------------------+
I want to get unique ids from the table where at least two of a list of slugs match for a given id.
For example lets say I've got the slugs values of:
gift
christmas
giving
I would want all unique ids that have a matching record for at least 2 of those.
i.e. only an id that had both the gift AND christmas slug or the giving AND christmas slug or the gift AND giving slug, etc...
You can use the distinct modifier to count the number of different slugs per ID:
SELECT id
FROM mytable
WHERE slug IN ('gift', 'christmass', 'giving')
GROUP BY id
HAVING COUNT(DISTINCT slug) >= 2

merge two tables with partial relationship

I have some messy data in text files (2 tables). I'd like to merge it into 1 table but there are duplication issues. My data looks like the following:
Status table
+--------------------------+
| | Last Name | Status | |
+--------------------------+
| | Jones | On Time | |
| | Jones | On Time | |
| | Jones | On Time | |
| | Jones | On Time | |
| | Jones | Missing | |
| | Hoinski | On Time | |
| | Hoinski | Late | |
| | Hoinski | Late | |
| | Hoinski | Missing | |
+--------------------------+
Risk table
+-------------------------+
| | Last Name | Risk | |
+-------------------------+
| | Jones | High | |
| | Jones | High | |
| | Jones | Low | |
| | Jones | Medium | |
| | Jones | Medium | |
| | Jones | Medium | |
| | Jones | Medium | |
| | Smith | Low | |
| | Smith | Medium | |
| | Smith | Medium | |
| | Smith | Medium | |
| | Hoinski | High | |
| | Hoinski | High | |
| | Hoinski | Low | |
+-------------------------+
How can I use sql to aggregate these two tables into 1 table? Is it possible? I know I do not have a proper relationship (many to many) so it doesn't quite make sense. But what if I aggregate the data using Group By statements on the [last name] field?
You're correct GROUP BY will resolve your problem, here's the query.
SELECT * FROM Status
INNER JOIN Risk ON Status.[Last Name] = Risk.[Last Name]
GROUP BY Status.[Last Name]
Resolve duplicated with DISTINCT
SELECT Distinct S.[Last Name] , S.Status, R.Risk
FROM Status S
INNER JOIN Risk R
ON R.[Last Name] = S.[Last Name]

How is foreign keys stored when not having an ID-column?

I'm wondering if you could theoretically drop the ID-column when splitting up tables in MySQL.
Lets say I have a table with this information:
+-----------+-----------+ +-----------+---------+
| Name | CountryID | | CountryID | Country |
+-----------+-----------+ +-----------+---------+
| Theressa | 1 | | 1 | America |
| Chiquita | 1 | +-----------+---------+
| Harlan | 1 |
| Vanda | 1 |
| Dudley | 1 |
| Catherine | 1 |
| Tad | 1 |
| Darcey | 1 |
| Antonette | 1 |
| Renetta | 1 |
| Arla | 1 |
| Emery | 1 |
| Alla | 1 |
| Antonetta | 1 |
+-----------+-----------+
+-----------+---------+ +---------+
| Name | Country | | Country |
+-----------+---------+ +---------+
| Theressa | America | | America |
| Chiquita | America | +---------+
| Harlan | America |
| Vanda | America |
| Dudley | America |
| Catherine | America |
| Tad | America |
| Darcey | America |
| Antonette | America |
| Renetta | America |
| Arla | America |
| Emery | America |
| Alla | America |
| Antonetta | America |
+-----------+---------+
Could the table below take less space than the above? Would the foreign key "understand" that the "America" in the Person-table be linked to the one Country-table, taking less space than the ID-version, as it already references to the other table.
I'm confusing myself, so I hope some of you understand what my question is.
Imagine an inteligent database who replaces the String "America" by an arbitrary pointer. This solution will need to having a column with an numeric ID and it will be the same
Use the int ID. It will save space even if it is not a real issue. It will make index to work faster and save you a lot of problems. Lets say that someone writes "America" vs "AMERICA". Strings are more complex that they seem. Countries have different names in different languages. Do the country have especial character. "EspaƱa" vs "Spain".
P.S. If you want a legible primary key use the ISO codes. Example "US".