I need to compare two tables inside a Microsoft Access database.
Since this is a one time thing, I didn't want to look for a software to do that, instead I wanted to do that by creating queries to show me the rows that only exist in one of the tables.
So I created two queries, query 1 is showing me the rows of table 1 that are not present in table 2 and query 2 is showing me the rows of table 2 that are not present in table 1.
In both queries I do a LEFT JOIN/RIGHT JOIN on all of the primary key columns and only return the rows of one table, when the primary key columns in the other table are NULL.
This worked the way I expected it, but for one row.
Checking the content of this one row in both tables I couldn't find a difference, there were no leading or tailing spaces, there were no TABs instead of spaces and I couldn't find other differences of that kind, believe me I tried.
Because I didn't know what else to do, I just replaced some non-ansi characters (german umlauts ä and ü) with their ansi counter parts, and suddenly my queries showed no differences anymore.
Replacing the ansi characters back to their non-ansi counter parts one by one, I found out, that the JOINs don't seem work when there's an "ä" present in the joined column. If the column contains an "ü" instead of an "ä" (same place in the string), then the JOINs work.
How is that possible? Is that a known bug of Access? Or is it a feature? How can I rely on working JOINs? Is there a setting I have to change?
Related
I have 1.7 million records in an access table sorted A to Z. the records are not unique and there are repeated records. I want to make them unique based on their frequency. if a record has been repeated 4 times I want the first one to get "-1" at the end of the record value, the second record get "-2" and so on. in this way similar records will become unique. all similar record are beside each other because of sorting. in excel I do this task by an If function (if this cell value<>the cell value above then "1" else above repeat number plus 1) but in access I don't know what to do (I'm a beginner).
finally I want to add a column to original table which is (original record value - repeat number).
I appreciate your help
Note about sort order:
Sort order in a relational database is not concrete like in a spreadsheet. There is no concept of rows being "next to each other", unless in context of an index. An index is largely a tool for the database to handle the data more efficiently (and to aid in defining uniqueness). The order itself is still largely dynamic because the order of a particular query can be specified differently from the index (or from storage order) and this does not change how the data is actually stored. Being "next to each other" is essentially a useless concept in SQL queries, unless you mean "next to each other numerically", for instance with an AutoNumber field or with the "repeat numbers" you want to add. Unlike in a spreadsheet, you cannot refer to the row "just above this row" or the "row offset by 2 from the 'current' row".
Solution
Regardless of whether or not you will use the AutoNumber column later, add a Long Integer AutoNumber column anyway. This column is named [ID] in the example code. Why? Because until you add something to allow the database to differentiate between the rows, there is technically no way using standard SQL to reliably reference individual duplicates since there is no way to distinguish individual rows. Even though you say that there are other differentiating columns, your own description rules out using them as a reliable key in referring to specific rows. (Even without such a differentiating column, Access can technically distinguish between rows. Iterating through a DAO.Recordset object in VBA would work, but perhaps not very elegant / efficient.)
Also add a new integer column for counting repeats, which below is named [DupeIndex]. A separate field is preferred (necessary?) because it allows continued reference to the original, unaltered duplicate values. If the reference number were directly updated, it would no longer match other fields and so would not be easily detected as a duplicate anymore. The following solution relies on grouping of ALL duplicate values, even those already "marked" with a [DupeIndex] number.
You should also realize that in comparing different data sets, that having separate fields allows more flexibility in matching the data. Having the values appended to the reference number complicates comparison, since you likely not only want to compare rows with the same duplication index, rather you will want to compare all possible combinations. For example, comparing records 123-1 in one set to 123-4 in another... how do you select such rows in an automated fashion? You don't want to have to manually code all combinations, but that's what you'll end up doing if you don't keep them separate like {123,1} and {123,4}.
Create and save this as a named query [Duplicates]. This query is referenced by later queries. It could instead be embedded as a sub query, but my preferences is to use saved queries for easier visualization and debugging in Access:
SELECT Data.RefNo, Count(Data.ID) AS Dupes, Max(Data.DupeIndex) AS IndexMax
FROM Data
GROUP BY Data.RefNo
HAVING Count(Data.ID) > 1
Execute the following to create a temporary table with new duplicate index values:
SELECT D1.ID, D1.RefNo,
IIf([Duplicates].[IndexMax] Is Null,0,[Duplicates].[IndexMax])
+ 1
+ (SELECT Count(D2.ID) FROM Data As D2
WHERE D2.[RefNo]=[D1].[RefNo]
And [D2].[DupeIndex] Is Null
And [D2].[ID]<[D1].[ID]) AS NewIndex
INTO TempIndices
FROM Data AS D1 INNER JOIN Duplicates ON D1.RefNo = Duplicates.RefNo
WHERE (D1.DupeIndex Is Null);
Execute the update query to set the new duplicate index values:
UPDATE Data
INNER JOIN TempIndices ON Data.ID = TempIndices.ID
SET Data.DupeIndex = [NewIndex]
Optionally remove the AutoNumber field and now assign the combined [RefNo] and new [DupeIndex] as primary key. The temporary table can also be deleted.
Comments about the queries:
Solution assume that [DupeIndex] is Null for unprocessed duplicates.
Solution correctly handles existing duplicate index numbers, only updating duplicate rows without an unique index.
Access has rather strict conditions for UPDATE queries, namely that updates are not based on circular references and/or that that joins will not produce multiple updates for the same row, etc. The temporary table is necessary in this case, since the query determining new index values refers multiple times in sub queries to the very column that is being updated. (If the update is attempted using joins on the subqueries, for example, Access complains that Operation must use an updatable query.)
I have two tables that their shared columns do not exactly match (differences in capital character or existence of some characters like comma,space and ...). How can I merge these two tables based on their shared column (in R, Knime, Excel-power query or sql)?
In your example Result table it's not clear where the row
gene1 | go3 | 14
comes from, because there's no entry for go3 in Table2. I'm assuming that's a mistake and you meant Table2 to include the row
go3 | 14
If that's correct, here's how to do this in KNIME:
The two Table Creator nodes just create the two tables with column names as shown in your example - replace these with your actual data sources. Cell Splitter splits column Goes using a comma as the delimiter. The Unpivoting node is configured like this:
and the Joiner like this:
All other settings were left as default. Add nodes to reorder and filter the columns in the Joiner output if you need to. Note that you'll see different Goes_Arr[n] columns depending on how many different values of Goes there are - the Enforce exclusion and Enforce inclusion settings make sure that Unpivoting handles this correctly.
This workflow should cope with whitespace between the commas, but I think you also mention differences in capital letters - if you need to handle these, pass each table through a Case Converter node to make them consistent.
Pivoting and unpivoting are hard to understand (IMHO - especially given the cryptic descriptions of their KNIME nodes) but very powerful. I recommend taking time to play around with these nodes to figure out how they work.
I have searched Stack Overflow and cannot find a way to match two fields based on the only the last three letters of the second field. I am not great at VBS but can get by. Here are the details:
Microsoft Access 2010. This is and Aircraft registration database. The first 4 letters in a field have a consistent length in the first database. Eg.
FABC
GPJR
IDTC
GPPC
The intended join field in the second database can look like this:
CFABC
C-FABC
ABC
C-GPJR
GPJR
PJR
CGPJR
I just need to take the last three letters in the first table and the last three letters in the second table and match them. It will likely be a make table query.
Any help would be appreciated.
Jeff
The syntax is as such:
SELECT ID
FROM Table1, Table 2
WHERE (((Right([Table1].[ID],3))=Right([Table2].[ID],3)));
My first mySQL project.
I am migrating a FileMaker DB to mySQL and having trouble with how to efficiently handle duplicate field (column) names from 3 left joined tables, combined with the fact that each table is supplying a large number of columns (50+). I understand the concept of aliasing columns. Is there a better way than to create several hundred alias lines to handle each column from each table? I've searched the site and not found a discussion of handling a large number of columns, which is common in FileMaker DBs...perhaps not in mySQL.
Current code is below, where I created the aliases for only ONE (WebPublish) of the ~50 fields for each of the 3 joined tables:
$query = "SELECT
Artwork.WebPublish as Artwork_WebPublish,
Artist.WebPublish as Artist_WebPublish,
Location.WebPublish as Location_WebPublish
FROM Review
LEFT JOIN Artwork ON Review._kf_ArtworkID = Artwork.__kp_ArtworkID
LEFT JOIN Artist ON Review._kf_ArtistID = Artist.__kp_ArtistID
LEFT JOIN Location ON Review._kf_LocationID = Location.__kp_LocationID
WHERE __kp_ReviewID = ?";
This query produces the desired response for one column from each joined table:
Array
(
[Artwork_WebPublish] => Yes
[Artist_WebPublish] => No
[Location_WebPublish] => Maybe
)
The question is whether I need to expand the aliases the long way to include 49 times more data.
Thanks for you help.
No, there's no SQL syntax for giving column aliases in a "batch" mode, for example applying the table name as a prefix to all columns (by the way, SQLite does support that feature by default).
One way to solve this is to refer to columns by ordinal position instead of by name, in whatever language you use to fetch the results.
Another solution is to define your tables with distinct column names so you avoid the name conflict. Some SQL identifiers, for example constraint names, are already required to be unique within the database they reside in, not only unique within a table. It may be a naming convention you want to use to apply the same rule to column names.
I have three tables called: users, facilities, and staff_facilities.
users contains average user data, the most important fields in my case being users.id, users.first, and users.last.
facilities also contains a fair amount of data, but none of it is necessarily pertinent to this example except facilities.id.
staff_facilties consists of staff_facilities.id (int,auto_inc,NOT NULL),staff_facilities.users_id (int,NOT NULL), and staff_faciltities.facilities_id (int,NOT NULL). (That's a mouthful!)
staff_facilities references the ids for the other two tables, and we are calling this table to look up users' facilities and facilities' users.
This is my select query in PHP:
SELECT users.id, users.first, users.last FROM staff_facilities LEFT JOIN users ON staff_facilities.users_id=users.id WHERE staff_facilities.facilties_id=$id ORDER BY users.last
This query works great on our development server, but when I drop it into the client's production environment often times blank rows appear in the results set. Our development server is using the replicated tables and data that already exist on the client's production server, but the hardware and software vary quite a bit.
These rows are devoid of any information, including the three id fields that require NOT NULL values to be entered into the database. Running the query through the MySQL management tools on the backend returns the same results. Searching the table for NULL fields has not turned up anything.
The other strange thing is that the number of empty rows is changing based on the varying results caused by the WHERE clause id check. It's usually around one to three empty rows, but they are consistent when using the same parameter.
I've many times dealt with the returning of nearly duplicate rows due to LEFT JOINS, but I've never had this happen before. As far as displaying the information goes, I can easily hide it from the end user. My concern is primarily that this problem will be compounded as time passes and the number of records grows larger. As it sits, this system has just been installed, and we already have 2000+ records in the staff_facilities table.
Any insight or direction would be appreciated. I can provide further more detailed examples and information as well.
You are only selecting columns from the table on the right side of the join. Of course some of them are completely null, you did a left join. So those records match to an id in the table on the left side of the join but not to any data on the right side of the join. Since you aren't returning any columns from the left table, you see no data.