How can I efficiently compare string values from two tables in Access? - ms-access

I have created a VBA function in Access 2010 to compare a list of terms in one table against a list of terms in another table. If the values are alike (not necessarily an exact match), I sum the value from a column from the second table for each match. TableA has approximately 150 terms. TableB has approximately 50,000 terms with a related count (integer).
Example tables:
TableA TableB
--------- ----------
ID ID
Term Term
Count
I have a simple SQL query which calls a VBA function to compare the terms and SUM the count if they have a fuzzy match.
SQL:
SELECT TableA.[Term], TermCheck(TableA.[Term]) AS [Term Count] FROM TableA ORDER BY 2 DESC;
VBA:
Option Compare Database
Public Function TermCheck(Term) As Long
Dim rst As DAO.Recordset
Set rst = CurrentDb.OpenRecordset("TableB", dbOpenDynaset)
Dim ttl As Long
ttl = 0
With rst
While Not .EOF
If rst(1) Like "*" & Term & "*" Then
ttl = ttl + rst(2)
End If
.MoveNext
Wend
End With
rst.Close
Set rst = Nothing
CurrentDb.Close
TermCheck = ttl
End Function
The issue I have is that it uses about 50% of my CPU and I'd like to make it as lightweight as possible. Is there a more efficient way to accomplish this task using Access? Moving to a purely SQL alternative is not an option at this point, although it would make me happier. I'm not an Access or VBA guru, but feel that I'm missing something obvious in my query that would improve performance.
EDIT:
The expected result would list out all terms in TableA with a sum of the count column from TableB where a fuzzy match occurred.
Example Data:
TableA
-------------
1 blah
2 foo
3 bar
4 zooba
TableB
-------------
1 blah 16
2 blah2 9
3 foo 7
4 food 3
5 bar 3
Example result:
Term Count
---------------------
blah 25
foo 10
bar 3
zooba 0

SELECT
TableA.Term,
Nz(subq.SumOfCount, 0) AS Count
FROM
TableA
LEFT JOIN
(
SELECT a.Term, Sum(b.Count) AS SumOfCount
FROM TableA AS a, TableB AS b
WHERE b.Term ALike '%' & a.Term & '%'
GROUP BY a.Term
) AS subq
ON TableA.Term = subq.Term;
Edit: I used ALike and standard ANSI wild card character. That allows the query to run correctly regardless of whether it is run from SQL-89 or SQL-92 mode. If you prefer the * wild card, use this version of the WHERE clause:
WHERE b.Term Like '*' & a.Term & '*'
Note that will only do the matching correctly when run from SQL-89 mode.

On these lines?
SELECT ta.ID, tb.Term, ta.Term, tb.Count
FROM ta, tb
WHERE ta.Term Like "*" & tb.term & "*";
ID tb.Term ta.Term Count
2 hat hat 2
3 hat the hat 2
3 the hat the hat 4
4 mat mat 6
5 mat matter 6
5 matter matter 8

I typically build an expression using iif:
TestFlag:iif([TableA]![Term] = [TableB]![Term],"Same","Different")

Related

MS Access Restart Number Sequence

trying to do a sequence count in MS Access where the count sequence resets based on another field, so example below, trying to figure out ColB:
ColA ColB
4566 1
5677 1
5677 2
5677 3
8766 1
8766 2
1223 1
Think it might have something to do with the DCount() function, unsure. Would very much appreciate the help ... Thanks!
Calculating a group sequence number in Access query is fairly common topic. Requires a unique identifier field, autonumber should serve.
Using DCount():
SELECT *, DCount("*", "table", "ColA=" & [ColA] & " AND ID<" & ID) + 1 AS GrpSeq FROM table;
Or with correlated subquery:
SELECT *, (SELECT Count(*) FROM table AS D WHERE D.ColA=table.ColA AND D.ID<table.ID)+1 AS GrpSeq FROM table;
An alternative to calculating in query is to use RunningSum property of textbox on a Report with Sorting & Grouping settings.

MySQL match area code only when given the full number

I have a database that lists a few area codes, area code + office codes and some whole numbers and a action. I want it to return a result by the digits given but I am not sure how to accomplish it. I have some MySQL knowledge but its not very deep.
Here is a example:
match | action
_____________________
234 | goto 1
333743 | goto 2
8005551212| goto 3
234843 | goto 4
I need to query the database with a full 10 digit number -
query 8005551212 gives "goto 3"
query 2345551212 gives "goto 1"
query 3337431212 gives "goto 2"
query 2348431212 gives "goto 4"
This would be similar to the LIKE selection, but I need to match against the database value instead of the query value. Matching the full number is easy,
SELECT * FROM database WHERE `match` = 8005551212;
First the number to query will always be 10 digits, so I am not sure how to format the SELECT statement to differentiate the match of 234XXXXXXX and 234843XXXX, as I can only have one match return. Basically if it does not match the 10 digits, then it checks 6 digits, then it will check the 3 digits.
I hope this makes sense, I do not have any other way to format the number and it has to be accomplished with just a single SQL query and return over a ODCB connection in Asterisk.
Try this
SELECT match, action FROM mytable WHERE '8005551212' like concat(match,'%')
The issue is that you will get two rows in one case .. given your data..
SELECT action
FROM mytable
WHERE '8005551212' like concat(match,'%')
order by length(match) desc limit 1
That should get the row that had the most digits matched..
try this:
SELECT * FROM (
SELECT 3 AS score,r.* FROM mytable r WHERE match LIKE CONCAT(SUBSTRING('1234567890',1,3),'%')
UNION ALL
SELECT 6 AS score,r.* FROM mytable r WHERE match LIKE CONCAT(SUBSTRING('1234567890',1,6),'%')
UNION ALL
SELECT 10 AS score,r.* FROM mytable r WHERE match LIKE CONCAT(SUBSTRING('1234567890',1,10),'%')
) AS tmp
ORDER BY score DESC
LIMIT 1;
What ended up working -
SELECT `function`,`destination`
FROM reroute
WHERE `group` = '${ARG2}'
AND `name` = 0
AND '${ARG1}' LIKE concat(`match`,'%')
ORDER BY length(`match`) DESC LIMIT 1

How to index a wide table of booleans

My question is on building indexes when your client is using a lot of little fields.
Consider a search of the following:
(can't change it, this is what the client is providing)
SKU zone1 zone2 zone3 zone4 zone5 zone6 zone7 zone8 zone9 zone10 zone11
A123 1 1 1 1 1 1 1 1
B234 1 1 1 1 1 1 1
C345 1 1 1 1 1 1
But it is much wider, and there are many more categories than just Zone.
The user will be looking for skus that match at least one of the selected zones. I intend to query this with (if the user checked "zone2, zone4, zone6")
select SKU from TABLE1 where (1 IN (zone2,zone4,zone6))
Is there any advantage to indexing with a multi tiered index like so:
create index zones on table1 (zone1,zone2,zone3,zone4,zone5,zone6,zone7,zone8,zone9,zone10,zone11)
Or will that only be beneficial when the user checked zone1?
Thanks,
Rob
You should structure the data as:
create table SKuZones (
Sku int not null,
zone varchar(255)
)
It would be populated with the places where a SKU has a 1. This can then take great advantage of an index on SKUZones(zone) for an index. A query such as:
select SKU
from SKUZones
where zone in ('zone2', 'zone4', 'zone6');
will readily take advantage of an index. However, if the data is not structured in a way appropriate for a relational database, then it is much harder to make queries efficient.
One approach you could take if you can add a column to the table is the following:
Add a new column called zones or something similar.
Use a trigger to populate it with values for each "1" in the columns (so "zone3 zone4 zone5 . . ." for the first row in your data).
Build a full text index on the column.
Run your query using match against
Indexing boolean values is almost always useless.
What if you use a SET datatype? Or BIGINT UNSIGNED?
Let's talk through how to do it with some sized INT, named zones
zone1 is the bottom bit (1<<0 = 1)
zone2 is the next bit (1<<1 = 2)
zone3 is the next bit (1<<2 = 4)
zone4 is the next bit (1<<3 = 8)
etc.
where (1 IN (zone2,zone4,zone6)) becomes
where (zones & 42) != 0.
To check for all 3 zones being set: where (zones & 42) = 42.
As for indexing, no index will help this design; there will still be a table scan.
If there are 11 zones, then SMALLINT UNSIGNED (2 bytes) will suffice. This will be considerably more compact than other designs, hence possibly faster.
For this query, you can have a "covering" index, which helps some:
select SKU from TABLE1 where (zones & 42) != 0 .. INDEX(zones, SKU)
(Edit)
42 = 32 | 8 | 2 = zone6 | zone4 | zone2 -- where | is the bitwise OR operator.
& is the bitwise AND operator. See http://dev.mysql.com/doc/refman/5.6/en/non-typed-operators.html
(zones & 42) = 42 effectively checks that all 3 of those bits are "on".
(zones & 42) = 0 effectively checks that all 3 of those bits are "off".
In both cases, it is ignoring the other bits.
42 could be represented as ((1<<5) | (1<<3) | (1<<1)). Because of precedence rules, I recommend using more parentheses than you might think necessary.
1 << 5 means "shift" 1 by 5 bits.

Conditional counting of consecutive rows

For a given current client i am trying to find how many consecutive years they have renewed a policy with us. My thinking on how to do this is to match a field in the current row with the previous row. I'm trying to write a function for this but if there is an easier way please let me know. Here is what i have for the function
Option Compare Database
'Renewal Count Returns count of consecutive renewals
Public Function RenewCount(strLocationID As Integer, _
strQuoteID As Integer, _
strOriginalQuoteID As Variant) As String
Static strLastLocationID As Integer
Static strLastQuoteID As Integer
Static strCount As Integer
If strLocationID = strLastLocationID And strOriginalQuoteID = strLastQuoteID Then
strCount = strCount + 1
Else
strLastLocationID = strLocationID
strLastQuoteID = strQuoteID
strCount = 0
End If
RenewCount = strCount
End Function
Here is a little sample of the data
LocationID QuoteID OriginalQuoteID
2 1094117
2 1125718 1094117
2 1148296 1125718
2 1176466 1148296
5 1031892
5 1044976 1031892
5 1059216 1044976
5 1077463 1059216
There are also dates for each policy that i can manipulate as well.
My idea would be to have the following and just find the max of the last column for each location.
LocationID QuoteID OriginalQuoteID Renewal_Count
2 1125718 1094117 0
2 1148296 1125718 1
2 1176466 1148296 2
5 1031892 0
5 1044976 1031892 1
5 1059216 1044976 2
5 1077463 1059216 3
5 1098124 1077463 4
5 1100215 0
5 1198714 1100215 1
5 1254125 1198714 2
Any help would be appreciated. Thanks
I forgot to mention that this has been sorted on the location ID and that OriginalQuoteID will be null for any New policy. When i try to run the function i get #num! in the column for a majority of the rows. What i want is for a given QuoteID the number of consecutive renewals there have been. So for the above for locationID 5 QuoteID 1254125 there have been 2 Renewals.
This might work (appears to be with my tests) - create a cartesian product on itself, link the QuoteID to the OriginaQuoteID and count.
SELECT T1.LocationID,
T1.QuoteID,
T1.OriginalQuoteID
FROM Table1 T1, Table1 T2
WHERE T1.QuoteID = T2.OriginalQuoteID
to get the count per LocationID (add 1 to count the Null field):
SELECT COUNT(*)+1
FROM Table1 T1, Table1 T2
WHERE T1.QuoteID = T2.OriginalQuoteID
GROUP BY T1.LocationID
or if you prefer joins:
SELECT COUNT(*)+1
FROM Table1 T1 INNER JOIN Table1 T2 ON T1.QuoteID = T2.OriginalQuoteID
GROUP BY T1.LocationID

Record edited twice if edited column has index

TableA has one column 'fielda' of type Long.
There are three records in the table with values 3,4 and 5 respectively.
After running the code below the values should be 18, 19 and 20.
This is the case if there isn't an index on fielda but if there is then the vaues will be
33, 19 and 20.
One record gets edited twice. Is this a bug in DAO or is this normal behaviour?
Dim rs As Recordset
Dim s1 As String
s1 = "select * from tableA"
Set DB = OpenDatabase(DBAddress)
Set rs = MyDB.OpenRecordset(s1)
If Not rs.BOF Or Not rs.EOF Then
rs.MoveFirst
Do While Not rs.EOF
rs.Edit
rs.Fields("fielda").Value = rs.Fields("fielda").Value + 15
rs.Update
rs.MoveNext
Loop
End If
While I was unable to recreate the behaviour you describe I can offer one possible explanation. As you are stepping through the records you may hit the same record more than once if the Recordset periodically checks for changes that may have been made to the underlying table by other users.
Say your Recordset starts out as
3 4 5
and you update the first record so the table now contains
18 4 5
if the Recordset then tries to "refresh" itself and the index on [fielda] controls the order in which the records appear in the Recordset it could end up being
3 4 5 18
and if it continues updating until .EOF the final result could be
3 19 20 33
Possible workarounds would be to
create the Recordset with a SQL statement that includes an ORDER BY clause on some other field so the order of the records will not change as you modify them, or
apply the update via SQL, e.g. UPDATE tableA SET fielda = fielda + 15