I am trying to create a function which, after having search the entire active worksheet, will return the total number of cells that contain a certain string. A lot like how the "x cell(s) found" within Find and Replace works.
I have this so far:
Function FINDIST(stringToFind)
Dim counter As Integer: counter = 0
For Each Cell In ActiveSheet.UsedRange.Cells
If InStr (Cell, stringToFind) > 0
Then counter = counter + 1
End If
Next
End Function
Another way of doing this:
Function FINDIST(stringToFind) As Long
FINDIST = Evaluate("SUM(IFERROR(SEARCH(" & Chr(34) _
& "*" & stringToFind & "*" & Chr(34) & "," _
& ActiveSheet.UsedRange.Address & ",1),0))")
End Function
This searches for stringToFind in every cell in used range, and returns an array with a 1 if that string is found in a cell and error if it is not found. The error is casted to zero with the IFERROR part, and the SUM sums the resulting binary array.
This will count only once the occurrence of stringToFind within each cell, even if it occurs more than once, but looking at your code I assume that this is what you are looking for.
I hope it helps!
UPDATE
Out of curiosity, I did some testing to see how the two approaches compare (read from range directly vs using evaluate). Here is the code I used:
Option Explicit
Private Declare Function GetTickCount Lib "kernel32.dll" () As Long
Sub test()
Dim ticks As Long
Range("A1:AA100000").Value = "adlrkjgalbabyajglakrjg"
ticks = GetTickCount
FINDIST1 ("baby")
Debug.Print "Read from range: ", GetTickCount - ticks
ticks = GetTickCount
FINDIST ("baby")
Debug.Print "Evaluate: ", GetTickCount - ticks
End Sub
Function FINDIST(stringToFind) As Long
FINDIST = Evaluate("SUM(IFERROR(SEARCH(" & Chr(34) _
& "*" & stringToFind & "*" & Chr(34) & "," _
& ActiveSheet.UsedRange.Address & ",1),0))")
End Function
Function FINDIST1(stringToFind) As Long
Dim counter As Long: counter = 0
Dim c As Range
Dim firstAddress As String
With ActiveSheet.UsedRange
Set c = .Find(stringToFind, LookIn:=xlValues, LookAt:=xlPart)
If Not c Is Nothing Then
firstAddress = c.Address
Do
counter = counter + 1
Set c = .FindNext(c)
Loop While Not c Is Nothing And c.Address <> firstAddress
End If
End With
FINDIST1 = counter
End Function
UPDATE 2
Chris Nielsen made two very good points in the comments below:
ActiveSheet.Evaluate is faster than Application.Evaluate. The link to the text by Charles Williams in the comment explains this behavior.
Good old Variant array will perform better than any other method.
For completeness, I post the version of the variant array method that I tested:
Function FINDIST_looping(stringToFind) As Long
Dim vContents, lRow As Long, lCol As Long, lCounter As Long
vContents = ActiveSheet.UsedRange.Value2
For lRow = LBound(vContents, 1) To UBound(vContents, 1)
For lCol = LBound(vContents, 2) To UBound(vContents, 2)
lCounter = IIf(InStr(vContents(lRow, lCol), stringToFind), _
lCounter + 1, lCounter)
Next lCol
Next lRow
FINDIST_looping = lCounter
End Function
Doug Glancy made another very good point, namely that COUNTIF can be used instead of SEARCH. This leads to a non-array formula solution and should dominate my original formula, performance-wise.
Here is Doug's formula:
FINDIST_COUNTIF = ActiveSheet.Evaluate("COUNTIF(" _
& ActiveSheet.Cells.Address & "," & Chr(34) & "*" _
& stringToFind & "*" & Chr(34) & ")")
In fact, Doug's point implies that no Evaluate() is necessary. We can call Countif from the WorksheetFunction object. Therefore, if the goal is to call this function from a spreadsheet, there is no need to use Evaluate() or to wrap it up in a UDF - it is a typical COUNTIF application with wildcards.
Results:
Read from range: 247,495 ms (~ 4 mins 7 secs)
Application.Evaluate: 3,261 ms (~ 3.2 secs)
Variant Array: 1,706 ms (~ 1.7 secs)
ActiveSheet.Evaluate: 1,257 ms (~ 1.3 secs)
ActiveSheet.Evaluate (DG): 602 ms (~ 0.6 secs)
WorksheetFunction.CountIf (DG):550 ms (~ 0.55 secs)
It appears that Application.Evaluate is about 75 times faster compared to using Range.Find()(?!) Also, the original code (with Integer changed to Long) runs in ~8 seconds.
Also, it seems that Activesheet.Evaluate is actually faster than the Variant array in this particular case. The difference between calling CountIf as a WorksheetFunction method vs Evaluateing it seems quite small.
CAVEAT: the frequency by which stringToFind is found within the UsedRange might affect the relative performance of the several methods. I ran the Activesheet.Evaluate and Variant Array methods with the above range (A1:AA100000) but with only the ten first cells having the matching string.
Results (average of 6 runs, variance pretty much minimal):
Activesheet.Evaluate: 920 ms (~ 1. sec)
Variant Array: 1654 ms (~ 1.7 secs)
This is interesting - it seems that ActiveSheet.Evaluate in this case has a slightly better performance than variant arrays (unless I have done something horrible in the looping code, in which case please let me know). Also, the Variant method's performance is actually.. invariant with respect to the string's frequency.
runs were made on EXCEL 2010 under Win7.
Working with Tony Dallimore's advice of using Find and also changing your return type to Long.
MSDN article: http://msdn.microsoft.com/en-us/library/office/ff839746(v=office.15).aspx
Function FINDIST(stringToFind) As Long
Dim counter As Long: counter = 0
Dim c As Range
Dim firstAddress As String
With ActiveSheet.UsedRange
Set c = .Find(stringToFind, LookIn:=xlValues, , LookAt:=xlPart)
If Not c Is Nothing Then
firstAddress = c.Address
Do
counter = counter + 1
Set c = .FindNext(c)
Loop While Not c Is Nothing And c.Address <> firstAddress
End If
End With
FINDIST = counter
End Function
Find is generally quicker than a coded equivalent but I've not speed tested against anything else and will be interested to here if it is quick or slow.
Related
I'm trying to compare two different columns in Access 2016. Both contain people's names, but not in the same order in both.
Name
Harris, Keisha
Banaghan, John
Garcia Silveira, Ana
NameFormat2
Keisha Melinda Harris
John C. Banaghan
Ana Lucia Garcia Silveira
Currently, when comparing these two columns, they all flag, as they are not exact matches.
The names in the first column are always shorter than the FULL name that lives in the second column.
Is there a way, be it a query or VBA or any other option, to say that if all of the names in the first column are contained in the second column, then it's a match and to move on?
Any help would be greatly appreciated!
Thanks,
Robert
Not sure if this is the best way, but you could create a VBA function that does the check by splitting the two fields into arrays and looping them. Something like this seems to work:
Function fCompareNames(strName1 As String, strName2 As String) As Boolean
On Error GoTo E_Handle
Dim astrName1() As String
Dim astrName2() As String
Dim intLoop1 As Integer
Dim intLoop2 As Integer
Dim intNames As Integer
Dim intMatch As Integer
strName1 = Replace(strName1, ",", "")
strName2 = Replace(strName2, ",", "")
astrName1 = Split(strName1, " ")
astrName2 = Split(strName2, " ")
intNames = UBound(astrName1) - LBound(astrName1) + 1
For intLoop1 = LBound(astrName1) To UBound(astrName1)
For intLoop2 = LBound(astrName2) To UBound(astrName2)
If astrName1(intLoop1) = astrName2(intLoop2) Then
intMatch = intMatch + 1
Exit For
End If
Next intLoop2
Next intLoop1
If intMatch = intNames Then fCompareNames = True
fExit:
On Error Resume Next
Exit Function
E_Handle:
MsgBox Err.Description & vbCrLf & vbCrLf & "fCompareNames", vbOKCancel + vbCritical, "Error: " & Err.Number
Resume fExit
End Function
You may need to add some extra Replaces to deal with things like "." in the name.
In the examples given, it returns true for the first two (Keisha Harris and John Banaghan) and false for the last (Ana Silveira) as there appears to be a spelling mistake in the second instance of her surname (Silbeira).
Regards,
I'm doing a Boolean comparison of a single precision number passed as an argument to a single precision field in a DAO recordset. The argument is 0.85 and the field is 0.85, yet VBA says the first is less than the second. I've looked at these in the Watch window, in break mode, with Debug.Print statements and they show up as the same number everywhere. Yet VBA doesn't return TRUE when I compare A = B.
The pertinent code is below my signature.
TIA,
Keith
Dim SQL As String
SQL = "SELECT cmpAttainPct, cmpPayPct, cmpPayAmt" _
& " FROM tblCompPlans" _
& " WHERE [cmpPerson]=""" & DMName & """" _
& " AND [cmpPeriod]=""" & BiPeriod & """" _
& " ORDER BY [cmpAttainPct];"
Dim rsCompPlan As DAO.Recordset
Set rsCompPlan = db.OpenRecordset(SQL)
' Round the percent to two decimal places.
' Didn't use the VBA Round() function which uses Banker's rounding,
' That rounds .5 either up or down, whichever will result in
' an even number." (Microsoft Round() help)
' I.E.: .645 = .64 and .655 = .66
' This does standard 4/5 rounding.
AttainPct = Int(AttainPct * 1000) / 1000
Dim ThousandsPlaceValue As Single
ThousandsPlaceValue = (AttainPct * 100) - Int((AttainPct * 100))
AttainPct = (Int(AttainPct * 100) / 100)
AttainPct = AttainPct + IIf(ThousandsPlaceValue < 0.5, (-0.01), (0.01))
...
Do Until .EOF
Debug.Print AttainPct & " attainment equals " & !cmpAttainPct & " comp tier (" & (AttainPct = !cmpAttainPct) & ")"
Select Case True
Case (AttainPct < !cmpAttainPct)
TempAmt = PrevAmt
TierFound = True
.MoveLast
Case (AttainPct = !cmpAttainPct)
' Equal - tier found.
TempAmt = !cmpPayAmt
TierFound = True
.MoveLast
Case Else
PrevAmt = !cmpPayAmt
End Select
First, never use Single for calculations except if there is a very specific and good reason for doing so. If Currency can hold your values (four decimals maximum) use, else Decimal, and Double as the last option.
Second, Int does not do a 4/5 rounding, it rounds down. The simplest bug-free and true 4/5 rounding towards zero you can do, is with Format:
RoundedValue = CCur(Format(ValueToRound, "0.00"))
For a full set of rounding functions, see:
CodeProject: Rounding Values
or:
Experts-Exchange: Rounding values
With AttainPct rounded and converted to, say, Currency, you can reliably use:
Case (AttainPct < CCur(!cmpAttainPct))
and:
Case (AttainPct = CCur(!cmpAttainPct))
Computers are not very good at comparing floating point numbers for equality. Instead of
if a = b
use
if abs(a-b) < 0.0001
set the second part to as small as is necessary to make sure a small enough variation still counts as "equal".
how do I remove duplication of objects in a Collection? This is what I tried:
dim unique_students as Collection
dim no_duplicate_student as cls_Student
dim no_duplication as boolean
For Each student as cls_Student In list_Student 'list_Students = original unsorted collection
no_duplication = True
Dim s As cls_Student
For Each s In unique_students
If s.name = student.name Then
no_duplication = False 'Duplication found
Exit For
End If
next s
If no_duplication Then
'Inserted into new sorted collection if no values matches that of the sorted collection
Set no_duplicate_student = New clsOverlap
no_duplicate_student.name = student.name
unique_students.Add no_duplicate_student
End If
Next student
This however still takes a long time (if list_Student.Count > 5000, then it'll take 30min+ to run). Is there a more efficient way (if possible, decrease time complexity) of removing duplication in a Collection?
Add the student names to a dictionary, which has the .Exists method to check if an item is already in the dictionary.
You can get some ideas from CollectionToDictionary in Collection And Dictionary Procedures
Something like this in your For Each student loop:
If Dict.Exists(Key:=student.name) Then
' is duplicate!
Else
Dict.Add Key:=student.name, Item:=student.name
' you could also do Item:=student if you want the de-duplicated list in a dictionary
End If
The inner loop is not needed. The function will probably run almost instantaneously.
I usually use a dictionary like what Andre451 suggested. Alternatively you could use an ArrayList like this. I'm not sure if there's much of a performance difference between the two but this method also produces a sorted list if that's desirable. The dictionary though can carry key/value pairs, so it just depends on what you're going after.
Sub Demo()
Set AL = CreateObject("System.Collections.ArrayList")
AL.Add "A"
AL.Add "B"
AL.Add "A"
AL.Add "A"
AL.Add "C"
'Sorting allows sequential comparisons to determine uniqueness
'You could also do something similar to the dictionary method with ArrayList.Contains
'but the evluation of ArrayList.Contains runs slower than this
AL.Sort
For i = 0 To AL.Count - 2
If AL(i) <> AL(i + 1) Then
'Prints unique values
Debug.Print AL(i)
End If
Next
If AL(i) <> AL(i - 1) Then
'Prints last value if unique by comparing to one before it
Debug.Print AL(i)
End If
End Sub
Edit: After testing I confirmed that the dictionary method is about twice as fast at 7.7 seconds versus 13 seconds per million. However, at the OP count of 5000 the difference is only 40 vs 80 ms.
Testing Code Here...
Public Declare Function GetTickCount Lib "kernel32.dll" () As Long
Sub DictionaryDemo()
Set D = CreateObject("Scripting.Dictionary")
Set AL = CreateObject("System.Collections.ArrayList")
For i = 0 To 10 ^ 6
AL.Add Round(Rnd * 10, 0)
Next
Start = GetTickCount
For i = 0 To AL.Count - 1
If Not (D.Exists(AL(i))) Then
D.Add AL(i), ""
Debug.Print AL(i)
End If
Next
Debug.Print GetTickCount - Start
End Sub
Sub ArrayListDemo()
Set AL = CreateObject("System.Collections.ArrayList")
For i = 0 To 10 ^ 6
AL.Add Round(Rnd * 10, 0)
Next
'Sorting allows sequential comparisons to determine uniqueness
Start = GetTickCount
AL.Sort
For i = 0 To AL.Count - 2
If AL(i) <> AL(i + 1) Then
'Prints unique values
Debug.Print AL(i)
End If
Next
If AL(i) <> AL(i - 1) Then
'Prints last value if unique by comparing to one before it
Debug.Print AL(i)
End If
Debug.Print GetTickCount - Start
End Sub
Edit Again: Ok, so I find this very interesting. What's most important appears to be the actual type itself. So for example, the testing above creates an ArrayList from which unique values are to be derived. If this is changed to a basic integer array Dim AL(10 ^ 6) As Integer, then the time is slashed from 7.7 to 0.8 seconds. Likewise, the ArrayList method can be slashed from 13 seconds down to 0.5 seconds simply by adding the line A = AL.ToArray after the sorting operation and looping through the array A instead.
This makes sense, since the memory allocation for arrays allows them to be processed really quick. It's also why some people prefer to create their own sorting and uniqueness algorithms rather than go with a less efficient but easy to use method employing a Dictionary or ArrayList as originally suggested here. The dictionary and ArrayLists are still powerful tools and, as mentioned above, they can still extract unique values from a 1 million length in a fraction of a second, but it's worth noting that when it comes to raw efficiency a simple array is wicked fast at looping.
The code below will extract unique values from a 1 million length array in about 0.3 seconds. It's not much different than the OP, but it's a lot more efficient. This is because looping through a collection is horribly slow, not because there was anything inefficient with the basic strategy. Also, notice that the efficiency will decrease as the number of unique values increases (this test only used the 10 unique of 1-10).
Sub ArrayDemo()
Dim A(10 ^ 6) As Integer
Dim B(10) As Integer
For i = 0 To 10 ^ 6
A(i) = Round(Rnd * 10, 0)
Next
Start = GetTickCount
k = 0
For i = 0 To 10 ^ 6
For j = 0 To k
If B(j) = A(i) Then GoTo nxt
Next
B(k) = A(i)
Debug.Print B(k)
k = k + 1
nxt:
Next
Debug.Print GetTickCount - Start
End Sub
I'm working on combining two excel worksheets. Before I start, I'd like to mention that I also have mysql workbench, so I'm open to working on this issue in either sql or vba (I should learn both). I'm working with .bed files, which are lists of genomic coordinates. In short, the data is indexed by chromosome number (ie:chr2) and then has a numerical start and stop location on the chromosome. These numerical locations can span a large range (ie:100-10,000) or be a single position (ie: 999-1000). I have a list of coordinates that cover a large range, and in a separate file I have a list of single positions.
Example of a file with ranges:
chromosome start stop
chr1 4561 6321
chr3 9842 11253
Example of file with single positions:
chromosome start stop
chr1 5213 5214
chr3 10254 10255
I would like to combine these worksheets such that if a location in my list of single positions is found within the range in my list of ranges, the locations for both are listed in the same row. The lists are 1000s of locations long, so I'd also like this program to loop through every row. Using the example data listed above, I'd like my output to look like the following:
Example of desired output:
chromosome start stop chromosome start stop
chr1 4561 6321 chr1 5213 5214
chr3 9842 11253 chr3 10254 10255
There is a high probability that multiple single positions will fall within a single range, and I would like these to be listed as separate rows.
I appreciate any help I can get! Thank you in advance. I am eager to learn!
Here's a basic outline which queries two tables on sheets named "Ranges" and "Positions", and outputs the results on a sheet named"Results"
The input tables should have headers, and start in the top-left cell (A1)
Sub SqlJoin()
Dim oConn As New ADODB.Connection
Dim oRS As New ADODB.Recordset
Dim sPath
Dim sSQL As String, wb As Workbook
Set wb = ThisWorkbook
sSQL = " select a.chromosome, a.start, a stop," & _
" b.chromosome, b.start, b.stop " & _
" from <ranges_table> a, <positions_table> b" & _
" where b.start >= a.start and b.stop <= a.stop"
sSQL = Replace(sSQL, "<ranges_table>", _
Rangename(wb.Worksheets("Ranges").Range("A1").CurrentRegion))
sSQL = Replace(sSQL, "<positions_table>", _
Rangename(wb.Worksheets("Positions").Range("A1").CurrentRegion))
If wb.Path <> "" Then
sPath = wb.FullName
Else
MsgBox "The workbook must be saved first!"
Exit Sub
End If
oConn.Open "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" & sPath & "';" & _
"Extended Properties='Excel 12.0;HDR=Yes;IMEX=1';"
oRS.Open sSQL, oConn
If Not oRS.EOF Then
wb.Worksheets("Results").Range("A2").CopyFromRecordset oRS
Else
MsgBox "No records found"
End If
oRS.Close
oConn.Close
End Sub
Function Rangename(r As Range) As String
Rangename = "[" & r.Parent.Name & "$" & _
r.Address(False, False) & "]"
End Function
The following code is working fine, however it is not rounding the stored values to the nearest pence, for example 8.025 is coming up instead of 8.01 can anyone suggest a fix?
Public Function Fs_Update_AccInvoices_Nexum() As Boolean
Dim adoRsInvoiceDCID As New ADODB.Recordset
Dim adoRsNexumInvoices As New ADODB.Recordset
On Error Resume Next
adoRsInvoiceDCID.Open "SELECT * FROM [tInvoiceDCID] where Issued=0" _
, CurrentProject.Connection, 2, 2
While Not adoRsInvoiceDCID.EOF
adoRsNexumInvoices.Open "SELECT * FROM [tPrintInvoiceNumbersNexum] " _
& " WHERE InvoiceID=" & adoRsInvoiceDCID("InvoiceID") _
, CurrentProject.Connection, 2, 2
If Not adoRsNexumInvoices.EOF Then
DoCmd.SetWarnings off
DoCmd.RunSQL "Update [Acc Invoices t Nexum] " _
& " SET [Total Due] = Round((Fees/0.8)+(VAT/0.8)+OutLays,2)" _
& " Fees = Round(Fees/0.8,2), VAT = Round(Vat/0.8,2)" _
& " WHERE Invoice=" & adoRsNexumInvoices("PrintingasINVOICE")
End If
adoRsNexumInvoices.Close
adoRsInvoiceDCID.MoveNext
Wend
adoRsInvoiceDCID.Close
End Function
Cheers
Ross
Quick note:
I've noticed some inaccuracies in vba's rounding function which the format function doesn't fix. In my particular case, I was trying to round the number 3687.23486
round(3687.23486) = 3687.23
format(3687.23486, "#.00") = 3687.23
under the traditional round to nearest rules, this should result in 3687.24
I've seen several custom functions posted to various forums to address rounding problems, but none worked for me, so I wrote my own.
Function trueRound(ByVal varNumber As Variant, ByVal intDecimals As Integer) As Double
If IsNull(varNumber) Then
trueRound = 0
Exit Function
End If
Dim decimals As Integer, testNumber As Double
decimals = 0
If InStr(varNumber, ".") > 0 Then decimals = Int(Len(varNumber)) - Int(Len(Fix(varNumber)) + 1)
If decimals = 0 Or intDecimals > decimals Then
trueRound = varNumber
Exit Function
End If
Do Until Len(varNumber) - Len(Fix(varNumber)) - 1 <= intDecimals
testNumber = varNumber * 10 ^ (decimals - 1)
varNumber = Round(testNumber, 0) / 10 ^ (decimals - 1)
decimals = decimals - 1
Loop
trueRound = varNumber
End Function
I hashed it out pretty quick, so there's no error handling, and a null value passed to the function results in 0, which may not be ideal for all situations. I use this regularly in some pretty large queries, hope it can help someone else.
"The Round function performs round to even, which is different from round to larger."
--Microsoft
Debug.Print Round(19.955, 2)
'Answer: 19.95
Debug.Print Format(19.955, "#.00")
'Answer: 19.96
See also How to Round in MS Access, VBA