Image scraping in access - ms-access

I have generated data matrix by using Access vba. Now I have to scrape that generated pic into my form each and everytime. Here is the code. Half part is generating barcode while half to scrape that picture in Access form is not working.
Private Sub Command24_Click()
Dim IE As New InternetExplorer
IE.Visible = True
IE.Navigate ("https://barcode.tec-it.com/en/DataMatrix?data=" & Forms!QRcodes!Text)
Do
DoEvents
Loop Until IE.ReadyState = READYSTATE_COMPLETE
StatusBar = "loading webpage...."
Dim img As Object
Dim html As HTMLDocument
Set html = IE.Document
Dim elementcol As Object, link As Object
Set elementcol = html.getElementsByTagName("img")
Dim doc As Object
Forms!QRcodes!OLEBound34 = elementcol
'Set img = ele.getElementsByTagName("img")
Set Tables!QRcodes!Matrix = elementcol
MsgBox "Getting Code"
End Sub

Grab the skeleton for the full URL to only the generated picture (as displayed on the site) - for example:
https://barcode.tec-it.com/barcode.ashx?data=12345678&code=DataMatrix&multiplebarcodes=false&translate-esc=false&unit=Fit&dpi=96&imagetype=Gif&rotation=0&color=%23000000&bgcolor=%23ffffff&qunit=Mm&quiet=0&dmsize=Default' alt='Barcode Generator TEC-IT
Then you can use my code and demo published in one of my articles:
Show pictures directly from URLs in Access forms and reports
The two main functions downloads pictures directly, one is caching the download:
' Download a file or a page with public access from the web as a cached file of Internet Explorer.
' Returns the full path of the cached file if success, an empty string if not.
'
' Examples:
'
' Download a file:
' Url = "https://www.codeproject.com/script/Membership/ProfileImages/%7Ba82bcf77-ba9f-4ec3-bbb3-1d9ce15cae23%7D.jpg"
' Result = DownloadCacheFile(Url)
' Result -> C:\Users\UserName\AppData\Local\Microsoft\Windows\INetCache\IE\B2IHEJQZ\{a82bcf77-ba9f-4ec3-bbb3-1d9ce15cae23}[2].png
'
' Download a page:
' Url = "https://www.codeproject.com/Tips/1022704/Rounding-Values-Up-Down-By-Or-To-Significant-Figur?display=Print"
' Result = DownloadCacheFile(Url)
' Result -> C:\Users\UserName\AppData\Local\Microsoft\Windows\INetCache\IE\B2IHEJQZ\Rounding-Values-Up-Down-By-Or-To-Significant-Figur[1].htm
'
' 2017-05-25. Gustav Brock, Cactus Data ApS, CPH.
'
Public Function DownloadCacheFile( _
ByVal Url As String) _
As String
Const BufferLength As Long = 1024
Const BindFDefault As Long = 0
Const ErrorNone As Long = 0
Dim FileName As String
Dim LocalFileName As String
Dim Result As Long
' Create buffer for name of downloaded and/or cached file.
FileName = Space(BufferLength - 1) & vbNullChar
' Download file or page.
' Return name of cached file in parameter FileName.
Result = URLDownloadToCacheFile(0, Url & vbNullChar, FileName, BufferLength, BindFDefault, 0)
' Trim file name.
LocalFileName = Split(FileName, vbNullChar)(0)
DownloadCacheFile = LocalFileName
End Function
I tested with a few URLs, and the codes were displayed at once:

Related

Importing data from a hyperlink on a webpage in excel using macros

I want to import some data from a website https://www.amfiindia.com/nav-history-download. On this page, there is a link "Download Complete NAV Report in Text Format" which will give me the required data. But this link is not static so I cannot use this directly in VBA to download my data. So how to download data from a hyperlink on a webpage using excel?
My approach is first getting the hyperlink in a variable then use that variable to get the data?
First, get the hyperlink using getElementsByTagName function as shown below.
Then use that as URL to get the data.
But I am getting type mismatch error while equating website which is a string with my hyperlink.
I don't know the type of href. Tried seeing in watch window showing Variant, tried that still error.
Kindly help me with this.
Sub webscraping()
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim price As Variant
Dim cellAddress As String
Dim rowNumber As Long
Dim ie As InternetExplorer
Dim ht As HTMLDocument
Dim hr As MSHTML.IHTMLElement
'Dim Hra As MSHTML.IHTMLElement
Set ie = New InternetExplorer
ie.Visible = True
ie.Navigate ("https://www.amfiindia.com/nav-history-download")
Do Until ie.ReadyState >= 4
DoEvents
Loop
Set ht = ie.Document
'MsgBox ht.getElementById("navhistorydownload")
Set hr = ht.getElementsByTagName("a")(18).href
' Website to go to.
website = StrConv(hr, vbUnicode)
' Create the object that will make the webpage request.
Set request = CreateObject("MSXML2.XMLHTTP")
' Where to go and how to go there - probably don't need to change this.
request.Open "GET", website, False
' Get fresh data.
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
' Send the request for the webpage.
request.send
' Get the webpage response data into a variable.
response = StrConv(request.responseBody, vbUnicode)
' Put the webpage into an html object to make data references easier.
html.body.innerHTML = response
' Get the price from the specified element on the page.
'price = html.getElementstagName("a").Item(0).innerText
cellAddress = Range("A" & Rows.Count).End(xlUp).Address
rowNumber = Range(cellAddress).Row
ThisWorkbook.Sheets(1).Cells(rowNumber + 1, 1) = response
' MsgBox rowNumber
' MsgBox cellAddress
' Output the price into a message box.
'MsgBox price
End Sub
If you don't know the type then you can use
?typename(ht.getElementsByTagName("a")(18).href)
in the immediate window.
It should be a string and declared as such.
Rather than indexing into an anchor collection I would grab by css selector
ht.querySelector(".nav-hist-dwnld a").href
This specifies the parent node with class name nav-hist-dwnld and then asks for the first child a tag.
This, website = StrConv(hr, vbUnicode) is not required. Use the extracted href direct.

Return URL From First Search Result

I have an Excel workbook of around 25,000 company keywords from which I'd like to get the company website URL.
I am looking to run a VBA script which can run these keywords as a Google search, and pull the URL of the first result into a spreadsheet.
I found a similar thread.
The results of this to be hit-and-miss; some keywords return the URL in the next column, others remain blank.
It also seemed to pull the URL of Google's optimised sub-links in the first search result rather than the main website URL: Google Search Result example
I then found the below code here which I ran on a sample list of 1,000 keywords. The author of this blog stipulates that this code works for Mozilla Firefox.
I tested IE code that he has also written but this did not achieve the same results (it was adding hyperlinks consisting of descriptive text from the search results rather than the raw URL).
The Firefox code worked until the 714th row, then returned a error message
"Run time error 91: object variable or with block variable not set"
Spreadsheet layout showing successful results and row at which macro stopped
Sub GoogleURL ()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object
Dim html As Object
Dim objResultDiv As Object
Dim objH As Object
lastRow = Range(“A” & Rows.Count).End(xlUp).Row
For i = 2 To lastRow
url = “https://www.google.co.uk/search?q=” & Cells(i, 1) & “&rnd=” & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject(“MSXML2.serverXMLHTTP”)
XMLHTTP.Open “GET”, url, False
XMLHTTP.setRequestHeader “Content-Type”, “text/xml”
XMLHTTP.setRequestHeader “User-Agent”, “Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0”
XMLHTTP.send
Set html = CreateObject(“htmlfile”)
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid(“rso”)
Set objH = objResultDiv.getelementsbytagname(“h3”)(0)
Cells(i, 2).Value = objH.innerText
Set html = CreateObject(“htmlfile”)
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid(“rso”)
Set objH = objResultDiv.getelementsbytagname(“cite”)(0)
Cells(i, 3).Value = objH.innerText
DoEvents
Next
End Sub
As Firefox is a third party browser for the support scope of Microsoft, I can help you to check the VBA code for the IE browser.
You said that the VBA code given in this link for the IE browser generates the description with the link and your requirement is to store description and link in a separate column.
I tried to modify that sample code as per your requirement.
Here is the modified code from that sample.
Option Explicit
Const TargetItemsQty = 1 ' results for each keyword
Sub GWebSearchIECtl()
Dim objSheet As Worksheet
Dim objIE As Object
Dim x As Long
Dim y As Long
Dim strSearch As String
Dim lngFound As Long
Dim st As String
Dim colGItems As Object
Dim varGItem As Variant
Dim strHLink As String
Dim strDescr As String
Dim strNextURL As String
Set objSheet = Sheets("Sheet1")
Set objIE = CreateObject("InternetExplorer.Application")
objIE.Visible = True ' for debug or captcha request cases
y = 1 ' start searching for the keyword in the first row
With objSheet
.Select
.Range(.Columns("B:B"), .Columns("B:B").End(xlToRight)).Delete ' clear previous results
.Range(.Columns("C:C"), .Columns("C:C").End(xlToRight)).Delete ' clear previous results
.Range("A1").Select
Do Until .Cells(y, 1) = ""
x = 2 ' start writing results from column B
.Cells(y, 1).Select
strSearch = .Cells(y, 1) ' current keyword
With objIE
lngFound = 0
.navigate "https://www.google.com/search?q=" & EncodeUriComponent(strSearch) ' go to first search results page
Do
Do While .Busy Or Not .READYSTATE = 4: DoEvents: Loop ' wait IE
Do Until .document.READYSTATE = "complete": DoEvents: Loop ' wait document
Do While TypeName(.document.getelementbyid("res")) = "Null": DoEvents: Loop ' wait [#res] element
Set colGItems = .document.getelementbyid("res").getElementsByClassName("g") ' collection of search result [.g] items
For Each varGItem In colGItems ' process each item in collection
If varGItem.getelementsbytagname("a").Length > 0 And varGItem.getElementsByClassName("st").Length > 0 Then ' must have hyperlink and description
strHLink = varGItem.getelementsbytagname("a")(0).href ' get first hyperlink [a] found in current item
strDescr = GetInnerText(varGItem.getElementsByClassName("st")(0).innerHTML) ' get first description [span.st] found in current item
lngFound = lngFound + 1
'Debug.Print (strHLink)
'Debug.Print (strDescr)
With objSheet ' put result into cell
.Cells(y, x).Value = strDescr
.Hyperlinks.Add .Cells(y, x + 1), strHLink
.Cells(y, x).WrapText = True
x = x + 1 ' next column
End With
If lngFound = TargetItemsQty Then Exit Do ' continue with next keyword - necessary quantity of the results for current keyword found
End If
DoEvents
Next
If TypeName(.document.getelementbyid("pnnext")) = "Null" Then Exit Do ' continue with next keyword - no [a#pnnext.pn] next page button exists
strNextURL = .document.getelementbyid("pnnext").href ' get next page url
.navigate strNextURL ' go to next search results page
Loop
End With
y = y + 1 ' next row
Loop
End With
objIE.Quit
' google web search page contains the elements:
' [div#res] - main search results block
' [div.g] - each result item block within [div#res]
' [a] - hyperlink ancor(s) within each [div.g]
' [span.st] - description(s) within each [div.g]
' [a#pnnext.pn] - hyperlink ancor to the next search results page
End Sub
Function EncodeUriComponent(strText As String) As String
Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
Set objHtmlfile = CreateObject("htmlfile")
objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
End If
EncodeUriComponent = objHtmlfile.parentWindow.encode(strText)
End Function
Function GetInnerText(strText As String) As String
Static objHtmlfile As Object
If objHtmlfile Is Nothing Then
Set objHtmlfile = CreateObject("htmlfile")
objHtmlfile.Open
objHtmlfile.Write "<body></body>"
End If
objHtmlfile.body.innerHTML = strText
GetInnerText = objHtmlfile.body.innerText
End Function
Output in IE 11 browser:
You can try to run it on your side to see the results with large amount of data.
If you meet with any performance issue then I suggest you try it with a smaller amount of data.

Downloading png to disk creates broken file

In my project, I want to download a png file from a url and save it to disk.
I have a url to an image, and I can load it in my web browser without any problem.
But when I use Access to download this file and save it, it saves "a" file, but it doesn't seem to have any image. Every file it creates is 167kb, and I cannot view them with my graphics viewers (XnViewMP, for example).
When I load the created file with PE Studio, it says the MZ signature is missing (I'm not sure that means anything).
I've tried this with an .ico file on my local webserver, too, and have the same issue.
Here is the code I'm using to download the image file.
Public Function DownloadFile(whaturl As String, whatdestination As String) As Boolean
Dim newfilepath
Dim success As Boolean
Dim WinHttpReq: Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", whaturl, False
WinHttpReq.Send
If WinHttpReq.Status = 200 Then
Dim oStream: Set oStream = CreateObject("ADODB.Stream")
oStream.type = 1 '1 is binary
oStream.Open
oStream.Write WinHttpReq.ResponseBody
oStream.SaveToFile whatdestination, 2 ' 1 = no overwrite, 2 = overwrite
oStream.Close
success = True
Else
success = False
End If
DownloadFile = success
End Function
Maybe someone can tell my why this code seems to fail in downloading the png file? I've tried with an .ico file, too, and have the same problem. Maybe someone can help me with this code?
You are making it too hard. Use a function like this:
Option Compare Database
Option Explicit
' API declarations.
'
Private Declare Function URLDownloadToFile Lib "Urlmon" Alias "URLDownloadToFileA" ( _
ByVal pCaller As Long, _
ByVal szURL As String, _
ByVal szFileName As String, _
ByVal dwReserved As Long, _
ByVal lpfnCB As Long) _
As Long
' Download a file or a page with public access from the web.
' Returns 0 if success, error code if not.
'
' If parameter NoOverwrite is True, no download will be attempted
' if an existing local file exists, thus this will not be overwritten.
'
' Examples:
'
' Download a file:
' Url = "https://www.codeproject.com/script/Membership/ProfileImages/%7Ba82bcf77-ba9f-4ec3-bbb3-1d9ce15cae23%7D.jpg"
' FileName = "C:\Test\CodeProjectProfile.jpg"
' Result = DownloadFile(Url, FileName)
'
' Download a page:
' Url = "https://www.codeproject.com/Tips/1022704/Rounding-Values-Up-Down-By-Or-To-Significant-Figur?display=Print"
' FileName = "C:\Test\CodeProject1022704.html"
' Result = DownloadFile(Url, FileName)
'
' Error codes:
' -2146697210 "file not found".
' -2146697211 "domain not found".
' -1 "local file could not be created."
'
' 2004-12-17. Gustav Brock, Cactus Data ApS, CPH.
' 2017-05-25. Gustav Brock, Cactus Data ApS, CPH. Added check for local file.
' 2017-06-05. Gustav Brock, Cactus Data ApS, CPH. Added option to no overwrite the local file.
'
Public Function DownloadFile( _
ByVal Url As String, _
ByVal LocalFileName As String, _
Optional ByVal NoOverwrite As Boolean) _
As Long
Const BindFDefault As Long = 0
Const ErrorNone As Long = 0
Const ErrorNotFound As Long = -1
Dim Result As Long
If NoOverwrite = True Then
' Page or file should not be overwritten.
' Check that the local file exists.
If Dir(LocalFileName, vbNormal) <> "" Then
' File exists. Don't proceed.
Exit Function
End If
End If
' Download file or page.
' Return success or error code.
Result = URLDownloadToFile(0, Url & vbNullChar, LocalFileName & vbNullChar, BindFDefault, 0)
If Result = ErrorNone Then
' Page or file was retrieved.
' Check that the local file exists.
If Dir(LocalFileName, vbNormal) = "" Then
Result = ErrorNotFound
End If
End If
DownloadFile = Result
End Function
taken from my article: Show pictures directly from URLs in Access forms and reports
(If you don't have an account, browse for the link: Read the full article)
Full code is also at GitHub: VBA.PictureUrl

External Images in Access Form

I have a dataset with an Image URL that links to a picture of the item in each record. I'd like to create a form so I can look at the data associated with the item and see a picture of it. I'm fairly new to Access and can't seem to figure out how to get the external image to show up.
Thanks for the help!
You can download the pictures transparently to the browser cache and then load them from the cache to be displayed:
' Download (picture) file from a URL of a hyperlink field to a
' (temporary) folder, and return the full path to the downloaded file.
'
' This can be used as the control source for a bound picture control.
' If no Folder is specified, the user's IE cache folder is used.
'
' Typical usage in the RecordSource for a form or report where Id is
' the unique ID and Url is the hyperlink field holding the URL to
' the picture file to be displayed:
'
' - to a cached file where parameter Id is not used:
'
' Select *, UrlContent(0, [Url]) As Path From SomeTable;
'
' - or, where Id is used to create the local file name:
'
' Select *, UrlContent([Id], [Url], "d:\somefolder") As Path From SomeTable;
'
' Then, set ControlSource of the bound picture control to: Path
'
' 2017-05-28. Gustav Brock, Cactus Data ApS, CPH.
'
Public Function UrlContent( _
ByVal Id As Long, _
ByVal Url As String, _
Optional ByVal Folder As String) _
As Variant
Const NoError As Long = 0
Const Dot As String = "."
Const BackSlash As String = "\"
Dim Address As String
Dim Ext As String
Dim Path As String
Dim Result As String
' Strip leading and trailing octothorpes from URL string.
Address = HyperlinkPart(Url, acAddress)
' If Address is a zero-length string, Url was not wrapped in octothorpes.
If Address = "" Then
' Use Url as is.
Address = Url
End If
If Folder = "" Then
' Import to IE cache.
Result = DownloadCacheFile(Address)
Else
If Right(Folder, 1) <> BackSlash Then
' Append a backslash.
Folder = Folder & BackSlash
End If
' Retrieve extension of file name.
Ext = StrReverse(Split(StrReverse(Address), Dot)(0))
' Build full path for downloaded file.
Path = Folder & CStr(Id) & Dot & Ext
If DownloadFile(Address, Path) = NoError Then
Result = Path
End If
End If
UrlContent = Result
End Function
The helper functions and the full explanation and a demo can be found here:
Show pictures directly from URLs in Access forms and reports
Code is also at GitHub: VBA.PictureUrl

I'm trying to extract data in the tag <a data-params> from Amazon.com website

I have set of Amazon URLS (https://www.amazon.com/dp/B01LTIORC8) in column A in excel and Im trying to extract "B00M4L4MFC" data from " href="/dp/B00M4L4MFC/ref=dp_cerb_1"" in column B using the below html tag.
<a data-params="/gp/cerberus/log/click/mid/ATVPDKIKX0DER/asin/B01LTIORC8/sub/B00M4L4MFC/pos/1/dev/WEB" class="a-link-normal cerberus-asin" href="/dp/B00M4L4MFC/ref=dp_cerb_1">
I found the below code online :
Sub GetAboutUsLinks()
Dim internet As Object
Dim html As Object
Dim myLinks As Object
Dim myLink As Object
Dim result As String
Dim myURL As String
Dim LastRow As Integer
Set internet = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
LastRow = Cells(Rows.Count, 1).End(xlUp).Row
'Loop through all the web links on the worksheet one by one and then do some things
For i = 2 To LastRow
'Get the link from the worksheet and assign it to the variable
myURL = Sheet1.Cells(i, 1).Value
'Now go to the website
internet.navigate myURL
'Keep the internet explorer visible
internet.Visible = True
'Ensure that the web page has downloaded completely
While internet.ReadyState <> 4
DoEvents
Wend
'Get the data from the web page that is in the links and assign it to the
variable
result = internet.document.body.innerHTML
'create a new html file
Set html = internet.document
MsgBox html.DocumentElement.innerHTML
'CreateObject (“htmlfile”)
'now place all the data extracted from the web page into the new html document
html.body.innerHTML = result
Set myLinks = html.getElementsByTagName(“a”)
'loop through the collected links and get a specific link defined by the conditions
For Each myLink In myLinks
If Right$(myLink, 9) = "ref=dp_cerb_1" Then
Sheet1.Cells(i, 2).Value = myLink
End If
'go to the next link
Next myLink
'once the last web link on the sheet has been visited close the internet explorer
If i = LastRow Then
internet.Quit
End If
'go to the next web link on the worksheet
Next i
End Sub
If you have already downloaded the data (and you have it is sitting in the cells of your sheet) then you do not need to fire up the browser (nice syntax there by the way, GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}"), I will blog that!)
See this blog post for opening HTML already downloaded and parsing it.
Can you give this a try. . .
Sub GetAboutUsLinks()
Dim internet As Object
Dim html As Object
Dim myLinks As Object
Dim myLink As Object
Dim result As String
Dim myURL As String
Dim LastRow As Integer
Dim varAuxMyLink As Variant
Const LNG_INDEX_POSITION_ToGetMyLink = 2
Const STR_BREAKING_CHARACTER = "/"
Set internet = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
LastRow = Cells(Rows.Count, 1).End(xlUp).Row
'Loop through all the web links on the worksheet one by one and then do some things
For i = 2 To LastRow
'Get the link from the worksheet and assign it to the variable
myURL = Sheet1.Cells(i, 1).Value
'Now go to the website
internet.navigate myURL
'Keep the internet explorer visible
internet.Visible = True
'Ensure that the web page has downloaded completely
While internet.ReadyState <> 4
DoEvents
Wend
'Get the data from the web page that is in the links and assign it to the
variable
result = internet.document.body.innerHTML
'create a new html file
Set html = internet.document
MsgBox html.DocumentElement.innerHTML
'CreateObject (“htmlfile”)
'now place all the data extracted from the web page into the new html document
html.body.innerHTML = result
Set myLinks = html.getElementsByTagName(“a”)
'loop through the collected links and get a specific link defined by the conditions
For Each myLink In myLinks
If Right$(myLink, 9) = "ref=dp_cerb_1" Then
' Transform the string in an Array/Vector
varAuxMyLink = VBA.Split(myLink, STR_BREAKING_CHARACTER)
' Check if variable is an Array
If VBA.IsArray(varAuxMyLink) Then
' Get thirth element of Array / Vector
Sheet1.Cells(i, 2).Value = varAuxMyLink(LBound(varAuxMyLink) + LNG_INDEX_POSITION_ToGetMyLink)
' Restart de variable
Let varAuxMyLink = Empty
End If
End If
'go to the next link
Next myLink
'once the last web link on the sheet has been visited close the internet explorer
If i = LastRow Then
internet.Quit
End If
'go to the next web link on the worksheet
Next i
End Sub