I must write a program that periodically reads a web page and copies certain data from a table on that page to an Excel spreadsheet. I don't know where to start or what programming language is suitable for this project. I know a little C++ and Matlab programming. Can anyone offer advice to point me in the right direction or suggest open source projects which do something similar?
I can use wget(linux) or fget1(matlab) to download the webpages, but I don't know how can I save certain data from source of this webpages into Excel.
I will assume you have room for learning C#. Since you have to extract the table from a web page, you need a special library/framework to deal with web browsing such as Watin. After getting the table, it's matter of saving into Excel spreadsheet. For convenience, you can write a CSV format (comma separated text) and excel can open the file. Hope it helps
I used the following code vb.net to parse multiple html table from a saved web page to a datatable (the table must have the same structure) (using Html-Agility-Pack) and save it to Xml file:
Imports System.Net
Public Sub ParseHtmlTable(byval HtmlFilePath as String)
Dim webStream As Stream
Dim webResponse = ""
Dim req As FileWebRequest
Dim res As FileWebResponse
req = WebRequest.Create("file:///" & HtmlFilePath)
req.Method = "GET" ' Method of sending HTTP Request(GET/POST)
res = req.GetResponse ' Send Request
webStream = res.GetResponseStream() ' Get Response
Dim webStreamReader As New StreamReader(webStream)
Dim htmldoc As New HtmlAgilityPack.HtmlDocument
htmldoc.LoadHtml(webStreamReader.ReadToEnd())
Dim nodes As HtmlAgilityPack.HtmlNodeCollection = htmldoc.DocumentNode.SelectNodes("//table/tr")
Dim dtTable As New DataTable("Table1")
Dim Headers As List(Of String) = nodes(0).Elements("th").Select(Function(x) x.InnerText.Trim).ToList
For Each Hr In Headers
dtTable.Columns.Add(Hr)
Next
For Each node As HtmlAgilityPack.HtmlNode In nodes
Dim Row = node.Elements("td").Select(Function(x) x.InnerText.Trim).ToArray
dtTable.Rows.Add(Row)
Next
dtTable.WriteXml("G:\1.xml", XmlWriteMode.WriteSchema)
End Sub
After that import the file to Excel
Read this Article to import XML into excel
Hope it helps
Related
Frequency should be 1 hour for creating an XML file. Once created, it needs to be sent through API. It has an API key for authenciation. It needs to be connected to a database first in order to get the data for creating the xml file. I need this functionality in VB (.NET)
Dim msgList As XElement
Dim xDoc As XDocument
Dim sw As New StringWriter()
Dim list As String = ""
Dim xmitList As String = ""
Dim xmitArray As New ArrayList
Dim xmlWriter As StringWriter = New StringWriter()
Dim SQLStatement As String = ""
Dim SQLErr As String = ""
Dim SQLResult As String = ""
Dim SQLResultTable As New DataTable
'********************
' Create the XML Document (XDeclaration sets the header information, and the XElement is the first node)
xDoc = New XDocument(
New XDeclaration("1.0", "UTF-16", "yes"),
New XElement("MessageList")
)
' Get the first node (<MessageList>) so we can then add child nodes to it when getting the data to send
msgList = xDoc.FirstNode
I came this far for creating the file in XML. Before this it needs to be connected to DB to grab the data and then start the process.
I tried creating the XML file but still stuck on completing that. Still left with two major steps excluding this, connecting to DB and sending the XML file using HTTP request. I need this functionality in VB (.NET).
We are using Azure APIM service & linked App Insights to get the log of request for analysis purpose.
As we are using Imperva WAF, we are getting Imperva IPs and getting client's IPs in request header thru App Insights Logs(Analytics).
So getting data in csv format, something like this
"2019-10-06T17:21:20.2264252Z","|key",,"OPTIONS /Endpoint/","https://APIM/query",True,200,"0.3565","<250ms",request,"{""ApimanagementServiceName"":""APIM_name"",""ApimanagementRegion"":""Country"",""HTTP Method"":""OPTIONS"",""API Name"":""API"",""Cache"":""None"",""Request-Incap-Client-IP"":""2a:23c4:1c4c:6c00:a458_IP""}","{""Client Time (in ms)"":0,""Response Size"":334,""Request Size"":0}","OPTIONS /EndPoint/",Key,,,,,,,PC,,,"0.0.0.0",,,"Country",,"APIM Country","APIM Country","key","App Insights","key","apim:0.81.21.0","key",1
Now i want to extract IP from "Request-Incap-Client-IP" element which is stored in JSON format starting from "ApimanagementServiceName".
I looked at help on web and all are talking about macro, custom code.
In my opinion, excel should have function to parse json and get value from specific columns i mean solution should be straightforward and simple.
Use that custom library to parse/read/write JSON files and streams
https://github.com/VBA-tools/VBA-JSON
Dim fso As New FileSystemObject
Dim JsonTS As TextStream
Dim Json As Dictionary
Dim JsonText As String
Set JsonTS = fso.OpenTextFile("yourfile.csv", ForReading)
JsonText = JsonTS.ReadAll
JsonTS.Close
'process the string to isolate JSON part of your CSV using Split/regexp
Set Json = JsonConverter.ParseJson(JsonText)
Then you can just retrieve the value like that :
Dim ipValue as String
ipValue = Json("Request-Incap-Client-IP")
My objective is to run a search for some data and return the results into an excel table. I'm using the service newsapi.org and using VBA to do this.
I'm sending a XMLHttpRequest to newsapi.org and successfully receiving a (JSON) response, which I am able to save into a file on my desktop. I however cannot import that response into excel as I receive run-time error 13: type mismatch.
Bizarrely when I change my source to a different JSON file, it works. e.g. http://jsonplaceholder.typicode.com/users
So I'm assuming the issue is somewhere around the type of the JSON response I am receiving.
Public Sub xmlhttptutorial()
Dim xmlhttp As Object
Dim myurl As String
Dim JSON As Object
Dim myFile As String
Dim i As Integer
Dim ws As Worksheet
Set xmlhttp = CreateObject("MSXML2.XMLHTTP")
Set ws = Sheet2
myFile = "C:\Users\A0781525\Desktop\myFile.txt"
myurl = "https://newsapi.org/v2/everything?q=Ashley%20Madison%20Data%20Breach&"
xmlhttp.Open "GET", myurl, False
xmlhttp.Send
Set JSON = JsonConverter.ParseJson(xmlhttp.ResponseText)
Open myFile For Output As #1: Print #1, xmlhttp.ResponseText: Close #1
i = 2
For Each Item In JSON
Range("A2").Value = Item("articles")("0:")("source")("id:")
Range("A2").Value = Item("articles")("0:")("source")("name")
Range("A2").Value = Item("articles")("0:")("title")
i = i + 1
Next
End Sub
The break occurs at line:
Range("A2").Value = Item("articles")("0:")("source")("id:")
A sample of the JSON file output I receive:
{"status":"ok","totalResults":16,"articles":[{"source":{"id":"mashable","name":"Mashable"},"author":"Jack Morse","title":"Porn site leaks over a million users' private info","description":"The great thing about the internet is that no one has to know you have a serious thing for hentai pornography. Unless, that is, the porn site you have an account on leaks your personal information. Over a million Luscious.net account holders faced that unexpe…","url":"https://mashable.com/article/porn-site-leaks-users-data/","urlToImage":"https://mondrian.mashable.com/2019%252F08%252F20%252F24%252F62fc9aa277d54b2092a39393d2202a62.856fe.jpg%252F1200x630.jpg?signature=MBXieHs3n4uvowiVyV4K8cCO4j4=","publishedAt":"2019-08-20T22:36:24Z","content":"The great thing about the internet is that no one has to know you have a serious thing for hentai pornography. Unless, that is, the porn site you have an account on leaks your personal information. \r\nOver a million Luscious.net account holders faced that unex… [+2840 chars]"}
You are parsing the JSON incorrectly. Probably due to a misunderstanding of how it is constructed.
Try something like:
i = 2
'Cells.Clear
For Each item In JSON("articles")
Cells(i, 1).Value = item("source")("id")
Cells(i, 2).Value = item("source")("name")
Cells(i, 3).Value = item("title")
i = i + 1
Next
The problem is with the way you are trying to access the parsed json elements.
Not having the exact structure of the JSON the best I can do is assume what you need to do is this:
Debug.Print JSON("articles")(1)("source")("id")
To access the first article's id.
or this
For Each item In JSON("articles")
Debug.Print item("source")("id")
Next item
to loop through them
I'm trying to scrape elements from xmlhttp.
I'm not too bad with vba, but relatively new to data scraping.
I have previously been using ie.
I can import the html into a cell, but would like to import specifically, the name, id, price and stock level.
The code I'm using to import the data is
Private Sub HTML_VBA_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code
sURL = "https://www.superdrug.com/Make-Up/Lips/Lip-Kits/Flower-Beauty-Mix-N%27-Matte-Lipstick-Duo-Tickled-Pink-687/p/769466"
'Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
'Get webpage data into Excel
sh02.Cells(1, 1) = sPageHTML
End Sub
Thanks in advance for any help received.
Ian
You cannot extract the information reliably from an xmlhttp request issued against the url you show as the content is javascript loaded and will not have run.
Not sure how sustainable the token is (doesn't seem to matter the value used) but you can join the productid, which is the end of your url, with the ajax token present in the page and issue and xmlhttp request using querystring parameters and parse a json response for the items of interest. I use jsonconverter.bas. After downloading and installing the .bas you need to go VBE > Tools > References and add a reference to Microsoft Scripting Runtime.
Some testing seems to indicate any number can be added after the hyphen in place of the token so you could randomly generate a number on the fly to use.
It's worth noting you can comma separate multiple products in the query string and thus do a bulk request. You would need then do a For Each Loop over the collection of dictionaries returned.
Option Explicit
Public Sub GetInfo()
Const URL As String = "https://www.superdrug.com/micrositeProduct/bulk/769466-1548702898380"
Dim json As Object, title As String, price As String, stocking As String, id As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.Send
Set json = jsonconverter.ParseJson(.responsetext)(1)
End With
title = json("name")
price = json("price")("formattedValue") 'json("price")("value")
stocking = json("stockLevel")
id = json("code")
End Sub
If you use a browser then the json string is present within one the script tags as the .innerHTML and you can easily extract from there.
I am using lotus notes form as .html files and I am sending values to server as json using angular js. But I want to upload files also now. How can I send files to server and extract using lotus script?
Can you please help me someone?
Like the below post. But it is done in ASP.NET . I want to do the same using lotus notes.
File uploading angular js ASP .NET
index.html
<span ng-if="quests.type == '17'">
<input type="file" file-upload multiple id='{{quests.id}}'/>
</span>
<button type="button" ng-click="submitForm();">Submit</button>
The above button will trigger the below code to executed.
Angular Code to post to server
var email=document.getElementById("email").value;
var message={"requesttype": "saveForm","email": emailid,"username": username};
$http.post("http://test.com/ajaxprocess?openagent", message).success(success).error(failure);
The above mentioned agent(lotusscript) will parse the above json and save the document as shown below.
ajaxprocess Agent code
'getting document context
Set docContext = sess.DocumentContext
If docContext.hasItem("REQUEST_CONTENT") Or docContext.hasItem("REQUEST_CONTENT_000") Then
'using openNTF lotus script classes to parse document to json object
Set userDataInfo=getJSONObjectFromDocument(docContext, "")
Dim fieldsobj As New JSONArray
'getting the fields array sent as json array
Set fieldsobj=userDataInfo.GetItemValue("fields")
fieldtype=Field.mGetItemValue("type")(0)
Dim doc As NotesDocument
Dim fieldname As String
ForAll Field In fieldsobj.Items
fieldname=Field.mGetItemValue("Fieldname")(0)
Call doc.Replaceitemvalue(fieldname,Field.mGetItemValue("value")(0))
End ForAll
call doc.save(true,false)
End If
Everything works fine expect file attachments. How can I send files to server with json and save using lotus script or is there any other workaround is there?
I finally found tip and made the solution as follows to get the base64 String and convert to attachment in lotusscript.
http://www-10.lotus.com/ldd/bpmpblog.nsf/dx/creating-a-mime-email-with-attachment?opendocument&comments
Dim s As New NotesSession
Dim stream As NotesStream
Dim body As NotesMIMEEntity
Dim header As NotesMIMEHeader
Dim StringInBase64 As String
StringInBase64=getbase64() 'your base64 string
Dim db As NotesDatabase
Set db=s.Currentdatabase
Dim tempdoc As NotesDocument
Set tempdoc=db.Createdocument()
Set stream = s.CreateStream
Call stream.WriteText(StringInBase64)
Set body = tempdoc.CreateMIMEEntity
Set header = body.createHeader("content-disposition")
Call header.setHeaderVal({attachment;filename="Onchange.xlsx"}) ' file name and type should be configurable
Call body.SetContentFromText(stream, "", ENC_BASE64)
Call stream.Close
tempdoc.form="Attachment"
Call tempdoc.save(True,False)
This works as expected. Thanks all for time you spent.
Here is the code for Multiple attachments, enhancement from Vijayakumar.
Dim session As New NotesSession
Dim db As NotesDatabase
Dim doc As NotesDocument
Set db = session.CurrentDatabase
Set doc = db.CreateDocument
Dim s As New NotesSession
Dim stream As NotesStream
Dim body As NotesMIMEEntity
Dim child As NotesMimeEntity
Dim header As NotesMIMEHeader
Set body = doc.CreateMIMEEntity
topString = Split(BASE64, ",")
Dim tmp_array() As String
i = 0
For i = 0 To Ubound(topString)
Redim Preserve tmp_array(i)
tmp_array(i) = topString(i)
Set child = body.CreateChildEntity()
Set header = child.CreateHeader("Content-Type")
Call header.SetHeaderVal("multipart/mixed")
Set header =child.createHeader("Content-Disposition")
Call header.setHeaderVal({attachment; filename=test} &Cstr(i)& {.jpg}) 'file name and type should be configure
Set header =child.CreateHeader("Content-ID")
Call header.SetHeaderVal("test" &Cstr(i)& ".jpg")
Set stream = s.CreateStream()
Call stream.WriteText(topString(i))
Call child.SetContentFromText(stream, "", ENC_BASE64)
Next
doc.form="Attachment"
'doc.Attachment = tmp_array
Call doc.save(True,False)
Call stream.Close()
s.ConvertMIME = True ' Restore conversion