LibXML C++ XPathEval Errors - html

For starters, I'm seeing two types of problems with my the functionality of the code. I can't seem to find the correct element with the function xmlXPathEvalExpression. In addition, I am receiving errors similar to:
HTML parser error : Unexpected end tag : a
This happens for what appears to be all tags in the page.
For some background, the HTML is fetched by CURL and fed into the parsing function immediately after. For the sake of debugging, the return statements have been replaced with printf.
std::string cleanHTMLDoc(std::string &aDoc, std::string &symbolString) {
std::string ctxtID = "//span[id='" + symbolString + "']";
htmlDocPtr doc = htmlParseDoc((xmlChar*) aDoc.c_str(), NULL);
xmlXPathContextPtr context = xmlXPathNewContext(doc);
xmlXPathObjectPtr result = xmlXPathEvalExpression((xmlChar*) ctxtID.c_str(), context);
if (xmlXPathNodeSetIsEmpty(result->nodesetval)) {
xmlXPathFreeObject(result);
xmlXPathFreeContext(context);
xmlFreeDoc(doc);
printf("[ERR] Invalid XPath\n");
return "";
}
else {
int size = result->nodesetval->nodeNr;
for (int i = size - 1; i >= 0; --i) {
printf("[DBG] %s\n", result->nodesetval->nodeTab[i]->name);
}
return "";
}
}
The parameter aDoc contains the HTML of the page, and symbolString contains the id of the item we're looking for; in this case yfs_l84_aapl. I have verified that this is an element on the page in the style span[id='yfs_l84_aapl'] or <span id="yfs_l84_aapl">.
From what I've read, the errors fed out of the HTML Parser are due to a lack of a namespace, but when attempting to use the XHTML namespace, I've received the same error. When instead using htmlParseChunk to write out the DOM tree, I do not receive these errors due to options such as HTML_PARSE_NOERROR. However, the htmlParseDoc does not accept these options.
For the sake of information, I am compiling with Visual Studio 2015 and have successfully compiled and executed programs with this library before. My apologies for the poorly formatted code. I recently switched from writing Java in Eclipse.
Any help would be greatly appreciated!
[Edit]
It's not a pretty answer, but I made what I was looking to do work. Instead of looking through the DOM by my (assumed) incorrect XPath expression, I moved through tag by tag to end up where I needed to be, and hard-coded in the correct entry in the nodeTab attribute of the nodeSet.
The code is as follows:
std::string StockIO::cleanHTMLDoc(std::string htmlInput) {
std::string ctxtID = "/html/body/div/div/div/div/div/div/div/div/span/span";
xmlChar* xpath = (xmlChar*) ctxtID.c_str();
htmlDocPtr doc = htmlParseDoc((xmlChar*) htmlInput.c_str(), NULL);
xmlXPathContextPtr context = xmlXPathNewContext(doc);
xmlXPathObjectPtr result = xmlXPathEvalExpression(xpath, context);
if (xmlXPathNodeSetIsEmpty(result->nodesetval)) {
xmlXPathFreeObject(result);
xmlXPathFreeContext(context);
xmlFreeDoc(doc);
printf("[ERR] Invalid XPath\n");
return "";
}
else {
xmlNodeSetPtr nodeSet = result->nodesetval;
xmlNodePtr nodePtr = nodeSet->nodeTab[1];
return (char*) xmlNodeListGetString(doc, nodePtr->children, 1);
}
}
I will leave this question open in hopes that someone will help elaborate upon what I did wrong in setting up my XPath expression.

Related

Write a program to load Item Manufacturer tab and print each row details in a text/csv file

im new to oracle agile plm can you solve me this.I refer Agile sdk guide but it not at all solving can you make this efficient search
In many documentations, you may not find code samples of every feature that there is. Instead, you are provided with one sample code which can be used with little modification for similar scenarios. In this case, you have code sample for reading the BOM table as below:
private void printBOM(IItem item, int level) throws APIException {
ITable bom = item.getTable(ItemConstants.TABLE_BOM);
Iterator i = bom.getReferentIterator();
while (i.hasNext()) {
IItem bomItem = (IItem)i.next();
System.out.print(indent(level));
System.out.println(bomItem.getName());
printBOM(bomItem, level + 1);
}
}
private String indent(int level) {
if (level <= 0) {
return "";
}
char c[] = new char[level*2];
Arrays.fill(c, ' ');
return new String(c);
}
All you need to do is change TABLE_BOM to TABLE_MANUFACTURERS, update relevant attribute names and fetch data from required cells.
Hope this helps.
Also, here's the link for latest documentation: SDK Developer Guide

MFC: Using CHtmlView with memory string via about: or data:?

I am trying out CHtmlView to display html from memory variables. After having dealt with the various exceptions you get in debug mode, have it working for very small strings via the about: uri.
Example:
Navigate(_T("about:<html><head></head><body>Hello</body></html>"))
works for small items but not larger strings. Does anyone know the documented limitation for about: ?
Now I found a new item that supposed to be available for IE, the data: entry, but when I try
Navigate(_T("data:text/html, <html><head></head><body>Hello</body></html>"))
It doesn't work, comes up with the fancy webpage can't be displayed page. Does anyone know why CHtmlView doesn't support data: and if there is any other trick that can be used to use memory variable data for html display in CHtmlView?
One option for setting HTML content directly, is to read from memory using IStream
MFC's CHtmlEditCtrl uses a similar method to set document html content, except MFC uses CStreamOnCString.
You may need to set the content to UTF8 for compatibility. To use UTF8,
change CString to CStringA in the code below, and pass UTF8 string to the function SetHTMLContent(htmlview, u8"<html>...")
HRESULT SetHTMLContent(CHtmlView* htmlview, CString html)
{
if(!html.GetLength()) return E_FAIL;
CComPtr<IDispatch> disp = htmlview->GetHtmlDocument();
if(!disp)
{
//not initialized, try again
htmlview->Navigate(_T("about:"));
disp = htmlview->GetHtmlDocument();
if(!disp)
return E_NOINTERFACE;
}
CComQIPtr<IHTMLDocument2> doc2 = disp;
if(!doc2) return E_NOINTERFACE;
int charsize = sizeof(html.GetAt(0));
IStream *istream = SHCreateMemStream(
reinterpret_cast<const BYTE*>(html.GetBuffer()), charsize * html.GetLength());
HRESULT hr = E_FAIL;
if(istream)
{
CComQIPtr<IPersistStreamInit> psi = doc2;
if(psi)
hr = psi->Load(istream);
istream->Release();
}
html.ReleaseBuffer();
return hr;
}
Usage:
CString str = _T("<html><head></head><body>Hello</body></html>");
SetHTMLContent(m_chtmlview, str);

JSON Parse error: Unrecognized token '<'

I am getting this error (as per Safari's Web inspector) but I cannot see why. Most reports of this error suggest that it is reading a HTML tag somewhere ... but I cannot see it.
var oReq = new XMLHttpRequest(); //New request object
oReq.onload = function() {
document.getElementById("myConsole").innerHTML = this.responseText;
myData = JSON.parse(this.responseText);
...
The third line of code dumps the responseText onto my webpage (into a DIV called 'myConsole'). This shows what I believe to be standard JSON code ... and contains no '<' characters.
The second line of code tries to parse the responseText and give the '<' token error.
The php data source looks like this:
$rowCount = 0;
do { $rowCount += 1;
$dbCurrentRow = $resultSet->fetch_assoc();
$seats[$rowCount]['room'] = $dbCurrentRow['Room'];
$seats[$rowCount]['seat'] = $dbCurrentRow['Seat'] * 1;
$seats[$rowCount]['x'] = $dbCurrentRow['x'] * 1;
$seats[$rowCount]['y'] = $dbCurrentRow['y'] * 1;
$seats[$rowCount]['name'] = "Joe Bloggs";
$seats[$rowCount]['adno'] = "01234";
$seats[$rowCount]['ev6'] = true;
$seats[$rowCount]['eal'] = true;
$seats[$rowCount]['dpLast'] = "LS";
$seats[$rowCount]['dpCurrent'] = "WA";
$seats[$rowCount]['dpTarget'] = "TG";
$seats[$rowCount]['ma'] = 2 * 1;
} while ($rowCount < $resultSet->num_rows);
echo json_encode($seats);
and the JSON output is this:
{"1":{"room":"35","seat":1,"x":0,"y":0,"name":"Joe
Bloggs","adno":"01234","ev6":true,"eal":true,"dpLast":"LS","dpCurrent":"WA","dpTarget":"TG","ma":2},"2":{"room":"35","seat":2,"x":30,"y":60,"name":"Joe
Bloggs","adno":"01234","ev6":true,"eal":true,"dpLast":"LS","dpCurrent":"WA","dpTarget":"TG","ma":2},"3":{"room":"35","seat":3,"x":60,"y":0,"name":"Joe
Bloggs","adno":"01234","ev6":true,"eal":true,"dpLast":"LS","dpCurrent":"WA","dpTarget":"TG","ma":2},"4":{"room":"35","seat":4,"x":90,"y":90,"name":"Joe
Bloggs","adno":"01234","ev6":true,"eal":true,"dpLast":"LS","dpCurrent":"WA","dpTarget":"TG","ma":2}}
I do not believe it to be a server timing issue since it 'myConsole' dump precedes the error and works fine. It does not look like the JSON is faulty even with a 2d array. The strange thing is if I take the JSON output and save it as 'testDataSample.php' and link my main page directly to it then the same output works flawlessly.
//oReq.open("get", "testDataSample.php", false); //Text JSON output works fine
oReq.open("get", "getData.php", false); // Live from Server ... '<' error
oReq.send();
Any suggestions as to what is wrong, or how I would track this down would be most welcome.
Thank you.
Thank you raghav710 :-)
The console log showed it ... I had some comments at the top of the dataSource.php file which were being included in the echo.
Writing this to my web page ... they were ignored and invisible ... which means I could not see them, and could not see the difference between the two outputs; parsing the comments JSON caused the choke.
I have removed all of the comments at the top of my datasource.php and it work instantly.
Thank you again.

aws-sdk-cpp: how to use CurlHttpClient?

I need to make signed requests to AWS ES, but am stuck at the first hurdle in that I cannot seem to be able to use CurlHttpClient. Here is my code (verb, path, and body defined elsewhere):
Aws::Client::ClientConfiguration clientConfiguration;
clientConfiguration.scheme = Aws::Http::Scheme::HTTPS;
clientConfiguration.region = Aws::Region::US_EAST_1;
auto client = Aws::MakeShared<Aws::Http::CurlHttpClient>(ALLOCATION_TAG, clientConfiguration);
Aws::Http::URI uri;
uri.SetScheme(Aws::Http::Scheme::HTTPS);
uri.SetAuthority(ELASTIC_SEARCH_DOMAIN);
uri.SetPath(path);
Aws::Http::Standard::StandardHttpRequest req(uri, verb);
req.AddContentBody(body);
auto res = client->MakeRequest(req);
Aws::Http::HttpResponseCode resCode = res->GetResponseCode();
if (resCode == Aws::Http::HttpResponseCode::OK) {
Aws::IOStream &body = res->GetResponseBody();
rejoiceAndBeMerry();
}
else {
gotoPanicStations();
}
When executed, the code throws a bad_function_call deep from within the sdk mixed up with a lot of shared_ptr this and allocate that. My guess is that I am just using the SDK wrong, but I've been unable to find any examples that use the CurlHttpClient directly such as I need to do here.
How can I use CurlHttpClient?
You shouldn't be using the HTTP client directly, but the supplied wrappers with the aws-cpp-sdk-es package. Like previous answer(s), I would recommend evaluating the test cases shipped with the library to see how the original authors intended to implement the API (at least until the documents catch-up).
How can I use CurlHttpClient?
Your on the right track with managed shared resources and helper functions. Just need to create a static factory/client to reference. Here's a generic example.
using namespace Aws::Client;
using namespace Aws::Http;
static std::shared_ptr<HttpClientFactory> MyClientFactory; // My not be needed
static std::shared_ptr<HttpClient> MyHttpClient;
// ... jump ahead to method body ...
ClientConfiguration clientConfiguration;
MyHttpClient = CreateHttpClient(clientConfiguration);
Aws::String uri("https://example.org");
std::shared_ptr<HttpRequest> req(
CreateHttpRequest(uri,
verb, // i.e. HttpMethod::HTTP_POST
Utils::Stream::DefaultResponseStreamFactoryMethod));
req.AddContentBody(body); //<= remember `body' should be `std::shared_ptr<Aws::IOStream>',
// and can be created with `Aws::MakeShared<Aws::StringStream>("")';
req.SetContentLength(body_size);
req.SetContentType(body_content_type);
std::shared_ptr<HttpResponse> res = MyHttpClient->MakeRequest(*req);
HttpResponseCode resCode = res->GetResponseCode();
if (resCode == HttpResponseCode::OK) {
Aws::StringStream resBody;
resBody << res->GetResponseBody().rdbuf();
rejoiceAndBeMerry();
} else {
gotoPanicStations();
}
I encountered exactly the same error when trying to download from S3 using CurlHttpClient.
I fixed it by instead modelling my code after the integration test found in the cpp sdk:
aws-sdk-cpp/aws-cpp-sdk-s3-integration-tests/BucketAndObjectOperationTest.cpp
Search for the test called TestObjectOperationsWithPresignedUrls.

How to find specific value in a large object in node.js?

Actually I've parsed a website using htmlparser and I would like to find a specific value inside the parsed object, for example, a string "$199", and keep tracking that element(by periodic parsing) to see the value is still "$199" or has changed.
And after some painful stupid searching using my eyes, I found the that string is located at somewhere like this:
price = handler.dom[3].children[3].children[3].children[5].children[1].
children[3].children[3].children[5].children[0].children[0].raw;
So I'd like to know whether there are methods which are less painful? Thanks!
A tree based recursive search would probably be easiest to get the node you're interested in.
I've not used htmlparser and the documentation seems a little thin, so this is just an example to get you started and is not tested:
function getElement(el,val) {
if (el.children && el.children.length > 0) {
for (var i = 0, l = el.children.length; i<l; i++) {
var r = getElement(el.children[i],val);
if (r) return r;
}
} else {
if (el.raw == val) {
return el;
}
}
return null;
}
Call getElement(handler.dom[3],'$199') and it'll go through all the children recursively until it finds an element without an children and then compares it's raw value with '$199'. Note this is a straight comparison, you might want to swap this for a regexp or similar?