Extract HTML comments using C++ std::sregex_token_iterator - html

I'm trying to extract the comments section from HTML source. It is sort of working but not quite.
<html><body>Login Successful!</body><!-- EXTRACT-THIS --></html>
Here's my code so far:
#include <string>
#include <iostream>
#include <sstream>
#include <fstream>
#include <regex>
using namespace std;
int main()
{
string s =
"<html><body>Login Successful!</body><!-- EXTRACT-THIS --></html>";
// Regular expression to extract from HTML comment
// <!-- comment -->
regex r("[<!--\r\n\t][\r\n\t-->]");
for (sregex_token_iterator it = sregex_token_iterator(
s.begin(),
s.end(),
r,
-1);
it != sregex_token_iterator(); ++it)
{
cout << "TOKEN: " << (string) *it << endl;
}
return 0;
}
I guess my main question is that is there a way to improve my regex expression?

Let's start with a std::string that contains more than one comment section:
string s = "<html><body>Login Successful!</body><!-- EXTRACT-THIS --><p>Test</p><!-- XXX --></html>";
Removing the Comments and Printing the HTML tags
If you want to remove the HTML comments from this string, you can do it like this:
regex r("(<\\!--[^>]*-->)");
// split the string using the regular expression
sregex_token_iterator iterator = sregex_token_iterator(s.begin(), s.end(), r, -1);
sregex_token_iterator end;
for (; iterator != end; ++iterator)
{
cout << "TOKEN: " << (string) *iterator << endl;
}
This code prints:
TOKEN: <html><body>Login Successful!</body>
TOKEN: <p>Test</p>
TOKEN: </html>
Removing the HTML Tags and Printing the Comments
If you want to extract the comments from the string, you can use the std::sregex_iterator like this:
regex r("(<\\!--[^>]*-->)");
std::sregex_iterator next(s.begin(), s.end(), r);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
This code prints:
<!-- EXTRACT-THIS -->
<!-- XXX -->
Parsing Comment Tags Manually
Another option is to find and iterate through the opening and closing tags manually. We can use the std::string::find() and std::string::substr() methods:
const std::string OPEN_TAG = "<!--";
const std::string CLOSE_TAG = "-->";
auto posOpen = s.find(OPEN_TAG, 0);
while (posOpen != std::string::npos) {
auto posClose = s.find(CLOSE_TAG, posOpen);
std::cout << s.substr(posOpen, posClose - posOpen + CLOSE_TAG.length()) << '\n';
posOpen = s.find(OPEN_TAG, posClose + CLOSE_TAG.length());
}

Related

Boost library write_json put extra new line at the end

I am learning from http://www.cochoy.fr/boost-property-tree/.
Instead of write_json to stdout, I tried to save it in a string.
std::stringstream ss;
boost::property_tree::json_parser::write_json(ss, oroot, false);
std::cout <<" begin json string" << std::endl;
std::cout << ss.str() << std::endl;
std::cout << "after json string" << std::endl;
output:
begin json string
{"height":"320","some":{"complex":{"path":"bonjour"}},"animals":{"rabbit":"white","dog":"brown","cat":"grey"},"fish":"blue","fish":"yellow","fruits":["apple","raspberry","orange"],"matrix":[["1","2","3"],["4","5","6"],["7","8","9"]]}
after json string
According to the output above, there is a new empty line at the end. How to get rid of the new line? Because with the new line it is not a valid JSON string.
The newline is not explicitly mentioned in the JSON RFC-7159 but it is defined as part of the POSIX standard for a line.
Incase you're interested in where the newline comes from you can take a look at the write_json_internal source code, we can see that there is an stream << std::endl; near the end of the method. Note that ...::write_json references write_json_internal.
// Write ptree to json stream
template<class Ptree>
void write_json_internal(std::basic_ostream<typename Ptree::key_type::value_type> &stream,
const Ptree &pt,
const std::string &filename,
bool pretty)
{
if (!verify_json(pt, 0))
BOOST_PROPERTY_TREE_THROW(json_parser_error("ptree contains data that cannot be represented in JSON format", filename, 0));
write_json_helper(stream, pt, 0, pretty);
stream << std::endl;
if (!stream.good())
BOOST_PROPERTY_TREE_THROW(json_parser_error("write error", filename, 0));
}

c++ File IO, html conversion

Here is a brief discription of the books programming project I am trying to do...
Write a program that reads in a C++ source file and converts all ‘<’ symbols to “<” and all ‘>’ symbols to “>” . Also add the tag <PRE> to the beginning of the file and </PRE> to the end of the file. This tag preserves whitespace and formatting in the HTML document. Your program should create a new file with the converted output. To implement this, you should write a function ‘convert’ that takes the input and output streams as parameters.
I am having issues trying to get the program to work correctly. What's happening is the program will create a new file with .html but it is not converting anything in the file. (i.e. adding <PRE> to the beginning and </PRE> to the end and converting all '<' symbols to &lt and '>' to &gt).
I've been messing with it for a while now and I'm honestly not sure where I am going wrong. I'm super new to programming in general and even more new to c++ so please be nice haha.
Here is my code, any help is greatly appreciated!!
Thanks!
Scott
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <cstdlib>
using namespace std;
// main function
int main() {
// Input file to convert
string filename;
// Output file with .html on the end
string outputname;
char c;
int i;
ifstream inStream;
ofstream outStream;
cout << "Enter filename you woudl like to convert: " << endl;
cin >> filename;
// Open the input file
inStream.open(filename.c_str());
if (inStream.fail()) {
cout << "I/O failure opening file." << endl;
exit(1);
}
// Create the output file
outputname = filename + ".html";
outStream.open(outputname.c_str());
// First, output the <PRE> tag
outStream << "<PRE>" << endl;
// Loop through the input file intil nothing else to get
while (!inStream.eof()) {
inStream.get(c); // Get one character
// Output < or > or original char
if (c == '<') {
outStream << "<";
}
else if (c=='>') {
outStream << ">";
}
else outStream << c;
}
// Output end /PRE tag
outStream << "</PRE>" << endl;
inStream.close();
outStream.close();
cout << "Conversion done. Results are in file " << outputname << endl;
}

qserialport does not send a char to arduino

I'm having a trouble in trying to send a char (i.e. "R") from my qt5 application on WIN7 to comport which is connected to an Arduino.
I intend to blink a led on Arduino and my arduino part works OK.
Here is my qt code:
#include <QTextStream>
#include <QCoreApplication>
#include <QtSerialPort/QSerialPortInfo>
#include <QSerialPort>
#include <iostream>
#include <QtCore>
QT_USE_NAMESPACE
using namespace std;
QSerialPort serial;
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QTextStream out(stdout);
QList<QSerialPortInfo> serialPortInfoList = QSerialPortInfo::availablePorts();
out << QObject::tr("Total number of ports available: ") << serialPortInfoList.count() << endl;
foreach (const QSerialPortInfo &serialPortInfo, serialPortInfoList) {
out << endl
<< QObject::tr("Port: ") << serialPortInfo.portName() << endl
<< QObject::tr("Location: ") << serialPortInfo.systemLocation() << endl
<< QObject::tr("Description: ") << serialPortInfo.description() << endl
<< QObject::tr("Manufacturer: ") << serialPortInfo.manufacturer() << endl
<< QObject::tr("Vendor Identifier: ") << (serialPortInfo.hasVendorIdentifier() ? QByteArray::number(serialPortInfo.vendorIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Product Identifier: ") << (serialPortInfo.hasProductIdentifier() ? QByteArray::number(serialPortInfo.productIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Busy: ") << (serialPortInfo.isBusy() ? QObject::tr("Yes") : QObject::tr("No")) << endl;
}
serial.setPortName("COM5");
serial.open(QIODevice::ReadWrite);
serial.setBaudRate(QSerialPort::Baud9600);
serial.setDataBits(QSerialPort::Data8);
serial.setParity(QSerialPort::NoParity);
serial.setStopBits(QSerialPort::OneStop);
serial.setFlowControl(QSerialPort::NoFlowControl);
if(!serial.isOpen())
{
std::cout<<"port is not open"<<endl;
//serial.open(QIODevice::ReadWrite);
}
if(serial.isWritable()==true)
{
std::cout<<"port writable..."<<endl;
}
QByteArray data("R");
serial.write(data);
serial.flush();
std::cout<<"value sent!!! "<<std::endl;
serial.close();
return 0;
}
My source code consists of two parts,
1- serialportinfolist .... which works just fine
2- opening and writing data... I get no issue when running the code and the display shows the result as if nothing has gone wrong!
HOWEVER, the led on the board does not turn on when I run this code.
I test this with Arduino Serial Monitor and it turns on but cant turn on from Qt.
Are you waiting for cr lf (0x0D 0x0A) in your arduino code?
QByteArray ba;
ba.resize(3);
ba[0] = 0x5c; //'R'
ba[1] = 0x0d;
ba[2] = 0x0a;
Or append it to your string with
QByteArray data("R\r\n");
Or
QByteArray data("R\n");
I think I have found a partial solution but it is still incomplete.
When I press debug the first time, qt does not send any signal to Arduino, but when I press debug for the second time it behaves as expected.
So, is'nt it so weird that one has to run it twice to get it working???
Let me know if the problem exists somewhere else,
any help...

std::find with type T** vs T*[N]

I prefer to work with std::string but I like to figure out what is going wrong here.
I am unable to understand out why std::find isn't working properly for type T** even though pointer arithmetic works on them correctly. Like -
std::cout << *(argv+1) << "\t" <<*(argv+2) << std::endl;
But it works fine, for the types T*[N].
#include <iostream>
#include <algorithm>
int main( int argc, const char ** argv )
{
std::cout << *(argv+1) << "\t" <<*(argv+2) << std::endl;
const char ** cmdPtr = std::find(argv+1, argv+argc, "Hello") ;
const char * testAr[] = { "Hello", "World" };
const char ** testPtr = std::find(testAr, testAr+2, "Hello");
if( cmdPtr == argv+argc )
std::cout << "String not found" << std::endl;
if( testPtr != testAr+2 )
std::cout << "String found: " << *testPtr << std::endl;
return 0;
}
Arguments passed: Hello World
Output:
Hello World
String not found
String found: Hello
Thanks.
Comparing types of char const* amounts to pointing to the addresses. The address of "Hello" is guaranteed to be different unless you compare it to another address of the string literal "Hello" (in which case the pointers may compare equal). Your compare() function compares the characters being pointed to.
In the first case, you're comparing the pointer values themselves and not what they're pointing to. And the constant "Hello" doesn't have the same address as the first element of argv.
Try using:
const char ** cmdPtr = std::find(argv+1, argv+argc, std::string("Hello")) ;
std::string knows to compare contents and not addresses.
For the array version, the compiler can fold all literals into a single one, so every time "Hello" is seen throughout the code it's really the same pointer. Thus, comparing for equality in
const char * testAr[] = { "Hello", "World" };
const char ** testPtr = std::find(testAr, testAr+2, "Hello");
yields the correct result

Parsing with Boost::Spirit (V2.4) into container

I just started to dig into Boost::Spirit, latest version by now -- V2.4.
The essense of my problem is following:
I would like to parse strings like "1a2" or "3b4".
So the rule I use is:
(double_ >> lit('b') >> double_)
| (double_ >> lit('a') >> double_);
The attribute of the rule must be "vector <double>". And I'm reading it into the container.
The complete code:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <cstring>
int main(int argc, char * argv[])
{
using namespace std;
using namespace boost::spirit;
using namespace boost::spirit::qi;
using boost::phoenix::arg_names::arg1;
char const * first = "1a2";
char const * last = first + std::strlen(first);
vector<double> h;
rule<char const *, vector<double>()> or_test;
or_test %= (double_ >> lit('b') >> double_)
| (double_ >> lit('a') >> double_);
if (parse(first, last, or_test,h)) {
cout << "parse success: ";
for_each(h.begin(), h.end(), (cout << arg1 << " "));
cout << "end\n";
} else cout << "parse error\n" << endl;
return 0;
}
I'm compiling it with g++ 4.4.3. And it returns "1 1 2". While I expect "1 2".
As far as I understand this happens because parser:
goes to the first alternative
reads a double_ and stores it in the container
then stops at "a", while expecting lit("b")
goes to the second alternative
reads two more doubles
My question is -- Is this a correct behavior, and if yes -- why?
That's expected behavior. During backtracking Spirit does not 'unmake' changes to attributes. Therefore, you should use the hold[] directive explicitly forcing the parser to hold on to a copy of the attribute (allowing to roll back any attribute change):
or_test =
hold[double_ >> lit('b') >> double_)]
| (double_ >> lit('a') >> double_)
;
This directive needs to be applied to all alternatives modifying the attribute, except the last one.