c++ File IO, html conversion - html

Here is a brief discription of the books programming project I am trying to do...
Write a program that reads in a C++ source file and converts all ‘<’ symbols to “<” and all ‘>’ symbols to “>” . Also add the tag <PRE> to the beginning of the file and </PRE> to the end of the file. This tag preserves whitespace and formatting in the HTML document. Your program should create a new file with the converted output. To implement this, you should write a function ‘convert’ that takes the input and output streams as parameters.
I am having issues trying to get the program to work correctly. What's happening is the program will create a new file with .html but it is not converting anything in the file. (i.e. adding <PRE> to the beginning and </PRE> to the end and converting all '<' symbols to &lt and '>' to &gt).
I've been messing with it for a while now and I'm honestly not sure where I am going wrong. I'm super new to programming in general and even more new to c++ so please be nice haha.
Here is my code, any help is greatly appreciated!!
Thanks!
Scott
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <cstdlib>
using namespace std;
// main function
int main() {
// Input file to convert
string filename;
// Output file with .html on the end
string outputname;
char c;
int i;
ifstream inStream;
ofstream outStream;
cout << "Enter filename you woudl like to convert: " << endl;
cin >> filename;
// Open the input file
inStream.open(filename.c_str());
if (inStream.fail()) {
cout << "I/O failure opening file." << endl;
exit(1);
}
// Create the output file
outputname = filename + ".html";
outStream.open(outputname.c_str());
// First, output the <PRE> tag
outStream << "<PRE>" << endl;
// Loop through the input file intil nothing else to get
while (!inStream.eof()) {
inStream.get(c); // Get one character
// Output < or > or original char
if (c == '<') {
outStream << "<";
}
else if (c=='>') {
outStream << ">";
}
else outStream << c;
}
// Output end /PRE tag
outStream << "</PRE>" << endl;
inStream.close();
outStream.close();
cout << "Conversion done. Results are in file " << outputname << endl;
}

Related

Boost library write_json put extra new line at the end

I am learning from http://www.cochoy.fr/boost-property-tree/.
Instead of write_json to stdout, I tried to save it in a string.
std::stringstream ss;
boost::property_tree::json_parser::write_json(ss, oroot, false);
std::cout <<" begin json string" << std::endl;
std::cout << ss.str() << std::endl;
std::cout << "after json string" << std::endl;
output:
begin json string
{"height":"320","some":{"complex":{"path":"bonjour"}},"animals":{"rabbit":"white","dog":"brown","cat":"grey"},"fish":"blue","fish":"yellow","fruits":["apple","raspberry","orange"],"matrix":[["1","2","3"],["4","5","6"],["7","8","9"]]}
after json string
According to the output above, there is a new empty line at the end. How to get rid of the new line? Because with the new line it is not a valid JSON string.
The newline is not explicitly mentioned in the JSON RFC-7159 but it is defined as part of the POSIX standard for a line.
Incase you're interested in where the newline comes from you can take a look at the write_json_internal source code, we can see that there is an stream << std::endl; near the end of the method. Note that ...::write_json references write_json_internal.
// Write ptree to json stream
template<class Ptree>
void write_json_internal(std::basic_ostream<typename Ptree::key_type::value_type> &stream,
const Ptree &pt,
const std::string &filename,
bool pretty)
{
if (!verify_json(pt, 0))
BOOST_PROPERTY_TREE_THROW(json_parser_error("ptree contains data that cannot be represented in JSON format", filename, 0));
write_json_helper(stream, pt, 0, pretty);
stream << std::endl;
if (!stream.good())
BOOST_PROPERTY_TREE_THROW(json_parser_error("write error", filename, 0));
}

Extract HTML comments using C++ std::sregex_token_iterator

I'm trying to extract the comments section from HTML source. It is sort of working but not quite.
<html><body>Login Successful!</body><!-- EXTRACT-THIS --></html>
Here's my code so far:
#include <string>
#include <iostream>
#include <sstream>
#include <fstream>
#include <regex>
using namespace std;
int main()
{
string s =
"<html><body>Login Successful!</body><!-- EXTRACT-THIS --></html>";
// Regular expression to extract from HTML comment
// <!-- comment -->
regex r("[<!--\r\n\t][\r\n\t-->]");
for (sregex_token_iterator it = sregex_token_iterator(
s.begin(),
s.end(),
r,
-1);
it != sregex_token_iterator(); ++it)
{
cout << "TOKEN: " << (string) *it << endl;
}
return 0;
}
I guess my main question is that is there a way to improve my regex expression?
Let's start with a std::string that contains more than one comment section:
string s = "<html><body>Login Successful!</body><!-- EXTRACT-THIS --><p>Test</p><!-- XXX --></html>";
Removing the Comments and Printing the HTML tags
If you want to remove the HTML comments from this string, you can do it like this:
regex r("(<\\!--[^>]*-->)");
// split the string using the regular expression
sregex_token_iterator iterator = sregex_token_iterator(s.begin(), s.end(), r, -1);
sregex_token_iterator end;
for (; iterator != end; ++iterator)
{
cout << "TOKEN: " << (string) *iterator << endl;
}
This code prints:
TOKEN: <html><body>Login Successful!</body>
TOKEN: <p>Test</p>
TOKEN: </html>
Removing the HTML Tags and Printing the Comments
If you want to extract the comments from the string, you can use the std::sregex_iterator like this:
regex r("(<\\!--[^>]*-->)");
std::sregex_iterator next(s.begin(), s.end(), r);
std::sregex_iterator end;
while (next != end) {
std::smatch match = *next;
std::cout << match.str() << "\n";
next++;
}
This code prints:
<!-- EXTRACT-THIS -->
<!-- XXX -->
Parsing Comment Tags Manually
Another option is to find and iterate through the opening and closing tags manually. We can use the std::string::find() and std::string::substr() methods:
const std::string OPEN_TAG = "<!--";
const std::string CLOSE_TAG = "-->";
auto posOpen = s.find(OPEN_TAG, 0);
while (posOpen != std::string::npos) {
auto posClose = s.find(CLOSE_TAG, posOpen);
std::cout << s.substr(posOpen, posClose - posOpen + CLOSE_TAG.length()) << '\n';
posOpen = s.find(OPEN_TAG, posClose + CLOSE_TAG.length());
}

Writing and reading strange letters to a txt-file using libcurl

I have a c++ program that downloads the HTML-code of a webpage and saves it as a text file using the library LibCurl. The problem is, that we have the following strange letters in danish alphabet: Æ æ Ø ø Å å.
When I try to read the HTML-code line by line, all these characters look like "�". I have tried to read/write the file as wide characters. That did not work.
I have also tried to write a sentence containing "æ", "ø" og "å" to another text file and read it again. That, for some reason, worked.
So my question is, why does the strange letters look like "�" when they are in the downloaded HTML-code but not when I write my own sentence? And how do I fix the HTML-output?
My code is as follows:
#include <iostream>
#include <string>
#include<fstream>
#include<curl/curl.h>
using namespace std;
static size_t write_data (string * ptr, size_t size, size_t nmemb,void *stream)
{
size_t written = fwrite (ptr, size, nmemb, (FILE *) stream);
cout << static_cast <const void *> (ptr);
return written;
//string myString (ptr, nbytes);
}
//string myString (ptr, nbytes);
int main ()
{
// Writing weird characters to a text-file!
ofstream myfile;
myfile.open ("example.txt");
myfile << "Print strange letter: æ ø og å!";
myfile.close ();
// Reading weird characters from the text-file!
std::ifstream wif ("example.txt");
if (wif.is_open ())
{
std::string wline;
while (std::getline (wif, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open file" << endl ;
wif.close ();
cout << endl << endl;
// Download the HTML-code from a webpage and save it.
CURL *curl_handle;
static const char *pagefilename = "example2.txt";
const char *charUrl = "http://politiken.dk/forbrugogliv/sundhedogmotion/ECE3406716/mindst-85000-offentligt-ansatte-maa-slet-ikke-ryge-i-arbejdstiden/"; // An article from the danish newspaper "Politiken"
FILE *pagefile;
curl_global_init (CURL_GLOBAL_ALL);
curl_handle = curl_easy_init ();
curl_easy_setopt (curl_handle, CURLOPT_URL, charUrl); // HERE IS THE URL PASSED!
curl_easy_setopt (curl_handle, CURLOPT_VERBOSE, 0L);
curl_easy_setopt (curl_handle, CURLOPT_NOPROGRESS, 0L);
curl_easy_setopt (curl_handle, CURLOPT_WRITEFUNCTION, write_data);
pagefile = fopen (pagefilename, "wb");
if (pagefile)
{
curl_easy_setopt (curl_handle, CURLOPT_WRITEDATA, pagefile);
curl_easy_perform (curl_handle);
fclose (pagefile);
}
curl_easy_cleanup (curl_handle);
//Reading the HTML-code
ifstream webIn ("example2.txt");
if (webIn.is_open ())
{
std::string wline;
while (getline (webIn, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open example.txt" << endl;
return 0;
}
And my output is
Print strange letter: æ ø og å!
>>HTML-CODE CONTAINING "�" instead of æ ø å <<
I don't know if it is relevant, but my OS is Linux Mint 17.3 and I've set the language and region og my system to "English, Denmark UTF-8".
Thanks in advance! I will really appreciate any help or hints :)

qserialport does not send a char to arduino

I'm having a trouble in trying to send a char (i.e. "R") from my qt5 application on WIN7 to comport which is connected to an Arduino.
I intend to blink a led on Arduino and my arduino part works OK.
Here is my qt code:
#include <QTextStream>
#include <QCoreApplication>
#include <QtSerialPort/QSerialPortInfo>
#include <QSerialPort>
#include <iostream>
#include <QtCore>
QT_USE_NAMESPACE
using namespace std;
QSerialPort serial;
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QTextStream out(stdout);
QList<QSerialPortInfo> serialPortInfoList = QSerialPortInfo::availablePorts();
out << QObject::tr("Total number of ports available: ") << serialPortInfoList.count() << endl;
foreach (const QSerialPortInfo &serialPortInfo, serialPortInfoList) {
out << endl
<< QObject::tr("Port: ") << serialPortInfo.portName() << endl
<< QObject::tr("Location: ") << serialPortInfo.systemLocation() << endl
<< QObject::tr("Description: ") << serialPortInfo.description() << endl
<< QObject::tr("Manufacturer: ") << serialPortInfo.manufacturer() << endl
<< QObject::tr("Vendor Identifier: ") << (serialPortInfo.hasVendorIdentifier() ? QByteArray::number(serialPortInfo.vendorIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Product Identifier: ") << (serialPortInfo.hasProductIdentifier() ? QByteArray::number(serialPortInfo.productIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Busy: ") << (serialPortInfo.isBusy() ? QObject::tr("Yes") : QObject::tr("No")) << endl;
}
serial.setPortName("COM5");
serial.open(QIODevice::ReadWrite);
serial.setBaudRate(QSerialPort::Baud9600);
serial.setDataBits(QSerialPort::Data8);
serial.setParity(QSerialPort::NoParity);
serial.setStopBits(QSerialPort::OneStop);
serial.setFlowControl(QSerialPort::NoFlowControl);
if(!serial.isOpen())
{
std::cout<<"port is not open"<<endl;
//serial.open(QIODevice::ReadWrite);
}
if(serial.isWritable()==true)
{
std::cout<<"port writable..."<<endl;
}
QByteArray data("R");
serial.write(data);
serial.flush();
std::cout<<"value sent!!! "<<std::endl;
serial.close();
return 0;
}
My source code consists of two parts,
1- serialportinfolist .... which works just fine
2- opening and writing data... I get no issue when running the code and the display shows the result as if nothing has gone wrong!
HOWEVER, the led on the board does not turn on when I run this code.
I test this with Arduino Serial Monitor and it turns on but cant turn on from Qt.
Are you waiting for cr lf (0x0D 0x0A) in your arduino code?
QByteArray ba;
ba.resize(3);
ba[0] = 0x5c; //'R'
ba[1] = 0x0d;
ba[2] = 0x0a;
Or append it to your string with
QByteArray data("R\r\n");
Or
QByteArray data("R\n");
I think I have found a partial solution but it is still incomplete.
When I press debug the first time, qt does not send any signal to Arduino, but when I press debug for the second time it behaves as expected.
So, is'nt it so weird that one has to run it twice to get it working???
Let me know if the problem exists somewhere else,
any help...

Parsing with Boost::Spirit (V2.4) into container

I just started to dig into Boost::Spirit, latest version by now -- V2.4.
The essense of my problem is following:
I would like to parse strings like "1a2" or "3b4".
So the rule I use is:
(double_ >> lit('b') >> double_)
| (double_ >> lit('a') >> double_);
The attribute of the rule must be "vector <double>". And I'm reading it into the container.
The complete code:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <algorithm>
#include <string>
#include <vector>
#include <cstring>
int main(int argc, char * argv[])
{
using namespace std;
using namespace boost::spirit;
using namespace boost::spirit::qi;
using boost::phoenix::arg_names::arg1;
char const * first = "1a2";
char const * last = first + std::strlen(first);
vector<double> h;
rule<char const *, vector<double>()> or_test;
or_test %= (double_ >> lit('b') >> double_)
| (double_ >> lit('a') >> double_);
if (parse(first, last, or_test,h)) {
cout << "parse success: ";
for_each(h.begin(), h.end(), (cout << arg1 << " "));
cout << "end\n";
} else cout << "parse error\n" << endl;
return 0;
}
I'm compiling it with g++ 4.4.3. And it returns "1 1 2". While I expect "1 2".
As far as I understand this happens because parser:
goes to the first alternative
reads a double_ and stores it in the container
then stops at "a", while expecting lit("b")
goes to the second alternative
reads two more doubles
My question is -- Is this a correct behavior, and if yes -- why?
That's expected behavior. During backtracking Spirit does not 'unmake' changes to attributes. Therefore, you should use the hold[] directive explicitly forcing the parser to hold on to a copy of the attribute (allowing to roll back any attribute change):
or_test =
hold[double_ >> lit('b') >> double_)]
| (double_ >> lit('a') >> double_)
;
This directive needs to be applied to all alternatives modifying the attribute, except the last one.