I have a c++ program that downloads the HTML-code of a webpage and saves it as a text file using the library LibCurl. The problem is, that we have the following strange letters in danish alphabet: Æ æ Ø ø Å å.
When I try to read the HTML-code line by line, all these characters look like "�". I have tried to read/write the file as wide characters. That did not work.
I have also tried to write a sentence containing "æ", "ø" og "å" to another text file and read it again. That, for some reason, worked.
So my question is, why does the strange letters look like "�" when they are in the downloaded HTML-code but not when I write my own sentence? And how do I fix the HTML-output?
My code is as follows:
#include <iostream>
#include <string>
#include<fstream>
#include<curl/curl.h>
using namespace std;
static size_t write_data (string * ptr, size_t size, size_t nmemb,void *stream)
{
size_t written = fwrite (ptr, size, nmemb, (FILE *) stream);
cout << static_cast <const void *> (ptr);
return written;
//string myString (ptr, nbytes);
}
//string myString (ptr, nbytes);
int main ()
{
// Writing weird characters to a text-file!
ofstream myfile;
myfile.open ("example.txt");
myfile << "Print strange letter: æ ø og å!";
myfile.close ();
// Reading weird characters from the text-file!
std::ifstream wif ("example.txt");
if (wif.is_open ())
{
std::string wline;
while (std::getline (wif, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open file" << endl ;
wif.close ();
cout << endl << endl;
// Download the HTML-code from a webpage and save it.
CURL *curl_handle;
static const char *pagefilename = "example2.txt";
const char *charUrl = "http://politiken.dk/forbrugogliv/sundhedogmotion/ECE3406716/mindst-85000-offentligt-ansatte-maa-slet-ikke-ryge-i-arbejdstiden/"; // An article from the danish newspaper "Politiken"
FILE *pagefile;
curl_global_init (CURL_GLOBAL_ALL);
curl_handle = curl_easy_init ();
curl_easy_setopt (curl_handle, CURLOPT_URL, charUrl); // HERE IS THE URL PASSED!
curl_easy_setopt (curl_handle, CURLOPT_VERBOSE, 0L);
curl_easy_setopt (curl_handle, CURLOPT_NOPROGRESS, 0L);
curl_easy_setopt (curl_handle, CURLOPT_WRITEFUNCTION, write_data);
pagefile = fopen (pagefilename, "wb");
if (pagefile)
{
curl_easy_setopt (curl_handle, CURLOPT_WRITEDATA, pagefile);
curl_easy_perform (curl_handle);
fclose (pagefile);
}
curl_easy_cleanup (curl_handle);
//Reading the HTML-code
ifstream webIn ("example2.txt");
if (webIn.is_open ())
{
std::string wline;
while (getline (webIn, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open example.txt" << endl;
return 0;
}
And my output is
Print strange letter: æ ø og å!
>>HTML-CODE CONTAINING "�" instead of æ ø å <<
I don't know if it is relevant, but my OS is Linux Mint 17.3 and I've set the language and region og my system to "English, Denmark UTF-8".
Thanks in advance! I will really appreciate any help or hints :)
Related
I am learning from http://www.cochoy.fr/boost-property-tree/.
Instead of write_json to stdout, I tried to save it in a string.
std::stringstream ss;
boost::property_tree::json_parser::write_json(ss, oroot, false);
std::cout <<" begin json string" << std::endl;
std::cout << ss.str() << std::endl;
std::cout << "after json string" << std::endl;
output:
begin json string
{"height":"320","some":{"complex":{"path":"bonjour"}},"animals":{"rabbit":"white","dog":"brown","cat":"grey"},"fish":"blue","fish":"yellow","fruits":["apple","raspberry","orange"],"matrix":[["1","2","3"],["4","5","6"],["7","8","9"]]}
after json string
According to the output above, there is a new empty line at the end. How to get rid of the new line? Because with the new line it is not a valid JSON string.
The newline is not explicitly mentioned in the JSON RFC-7159 but it is defined as part of the POSIX standard for a line.
Incase you're interested in where the newline comes from you can take a look at the write_json_internal source code, we can see that there is an stream << std::endl; near the end of the method. Note that ...::write_json references write_json_internal.
// Write ptree to json stream
template<class Ptree>
void write_json_internal(std::basic_ostream<typename Ptree::key_type::value_type> &stream,
const Ptree &pt,
const std::string &filename,
bool pretty)
{
if (!verify_json(pt, 0))
BOOST_PROPERTY_TREE_THROW(json_parser_error("ptree contains data that cannot be represented in JSON format", filename, 0));
write_json_helper(stream, pt, 0, pretty);
stream << std::endl;
if (!stream.good())
BOOST_PROPERTY_TREE_THROW(json_parser_error("write error", filename, 0));
}
Here is a brief discription of the books programming project I am trying to do...
Write a program that reads in a C++ source file and converts all ‘<’ symbols to “<” and all ‘>’ symbols to “>” . Also add the tag <PRE> to the beginning of the file and </PRE> to the end of the file. This tag preserves whitespace and formatting in the HTML document. Your program should create a new file with the converted output. To implement this, you should write a function ‘convert’ that takes the input and output streams as parameters.
I am having issues trying to get the program to work correctly. What's happening is the program will create a new file with .html but it is not converting anything in the file. (i.e. adding <PRE> to the beginning and </PRE> to the end and converting all '<' symbols to < and '>' to >).
I've been messing with it for a while now and I'm honestly not sure where I am going wrong. I'm super new to programming in general and even more new to c++ so please be nice haha.
Here is my code, any help is greatly appreciated!!
Thanks!
Scott
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#include <cstdlib>
using namespace std;
// main function
int main() {
// Input file to convert
string filename;
// Output file with .html on the end
string outputname;
char c;
int i;
ifstream inStream;
ofstream outStream;
cout << "Enter filename you woudl like to convert: " << endl;
cin >> filename;
// Open the input file
inStream.open(filename.c_str());
if (inStream.fail()) {
cout << "I/O failure opening file." << endl;
exit(1);
}
// Create the output file
outputname = filename + ".html";
outStream.open(outputname.c_str());
// First, output the <PRE> tag
outStream << "<PRE>" << endl;
// Loop through the input file intil nothing else to get
while (!inStream.eof()) {
inStream.get(c); // Get one character
// Output < or > or original char
if (c == '<') {
outStream << "<";
}
else if (c=='>') {
outStream << ">";
}
else outStream << c;
}
// Output end /PRE tag
outStream << "</PRE>" << endl;
inStream.close();
outStream.close();
cout << "Conversion done. Results are in file " << outputname << endl;
}
I'm having a trouble in trying to send a char (i.e. "R") from my qt5 application on WIN7 to comport which is connected to an Arduino.
I intend to blink a led on Arduino and my arduino part works OK.
Here is my qt code:
#include <QTextStream>
#include <QCoreApplication>
#include <QtSerialPort/QSerialPortInfo>
#include <QSerialPort>
#include <iostream>
#include <QtCore>
QT_USE_NAMESPACE
using namespace std;
QSerialPort serial;
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
QTextStream out(stdout);
QList<QSerialPortInfo> serialPortInfoList = QSerialPortInfo::availablePorts();
out << QObject::tr("Total number of ports available: ") << serialPortInfoList.count() << endl;
foreach (const QSerialPortInfo &serialPortInfo, serialPortInfoList) {
out << endl
<< QObject::tr("Port: ") << serialPortInfo.portName() << endl
<< QObject::tr("Location: ") << serialPortInfo.systemLocation() << endl
<< QObject::tr("Description: ") << serialPortInfo.description() << endl
<< QObject::tr("Manufacturer: ") << serialPortInfo.manufacturer() << endl
<< QObject::tr("Vendor Identifier: ") << (serialPortInfo.hasVendorIdentifier() ? QByteArray::number(serialPortInfo.vendorIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Product Identifier: ") << (serialPortInfo.hasProductIdentifier() ? QByteArray::number(serialPortInfo.productIdentifier(), 16) : QByteArray()) << endl
<< QObject::tr("Busy: ") << (serialPortInfo.isBusy() ? QObject::tr("Yes") : QObject::tr("No")) << endl;
}
serial.setPortName("COM5");
serial.open(QIODevice::ReadWrite);
serial.setBaudRate(QSerialPort::Baud9600);
serial.setDataBits(QSerialPort::Data8);
serial.setParity(QSerialPort::NoParity);
serial.setStopBits(QSerialPort::OneStop);
serial.setFlowControl(QSerialPort::NoFlowControl);
if(!serial.isOpen())
{
std::cout<<"port is not open"<<endl;
//serial.open(QIODevice::ReadWrite);
}
if(serial.isWritable()==true)
{
std::cout<<"port writable..."<<endl;
}
QByteArray data("R");
serial.write(data);
serial.flush();
std::cout<<"value sent!!! "<<std::endl;
serial.close();
return 0;
}
My source code consists of two parts,
1- serialportinfolist .... which works just fine
2- opening and writing data... I get no issue when running the code and the display shows the result as if nothing has gone wrong!
HOWEVER, the led on the board does not turn on when I run this code.
I test this with Arduino Serial Monitor and it turns on but cant turn on from Qt.
Are you waiting for cr lf (0x0D 0x0A) in your arduino code?
QByteArray ba;
ba.resize(3);
ba[0] = 0x5c; //'R'
ba[1] = 0x0d;
ba[2] = 0x0a;
Or append it to your string with
QByteArray data("R\r\n");
Or
QByteArray data("R\n");
I think I have found a partial solution but it is still incomplete.
When I press debug the first time, qt does not send any signal to Arduino, but when I press debug for the second time it behaves as expected.
So, is'nt it so weird that one has to run it twice to get it working???
Let me know if the problem exists somewhere else,
any help...
I'm planning to get data from this website
http://www.gpw.pl/akcje_i_pda_notowania_ciagle
(it's a site of the main stock market in Poland)
I've got a program written in C++ that downloads source of the site to the file.
But the problem is that it doesn't contain thing I'm interested in
(stocks' value of course).
If you compare this source of the site to the option "View element" ( RMB -> View element)
you can see that "View element" does contain the stocks' values.
<td>75.6</td>
<tr class="even red">
etc etc...
The downloaded source of the site doesn't have this information.
So we've got 2 questions
1) Why does source of the site is different from the "View element" option?
2) How to transfer my program so that it can download the right code?
#include <string>
#include <iostream>
#include "curl/curl.h"
#include <cstdlib>
using namespace std;
// Write any errors in here
static char errorBuffer[CURL_ERROR_SIZE];
// Write all expected data in here
static string buffer;
// This is the writer call back function used by curl
static int writer(char *data, size_t size, size_t nmemb,
string *buffer)
{
// What we will return
int result = 0;
// Is there anything in the buffer?
if (buffer != NULL)
{
// Append the data to the buffer
buffer->append(data, size * nmemb);
// How much did we write?
result = size * nmemb;
}
return result;
}
// You know what this does..
void usage()
{
cout <<"curltest: \n" << endl;
cout << "Usage: curltest url\n" << endl;
}
/*
* The old favorite
*/
int main(int argc, char* argv[])
{
if (argc > 1)
{
string url(argv[1]);
cout<<"Retrieving "<< url << endl;
// Our curl objects
CURL *curl;
CURLcode result;
// Create our curl handle
curl = curl_easy_init();
if (curl)
{
// Now set up all of the curl options
curl_easy_setopt(curl, CURLOPT_ERRORBUFFER, errorBuffer);
curl_easy_setopt(curl, CURLOPT_URL, argv[1]);
curl_easy_setopt(curl, CURLOPT_HEADER, 0);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, writer);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &buffer);
// Attempt to retrieve the remote page
result = curl_easy_perform(curl);
// Always cleanup
curl_easy_cleanup(curl);
// Did we succeed?
if (result == CURLE_OK)
{
cout << buffer << "\n";
exit(0);
}
else
{
cout << "Error: [" << result << "] - " << errorBuffer;
exit(-1);
}
}
}
return 0;
}
Because the values are filled in using JavaScript.
"View source" shows you the raw source for the page, while "View Element" shows you the state the document tree is in at the moment.
There's no simple way to fix it, because you need to either execute the JavaScript or port it to C++ (and it would probably make you unpopular at the exchange).
When I save the page as an html file (file/save as), I get a file containing all data displayed in browser and which was not found in page source (I use Chrome).
So I suggest that you add one step in your code:
Download page from a javascript enabled browser that support command line or some sort of API (If curl can't do it, maybe wget or lynx/links/links2/elinks on linux can help you?).
Parse data.
I prefer to work with std::string but I like to figure out what is going wrong here.
I am unable to understand out why std::find isn't working properly for type T** even though pointer arithmetic works on them correctly. Like -
std::cout << *(argv+1) << "\t" <<*(argv+2) << std::endl;
But it works fine, for the types T*[N].
#include <iostream>
#include <algorithm>
int main( int argc, const char ** argv )
{
std::cout << *(argv+1) << "\t" <<*(argv+2) << std::endl;
const char ** cmdPtr = std::find(argv+1, argv+argc, "Hello") ;
const char * testAr[] = { "Hello", "World" };
const char ** testPtr = std::find(testAr, testAr+2, "Hello");
if( cmdPtr == argv+argc )
std::cout << "String not found" << std::endl;
if( testPtr != testAr+2 )
std::cout << "String found: " << *testPtr << std::endl;
return 0;
}
Arguments passed: Hello World
Output:
Hello World
String not found
String found: Hello
Thanks.
Comparing types of char const* amounts to pointing to the addresses. The address of "Hello" is guaranteed to be different unless you compare it to another address of the string literal "Hello" (in which case the pointers may compare equal). Your compare() function compares the characters being pointed to.
In the first case, you're comparing the pointer values themselves and not what they're pointing to. And the constant "Hello" doesn't have the same address as the first element of argv.
Try using:
const char ** cmdPtr = std::find(argv+1, argv+argc, std::string("Hello")) ;
std::string knows to compare contents and not addresses.
For the array version, the compiler can fold all literals into a single one, so every time "Hello" is seen throughout the code it's really the same pointer. Thus, comparing for equality in
const char * testAr[] = { "Hello", "World" };
const char ** testPtr = std::find(testAr, testAr+2, "Hello");
yields the correct result