How to read a non-ASCII characters in WinInet - wininet

How to use InternetReadFile to read a from pastebin with a non ASCII characters
For example this paste has russian characters which are a non ASCII characters, how to output the string to console
std::string request(const char * URL)
{
char buffer[MAX];
DWORD read;
LPWSTR lpszText = new WCHAR[MAX];
HINTERNET hSession = InternetOpen(NULL,
INTERNET_OPEN_TYPE_PRECONFIG,
NULL,NULL,0);
HINTERNET conn = InternetOpenUrl(hSession,
URL,
NULL,
0,
INTERNET_FLAG_RELOAD,
0);
InternetReadFile(conn, buffer, sizeof(buffer),&read);
::MultiByteToWideChar(CP_UTF8,
MB_ERR_INVALID_CHARS,
buffer,read,
lpszText,
sizeof(lpszText));
std::wcout << lpszText << std::endl;
InternetCloseHandle(conn);
InternetCloseHandle(hSession);
return std::string(buffer, read);
}

Related

Boost library write_json put extra new line at the end

I am learning from http://www.cochoy.fr/boost-property-tree/.
Instead of write_json to stdout, I tried to save it in a string.
std::stringstream ss;
boost::property_tree::json_parser::write_json(ss, oroot, false);
std::cout <<" begin json string" << std::endl;
std::cout << ss.str() << std::endl;
std::cout << "after json string" << std::endl;
output:
begin json string
{"height":"320","some":{"complex":{"path":"bonjour"}},"animals":{"rabbit":"white","dog":"brown","cat":"grey"},"fish":"blue","fish":"yellow","fruits":["apple","raspberry","orange"],"matrix":[["1","2","3"],["4","5","6"],["7","8","9"]]}
after json string
According to the output above, there is a new empty line at the end. How to get rid of the new line? Because with the new line it is not a valid JSON string.
The newline is not explicitly mentioned in the JSON RFC-7159 but it is defined as part of the POSIX standard for a line.
Incase you're interested in where the newline comes from you can take a look at the write_json_internal source code, we can see that there is an stream << std::endl; near the end of the method. Note that ...::write_json references write_json_internal.
// Write ptree to json stream
template<class Ptree>
void write_json_internal(std::basic_ostream<typename Ptree::key_type::value_type> &stream,
const Ptree &pt,
const std::string &filename,
bool pretty)
{
if (!verify_json(pt, 0))
BOOST_PROPERTY_TREE_THROW(json_parser_error("ptree contains data that cannot be represented in JSON format", filename, 0));
write_json_helper(stream, pt, 0, pretty);
stream << std::endl;
if (!stream.good())
BOOST_PROPERTY_TREE_THROW(json_parser_error("write error", filename, 0));
}

Extract all URLs from HTML in C

How can I extract all URLs in a HTML using C standard library?
I am trying to deal with it using sscanf(), but the valgrind gives error (and I am even not sure if the code can meet my requirement after debugging successfully, so if there are other ways, please tell me). I stored the html content in a string pointer, there are multiple URLs (including absolute URL and relative URL, e.g.http://www.google.com, //www.google.com, /a.html, a.html and so on) in it. I want to extract them one by one and store them separately into another string pointer.
I am also thinking about using strstr(), but then I have no idea about how to get the second url.
My code (I skip the assert here) using sscanf:
int
main(int argc, char* argv[]) {
char *remain_html = (char *)malloc(sizeof(char) * 1001);
char *url = (char *)malloc(sizeof(char) * 101);
char *html = "navigation"
"search";
printf("html: %s\n\n", html);
sscanf(html, "<a href=\"%s", remain_html);
printf("after first href tag: %s\n\n", remain_html);
sscanf(remain_html, "%s\">", url);
printf("first web: %s\n\n", url);
sscanf(remain_html, "<a href=\"%s", remain_html);
printf("after second href tag: %s\n\n", remain_html);
free(remain_html);
free(url);
}
The valgrind gives: Conditional jump or move depends on uninitialised value(s).
If anybody could help, thank you so much!
valgrind warn you about non initialized data (used in test), considering your program only does sscanf and printf that means you very probably have a problem with your scanf
if I change a little your program to print the result of sscanf, so show much elements it get :
int
main(int argc, char* argv[]) {
char *remain_html = (char *)malloc(sizeof(char) * 1001);
char *url = (char *)malloc(sizeof(char) * 101);
char *html = "<A class=\"mw-jump-link\" HREF=\"#mw-head\">Jump to navigation</a>"
"<a class=\"mw-jump-link\" href=\"#p-search\">Jump to search</a>";
printf("html: %s\n\n", html);
printf("%d\n", sscanf(html, "<a href=\"%s", remain_html));
printf("after first href tag: %s\n\n", remain_html);
printf("%d\n", sscanf(remain_html, "%s\">", url));
printf("first web: %s\n\n", url);
printf("%d\n", sscanf(remain_html, "<a href=\"%s", remain_html));
printf("after second href tag: %s\n\n", remain_html);
free(remain_html);
free(url);
}
the execution is :
pi#raspberrypi:/tmp $ ./a.out
html: <A class="mw-jump-link" HREF="#mw-head">Jump to navigation</a><a class="mw-jump-link" href="#p-search">Jump to search</a>
0
after first href tag:
-1
first web:
-1
after second href tag:
pi#raspberrypi:/tmp $
so the first scanf got nothing (0 element), that means it does not set remain_html and that one is non initialized when it is used by the next sscanf with an undefined behavior
Because of the format
"<a href=\"%s"
the first sscanf waits for a string starting by
<a href="
but html starts by
<A class=
which is different, so it stop from the second character and does not set remain_html
To use sscanf is not the right way, search for the prefix <a href=" may be in uppercase for instance using strcasestr, then extract the URL up to the closing "
Example :
#include <stdio.h>
#include <string.h>
#include <ctype.h>
/* in case you do not have that function */
char * strcasestr(char * haystack, char *needle)
{
while (*haystack) {
char * ha = haystack;
char * ne = needle;
while (tolower(*ha) == tolower(*ne)) {
if (!*++ne)
return haystack;
ha += 1;
}
haystack += 1;
}
return NULL;
}
int main(int argc, char* argv[]) {
char *html = "navigation"
"search";
char * begin = html;
char * end;
printf("html: %s\n", html);
while ((begin = strcasestr(begin, "<a href=\"")) != NULL) {
begin += 9; /* bypass the header */
end = strchr(begin, '"');
if (end != NULL) {
printf("found '%.*s'\n", (int) (end - begin), begin);
begin = end + 1;
}
else {
puts("invalid url");
return -1;
}
}
}
Compilation and execution :
pi#raspberrypi:/tmp $ gcc -Wall a.c
pi#raspberrypi:/tmp $ ./a.out
html: navigationsearch
found 'http://www.google.com'
found '/a.html'
pi#raspberrypi:/tmp $
Note I know the second parameter of strcasestr is in lower case so it is useless to do do tolower(*ne) and *ne is enough, but I given a definition of the function out of the current context

Writing and reading strange letters to a txt-file using libcurl

I have a c++ program that downloads the HTML-code of a webpage and saves it as a text file using the library LibCurl. The problem is, that we have the following strange letters in danish alphabet: Æ æ Ø ø Å å.
When I try to read the HTML-code line by line, all these characters look like "�". I have tried to read/write the file as wide characters. That did not work.
I have also tried to write a sentence containing "æ", "ø" og "å" to another text file and read it again. That, for some reason, worked.
So my question is, why does the strange letters look like "�" when they are in the downloaded HTML-code but not when I write my own sentence? And how do I fix the HTML-output?
My code is as follows:
#include <iostream>
#include <string>
#include<fstream>
#include<curl/curl.h>
using namespace std;
static size_t write_data (string * ptr, size_t size, size_t nmemb,void *stream)
{
size_t written = fwrite (ptr, size, nmemb, (FILE *) stream);
cout << static_cast <const void *> (ptr);
return written;
//string myString (ptr, nbytes);
}
//string myString (ptr, nbytes);
int main ()
{
// Writing weird characters to a text-file!
ofstream myfile;
myfile.open ("example.txt");
myfile << "Print strange letter: æ ø og å!";
myfile.close ();
// Reading weird characters from the text-file!
std::ifstream wif ("example.txt");
if (wif.is_open ())
{
std::string wline;
while (std::getline (wif, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open file" << endl ;
wif.close ();
cout << endl << endl;
// Download the HTML-code from a webpage and save it.
CURL *curl_handle;
static const char *pagefilename = "example2.txt";
const char *charUrl = "http://politiken.dk/forbrugogliv/sundhedogmotion/ECE3406716/mindst-85000-offentligt-ansatte-maa-slet-ikke-ryge-i-arbejdstiden/"; // An article from the danish newspaper "Politiken"
FILE *pagefile;
curl_global_init (CURL_GLOBAL_ALL);
curl_handle = curl_easy_init ();
curl_easy_setopt (curl_handle, CURLOPT_URL, charUrl); // HERE IS THE URL PASSED!
curl_easy_setopt (curl_handle, CURLOPT_VERBOSE, 0L);
curl_easy_setopt (curl_handle, CURLOPT_NOPROGRESS, 0L);
curl_easy_setopt (curl_handle, CURLOPT_WRITEFUNCTION, write_data);
pagefile = fopen (pagefilename, "wb");
if (pagefile)
{
curl_easy_setopt (curl_handle, CURLOPT_WRITEDATA, pagefile);
curl_easy_perform (curl_handle);
fclose (pagefile);
}
curl_easy_cleanup (curl_handle);
//Reading the HTML-code
ifstream webIn ("example2.txt");
if (webIn.is_open ())
{
std::string wline;
while (getline (webIn, wline))
{
cout << wline << endl;
}
}
else
cout << "Could not open example.txt" << endl;
return 0;
}
And my output is
Print strange letter: æ ø og å!
>>HTML-CODE CONTAINING "�" instead of æ ø å <<
I don't know if it is relevant, but my OS is Linux Mint 17.3 and I've set the language and region og my system to "English, Denmark UTF-8".
Thanks in advance! I will really appreciate any help or hints :)

assiging makes pointer integer witout cast

hello all i have write a c program which connects to a mysql server and executes a sql query from a text file which has only one query.
#include <mysql.h>
#include <stdio.h>
main() {
MYSQL *conn;
MYSQL_RES *res;
MYSQL_ROW row;
char *server = "127.0.0.1";
char *user = "root";
char *password = "PASSWORD"; /* set me first */
char *database = "har";
conn = mysql_init(NULL);
char ch, file_name[25];
char *ch1;
FILE *fp;
printf("Enter the name of file you wish to see ");
gets(file_name);
fp = fopen(file_name,"r"); // read mode
if( fp == NULL )
{
perror("Error while opening the file.\n");
exit(0);
}
while( ( ch = fgetc(fp) ) != EOF )
printf("%c",ch);
ch1=ch;
/* Connect to database */
if (!mysql_real_connect(conn, server,
NULL , NULL, database, 0, NULL, 0)) {
fprintf(stderr, "%s\n", mysql_error(conn));
exit(0);
}
printf("%c",ch);
/* send SQL query */
if (mysql_query(conn, ch1)) {
fprintf(stderr, "%s\n", mysql_error(conn));
exit(0);
}
res = mysql_use_result(conn);
/* output table name */
printf("MySQL Tables in mysql database:\n");
while ((row = mysql_fetch_row(res)) != NULL)
printf("%s \n", row[0]);
/* close connection */
mysql_free_result(res);
mysql_close(conn);
fclose(fp);
}
i am unable to understand where i have gone wrong....
thanks in advance...
This is the line causing problem:
ch1=ch;
ch1 is a pointer to a character, whereas ch is a character.
Do you intend to store the bytes read from fp in a char array pointed by ch1? What you are doing is, every time in the while loop you are reading a character using fgetc storing it in ch and printing it.
Then, when while loop gets over, you are assigning a char to a char pointer. I am not sure what you are trying to do with this. But this definitely causes the problem.
You're going wrong in a lot of ways:
You don't declare the return type or arguments for main.
You're using gets. Never ever use gets, don't even think about. Use fgets instead.
fgetc returns an int, not a char so your ch should be an int. You won't be able to recognize EOF until you fix this.
You're declaring char ch and char *ch1 but assigning ch to ch1. That's where the error in your title is coming from.
Your code appears to be trying feed your SQL to MySQL one byte at a time and that's not going to do anything useful. I think you're meaning to use fgets to read the SQL file one line a time so that you can feed each line to MySQL as a single SQL statement.
You should spend some time reading about your compiler's warning switches

std::find with type T** vs T*[N]

I prefer to work with std::string but I like to figure out what is going wrong here.
I am unable to understand out why std::find isn't working properly for type T** even though pointer arithmetic works on them correctly. Like -
std::cout << *(argv+1) << "\t" <<*(argv+2) << std::endl;
But it works fine, for the types T*[N].
#include <iostream>
#include <algorithm>
int main( int argc, const char ** argv )
{
std::cout << *(argv+1) << "\t" <<*(argv+2) << std::endl;
const char ** cmdPtr = std::find(argv+1, argv+argc, "Hello") ;
const char * testAr[] = { "Hello", "World" };
const char ** testPtr = std::find(testAr, testAr+2, "Hello");
if( cmdPtr == argv+argc )
std::cout << "String not found" << std::endl;
if( testPtr != testAr+2 )
std::cout << "String found: " << *testPtr << std::endl;
return 0;
}
Arguments passed: Hello World
Output:
Hello World
String not found
String found: Hello
Thanks.
Comparing types of char const* amounts to pointing to the addresses. The address of "Hello" is guaranteed to be different unless you compare it to another address of the string literal "Hello" (in which case the pointers may compare equal). Your compare() function compares the characters being pointed to.
In the first case, you're comparing the pointer values themselves and not what they're pointing to. And the constant "Hello" doesn't have the same address as the first element of argv.
Try using:
const char ** cmdPtr = std::find(argv+1, argv+argc, std::string("Hello")) ;
std::string knows to compare contents and not addresses.
For the array version, the compiler can fold all literals into a single one, so every time "Hello" is seen throughout the code it's really the same pointer. Thus, comparing for equality in
const char * testAr[] = { "Hello", "World" };
const char ** testPtr = std::find(testAr, testAr+2, "Hello");
yields the correct result