Scraped HTML is not written at the beginning of text file - html

Currently, I'm scraping the HTML code of a page, and writing it to a text file.
My problem is, why must there be empty spaces or empty lines at the beginning? The HTML codes written to the txt file do not seem to start at the beginning of the text file. This means that the '<' is not located at the position 0 of the txt file.
After a few runs, my HTML is always written a few lines down inside the text file.
Can anyone tell me why?
Below is my code. I'm doing it under Visual C++ .
UINT32 LOG(wstring log, UINT32 flag)
{
wfstream file (LOG_FILE, ios_base::app);
file << log;
file.close();
return 1;
}
My problem is, the HTML code copied to my text file is always down a couple of lines, then will find the '<' tag. What I want is, the HTML's first '<' is written at the position 0 of my text file :)

Below is my code. I'm doing it under Visual C++ .
UINT32 LOG(wstring log, UINT32 flag)
{
if(flag == 0)
{
wfstream file (LOG_FILE, ios_base::app);
if (file.is_open())
{
file << log <<endl;
file.close();
wcout << endl << log << endl;
return 0;
}
else wcout << "\nUnable to open LOG file\n";
return 1;
}
My problem is, the HTML code copied to my text file is always down a couple of lines, then will find the '<' tag. What I want is, the HTML's first '<' is written at the position 0 of my text file :)

Related

How to read a specific HTML part in a C++ program?

I am a very beginner. I have a C++ program in CLI, with a HTML/XML input file thanks to a ifstream file("C:/.......").
The main problem is I want to take some text in this file, and put into variable. But the difficulties I meet is that I only want to take some part of the file, for instance in a var1. I want text which is between the HTML tag <name> or the one I choose.
I already tried to put some getline with condition or move cursors but I only have all the text or nothing.
//here some part of the code that i'm sure of
string info(""), line(""), system("");
ifstream file("C:/Users/[...]/file.xml");
if (file.is_open())
{
while (getline(file, line))
{
cout << line << endl;
}
file.close();
}
else
cout << "file is not open" << endl;
Then I call the var with the the text, sorry for English mistakes or code mistakes, and thanks in advance if you could give me some clues.

How to load .txt file into separate .html files? c++

I need to read an HTML file and then separate specific parts of it into individuals HTML files.
For example:
<html lang="en">
<head></head>
<body>
<ul>something 123</ul>
<p>something else 123</p>
<p>blabla</p>
<table>example</table>
</body>
</html>
Everything between <ul> and </ul> should be saved in another HTML file, same with everything between <p> and </p>.
I need to use <fstream> library, and I do not know how to use vectors, so I need to do this probably without them unless there's a simple solution.
The main problem, for now, is, how to read a file until a string is found?
I mean, for example - string table = "<table>" is found and then the program is saving everything after <table> until it finds string end_table = "</table>".
Thanks for your help.
You can use find to locate the beginning and ending body tag with the following:
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char* argv[]) {
string line = "some line with <body> in it";
string bodytag = "<body>";
if(line.find(bodytag) != string::npos) {
cout << "found" << endl;
}
return 0;
}
Then just read lines in from the file until you find the <body> tag and output them until you find the </body> tag. You might need to modify this if content that needs to be saved appears after the opening body tag or before the closing body tag on the same line. Your input doesn't contain this, so this isn't likely a problem.

Adding images in html page using c++ file handling

I have created one html file using ofstream in c++. I have one folder in my system and in that I have 10 jpeg images.
Now I want to add that 10 images to my html page using loop.
I have add one image in my html page.
My question is how to add 10 jpeg images which are in same folder but having different names using one loop?
#include<iostream>
#include<fstream>
using namespace std;
int main()
{
ifstream folder;
ofstream myfile;
char buffer[500]={'0'};
folder.open("/home/tanmay/exp");
myfile.open ("MY_FILE.html");
myfile << "<!DOCTYPE html>"<<endl;
sprintf(buffer,"00_FrontView.jpg");
myfile<<"<table align=""center"" border=""2""width=""600"" height=""50"">"
<<"<td><center><a href="<<"\""<<buffer<<"\""<<"target=""_self"">"
<<"<img src="<<"\""<<buffer<<"\""<<"width=""150"" height=""100""/>"
<<"</center></td>"<<endl;
myfile << "</table></body></html>";
myfile.close();
folder.close();
return 0;
}

CGI wont display variables through HTML in c (Eclipse)

I have used a fifo pipe to read in some data (weather data) into a char variable. The console will display this variable correctly. However, when I try to display it through HTML on the CGI page, it simply does not display. Code below -
int main(void) {
int fd;
char *myfifo = "pressure.txt";
char buff[BUFFER];
long fTemp;
//open and read message
fd = open(myfifo, O_RDONLY);
read(fd, buff, BUFFER);
printf("Received: %s\n", buff);
close(fd);
printf("Content-type: text/html\n\n");
puts("<HTML>");
puts("<BODY>");
printf("Data is: %s", buff);
puts("</BODY>");
puts("</HTML>");
return EXIT_SUCCESS;
}
As you can see in the console is displays correctly -
Received: 2014-08-13 16:54:57
25.0 DegC, 1018.7 mBar
Content-type: text/html
<HTML>
<BODY>
Data is 2014-08-13 16:54:57
25.0 DegC, 1018.7 mBar
</BODY>
</HTML>
logout
But on the CGI webpage it does not display the weather data, but it does display "data is".
Two important things when writing a CGI program:
the program will be run by the webserver, which is normally
started as a different user (the 'www' user for example).
it's possible that the program is started from within another
directory, which can cause different behaviour if you don't
specify the full path of a file you want to open.
Since both these things can cause problems, it can be helpful
to add some debug information. Of course, it's always a good idea
to check return values of functions you use.
To make it easier to display debug or error messages, I'd first
move the following code up, so that all output that comes after
it will be rendered by the browser:
printf("Content-type: text/html\r\n\r\n");
puts("<HTML>");
puts("<BODY>");
It may be useful to know what the webserver uses as the directory
from which the program is started. The getcwd
call can help here. Let's use a buffer of size BUFFER to store
the result in, and check if it worked:
char curpath[BUFFER];
if (getcwd(curpath, BUFFER) == NULL)
printf("Can't get current path: %s<BR>\n", strerror(errno));
else
printf("Current path is: %s<BR>\n", curpath);
The getcwd function returns NULL in case of an error, and sets the value
of errno to a number which indicates what went wrong. To convert this
value to something readable, the strerror
function is used. For example, if BUFFER was not large enough to be
able to store the path, you'll see something like
Can't get current path: Numerical result out of range
The open call returns a negative number
if it didn't work, and sets errno again. So, to check if this worked:
fd = open(myfifo, O_RDONLY);
if (fd < 0)
printf("Can't open file: %s<BR>\n", strerror(errno));
In case the file can be found, but the webserver does not have permission
to open it, you'll see
Can't open file: Permission denied
If the program is started from another directory than you think, and
it's unable to locate the file, you would get:
Can't open file: No such file or directory
Adding such debug info should make it more clear what's going on, and more
importantly, what's going wrong.
To make sure the actual data is read without problems as well, the return
value of the read function should be
checked and appropriate actions should be taken. If read fails,
a negative number is returned. To handle this:
numread = read(fd, buff, BUFFER);
if (numread < 0)
printf("Error reading from file: %s<BR>\n", strerror(errno));
Another value indicates success, and returns the number of bytes that were
read. If really BUFFER bytes were read, it's not at all certain that the
last byte in buff is a 0, which is needed for printf to know when the
string ended. To make sure it is in fact null-terminated, the last byte in
buff is set to 0:
if (numread == BUFFER)
buff[BUFFER-1] = 0;
Note that this actually overwrites one of the bytes that were read in this
case.
If fewer bytes were read, it's still not certain that the last byte that was
read was a 0, but now we can place our own 0 after the bytes that were read
so none of them are overwritten:
else
buff[numread] = 0;
To make everything work, you may need the following additional include files:
#include <unistd.h>
#include <string.h>
#include <errno.h>
The complete code of what I described is shown below:
int main(void)
{
int fd, numread;
char *myfifo = "pressure.txt";
char buff[BUFFER];
char curpath[BUFFER];
long fTemp;
// Let's make sure all text output (even error/debug messages)
// will be visible in the web page
printf("Content-type: text/html\r\n\r\n");
puts("<HTML>");
puts("<BODY>");
// Some debug info: print the current path
if (getcwd(curpath, BUFFER) == NULL)
printf("Can't get current path: %s<BR>\n", strerror(errno));
else
printf("Current path is: %s<BR>\n", curpath);
// Open the file
fd = open(myfifo, O_RDONLY);
if (fd < 0)
{
// An error occurs, let's see what it is
printf("Can't open file: %s<BR>\n", strerror(errno));
}
else
{
// Try to read 'BUFFER' bytes from the file
numread = read(fd, buff, BUFFER);
if (numread < 0)
{
printf("Error reading from file: %s<BR>\n", strerror(errno));
}
else
{
if (numread == BUFFER)
{
// Make sure the last byte in 'buff' is 0, so that the
// string is null-terminated
buff[BUFFER-1] = 0;
}
else
{
// Fewer bytes were read, make sure a 0 is placed after
// them
buff[numread] = 0;
}
printf("Data is: %s<BR>\n", buff);
}
close(fd);
}
puts("</BODY>");
puts("</HTML>");
return EXIT_SUCCESS;
}

Open an image (jpg, gif, xbm) in binary mode

I'm working on a server/client assignment. My server should pass down to Client HTML, .txt, .jpg, .gif, and .xbm files to open up on a web browser when I go the localhost:8080/some address... my HTML and .txt files open like a charm! however the .jpg and .gif wont display at all. the server does locate the file but when it comes to reading or displaying the .jpg or .gif it cannot be display because it contains errors, but when I open it in a regular browser it works good. Our instructions are "When you open an image file for reading, make sure to open it in binary mode, since an image file contains binary data." so I did just that but since the image is not displaying I wonder if I'm doing it correctly. Here is my code in c to read a file that I use for html, txt, and image formats.
void file_read(char *filename){
//filename is actually the path
size_in = strlen(filename);
//this is test display to see how the path was being read and since getting a
//from a web browser contains an extra character '/' in the beginning we need to
//to ignore it,
printf("printf in file_read function %d\n", size_in);
wtg = filename + 1;
printf("printf in file_read function %s\n", wtg);
//file open and transfer in binary 'rb'
fp= fopen (wtg, "rb");
if (fp == NULL){
printf("Could not open the file\n");
exit(0);
}
fseek(fp , 0, SEEK_END);
len = ftell (fp);
rewind (fp);
//allocate memory to contain the whole file:
linehtml = calloc(1, len + 1);
if (linehtml == NULL){fputs("Memory Error", stderr);
exit(2);}
//copy the file into the buffer:
if(1!=fread(linehtml, len, 1, fp)){
fputs("entire read fails",stderr),exit(1);
}
fclose(fp);
/*if (result != len) {
printf("file lenght is %d\n\n", len);
fputs ("Reading Error",stderr);
exit(3);
};*/
}
Is there something wrong? is the buffer not long enough? any ideas?