I am new to learning C++ and for a class I am tasked with an assignment to create a parser for HTML files in C++. The program is to input a file name, and output that file's contents, how many lines, character, tags, links, comments, and what percent of characters are in tags.
I have most of the program complete, I am just stumbling on one part: how to count the number of tags in the HTML file. Below is what I have so far. My issue in particular is with lines 106-109, the part that starts with "if(fileChar == TAG)"
Other questions related to this topic either aren't answered, or are using libraries I am not allowed to.
Since this is for a class ideally I am looking for a method that does not involve libraries other than the ones listed in the header files. Any help would be much appreciated as I am currently banging my head against a wall :)
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main ()
{
const char TAG = '<', //marks the beginning of a tag
LINK = 'a', //marks the beginning of a link
COMMENT = '!'; //marks the beginning of a comment
char fileChar; //individual characters from the file
int charNum=0, //total characters in the file
tagNum=0, //total tags in the file
linkNum=0, //total links in the file
commentNum=0, //total comments in the file
tagChars=0, //number of chars in tags
lineNum=0, //number of lines in file
charPercent=0; //percent of chars in tags
int count = 0; //for counting
string fileName; //name of file
ifstream inFile;
//take in user input
cout << "========================================" << endl;
cout << " Welcome to the HTML File Analyzer!" << endl;
cout << "========================================" << endl << endl;
cout << "Please enter a valid file name (with no spaces): " << endl;
cin >> fileName;
inFile.open(fileName.c_str()); //opens the file
if(inFile) //tests if file is open
cout << "file IS open" << endl;
else
cout << "file NOT open" << endl;
while (!inFile) //error checking to ensure file exists
{
inFile.clear(); //clear false file
cout << endl << "Re-enter a valid filename: " << endl;
cin >> fileName;
inFile.open (fileName.c_str());
}
//display contents of file
cout << "========================================" << endl;
cout << " Contents of the File " << endl;
cout << "========================================" << endl << endl;
std:string line;
while(inFile) //print out contents of the file
{
getline(inFile, line);
cout << line << endl;
lineNum++; //add to line counter
const int size=line.length();
charNum = charNum + size;
cout << "The total number of characters entered is: " << charNum << endl;
}
inFile.open(fileName.c_str()); //reopen file
while(inFile)
{
if (fileChar == TAG)
{
tagNum++;
}
}
cout << "========================================" << endl;
cout << " End of Contents of File " << endl;
cout << "========================================" << endl << endl;
inFile.open(fileName.c_str());
while(inFile) //count chars
{
charNum = charNum + 1;
}
cout << "========================================" << endl;
cout << " Content Analysis " << endl;
cout << "========================================" << endl << endl;
cout << "Number of Lines: " << lineNum << endl;
cout << "Number of Tags: " << tagNum << endl;
cout << "Number of Comments: " << commentNum << endl;
cout << "Number of Links: " << linkNum << endl;
cout << "Number of Chars in File: " << charNum << endl;
cout << "Number of Chars in Tags: " << tagChars << endl;
cout << "Percent of Chars in Tags: " << charPercent << endl;
inFile.close ();
return (0);
}
Under the assumption that you are dealing with valid HTML5, we can distinguish five cases when you see a < character outside of a comment:
either it is the start of a comment and is followed by !--, or
either it is the start of a DOCTYPE and is followed by !DOCTYPE, or
either it is the start of a CDATA and is followed by ![CDATA[, or
it is an end tag and is followed by /, or
it is a start tag and is followed by a tag name.
while (inFile) {
inFile >> fileChar;
if (inFile != TAG) continue; // We are only interested in potential tag or comment starts.
inFile >> fileChar;
if (fileChar == '!') {
char after1, after2;
inFile >> after1 >> after2;
if (after1 == '-' && after2 == '-') {
// This is the start of a comment.
// We start eating chars until we see '-->' pass by.
std::string history = " ";
while (inFile) {
inFile >> fileChar;
if (history == "--" && fileChar == '>') {
// end of comment, stop this inner loop.
commentNum++;
break;
}
// Shift history and copy current character to recent history
history[0] = history[1];
history[1] = fileChar;
}
}
} else if (fileChar == '/') {
// This is a closing tag. Do nothing.
} else {
// This is the start of a tag. Read until the first non-letter, non-digit.
std::string tagName;
while (inFile) {
inFile >> fileChar;
if (std::isalnum(fileChar)) {
tagName.append(1, fileChar);
} else {
tagNum++;
if (tagName == "a") linkNum++;
}
}
}
}
Note that this is a very naïve implementation that only implements a part of the specification. It will probably break if you feed it malformed HTML. It definitely does not handle CDATA blocks (which will treat its contents as HTML instead of unparsed character data). I am not sure what you mean by "percent chars in tags", but that might be something you can track in the last else branch.
Finally, note that I wrote it as a single block. You are of course encouraged to factor it into smaller functions (read_comment or read_tag_name, for example) to increase legibility.
I'm a newbie in c++ and I've been trying to search for an answer to this question but can hardly found an answer I'm trying to compress an html string body with zlib in gzip compression the compression gets done successfully but when I try to concatenate it to another string it won't concatenate at all rendering the body to be blank here is the compression code:
void compress_message(){
z_stream zs;
char outbuffer[32768];
long strlen = content.size();
int ret;
string outstring;
memset(&zs, 0, sizeof(zs));
if (deflateInit2(&zs,
Z_BEST_COMPRESSION,
Z_DEFLATED,
MOD_GZIP_ZLIB_WINDOWSIZE + 16,
MOD_GZIP_ZLIB_CFACTOR,
Z_DEFAULT_STRATEGY) != Z_OK
) {
throw(std::runtime_error("deflateInit2 failed while compressing."));
}
content = "";
zs.next_in = (Bytef*) content.data();
zs.avail_in = strlen;
do {
//cout << "compressing..." << endl;
zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
zs.avail_out = sizeof(outbuffer);
ret = deflate(&zs, Z_FINISH);
if (content.size() < zs.total_out) {
// append the block to the output string
content.append(outbuffer,
zs.total_out - content.size());
}
} while (ret == Z_OK);
deflateEnd(&zs);
//cout << "original: " << content << endl;
if (ret != Z_STREAM_END) { // an error occurred that was not EOF
ostringstream oss;
oss << "Exception during zlib compression: (" << ret << ") " << zs.msg;
throw(std::runtime_error(oss.str()));
}
}
base64 conversion function:
string tobase64(string data) {
string code64,
output = "";
int i = 0,
j = 0;
unsigned int slen = data.size();
const char* bytes_to_encode;
code64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789+/";
unsigned char char_array_3[3],
char_array_4[4];
bytes_to_encode = data.c_str();
while(slen--){
char_array_3[i++] = bytes_to_encode[j++];
if(3 == i){
char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
char_array_4[3] = char_array_3[2] & 0x3f;
for(i = 0; i < 4; i++){
output += code64[char_array_4[i]];
}
i = 0;
}
}
if (i)
{
for(j = i; j < 3; j++)
char_array_3[j] = '\0';
char_array_4[0] = ( char_array_3[0] & 0xfc) >> 2;
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
for (j = 0; (j < i + 1); j++)
output += code64[char_array_4[j]];
while((i++ < 3))
output += '=';
}
return output;
}
the string is successfully compressed but when I concatenate it to the string response like:
string final_content = ("HTTP/1.1 200 OK\r\n"+headers+"\r\n\r\n");
content = tobase64(content);
final_content += content;
I also tried
final_content.append(content);
They both have the same result the body is never concatenated the final_content when printed only shows:
HTTP/1.1 200 OK
...
Content-type: text/html
and the compressed string is never included have been trying to find answers online and couldn't figure it out please help. I'm working this on ubuntu 18.04
I am using a websocket server to parse some information from a program writing with c++ to a html page.
I am using the websocket server sample from POCO,
my problem is that the html page is generated from the web socket while i want to have the web page dependent from the websocket and that the web client can call the web socket server to give him data whenever it lunches an event like a click of a button.
I tried to call the websocket server from the html page but i think the problem is in the server, how can a server know that a certain page is calling him?
I used this script in my html page to call the websocket whenever i click a button.
function myFunction()
{
var soc_di="ws://127.0.0.1:9980/ws"
try {
soc_di.onopen = function() {
//document.getElementById("wsdi_statustd").style.backgroundColor = "#40ff40";
document.getElementById("demo").textContent = " websocket connection opened ";
}
soc_di.onmessage =function got_packet(msg) {
//document.getElementById("number").textContent = msg.data + "\n";
document.getElementById("des").textContent = msg.data + "\n";
}
soc_di.onclose = function(){
//document.getElementById("wsdi_statustd").style.backgroundColor = "#ff4040";
document.getElementById("demo").textContent = " websocket connection CLOSED ";
}
} catch(exception) {
alert('<p>Error' + exception);
}
}
and here is the code when the websocket server generate the html page
class PageRequestHandler: public HTTPRequestHandler
/// Return a HTML document with some JavaScript creating
/// a WebSocket connection.
{
public:
void handleRequest(HTTPServerRequest& request, HTTPServerResponse& response)
{
response.setChunkedTransferEncoding(true);
response.setContentType("text/html");
std::ostream& ostr = response.send();
ostr << "<html>";
ostr << "<head>";
ostr << "<title>WebSocketServer</title>";
ostr << "<script type=\"text/javascript\">";
ostr << "function WebSocketTest()";
ostr << "{";
ostr << " if (\"WebSocket\" in window)";
ostr << " {";
ostr << " var ws = new WebSocket(\"ws://" << request.serverAddress().toString() << "/ws\");";
ostr << " ws.onopen = function()";
ostr << " {";
ostr << " ws.send(\"Hello, world!\");";
ostr << " };";
ostr << " ws.onmessage = function(evt)";
ostr << " { ";
ostr << " var msg = evt.data;";
ostr << " alert(\"Message received: \" + msg);";
ostr << " ws.close();";
ostr << " };";
ostr << " ws.onclose = function()";
ostr << " { ";
ostr << " alert(\"WebSocket closed.\");";
ostr << " };";
ostr << " }";
ostr << " else";
ostr << " {";
ostr << " alert(\"This browser does not support WebSockets.\");";
ostr << " }";
ostr << "}";
ostr << "</script>";
ostr << "</head>";
ostr << "<body>";
ostr << " <h1>WebSocket Server</h1>";
ostr << " <p>Run WebSocket Script</p>";
ostr << "</body>";
ostr << "</html>";
}
};
Have a look at the websocket example.
In the example the RequestHandlerFactory looks for the browser to send an upgrade to websocket request. If it finds this request then the example returns a WebSocketRequestHandler that then reads the web socket stream using Poco::Net::WebSocket.
You probably need another HTTPRequestHandler class that is responsible for handling websockets.
class WebSocketRequestHandler: public HTTPRequestHandler
/// Handle a WebSocket connection.
{
public:
void handleRequest(HTTPServerRequest& request, HTTPServerResponse& response)
{
Application& app = Application::instance();
try
{
WebSocket ws(request, response);
app.logger().information("WebSocket connection established.");
char buffer[1024];
int flags;
int n;
do
{
n = ws.receiveFrame(buffer, sizeof(buffer), flags);
app.logger().information(Poco::format("Frame received (length=%d, flags=0x%x).", n, unsigned(flags)));
ws.sendFrame(buffer, n, flags);
}
while (n > 0 || (flags & WebSocket::FRAME_OP_BITMASK) != WebSocket::FRAME_OP_CLOSE);
app.logger().information("WebSocket connection closed.");
}
catch (WebSocketException& exc)
{
app.logger().log(exc);
switch (exc.code())
{
case WebSocket::WS_ERR_HANDSHAKE_UNSUPPORTED_VERSION:
response.set("Sec-WebSocket-Version", WebSocket::WEBSOCKET_VERSION);
// fallthrough
case WebSocket::WS_ERR_NO_HANDSHAKE:
case WebSocket::WS_ERR_HANDSHAKE_NO_VERSION:
case WebSocket::WS_ERR_HANDSHAKE_NO_KEY:
response.setStatusAndReason(HTTPResponse::HTTP_BAD_REQUEST);
response.setContentLength(0);
response.send();
break;
}
}
}
};
I'm am writing a HTTP web server when I send a text file with the equivalent content of the HTML file to the browser the browser shows it correctly but when I send the HTML file itself browser shows the HTML page for a second and then the "the connection was reset" error shows up.
I have noticed that the text file is bigger than the HTML file but I have no Idea why
text size = 286 byte
HTML size = 142 byte
and this is the HTML code:
<!DOCTYPE html>
<html>
<body>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
<p>This is a paragraph.</p>
</body>
</html>
this is my code:
char sendBuffer[500];
FILE *sendFile = fopen("foo.html", "r");
fseek(sendFile, 0L, SEEK_END);
int sz = ftell(sendFile);
fseek(sendFile, 0L, SEEK_SET);
string s1;
s1="HTTP/1.1 200 OK\nContent-length: " + to_string(sz) + "\n";
std::vector<char> writable(s1.begin(), s1.end());
writable.push_back('\0');
strcpy(sendBuffer,(const char *)&writable[0]);
int c=send(connected,(const char*)&sendBuffer,strlen(&writable[0]),0);
printf("\nSent : %s\n",sendBuffer);
strcpy(sendBuffer,"Content-Type: text/html\n\n");
c=send(connected,(const char*)&sendBuffer,strlen("Content-Type: text/html\n\n"),0);
printf("\nSent : %s\n",sendBuffer);
char send_buffer[300];
while( !feof(sendFile) )
{
int numread = fread(send_buffer, sizeof(unsigned char), 300, sendFile);
if( numread < 1 ) break; // EOF or error
char *send_buffer_ptr = send_buffer;
do {
int numsent = send(connected, send_buffer_ptr, numread, 0);
if( numsent < 1 ) // 0 if disconnected, otherwise error
{
if( numsent < 0 ) {
if( WSAGetLastError() == WSAEWOULDBLOCK )
{
fd_set wfd;
FD_ZERO(&wfd);
FD_SET(connected, &wfd);
timeval tm;
tm.tv_sec = 10;
tm.tv_usec = 0;
if( select(0, NULL, &wfd, NULL, &tm) > 0 )
continue;
}
}
break; // timeout or error
}
send_buffer_ptr += numsent;
numread -= numsent;
}
while( numread > 0 );
}
Here is the other part of code that is used just before the code above:
int sock, connected, bytes_recieved , _true = 1 , portNumber;
char send_data [1024] , recv_data[1024];
struct sockaddr_in server_addr,client_addr;
int sin_size;
time_t t = time(NULL);
struct tm tm = *localtime(&t);
char date[50];
if ((sock = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror("Unable to create the Socket");
exit(1);
}
if (setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,(const char*)&_true,sizeof(int)) == -1) {
perror("Unable to Setsockopt");
exit(1);
}
char *server_address="127.1.1.1";
portNumber=8080;
server_addr.sin_family = AF_INET;
server_addr.sin_port = htons(portNumber);
server_addr.sin_addr.s_addr = inet_addr("127.1.1.1");//inet_pton(AF_INET,"127.0.0.1",&server_addr.sin_addr);//INADDR_ANY;
string host=server_address+':'+to_string(portNumber);
memset(&(server_addr.sin_zero),0,8);//sockaddr_in zero padding is needed
if (bind(sock, (struct sockaddr *)&server_addr, sizeof(struct sockaddr))==-1) //bind the socket to a local address
{
perror("Unable to bind");
exit(1);
}
if (listen(sock, 5) == -1) //listen to the socket with the specified waiting queue size
{
perror(" Listen");
exit(1);
}
cout << "MyHTTPServer waiting on port 8080" << endl;
fflush(stdout);
sin_size = sizeof(struct sockaddr_in);
connected = accept(sock, (struct sockaddr *)&client_addr,&sin_size);
cout<< "I got a connection from (" << inet_ntoa(client_addr.sin_addr) << "," << ntohs(client_addr.sin_port) << ')' << endl;
You have two important problems I can see
Your are passing send paremeters wrong, this line (very important)
int c=send(connected,(const char*)&sendBuffer,strlen(&writable[0]),0);
should be
int c=send(connected,(const char*) sendBuffer,strlen(&writable[0]),0);
/* ^
* No ampersand
*/
since the sendBuffer array decays to a pointer and you don't need that.
You are passing the first parameter of select wrong too from the manual
nfds is the highest-numbered file descriptor in any of the three sets, plus 1
so in your case it should be
if (select(connected + 1, NULL, &wfd, NULL, &tm) > 0)
and you are using it after you call send you must call it before to see if it is possible to write to the file descriptor.
Your code is a little bit too complicated for the task it's designed to so I propose the following solution with the mentions problems fixed and some other ones improved
string text;
stringstream stream;
FILE *sendFile = fopen("foo.html", "r");
if (sendFile == NULL) /* check it the file was opened */
return;
fseek(sendFile, 0L, SEEK_END);
/* you can use a stringstream, it's cleaner */
stream << "HTTP/1.1 200 OK\nContent-length: " << ftell(sendFile) << "\n";
fseek(sendFile, 0L, SEEK_SET);
text = stream.str();
/* you don't need a vector and strcpy to a char array, just call the .c_str() member
* of the string class and the .length() member for it's length
*/
send(connected, text.c_str(), text.length(), 0);
std::cout << "Sent : " << text << std::endl;
text = "Content-Type: text/html\n\n";
send(connected, text.c_str(), text.length(), 0);
std::cout << "Sent : %s" << text << std::endl;
while (feof(sendFile) == 0)
{
int numread;
char sendBuffer[500];
numread = fread(sendBuffer, sizeof(unsigned char), 300, sendFile);
if (numread > 0)
{
char *sendBuffer_ptr;
sendBuffer_ptr = sendBuffer;
do {
fd_set wfd;
timeval tm;
FD_ZERO(&wfd);
FD_SET(connected, &wfd);
tm.tv_sec = 10;
tm.tv_usec = 0;
/* first call select, and if the descriptor is writeable, call send */
if (select(1 + connected, NULL, &wfd, NULL, &tm) > 0)
{
int numsent;
numsent = send(connected, sendBuffer_ptr, numread, 0);
if (numsent == -1)
return;
sendBuffer_ptr += numsent;
numread -= numsent;
}
} while (numread > 0);
}
}
/* don't forget to close the file. */
fclose(sendFile);
A half way answer. First off, even if “Using \n works” it is breaching the standard. One should use CRLF. Use CRLF. Period.
For the rest of the code. I doubt this is going to change many things, but I would have re-structured the code a bit. It is to much going on in the send function.
Have separated out the sending of data to own function. You could also consider even separating out send header to it's own function – if you find a good way to structure it. When you expand to send text or html or etc. and so on and so forth, you definitively should separate out header to own function. Doing it at an early stage would be helpful.
Only meant as a crude start.
int send_data(int soc, const char *buf, size_t len)
{
ssize_t sent;
do {
/* Use iharob code or similar here */
/* Return something <> 0 on error. */
sent = send(soc, buf, len, 0);
buf += sent;
len -= sent;
} while (len > 0);
return 0;
}
int send_file(int soc, const char *fn)
{
char buf[500];
FILE *fh;
long sz;
size_t len;
int err = 0;
if (!(fh = fopen(fn, "r"))) {
perror("fopen");
return 1;
}
fseek(fh, 0L, SEEK_END);
sz = ftell(fh);
fseek(fh, 0L, SEEK_SET);
/* Consider adding Date + Server here. */
len = sprintf(buf,
"HTTP/1.1 200 OK\r\n"
"Content-length: %ld\r\n"
"Content-Type: text/html\r\n"
"Server: FooBar/0.0.1\r\n"
"\r\n", sz
);
if (len < 0) {
err = 3;
fprintf(stderr, "Error writing header.\n");
goto fine;
}
/* Debug print. */
fprintf(stderr, "Header[%d]:\n'%s'\n", len, buf);
if ((err = send_data(soc, buf, len)) != 0) {
fprintf(stderr, "Error sending header.\n");
goto fine;
}
while (!feof(fh)) {
len = fread(buf, sizeof(char), 500, fh);
if (len < 1)
break;
if ((err = send_data(soc, buf, len))) {
fprintf(stderr, "Error sending file.\n");
goto fine;
}
}
if ((err = ferror(fh))) {
fprintf(stderr, "Error reading file.\n");
perror("fread");
}
fine:
fclose(fh);
return err;
}
Is there an alternative version of std::find_if that returns an iterator over all found elements, instead of just the first one?
Example:
bool IsOdd (int i) {
return ((i % 2) == 1);
}
std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);
std::vector<int>::iterator it = find_if(v.begin(), v.end(), IsOdd);
for(; it != v.end(); ++it) {
std::cout << "odd: " << *it << std::endl;
}
You can just use a for loop:
for (std::vector<int>:iterator it = std::find_if(v.begin(), v.end(), IsOdd);
it != v.end();
it = std::find_if(++it, v.end(), IsOdd))
{
// ...
}
Alternatively, you can put your condition and action into a functor (performing the action only if the condition is true) and just use std::foreach.
in STL there isn't, but boost offers this funcionality:
boost::algorithm::find_all
First always try to come up with typical STL usage itself, you can go for boost as well. Here is more simplified form from the above mentioned answer by Charles.
vec_loc = find_if(v3.begin(), v3.end(), isOdd);
if (vec_loc != v3.end())
{
cout << "odd elem. found at " << (vec_loc - v3.begin()) << "and elem found is " << *vec_loc << endl;
++vec_loc;
}
for (;vec_loc != v3.end();vec_loc++)
{
vec_loc = find_if(vec_loc, v3.end(), isOdd);
if (vec_loc == v3.end())
break;
cout << "odd elem. found at " << (vec_loc - v3.begin()) << "and elem found is " << *vec_loc << endl;
}