how to detect "​" (combination of unicode) in c++ string - html

I am trying to detect some of the combination of Unicode character (like ​) to cleanup the string, For a single Unicode character it is detecting but combination of Unicode is not detecting.
These string I am using to make HTML page from another HTML page which need to be cleanup. I want to clean only string which have these kind of unicode that not even visible in html page in browser.
below is the sample code:
void detect_Unicode(string& str) {
if(!str.empty() && str.find_first_not_of(" \t\n\r\f\v\u00A0\u00C2\u00E2\u20AC\u2039")==string::npos)
str.assign(" ");
return;
}
Input string:
1. " ​ ​ " ;
2. "are   there is something    ​ combination ​"
3. " Â Â "
4. "​   ​"
5 . "Â Â â â"
Expected Output:
1. " "
2. "are   there is something    ​ combination ​"
3. " "
4. " "
5. " "
Please let me know other ways too.

OK, following on from the comments above, I think it's highly likely that the input string is in UTF-8 (after all, in an HTML context, what else would it be?).
On that basis, I humbly submit this:
#include <string>
#include <codecvt>
#include <locale>
std::string narrow (const std::wstring& ws)
{
std::wstring_convert <std::codecvt_utf8 <wchar_t>, wchar_t> convert;
return convert.to_bytes (ws);
}
std::wstring widen (const std::string& s)
{
std::wstring_convert <std::codecvt_utf8 <wchar_t>, wchar_t> convert;
return convert.from_bytes (s);
}
std::string detect_Unicode (const std::string& s)
{
std::wstring ws = widen (s);
if (ws.empty() || ws.find_first_not_of (L" \t\n\r\f\v\u00A0\u00C2\u00E2\u20AC\u2039") != std::wstring::npos)
return " ";
return s;
}
#include <iostream>
int main ()
{
std::cout << narrow (L"\u00A0 \u00C2 \u00E2 \u20AC \u2039\n\n");
std::cout << "0.\t\"" << detect_Unicode (u8"abcde") << "\"\n";
std::cout << "1.\t\"" << detect_Unicode (u8" ​ ​ ") << "\"\n";
std::cout << "2.\t\"" << detect_Unicode (u8"are   there is something    ​ combination ​") << "\"\n";
std::cout << "3.\t\"" << detect_Unicode (u8" Â Â ") << "\"\n";
std::cout << "4.\t\"" << detect_Unicode (u8"​   ​") << "\"\n";
std::cout << "5.\t\"" << detect_Unicode (u8"Â Â â â") << "\"\n";
}
Output:
 ⠀ ‹
0. " "
1. " ​ ​ "
2. " "
3. " Â Â "
4. "​   ​"
5. "Â Â â â"
Now this is not the output the OP expects, but I think that's simply because the logic (as opposed to the implementation) of detect_Unicode() looks flawed. The point here is that converting the input string to a wide string means that you can use standard basic_string operations on it reliably, because there are no multibyte issues now.
An alternative, slightly radical, implementation of detect_Unicode() might be:
for (auto wide_char : ws)
{
if (wide_char > 0xff)
return " ";
}
return s;
But really, now you have a wide string to hand in detect_Unicode, anything is possible, so go wild OP.
Other notes:
std::codecvt is deprecated in C++17, but since there is no other obvious choice you might as well run with it. You can always change the implementations of narrow and widen if it comes to it.
Depending on platform, std::wstring might not be the best choice but it's probably fine. You could also look at std::u16string and std::u32string.
Live demo.
Inspiration taken from here.

Related

What is wrong with print (2 & 2) >> 1?

I am just wondering what happens with that piece of code.
Why the result is incorrect only when printed directly, why is the newline ignored?
user#host_09:22 AM: perl
print 2 >> 1, "\n";
print 2 & 2, "\n";
print (2 & 2) >> 1, "\n";
1
2
2user#host_09:22 AM: perl
$a = (2 & 2) >> 1;
print "$a\n";
1
Perl interprets the parentheses as function arguments marker, as you can verify with
perl -MO=Deparse,-p -e 'print (2 & 2) >> 1'
Output:
(print(2) >> 1);
The canonical way is to precede the left parenthesis with a +:
print +(2 & 2) >> 1
When you print it with warnings it becomes clear(er)
perl -we'print (2 & 2), "\n"'
says
print (...) interpreted as function at -e line 1.
Useless use of a constant ("\n") in void context at -e line 1.
It works out print (2&2) as a function call to print† and duly prints 2 (no newline!), and then it keeps evaluating the comma operator, with "\n" in void context next, which it also warns us about.
With >> 1 also there, the return 1 of print (2&2) (for success) is bit shifted to 0, which disappears into the void, and we get
another "Useless use of ... in void context."
One fix is to add a + since what follows it must be an expression
perl -we'print +(2 & 2) >> 1, "\n"'
Or, make a proper call to print, with parenthesis around the whole thing
perl -we'print((2 & 2) >> 1, "\n")'
Both print a line with 1.
This is mentioned in print, and more fully documented in Terms and List operators and in Symbolic Unary operators, both in perlop. For another, related, example see this post.
† It also warns about it as it is likely an error -- with a space before parens; no space, no warning.

Print Universal Characters (html) stored in a string c++

I am trying to print a string with universal characters stored in it. If i initialized the string with the following:
string test = "\u000D\u000A\u000D\u000Aclass Solution {\u000D\u000Apublic:\u000D\u000A
cout << test << endl;
it would print out the output I want:
class Solution {
but if I get the same string from Curl result, like following:
curl_easy_setopt(curl, CURLOPT_WRITEDATA, addressof(test));
res = curl_easy_perform(curl);
cout << test << endl;
it would print out:
\u000D\u000A\u000D\u000Aclass Solution {\u000D\u000Apublic:\u000D\u000A
I tried to turn the test into const char * by c_str and then print it out, but it is still not working. I tried to google it for a few hours but unfortunately I cannot find the answer.
I am using Mac and compile with
clang++ -std=c++11 -Wall -Wextra -lcurl
May anyone help me on this?
Thanks a lot!
Just in case anyone facing the same issue, I get this done by using ICU library with UnicodeString Converter.

MySql connection threw c++ in ubuntu

I have installed necessary packages
sudo apt-get install mysql-server
sudo apt-get install libmysqlcppconn-dev
here is my code:
/* Standard C++ includes */
#include <stdlib.h>
#include <iostream>
/*
Include directly the different
headers from cppconn/ and mysql_driver.h + mysql_util.h
(and mysql_connection.h). This will reduce your build time!
*/
#include <mysql_connection.h>
#include <cppconn/driver.h>
#include <cppconn/exception.h>
#include <cppconn/resultset.h>
#include <cppconn/statement.h>
using namespace std;
int main(void)
{
cout << endl;
cout << "Running 'SELECT 'Hello World!' AS _message'..." << endl;
try {
sql::Driver *driver;
sql::Connection *con;
sql::Statement *stmt;
sql::ResultSet *res;
/* Create a connection */
driver = get_driver_instance();
con = driver->connect("tcp://127.0.0.1:11840", "root", "n");
/* Connect to the MySQL test database */
con->setSchema("test");
stmt = con->createStatement();
res = stmt->executeQuery("SELECT 'Hello World!' AS _message"); // replace with your statement
while (res->next()) {
cout << "\t... MySQL replies: ";
/* Access column data by alias or column name */
cout << res->getString("_message") << endl;
cout << "\t... MySQL says it again: ";
/* Access column fata by numeric offset, 1 is the first column */
cout << res->getString(1) << endl;
}``
delete res;
delete stmt;
delete con;
} catch (sql::SQLException &e) {
cout << "# ERR: SQLException in " << __FILE__;
//cout << "(" << __FUNCTION__ << ") on line " »
// << __LINE__ << endl;
cout << "# ERR: " << e.what();
cout << " (MySQL error code: " << e.getErrorCode();
cout << ", SQLState: " << e.getSQLState() << " )" << endl;
}
cout << endl;
return EXIT_SUCCESS;
}
I compiled using:
sudo g++ -Wall -I/usr/include/cppconn -o testapp mysql_connect1.cpp -L/usr/lib -lmysqlcppconn
it got compiled successfully
when I run:
./testapp
Running 'SELECT 'Hello World!' AS _message'...
# ERR: SQLException in mysql_connect1.cpp# ERR:
Can't connect to MySQL server on '127.0.0.1' (111)
(MySQL error code: 2003, SQLState: HY000 )
I get the above error
I made the below changes as suggested here:
run the command vim /etc/mysql/my.cnf
comment bind-address = 127.0.0.1 using the # symbol
restart your mysql server once.
But it still didn't work
output of:
root#knils-HP:/home/knils# sudo netstat -tap | grep mysql
tcp 0 0 *:mysql *:* LISTEN 11840/mysqld
root#knils-HP:/home/knils#
Please can someone help me with this?
Are you sure your server runs on 11840 and the user account you are using is allowed to connect from localhost?
con = driver->connect("tcp://127.0.0.1:11840", "root", "n");
Your console output that more looked like a processID or internal port for me.
MySQL usually runs on 3306 except you changed that on your own.
Can you try:
con = driver->connect("tcp://127.0.0.1:3306", "root", "n");
?

RegEx match for multiline in Perl

I have following data line i need to parse in Perl:
my $string='Upper Left ( 440720.000, 3751320.000) (117d38\'28.21"W, 33d54\'8.47"N)';
Here is my perl script:
if ($string=~ m/Upper Left\s+[(]\s+\d{1,6}[.]\d{1,3}[\,]\s+\d{1,6}[.]\d{1,3}[)]\s+[(](\d{1,3})d(\d{1,2})['](\d{1,2})[.](\d{1,2})/ig) {
$upperLeft="lat=". $1. 'd'. $2. "'". $3. ".". $4. '"W long='. $5. 'd'. $6. "'". $7. ".". $8. '"W';
print $upperLeft. "\n";
}
However this expression fails to 117d38'28.21" as lat and 33d54'8.47 as long. Note the space and '(' in the input $string which i use to create this regular expression.
What I am I doing wrong in extracting (117d38'28.21"W, 33d54'8.47"N) into 8 fields? Any help is appreciated.
You had several issues. The main being your regex just parsing up to lat, not lon.
What changed:
m/Upper Left\s+[(]\s+\d{1,6}[.]\d{1,3}[\,]\s+\d{1,6}[.]\d{1,3}[)]\s+[(](\d{1,3})d(\d{1,2})['](\d{1,2})[.](\d{1,2})/ig
m/Upper Left\s+[(]\s+\d{1,6}[.]\d{1,3}[\,]\s+\d{1,7}[.]\d{1,3}[)]\s+[(](\d{1,3})d(\d{1,2})['](\d{1,2})[.](\d{1,2})"([WE])[\,]\s(\d{1,3})d(\d{1,2})['](\d{1,2})[.](\d{1,2})"([NS])/ig
^-- Your test number was 7-digit big ^-- (1) ^-- (2) ^-- (3)
At the ending: (1) added group to deal with W/E (([WE])). (2) Added groups to extract lon number. (3) Added group to deal with N/S (([NS])).
Your code, corrected:
if ($string=~ m/Upper Left\s+[(]\s+\d{1,6}[.]\d{1,3}[\,]\s+\d{1,7}[.]\d{1,3}[)]\s+[(](\d{1,3})d(\d{1,2})['](\d{1,2})[.](\d{1,2})"([WE])[\,]\s(\d{1,3})d(\d{1,2})['](\d{1,2})[.](\d{1,2})"([NS])/ig) {
$upperLeft = "lat=" . $1 . 'd' . $2 . "'" . $3 . "." . $4 . '"' . $5 . " long=" . $6 . 'd' . $7 . "'" . $8 . "." . $9 . '"' . $10;
print $upperLeft. "\n";
}
Output:
lat=117d38'28.21"W long=33d54'8.47"N

Importing XPM graphics into an HTML5 canvas

Is this possible?
I am trying to port an old professor's demo-game into a web-playable format for fun, and he had setup all the graphics in the XPM format.
Is there some way to load XPM files directly into an HTML5 canvas? I could probably get by with loading them into an image editor and converting...but I'd rather stay as true to the original source as possible.
You could probably write some sort of parser for XPM in JavaScript and render canvas pixels using a similar approach to this question, however I think it'd be more efficient just to use something like ImageMagick and do a one off conversion:
mogrify -format png *.xpm
I made a little plugin to do this, there's a lot to improve but maybe it can help you... you can see the demo here: http://cortezcristian.com.ar/xpm2canvas/
You can also play with the demo in this fiddle: http://jsfiddle.net/crisboot/aXt3G/
<script src="./js/libs/jquery-1.7.1.min.js"></script>
<script src="./js/jquery.xpm2canvas.js"></script>
<script>
var pseudoXMP = [
/* <Values> */
/* <width/cols> <height/rows> <colors> <char on pixel>*/
"40 40 6 1",
/* <Colors> */
" c none",
". c #ffffff",
"X c #dadab6",
"o c #6c91b6",
"O c #476c6c",
"+ c #000000",
/* <Pixels> */
" ",
" ",
" ",
" . .X..XX.XX X ",
" .. .....X.XXXXXX XX ",
" ... ....X..XX.XXXXX XXX ",
" .. ..........X.XXXXXXXXXXX XX ",
" .... ........X..XX.XXXXXXXXX XXXX ",
" .... ..........X.XXXXXXXXXXX XXXX ",
" ooOOO..ooooooOooOOoOOOOOOOXX+++OO++ ",
" ooOOO..ooooooooOoOOOOOOOOOXX+++OO++ ",
" ....O..ooooooOooOOoOOOOOOOXX+XXXX++ ",
" ....O..ooooooooOoOOOOOOOOOXX+XXXX++ ",
" ..OOO..ooooooOooOOoOOOOOOOXX+++XX++ ",
" ++++..ooooooooOoOOOOOOOOOXX+++ +++ ",
" +++..ooooooOooOOoOOOOOOOXX+++ + ",
" ++..ooooooooOoOOOOOOOOOXX+++ ",
" ..ooooooOooOOoOOOOOOOXX+++ ",
" ..ooooooooOoOOOOOOOOOXX+++ ",
" ..ooooooOooOOoOOOOOOOXX+++ ",
" ..ooooooooOoOOOOOOOOOXX+++ ",
" ..oooooOooOOoOOOOOOXX+++ ",
" ..oooooooOoOOOOOOOOXX+++ ",
" ..ooooOooOOoOOOOOXX+++ ",
" ..ooooooOoOOOOOOOXX++++ ",
" ..o..oooOooOOoOOOOXX+XX+++ ",
" ...o..oooooOoOOOOOXX++XXX++ ",
" ....OO..ooOooOOoOOXX+++XXXX++ ",
" ...oo..+..oooOoOOOXX++XXooXXX++ ",
" ...ooo..++..OooOOoXX+++XXooOXXX+ ",
" ..oooOOXX+++....XXXX++++XXOOoOOXX+ ",
" ..oooOOXX+++ ...XXX+++++XXOOooOXX++ ",
" ..oooOXXX+++ ..XX+++ +XXOOooOXX++ ",
" .....XXX++++ XXXXXXX++ ",
" ....XX++++ XXXXXXX+ ",
" ...XX+++ XXXXX++ ",
" ",
" ",
" ",
" "];
$(document).ready(function(){
$('#xmp2canvas').xpm2canvas({xpm:pseudoXMP});
});
</script>
IIRC, the rendering context for a canvas element in such a context relies on manipulating the src attribute of an embedded img tag. As such, presumably XPM files only stand a chance of working if the browser in question supports them.
The best way to check this would be to test it. The accepted answer for this question contains some code that should help:
importing image on canvas html5