In C, how can one convert HTML strings to C strings? - html

Is there a common routine or library available?
e.g. ' has to become '.

This isn't particularly hard, assuming you only care about &#xx; style entities. The bare-bones, let-everyone-else-worry-about-the-memory-management, mechanical, what's-a-regex way:
int hex_to_value(char hex) {
if (hex >= '0' && hex <= '9') { return hex - '0'; }
if (hex >= 'A' && hex <= 'F') { return hex - 'A' + 10; }
if (hex >= 'a' && hex <= 'f') { return hex - 'f' + 10; }
return -1;
}
void unescape(char* dst, const char* src) {
// Write the translated version of the text at 'src', to 'dst'.
// All sequences of '&#xx;', where x is a hex digit, are replaced
// with the corresponding single byte.
enum { NONE, AND, AND_HASH, AND_HASH_EX, AND_HASH_EX_EX } mode;
char first_hex, second_hex, translated;
mode m = NONE;
while (*src) {
char c = *src++;
switch (m) {
case NONE:
if (c == '&') { m = AND; }
else { *dst++ = c; m = NONE; }
break;
case AND:
if (c == '#') { m = AND_HASH; }
else { *dst++ = '&'; *dst++ = c; m = NONE; }
break;
case AND_HASH:
translated = hex_to_value(c);
if (translated != -1) { first_hex = c; m = AND_HASH_EX; }
else { *dst++ = '&'; *dst++ = '#'; *dst++ = c; m = NONE; }
break;
case AND_HASH_EX:
translated = hex_to_value(c);
if (translated != -1) {
second_hex = c;
translated = hex_to_value(first_hex) << 4 | translated;
m = AND_HASH_EX_EX;
} else {
*dst++ = '&'; *dst++ = '#'; *dst++ = first_hex; *dst++ = c;
m = NONE;
}
break;
case AND_HASH_EX_EX:
if (c == ';') { *dst++ = translated; }
else {
*dst++ = '&'; *dst++ = '#';
*dst++ = first_hex; *dst++ = second_hex; *dst++ = c;
}
m = NONE;
break;
}
}
}
Tedious, and way more code than seems reasonable, but not hard :)

I'd try to parse the number out from the string and then convert it to a number using atoi and then cast it to a character.
This is something I wrote in ~20 seconds so it's completely contrived:
char html[] = "'";
char* pch = &html[2];
int n = 0;
char c = 0;
pch[2] = '\0';
n = atoi(pch);
c = n;
now c is '. Also I don't really know about html strings... so I might be missing something

There is "GNU recode" - command line program and a library.
http://recode.progiciels-bpi.ca/index.html
Among other things it can encode/decode HTML characters.

Related

C# How to count how many anagrams are in a given string

I have to calculate how many anagrams are in a given word.
I have tried using factorial, permutations and using the posibilities for each letter in the word.
This is what I have done.
static int DoAnagrams(string a, int x)
{
int anagrams = 1;
int result = 0;
x = a.Length;
for (int i = 0; i < x; i++)
{ anagrams *= (x - 1); result += anagrams; anagrams = 1; }
return result;
}
Example: for aabv I have to get 12; for aaab I have to get 4
As already stated in a comment there is a formula for calculating the number of different anagrams
#anagrams = n! / (c_1! * c_2! * ... * c_k!)
where n is the length of the word, k is the number of distinct characters and c_i is the count of how often a specific character occurs.
So first of all, you will need to calculate the faculty
int fac(int n) {
int f = 1;
for (int i = 2; i <=n; i++) f*=i;
return f;
}
and you will also need to count the characters in the word
Dictionary<char, int> countChars(string word){
var r = new Dictionary<char, int>();
foreach (char c in word) {
if (!r.ContainsKey(c)) r[c] = 0;
r[c]++;
}
return r;
}
Then the anagram count can be calculated as follows
int anagrams(string word) {
int ac = fac(word.Length);
var cc = countChars(word);
foreach (int ct in cc.Values)
ac /= fac(ct);
return ac;
}
Answer with Code
This is written in c#, so it may not apply to the language you desire, but you didn't specify a language.
This works by getting every possible permutation of the string, adding every copy found in the list to another list, then removing those copies from the original list. After that, the count of the original list is the amount of unique anagrams a string contains.
private static List<string> anagrams = new List<string>();
static void Main(string[] args)
{
string str = "AAAB";
char[] charArry = str.ToCharArray();
Permute(charArry, 0, str.Count() - 1);
List<string> copyList = new List<string>();
for(int i = 0; i < anagrams.Count - 1; i++)
{
List<string> anagramSublist = anagrams.GetRange(i + 1, anagrams.Count - 1 - i);
var perm = anagrams.ElementAt(i);
if (anagramSublist.Contains(perm))
{
copyList.Add(perm);
}
}
foreach(var copy in copyList)
{
anagrams.Remove(copy);
}
Console.WriteLine(anagrams.Count);
Console.ReadKey();
}
static void Permute(char[] arry, int i, int n)
{
int j;
if (i == n)
{
var temp = string.Empty;
foreach(var character in arry)
{
temp += character;
}
anagrams.Add(temp);
}
else
{
for (j = i; j <= n; j++)
{
Swap(ref arry[i], ref arry[j]);
Permute(arry, i + 1, n);
Swap(ref arry[i], ref arry[j]); //backtrack
}
}
}
static void Swap(ref char a, ref char b)
{
char tmp;
tmp = a;
a = b;
b = tmp;
}
Final Notes
I know this isn't the cleanest, nor best solution. This is simply one that carries the best across the 3 object oriented languages I know, that's also not too complex of a solution. Simple to understand, simple to change languages, so it's the answer I've decided to give.
EDIT
Here's the a new answer based on the comments of this answer.
static void Main(string[] args)
{
var str = "abaa";
var strAsArray = new string(str.ToCharArray());
var duplicateCount = 0;
List<char> dupedCharacters = new List<char>();
foreach(var character in strAsArray)
{
if(str.Count(f => (f == character)) > 1 && !dupedCharacters.Contains(character))
{
duplicateCount += str.Count(f => (f == character));
dupedCharacters.Add(character);
}
}
Console.WriteLine("The number of possible anagrams is: " + (factorial(str.Count()) / factorial(duplicateCount)));
Console.ReadLine();
int factorial(int num)
{
if(num <= 1)
return 1;
return num * factorial(num - 1);
}
}

Fuzzy matching of an OCR output in text file

I have a question regarding partial match of two strings.
I have a string and I need to validate it. To be more specific, I have an output from OCR reading and it contains some mistakes, of course. I need to check if the string is really there but as it can be written incorrectly I need only 70% match.
Is it possible to do that in UiPath? The string is in notepad (.txt) so any idead would be helpful.
Try passing OCR output/words_detected against a base word.(double fuzzyness is 0-1)
list<string> Search(string word, list<string> wordList, double fuzzyness) {
list<string> foundWords;
for (string s : wordList) {
int levenshteinDistance = LevenshteinDistance(word, s);
int length = max(word.length(), s.length());
double score = 1.0 - (double)levenshteinDistance / length;
if (score > fuzzyness) foundWords.push_back(s);
}
if (foundWords.size() > 1) {
for (double d = fuzzyness; ; d++) {
foundWords = Search(word, wordList, d);
if (foundWords.size() == 1) break;
}
}
return foundWords;}
int LevenshteinDistance(string src, string dest) {
std::vector<vector<int>> d;
d.resize((int)src.size() + 1, std::vector<int>((int)dest.size() + 1, 0));
int i, j, cost;
std::vector<char> str1(src.begin(), src.end());
std::vector<char> str2(dest.begin(), dest.end());
for (i = 0; i <= str1.size(); i++) d[i][0] = i;
for (j = 0; j <= str2.size(); j++) d[0][j] = j;
for (i = 1; i <= str1.size(); i++) {
for (j = 1; j <= str2.size(); j++) {
if (str1[i - 1] == str2[j - 1]) cost = 0;
else cost = 1;
d[i][j] = min(d[i - 1][j] + 1, min(d[i][j - 1] + 1, d[i - 1][j - 1] + cost));
if ((i > 1) && (j > 1) && (str1[i - 1] == str2[j - 2]) && (str1[i - 2] == str2[j - 1])) d[i][j] = min(d[i][j], d[i - 2][j - 2] + cost);
}
}
return d[str1.size()][str2.size()];}

using shared memory in cuda gives memory write error

I had a kernel which works fine as
__global__ static void CalcSTLDistance_Kernel(Integer ComputeParticleNumber)
{
const Integer TID = CudaGetTargetID();
const Integer ID = TID;
if(ID >= ComputeParticleNumber)
{
return ;
}
CDistance NearestDistance;
Integer NearestID = -1;
NearestDistance.Magnitude = 1e8;
NearestDistance.Direction = make_Scalar3(0,0,0);
if(c_daOutputParticleID[ID] < -1)
{
c_daSTLDistance[ID] = NearestDistance;
c_daSTLID[ID] = NearestID;
return;
}
Scalar3 TargetPosition = c_daParticlePosition[ID];
Integer TriangleID;
Integer CIDX, CIDY, CIDZ;
Integer CID = GetCellID(&CONSTANT_BOUNDINGBOX,&TargetPosition,CIDX, CIDY, CIDZ);
Integer Range = 1;
if(CID >=0 && CID < c_CellNum)
{
for(Integer k = -Range; k <= Range; ++k)
{
for(Integer j = -Range; j <= Range; ++j)
{
for(Integer i = -Range; i <= Range; ++i)
{
Integer MCID = GetCellID(&CONSTANT_BOUNDINGBOX,CIDX +i, CIDY + j,CIDZ + k);
if(MCID < 0 || MCID >= c_CellNum)
{
continue;
}
unsigned int TriangleNum = c_daCell[MCID].m_TriangleNum;
for(unsigned int l = 0; l < TriangleNum; ++l)
{
TriangleID = c_daCell[MCID].m_TriangleID[l];
if( TriangleID >= 0 && TriangleID < c_TriangleNum && TriangleID != NearestID)// No need to calculate again for the same triangle
{
CDistance Distance ;
Distance.Magnitude = CalcDistance(&c_daTriangles[TriangleID], &TargetPosition, &Distance.Direction);
if(Distance.Magnitude < NearestDistance.Magnitude)
{
NearestDistance = Distance;
NearestID = TriangleID;
}
}
}
}
}
}
}
c_daSTLDistance[ID] = NearestDistance;
c_daSTLID[ID] = NearestID;
}
here c_daParticlePosition is constant memory float3 data type . so here I want to use shared memory so I tried to create float3 type shared memory and tried to copy constant date to shared memory however it shows unknown error and with cuda-memcheck it says
here thread number is 255 with 2 block size
shared_memory code
__global__ static void CalcSTLDistance_Kernel(Integer ComputeParticleNumber)
{
//const Integer TID = CudaGetTargetID();
const Integer ID =CudaGetTargetID();
extern __shared__ float3 s[];
/*if(ID >= ComputeParticleNumber)
{
return ;
}*/
s[ID] = c_daParticlePosition[ID];
__syncthreads();
CDistance NearestDistance;
Integer NearestID = -1;
NearestDistance.Magnitude = 1e8;
NearestDistance.Direction.x = 0;
NearestDistance.Direction.y = 0;
NearestDistance.Direction.z = 0;//make_Scalar3(0,0,0);
//if(c_daOutputParticleID[ID] < -1)
//{
// c_daSTLDistance[ID] = NearestDistance;
// c_daSTLID[ID] = NearestID;
// return;
//}
//Scalar3 TargetPosition = c_daParticlePosition[ID];
Integer TriangleID;
Integer CIDX, CIDY, CIDZ;
Integer CID = GetCellID(&CONSTANT_BOUNDINGBOX,&s[ID],CIDX, CIDY, CIDZ);
if(CID >=0 && CID < c_CellNum)
{
//Integer Range = 1;
for(Integer k = -1; k <= 1; ++k)
{
for(Integer j = -1; j <= 1; ++j)
{
for(Integer i = -1; i <= 1; ++i)
{
Integer MCID = GetCellID(&CONSTANT_BOUNDINGBOX,CIDX +i, CIDY + j,CIDZ + k);
if(MCID < 0 || MCID >= c_CellNum)
{
continue;
}
unsigned int TriangleNum = c_daCell[MCID].m_TriangleNum;
for(unsigned int l = 0; l < TriangleNum; ++l)
{
TriangleID = c_daCell[MCID].m_TriangleID[l];
/*if(c_daTrianglesParameters[c_daTriangles[TriangleID].ModelIDNumber].isDrag)
{
continue;
}*/
if( TriangleID >= 0 && TriangleID < c_TriangleNum && TriangleID != NearestID)// No need to calculate again for the same triangle
{
CDistance Distance ;
Distance.Magnitude = CalcDistance(&c_daTriangles[TriangleID], &s[ID], &Distance.Direction);
if(Distance.Magnitude < NearestDistance.Magnitude)
{
NearestDistance = Distance;
NearestID = TriangleID;
}
}
}
}
}
}
}
c_daSTLDistance[ID] = NearestDistance;
c_daSTLID[ID] = NearestID;
}
error
Invalid __shared__ write of size 4
========= at 0x00000128 in CalcSTLDistance_Kernel(int)
========= by thread (159,0,0) in block (0,0,0)
========= Address 0x0000077c is out of bounds
You may find useful info on how to work with shared memory in this article. Focus especially on static shared memory and dynamic shared memory sections.
Based on above article you should find out that you are simply writing out of bounds of your array s, exactly as the error message says. To fix the issue you can:
either specify the size of shared memory array s at compile time,
if you know it in advance, such as __shared__ float3 s[123456];
or use dynamically sized s array, thats basically what you are doing at the moment, but ALSO specify the third kernel launch parameter as CalcSTLDistance_Kernel<<<gridSize, blockSize, sharedMemorySizeInBytes>>>. In case you will be using an array of 123456 float3s then use int sharedMemorySizeInBytes = 123456 * sizeof(float3)

Divide function

I need to write the divide function in the Jack language.
my code is:
function int divide(int x, int y) {
var int result;
var boolean neg;
let neg = false;
if(((x>0) & (y<0)) | ((x<0) & (y>0))){
let neg = true;
let x = Math.abs(x);
let y = Math.abs(y);
}
if (y>x){
return 0;
}
let result = Math.divide(x, y+y);
if ((x-(2*result*y)) < y) {
if (neg){
return -(result + result);
} else {
return (result + result);
}
} else {
if (neg){
return -(result + result + 1);
} else {
return (result + result + 1);
}
}
}
this algorithm is sub-optimal since each multiplication operation also requires O(n) addition and subtraction operations.
Can I compute the product 2*result*y without any multiplication?
Thanks
Here's an implementation of (unsigned) restoring division (x/y), I don't actually know Jack though so I'm not 100% sure about this
var int r;
let r = 0;
var int i;
let i = 0;
while (i < 16)
{
let r = r + r;
if ((x & 0x8000) = 0x8000) {
let r = r + 1;
}
if ((y ^ 0x8000) > (r ^ 0x8000)) { // this is an unsigned comparison
let x = x + x;
}
else {
let r = r - y;
let x = x + x + 1;
}
let i = i + 1;
}
return x;
You should be able to turn that into signed division.

An error with dec to bin

I have been debugging this function but I don't know why is it throwing 99 when I send 4 to the function.
This is a function to covert from decimal to binary.
Actually, I have tried to cout exp, res and the other variables in each step and then multiply them but I don't know. It doesn't make sense.
int DecToBinary(long num) {
if(num == 0) {
return 0;
}
else if(num == 1) {
return 1;
}
int exp = 0;
int res = 0;
for (; num != 0; exp++){
res = res+num%2*pow(10,exp);
num = num/2;
}
return res;
}
Thank you guys.
if(num == 0) {
return 0;
}
else if(num == 0) {
return 1;
}
You know the second branch will never be executed, right?
Furthermore:
pow(10,exp);
this yields a floating-point number. Be prepared for rounding errors. Even better: don't use pow() at all (you don't need floating-point numbers for working with integers). Simply do the division step by step, accumulating the result in a variable.
int dec2bin(int n)
{
int r = 0, tp = 1;
while (n) {
r += (n % 2) * tp;
n >>= 1;
tp *= 10;
}
return r;
}