Convert text string to binary ASCII in an Arduino sketch - binary

I am writing a sketch for Arduino that aims to convert a text string into binary 7-bit or 8-bit ASCII. For example, "Hello World" would become this 8-bit ASCII binary stream:
0100100001100101011011000110110001101111001000000111011101101111011100100110110001100100
As you can see, this is standard 7-bit ASCII padded with zeros to make it 8-bit ASCII. I don't mind which bit length I use as long as it's consistent once I've started. I've spent a couple of hours trying to work out a method to achieve that to no avail. The closest I have is something like this:
char text[] = "Hello world";
which when printed to the monitor like this:
Serial.println(text[0], BIN);
Gives me 1001000. However, this isn't at all padded (so "0" would simply be 0, not 0000000) and obviously this doesn't provide me with anything to work with, just something to look at! Does anyone have any advice for me?

You can use this as a starting point:
char inputChar = 'H';
// This will 'output' the binary representation of 'inputChar' as 8 characters of '1's and '0's, MSB first.
for ( uint8_t bitMask = 128; bitMask != 0; bitMask = bitMask >> 1 ) {
if ( inputChar & bitMask ) {
output('1');
} else {
output('0');
}
}

Related

What encoding Facebook uses in JSON files from data export?

I've used the Facebook feature to download all my data. The resulting zip file contains meta information in JSON files. The problem is that unicode characters in strings in these JSON files are escaped in a weird way.
Here's an example of such a string:
"nejni\u00c5\u00be\u00c5\u00a1\u00c3\u00ad bod: 0 mnm Ben\u00c3\u00a1tky\n"
When I try parse the string for example with javascript's JSON.parse() and print it out I get:
"nejnižší bod: 0 mnm Benátky\n"
While it should be
"nejnižší bod: 0 mnm Benátky\n"
I can see that \u00c5\u00be should somehow correspond to ž but I can't figure out the general pattern.
I've been able to figure out these characters so far:
'\u00c2\u00b0' : '°',
'\u00c3\u0081' : 'Á',
'\u00c3\u00a1' : 'á',
'\u00c3\u0089' : 'É',
'\u00c3\u00a9' : 'é',
'\u00c3\u00ad' : 'í',
'\u00c3\u00ba' : 'ú',
'\u00c3\u00bd' : 'ý',
'\u00c4\u008c' : 'Č',
'\u00c4\u008d' : 'č',
'\u00c4\u008f' : 'ď',
'\u00c4\u009b' : 'ě',
'\u00c5\u0098' : 'Ř',
'\u00c5\u0099' : 'ř',
'\u00c5\u00a0' : 'Š',
'\u00c5\u00a1' : 'š',
'\u00c5\u00af' : 'ů',
'\u00c5\u00be' : 'ž',
So what is this weird encoding? Is there any known tool that can correctly decode it?
The encoding is valid UTF-8. The problem is, JavaScript doesn't use UTF-8, it uses UTF-16. So you have to convert from the valid UTF-8, to JavaScript UTF-16:
function decode(s) {
let d = new TextDecoder;
let a = s.split('').map(r => r.charCodeAt());
return d.decode(new Uint8Array(a));
}
let s = "nejni\u00c5\u00be\u00c5\u00a1\u00c3\u00ad bod: 0 mnm Ben\u00c3\u00a1tky\n";
s = decode(s);
console.log(s);
https://developer.mozilla.org/docs/Web/API/TextDecoder
You can use a regular expression to find groups of almost unicode characters, decode them into Latin-1 and then encode back into UTF-8
The following code should work in python3.x:
import re
re.sub(r'[\xc2-\xf4][\x80-\xbf]+',lambda m: m.group(0).encode('latin1').decode('utf8'), s)
The JSON file itself is UTF-8, but the strings are UTF-16 characters converted to byte sequences then converted to UTF-8 using escape sequences.
This command fixes a file like this in Emacs:
(defun k/format-facebook-backup ()
"Normalize a Facebook backup JSON file."
(interactive)
(save-excursion
(goto-char (point-min))
(let ((inhibit-read-only t)
(size (point-max))
bounds str)
(while (search-forward "\"\\u" nil t)
(message "%.f%%" (* 100 (/ (point) size 1.0)))
(setq bounds (bounds-of-thing-at-point 'string))
(when bounds
(setq str (--> (json-parse-string (buffer-substring (car bounds)
(cdr bounds)))
(string-to-list it)
(apply #'unibyte-string it)
(decode-coding-string it 'utf-8)))
(setf (buffer-substring (car bounds) (cdr bounds))
(json-serialize str))))))
(save-buffer))
Thanks to Jen's excellent question and Shawn's comment.
Basically facebook seems to take each individual byte of the unicode string representation, then exporting to JSON as if these bytes are individual Unicode code points.
What we need to do is take last two characters of each sextet (e.g. c3 from \u00c3), concatenate them together and read as a Unicode string.
This is how I do it in Ruby (see gist):
require 'json'
require 'uri'
bytes_re = /((?:\\\\)+|[^\\])(?:\\u[0-9a-f]{4})+/
txt = File.read('export.json').gsub(bytes_re) do |bad_unicode|
$1 + eval(%Q{"#{bad_unicode[$1.size..-1].gsub('\u00', '\x')}"}).to_json[1...-1]
end
good_data = JSON.load(txt)
With bytes_re we catch all sequences of bad Unicode characters.
Then for each sequence replace '\u00' with '\x' (e.g. \xc3), put quotes around it " and use Ruby's built-in string parsing so that the \xc3\xbe... strings are converted to actual bytes, that will later remain as Unicode characters in the JSON or properly quoted by the #to_json method.
The [1...-1] is to remove quotes inserted by #to_json
I wanted to explain the code because question is not ruby specific and reader may use another language.
I guess somebody can do it with a sufficiently ugly sed command..
Just adding the general rule how to get from something like '\u00c5\u0098' to 'Ř'. Putting together the last two letters from the \u parts gets you c5 and 98 which are the two bytes of the utf-8 representation. UTF-8 encodes the code point in two bytes like this: 110xxxxx 10xxxxxx, where x are the actual bits of the character code. You can take the two bytes, use & to get the x parts, put them one after the next and read that as a number and you get the 0x158, which is the code for 'Ř'.
My javascript implementation:
function fixEncoding(s) {
var reg = /\\u00([a-f0-9]{2})\\u00([a-f0-9]{2})/gi;
return s.replace(reg, function(a, m1, m2){
b1 = parseInt(m1,16);
b2 = parseInt(m2,16);
var maskedb1 = b1 & 0x1F;
var maskedb2 = b2 & 0x3F;
var result = (maskedb1 << 6) | maskedb2;
return String.fromCharCode(result);
})
}

Denary to binary conversion program

How does this denary to binary program work? I am finding it hard to comprehend what is happening behind the code.
Can someone explain the lines 6 onwards?
Number = int(input("Hello. \n\nPlease enter a number to convert: "))
if Number < 0:
print ("Can't be less than 0")
else:
Remainder = 0
String = ""
while Number > 0:
Remainder = Number % 2
Number = Number // 2
String = str(Remainder) + String
print (String)
The idea is to separate out the last part of the binary number, stick it in a buffer, and then remove it from "Number". The method is general and can be used for other bases as well.
Start by looking at it as a dec -> dec "conversion" to understand the principle.
Let's say you have the number 174 (base10). If you want to parse out each individual piece (read as "digit") of it you can calculate the number modulo the base (10), then do an integer division to "remove" that digit from the number. I.e. 174%10 and 174//10 => (Number) 17|4 (Reminder). Next iteration you have 17 from the division and when you perform the same procedure, it'll split it up into 1|7. On the next iteration you'll get 0|1, and after that "Number" will be 0 (which is the exit condition for the loop (while Number > 0)).
In each iteration of the loop you take the remainder (which will be a single digit for the specific base you use (it's a basic property of how bases work)), convert it to a string and concatenate it with the string you had from previous iterations (note the order in the code!), and you'll get the converted number once you've divided your way down to zero.
As mentioned before, this works for any base; you can use base 16 to convert to hex (though you'll need to do some translations for digits above 9), octal (base 8), etc.
Python code for converting denary into binary
denary= int(input('Denary: '))
binary= [0,0,0,0]
while denary>0:
for n,i in enumerate(binary):
if denary//(2**(3-n))>=1:
binary[n]= 1
denary -= 2**(3-n)
print(denary)
print (binary)

Trying to copy char pointer to "QUERY_STRING" to a char[] variable, getting wrong result

I am working with FastCgi, trying to generate a dynamic html webpage.
I am able to get the QUERY_STRING easily enough, but I am having trouble trying to copy it into a char array.
If there is even a shorter way of just getting the value from QUERY_STRING, please advise because I am a little over my head.
char *queryString = getenv(ENV_VARS[7]);
char newDeviceName[64];
strncpy( newDeviceName, *queryString, sizeof(*queryString) -1);
printf("------- %c ------------", newDeviceName);
This compiles with only warnings, but once i try to load the webpage, the characters are some weird Chinese looking characters. -> �ፙ�
Thank you in advance.
EDIT: More of my code
const char *ENV_VARS[] = {
"DOCUMENT_ROOT",
"HTTP_COOKIE",
"HTTP_HOST",
"HTTP_REFERER",
"HTTP_USER_AGENT",
"HTTPS",
"PATH",
"QUERY_STRING",
"REMOTE_ADDR",
"REMOTE_HOST",
"REMOTE_PORT",
"REMOTE_USER",
"REQUEST_METHOD",
"REQUEST_URI",
"SCRIPT_FILENAME",
"SCRIPT_NAME",
"SERVER_ADMIN",
"SERVER_NAME",
"SERVER_PORT",
"SERVER_SOFTWARE"
};
int main(void)
{
char deviceName[]=ADAPTERNAME;
time_t t;
/* Intializes random number generator */
srand((unsigned) time(&t));
while (FCGI_Accept() >= 0) {
printf("Content-type: text/html \r\n\r\n");
printf("");
printf("<html>\n");
printf("<script src=\"/js/scripts.js\"></script>");
/* CODE CODE CODE */
printf("<p> hi </p>");
printf("<p> hi </p>");
char *queryString = getenv(ENV_VARS[7]);
char newDeviceName[64];
if (queryString == NULL)
printf("<p> +++++ERROR++++++ </p>");
else {
strcpy( newDeviceName, queryString);
newDeviceName[sizeof(newDeviceName) - 1] = 0;
printf("<p> ------- %s ------------ </p> ", newDeviceName);
}
SOLVED: Amateur mistake, for some reason none of my new edits went into effect until after i restart my lighttpd server.
Your program has undefined behavior. Read those warnings issued by the compiler. They're important.
Don't dereference the pointer when you're passing the string to strncpy(). When you do that, you're now passing a single char. That's converted to a pointer when it's given to strncpy() (which is where you probably get your warning, i.e. passing a char to a function that expects a char*).
You also can't get the size of an array that has decayed to a pointer using sizeof. You're just getting the size of the pointer (which is probably either 8 or 4 bytes depending on your system). Since you don't know the length of the string anyway, it might even be better to just use strcpy() instead of strncpy().
Here's what your code probably should look like:
char *queryString = getenv(ENV_VARS[7]);
char newDeviceName[64];
strcpy( newDeviceName, queryString);
printf("------- %s ------------", newDeviceName); /* use %s to print strings */
The length on your strncpy is wrong [too short], the second argument is wrong, and the format string is incorrect.
Try this:
strncpy( newDeviceName, queryString, sizeof(newDeviceName) - 1);
newDeviceName[sizeof(newDeviceName) - 1] = 0;
printf("------- %s ------------", newDeviceName);
In the call to strncpy, it expects a char * for the second argument, but you pass it a char.
Also, the size is not correct. *queryString is a char and has size 1. Using sizeof(queryString) is not correct either because it will return the size of a pointer. What you actually want is the size of the detination buffer.
In the printf call the %c format specifier expects a char but you pass it a char *. You should instead use %s which expects a char * pointing to a null terminated string.
So what you want to do is this:
strncpy( newDeviceName, queryString, sizeof(newDeviceName) -1);
newDeviceName[sizeof(newDeviceName) - 1] = 0;
printf("------- %s ------------", newDeviceName);
What you want is
strncpy(newDeviceName, queryString, sizeof(newDeviceName)-1);
newDeviceName[63] = '\0'; // Guarantee NUL terminator
printf("----- %s -----", newDeviceName);
So multiple problems:
*queryString just gets you the first character, which strncpy tries to treat as a pointer.
sizeof(*queryString) is the size of a char (i.e. 1)
%c prints a single character, not the string

Extracting integers from a query string

I am creating a program that can make mysql transactions through C and html.
I have this query string
query = -id=103&-id=101&-id=102&-act=Delete
Extracting "Delete" by sscanf isn't that hard, but I need help extracting the integers and putting them in an array of int id[]. The number of -id entries can vary depending on how many checkboxes were checked in the html form.
I've been searching for hours but haven't found any applicable solution; or I just did not understand them. Any ideas?
Thanks
You can use strstr and atoi to extract the numbers in a loop, like this:
char *query = "-id=103&-id=101&-id=102&-act=Delete";
char *ptr = strstr(query, "-id=");
if (ptr) {
ptr += 4;
int n = atoi(ptr);
printf("%d\n", n);
for (;;) {
ptr = strstr(ptr, "&-id=");
if (!ptr) break;
ptr += 5;
int n = atoi(ptr);
printf("%d\n", n);
}
}
Demo on ideone.
You want to use strtok or a better solution, to tokenize this string with & and = as tokens.
Take a look at cplusplus.com for more information and an example.
This is the output you would get from strtok
Output:
Splitting string "- This, a sample string." into tokens:
This
a
sample
string
Once you figure out how to split them, the next hurdle is to convert the numbers from strings to ints. For this you need to look at atoi or its safer more robust cousin strtol
Most likely I would write a small lexical scanner to tackle the task. Meaning, I would analyze the string one character at a time, according to a regular expression representing the set of possible inputs.

What do the functions lowByte() and highByte() do?

I've made this small experimental program in Arduino to see how the functions lowByte() and highByte() work. What exactly are they supposed to return when passed a value?
On entering the character '9' in the serial monitor it prints the following:
9
0
218
255
How does that come? Also, the last 2 lines are being printed for all values inputted. Why is this happening?
int i=12;
void setup()
{
Serial.begin(9600);
}
void loop()
{
if(Serial.available())
{
i = Serial.read() - '0'; // conversion of character to number. eg, '9' becomes 9.
Serial.print(lowByte(i)); // send the low byte
Serial.print(highByte(i)); // send the high byte
}
}
If you have this data:
10101011 11001101 // original
// HighByte() get:
10101011
// LowByte() get:
11001101
An int is a 16-bit integer on Arduino. So you are reading the high and low part as a byte.
As the actual buffer is "9\n", that is why the second bit prints out 'funny' numbers due to subtracting the result with '0'.
Serial.print needs to be formatted to a byte output if that's what you want to see.
Try:
Serial.print(lowByte, BYTE)
In addition to Rafalenfs' answer, Should you provide a larger data type:
00000100 10101011 11001101 // original
// HighByte() will NOT return: 00000100, but will return:
10101011
// LowByte() will still return:
11001101
Highbyte() returns the second lowest bit (as specified by the documentation: https://www.arduino.cc/reference/en/language/functions/bits-and-bytes/highbyte/)