Internationalization: Are IP Addresses entered in the same format for all cultures? - language-agnostic

I know that IPv4 addresses are written using the dot-decimal notation. What I don't know is if each of the numeric characters entered for an IP Address is always an Arabic numeral (such as 1,2,3...) or if a non-ASCII numeral character is generally accepted for IP Address input?
For instance, if I had a IPv4 address input field localized for a Chinese culture, would it be reasonable for me to only expect Arabic numerals to be entered for each octet? Should I also expect non-ASCII characters that are also numeric?
This field would be purely for IP Address entry and would not accept host names.

Unicode and ASCII characters are the same for numerics. So yes, IP address should be the same format for all cultures.
The dotted-octet format is an international standard, this should not deviate based on culture.

a. The dot notation for numeric IP addresses is just shorthand / easy read for a direct binary equivalent.
b. It's a programmatic identifier, not a URI/text based field - so no question of internationalization.
So yes, arabic numerals all the way.
I think you would be in trouble the day you could type C# (or anything else) in another language and still have it work seamlessly. Ex. HelloWorld() and नमस्तेविश्व() -- Hindi
But so will we all :)
EDIT: Saw the comment about decimal points - don't let it be an input. Again since it's a programmatic identifier, so decimals are a fixed part of the notation.

IP Address format is standardized and must be the same for all cultures, since network devices actually depends on this exact format. So the obvious answer is, yes you might expect regular numbers and dots only.
There are some other conventions as well, for instance you might use octal IP Addresses, but frankly, I don't see any reason to accept anything but typical IP Address format.

Related

Ways MySQL/MariaDB could silently be changing values when storing

I'm searching for cases in MySQL/MariaDB where the value transmitted when storing will differ from the value that can be retrieved later on. I'm only interested in fields with non-binary string data types like VARCHAR and *TEXT.
I'd like to get a more comprehensive understanding on how much a stored value can be trusted. This would especially be interesting for cases where the output just lacks certain characters (like with the escape character example below) as this is specifically dangerous when validating.
So, this boils down to: Can you create an input string (and/or define an environment) where this doesn't output <value> in the second statement?
INSERT INTO t SET v = <value>, id = 1; // success
SELECT v FROM t WHERE id = 1;
Things I can think of:
strings containing escaping (\a → a)
truncated if too long
character encoding of the table not supporting the input
If something fails silently probably also depends on how strict the SQL mode is set (like with the last two examples).
Thanks a lot in advance for your input!
you can trust that all databases do, what the standards purpose, with strings and integer it is simple, because it saves the binary representation of that number or character in your choosen character set.
Decimal Double and single values are different, because the can't be saved directly and so it comes to fractals see decimal representation
That also follows standards, but you have to account with it.

Convert comma to dot in Python or MySQL

I have a Python script which collects data and sends it to my MySQL table.
I noticed that the "Cost" sometimes is 0,95 which results in 0 in my table since my table use "0.95" instead of "0,95".
I assume the best solution is to convert the , to . in my Python script by using:
variable.replace(",", ".")
However, couldn't one solution be to change format in my MySQL table? So that I store numbers in this format:
1100
0,95
0,1
150000
My Django Model
cost = models.DecimalField(max_digits=10, decimal_places=4, default=None)
Any feedback on how to best solve this issue?
Thanks
Your first instinct is correct: convert the "unusual" (comma-decimal) input into the standard format that MySQL used by default (dot-decimal) at the first point where you receive it.
there's lots of ways to write numbers
Be careful, though that you don't get stung by people using commas as thousands separators like "3,203,907.23", or the European form "3.203.907,23", the Swiss "3'203'907,23' or even this form, which is widely used in India: "32,03,907.71" (yes, I did mean to type only two digits there!)
To make your life easier, the rule for currencies is relatively simple:
where a dot or comma is followed by only two digits at the end of the string, that character is acting as the decimal separator.
Once you know which is the decimal separator, you can safely remove all other non-digits from the string, change the decimal separator you found to . then use any standard library string-to-number conversion.
Storage format isn't presentation format
Yes, you can tell MySQL to use comma as its decimal separator, but doing that will break so much of your code - including the parts of the framework that read from the database and expect dot-decimal numbers - that you'll regret doing it that way very quickly...
There's a general principle at work here: you should do your data storage and processing using a format that is easy to process, interchangeable with other systems, and understood by other software developers.
Consider what happens if you need to allow a different framework to access your MySQL database to generate reports... whoever develops that software (and it may be you) will be glad that the numbers are all stored the way numbers are "always" stored in databases.
Convert on the way in, re-convert on the way out
Where you need to accept input in a different format, convert that input into your standardised format as early as possible.
When you need to use an output format, do the conversion to that format as late as possible.
The idea is to keep as much of your system "unexceptional" as possible. A programmer who has to remember what numeric format will in force at the time when a given method is called is not a happy programmer.
P.S.
The option you're talking about in MySQL is an example of this pattern: it doesn't change how numeric data is stored. All that changes is how you pass numbers to MySQL and how it presents them back to you.

What is the likely meaning of this character sequence? A&#C

I'm working on an application that imports data from a CSV file. I am told that the data in the CSV file comes from SAP, which I am totally unfamiliar with.
My client indicates that there is an issue. One column of data in the CSV file contains postal addresses. Sometimes, the system doesn't see a valid address. Here is a slightly fictionalized example:
1234 MAIN ST A&#C HOUSTON
As you can see, there is a street number, a street name, and a city, all in capital letters. There is no state or zip code specified. In the CSV file, all addresses are assumed to be in the same state.
Normally, where there is text between the street name and city, it is an apartment number or letter. In the above example, we get errors when we try to use the address with other services, such as Google geolocation. One suggested fix is to simply strip out there special characters, but I believe that there must be a better way.
I want to know what this A&#C means. It looks like some sort of escape sequence, but it isn't in a format I'm familiar with. Please tell me what these strange character sequence means.
I'm not totally sure, but I doubt there's a "canonical" escape sequence that looks like this. In the ABAP environment, # is used to replace non-printable characters. It might be that the data was improperly sanitized when importing into the SAP system in the first place, and when writing to the output file, some non-printable character was replaced by #. Another explanation might be that one of the field contained a non-ASCII unicode character (like,   ) and the export program failed to convert that to the selected target codepage. It's hard to tell without examining the actual source dataset. Of course, it might also be some programming error or a weird custom field separator...

MySql collecting database only English but not other

I have a comment section and submission form that any of my member can submit.
If my member post in English I will receive an email update and the comment will be post no problem in English. But if they use other than English such an example of Thai language. Then what happen all the words let say for example สวัสดี it will appear as ??????
I don't know why, but I went to check on my php.ini file and the unicode/encoded setted to UTF8 and also on the MySQL collation setted to UTF8 as well. I make sure the meta setted to UTF8 as well on the .html/.php files, but still causing the same problem.
Any suggestion what else I missed to configure?
Make sure you are using multibyte safe string functions or you might be losing your UTF-8 encoding.
From the PHP mbstring manual:
While there are many languages in
which every necessary character can be
represented by a one-to-one mapping to
an 8-bit value, there are also several
languages which require so many
characters for written communication
that they cannot be contained within
the range a mere byte can code (A byte
is made up of eight bits. Each bit can
contain only two distinct values, one
or zero. Because of this, a byte can
only represent 256 unique values (two
to the power of eight)). Multibyte
character encoding schemes were
developed to express more than 256
characters in the regular bytewise
coding system.
When you manipulate (trim, split,
splice, etc.) strings encoded in a
multibyte encoding, you need to use
special functions since two or more
consecutive bytes may represent a
single character in such encoding
schemes. Otherwise, if you apply a
non-multibyte-aware string function to
the string, it probably fails to
detect the beginning or ending of the
multibyte character and ends up with a
corrupted garbage string that most
likely loses its original meaning.
mbstring provides multibyte specific
string functions that help you deal
with multibyte encodings in PHP. In
addition to that, mbstring handles
character encoding conversion between
the possible encoding pairs. mbstring
is designed to handle Unicode-based
encodings such as UTF-8 and UCS-2 and
many single-byte encodings for
convenience
I just found out that what is causing the problem
in php.ini
line mbstring.internal_encodingit was setted to something else so I setted it to UTF-8 then magical! now everything worked!

Querying Chinese addresses in Googlemap API geocoding

I'm following the demo code from article of phpsqlgeocode.html
In the db, I inserted some Chinese addresses, which are utf-8 encoded. I
found after urlencode the Chinese address, the output of the address
would be wrong. Like this one:
http://maps.google.com.tw/maps/geo?output=csv&key=ABQIAAAAfG3KxFZXjEslq8VNxMBpKRR08snBovzCxLQZ9DWwpnzxH-ROPxSAS9Q36m-6OOy0qlwTL6Ht9qp87w&q=%3F%3F%3F%3F%3F%3F%3F%3F%3F132%3F
Then it outputs 200,5,59.3266963,18.2733433 (I can't query this through PHP, but through the browser instead).
This address is actually located in Taichung, Taiwan, but it turns out to be
in Sweden, Europe. But when I paste the Chinese address(such as 台中市西屯區智惠
街131巷56號58號60號) in the url, the result turns out to be fine!
How do I make sure it sends out the original Chinese
address? How do I avoid urlencode()? I found that removing urlencode() doesn't change anything.
(I've change the MAPS_HOST from maps.google.com to
maps.google.com.tw.)
(I'm sure my key is right, and other English address geocoding are
fine.)
q=%3F%3F%3F%3F%3F%3F%3F%3F%3F132%3F
decodes to:
?????????132?
so something has corrupted the string already before URL-encoding. This could happen if you try to convert the Chinese characters into an encoding that doesn't support Chinese characters, such as Latin-1.
You need to ensure that you're using UTF-8 consistently through your application. In particular you will need to ensure the tables in the database are stored using a UTF-8 character set; in MySQL terms, a UTF-8 collation. The default collation for MySQL otherwise is Latin-1. You'll also want to ensure your connection to the database uses UTF-8 by calling 1mysql_set_charset('utf-8')`.
(I am guessing from your question that you're using PHP.)