Why is "Warning: Implicit string type conversion from AnsiString to UnicodeString" here while both are Strings? - freepascal

Here I get an warning Warning: Implicit string type conversion from "AnsiString" to "UnicodeString"
....
{$mode DelphiUnicode}
{$H+}
....
Function THeader.ToHtml(Constref input: String): String;
Begin
Result := Format('<h%d>%s</h%d>', [FLevel, Chunk(input), FLevel]); // <--- HERE !
End;
My project settings include -MDelphiUnicode. My Lazarus version is 2.2.2.
As I understand it means that if Chunk() returns symbols outside of ASCII (Unicode), then the Result will be problematic. Right? What to do with this warning? Sure, I can cast the Format() result to String. But why is it required? I see that Format's prototype is:
// somewhere in the sysstrh.inc ...
Function Format (Const Fmt : String; const Args : Array of const) : String;
so it already returns a String (which is magically UnicodeString in my case, as I think). What is the problem actually here? And how to work in the correct way with such library functions like Format() (for instance, GetOptionValue() of TCustomApplication)?
ps. I read FreePascal Wiki about Unicode and String types, but I still cannot understand the reason of this warning :)

There are multiple reasons to do so.
The exact codepage of ansistring is under control of the RTL, which can query the OS for it, without the compiler knowing the details. In Lazarus applications this is generally set to utf8, but the compiler doesn't know that.
So calling a ansistring format() could corrupt strings, and repeated conversions are of course also not ideal for a performance.
delphiunicode is a work in progress, and I would not recommend using it (yet) out of habit, only if you really know what you are doing (and by that I mean knowing the state of it in FPC, not that it works in Delphi)
The original plan was to migrate to unicodestring fully, but since Windows now allows UTF8 as native 1-byte codepage (see thick in application tab of project options), the progress on that migration is glacial.
In short, consider arranging your code as much as possible so that string type doesn't matter, and then use utf8 ansistrings in Lazarus for unicode.
Or ignore the warnings, or disable them with some -vn parameter that allows you to disable specific hints/warnings

Related

Is everything really a string in TCL?

And what is it, if it isn't?
Everything I've read about TCL states that everything is just a string in it. There can be some other types and structures inside of an interpreter (for performance), but at TCL language level everything must behave just like a string. Or am I wrong?
I'm using an IDE for FPGA programming called Vivado. TCL automation is actively used there. (TCL version is still 8.5, if it helps)
Vivado's TCL scripts rely on some kind of "object oriented" system. Web search doesn't show any traces of this system elsewhere.
In this system objects are usually obtained from internal database with "get_*" commands. I can manipulate properties of these objects with commands like get_property, set_property, report_property, etc.
But these objects seem to be something more than just a string.
I'll try to illustrate:
> set vcu [get_bd_cells /vcu_0]
/vcu_0
> puts "|$vcu|"
|/vcu_0|
> report_property $vcu
Property Type Read-only Value
CLASS string true bd_cell
CONFIG.AXI_DEC_BASE0 string false 0
<...>
> report_property "$vcu"
Property Type Read-only Value
CLASS string true bd_cell
CONFIG.AXI_DEC_BASE0 string false 0
<...>
But:
> report_property "/vcu_0"
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> report_property {/vcu_0}
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> report_property /vcu_0
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> puts |$vcu|
|/vcu_0|
> report_property [string range $vcu 0 end]
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
So, my question is: what exactly is this "valid first class Tcl object"?
Clarification:
This question might seem like asking for help with Vivado scripting, but it is not. (I was even in doubt about adding [vivado] to tags.)
I can just live and script with these mystic objects.
But it would be quite useful (for me, and maybe for others) to better understand their inner workings.
Is this "object system" a dirty hack? Or is it a perfectly valid TCL usage?
If it's valid, where can I read about it?
If it is a hack, how is it (or can it be) implemented? Where exactly does string end and object starts?
Related:
A part of this answer can be considered as an opinion in favor of the "hack" version, but it is quite shallow in a sense of my question.
A first class Tcl value is a sequence of characters, where those characters are drawn from the Basic Multilingual Plane of the Unicode specification. (We're going to relax that BMP restriction in a future version, but that's not yet in a version we'd recommend for use.) All other values are logically considered to be subtypes of that. For example, binary strings have the characters come from the range [U+000000, U+0000FF], and integers are ASCII digit sequences possibly preceded by a small number of prefixes (e.g., - for a negative number).
In terms of implementation, there's more going on. For example, integers are usually implemented using 64-bit binary values in the endianness that your system uses (but can be expanded to bignums when required) inside a value boxing mechanism, and the string version of the value is generated on demand and cached while the integer value doesn't change. Floating point numbers are IEEE double-precision floats. Lists are internally implemented as an array of values (with smartness for handling allocation). Dictionaries are hash tables with linked lists hanging off each of the hash buckets. And so on. THESE ARE ALL IMPLEMENTATION DETAILS! As a programmer, you can and should typically ignore them totally. What you need to know is that if two values are the same, they will have the same string, and if they have the same string, they are the same in the other interpretation. (Values with different strings can also be equal for other reasons: for example, 0xFF is numerically equal to 255 — hex vs decimal — but they are not string equal. Tcl's true natural equality is string equality.)
True mutable entities are typically represented as named objects: only the name is a Tcl value. This is how Tcl's procedures, classes, I/O system, etc. all work. You can invoke operations on them, but you can only see inside to a limited extent.
Vivado TCL is not TCL. Vivado will not really document their language they call TCL, but refer you to the real TCL language documentation. Where Vivado TCL and TCL differ, you are left on your own without help. TCL was a poor choice for a scripting language given the very large data bases, so they had to bastardize it to get it half functional. You are better off getting help on the Xilinx forums then in general TCL forums. Why they went with TCL rather than python is beyond anyone's comprehension.

Converting an "HTML entity" emoticon code in UTF16 (in c++)

I'm currently writing my own DrawTextEx() function that supports emoticons. Using this function, a callback is called every time an emoticon is found in the text, giving the opportunity to caller to replace the text segment containing the emoticon by an image. For example, the Unicode chars 0x3DD8 0x00DE found in a text will be replaced by a smiling face image while the text is drawn. Actually this function works fine.
Now I want to implement an image library on the caller side. I receive a text segment like 0x3DD8 0x00DE in my callback function, and my idea is to use this code as key in a map containing all the Unicode combinations, every one linked with a structure containing the image to draw. I found a good package on the http://emojione.com/developers/ website. All the packages available on this site contain several file names, that is an hexadecimal code. So I can iterate through the files contained in the package, and create my map in an automatic way.
However I found that these codes are part of another standard, and are in fact a set of items named "HTML entity", apparently used in the web development, as it can be seen on the http://graphemica.com/%F0%9F%98%80 website. So, to be able to use these files, I need a solution to convert the HTML entity values contained in their names into an UTF16 code. For example, in the case of the above mentioned smiling face, I need to convert the 0x1f600 HTML entity code to the 0x3DD8 0x00DE UTF16 code.
A brute force approach may consist to write a map that converts these codes, by adding each of them in my code, one by one. But as the Unicode standard contains, in the most optimist scenario, more than 1800 combinations for the emoticons, I want to know it there is an existing solution, such as a known API or function, that I may use to do the job. Or is there a known trick to do that? (like e.g. "character + ('a' - 'A')" to convert an uppercase char to lower)
Regards
For example, the Unicode chars 0x3DD8 0x00DE found in a text will be replaced by a smiling face image
The character U+1F600 Grinning Face 😀 is represented by the UTF-16 code unit sequence 0xD83D, 0xDE00.
(Graphemica swapping the order of the bytes for each code unit is super misleading; ignore that.)
I found that these codes are part of another standard, and are in fact a set of items named "HTML entity", apparently used in the web development
HTML has nothing to do with it. They're plain Unicode characters—just ones outside the Basic Multilingual Plane, above U+FFFF, which is why it takes more than one UTF-16 code unit to represent them.
HTML numeric character references like 😀 (often incorrectly referred to as entities) are a way of referring to characters by code point number, but the escape string is only effective in an HTML (or XML) document, and we're not in one of those.
So:
I need to convert the 0x1f600 HTML entity code to the 0x3DD8 0x00DE UTF16 code.
sounds more like:
I need to convert representations of U+1F600 Grinning Face: from the code point number 0x1F600 to the UTF-16 code unit sequence 0xD83D, 0xDE00
Which in C# would be:
string face = Char.ConvertFromUtf32(0x1F619); // "😀" aka "\uD83D\uDE00"
or in the other direction:
int codepoint = Char.ConvertToUtf32("\uD83D\uDE00", 0); // 0x1F619
(the name ‘UTF-32’ is poorly-chosen here; we are talking about an integer code point number, not a sequence of four-bytes-per-character.)
Or is there a known trick to do that? (like e.g. "character + ('a' - 'A')" to convert an uppercase char to lower)
In C++ things are more annoying; there's not (that I can think of) anything that directly converts between code points and UTF-16 code units. You could use various encoding functions/libraries to convert between UTF-32-encoded byte sequences and UTF-16 code units, but that can end up more faff than just writing the conversion logic yourself. eg in most basic form for a single character:
std::wstring fromCodePoint(int codePoint) {
if (codePoint < 0x10000) {
return std::wstring(1, (wchar_t)codePoint);
}
wchar_t codeUnits[2] = {
0xD800 + ((codePoint - 0x10000) >> 10),
0xDC00 + ((codePoint - 0x10000) & 0x3FF)
};
return std::wstring(codeUnits, 2);
}
This is assuming the wchar_t type is based on UTF-16 code units, same as C#'s string type is. On Windows this is probably true. Elsewhere it is probably not, but on platforms where wchar_t is based on code points, you can just pull each code point out of the string as a character with no further processing.
(Optimisation and error handling left as an exercise for the reader.)
I'm using the RAD Studio compiler, and fortunately it provides an implementation for the ConvertFromUtf32 and ConvertToUtf32 functions mentioned by bobince. I tested them and they do exactly what I needed.
For those that doesn't use the Embarcadero products, the fromCodePoint() implementation provided by bobince works also well. For information, here is also the ConvertFromUtf32() function as implemented in RAD Studio, and translated into C++
std::wstring ConvertFromUtf32(unsigned c)
{
const unsigned unicodeLastChar = 1114111;
const wchar_t minHighSurrogate = 0xD800;
const wchar_t minLowSurrogate = 0xDC00;
const wchar_t maxLowSurrogate = 0xDFFF;
// is UTF32 value out of bounds?
if (c > unicodeLastChar || (c >= minHighSurrogate && c <= maxLowSurrogate))
throw "Argument out of range - invalid UTF32 value";
std::wstring result;
// is UTF32 value a 16 bit value that can fit inside a wchar_t?
if (c < 0x10000)
result = wchar_t(c);
else
{
// do divide in 2 chars
c -= 0x10000;
// convert code point value to UTF16 string
result = wchar_t((c / 0x400) + minHighSurrogate);
result += wchar_t((c % 0x400) + minLowSurrogate);
}
return result;
}
Thanks to bobince for his response, which pointed me in the right direction and helped me to solve this problem.
Regards

Regex offset in string

I am currently using a regular expression to find some data in a given string.I wish to find the position of the matching pattern in the string.
Is it possible to find the offset of a Regex in a given string with FreePascal ?
In current versions there are two regex functions. One is only in newer versions, but is the most commonly used one (Sorokin's regexpr). And older unit regex is faster but more limited iirc.
I don't use regular expressions much, so I don't have example syntax for you. There is some information here in the wiki http://wiki.freepascal.org/Regexpr though
Of course you could also try to create a header for the perl pcre library. (or recycle a Delphi one)
However to find the offset a simple substring, one can use the standard POS() function. THere is a replace function too.
Here is an example using the standard RegExpr unit.
{$APPTYPE CONSOLE}
{$IFDEF FPC}{$MODE DELPHI}{$ENDIF}
uses
regexpr;
var
s: string;
e: TRegExpr;
begin
s := 'abcdefg';
e := TRegExpr.Create;
e.Expression := '[c-f]+';
e.Exec(s);
WriteLn(e.Match[0]); // cdef
WriteLn(e.MatchPos[0]); // 3
WriteLn(e.MatchLen[0]); // 4
e.Free;
ReadLn;
end.

Type Safe vs Static Typing?

If a language is type-safe does that mean one could automatically assume that its statically typed since you would have to check types at compile time ?
C, for example, is statically typed and not type safe, while Haskell is statically typed and type safe. Most (all?) dynamically typed languages are type safe, as they have means of checking types at runtime to make sure they're the right thing. Additionally, these languages assume that because you have chosen to incur the performance penalty of including runtime type information, you would want to use that information as effectively as possible, and so generally do not allow interpreting a chunk of memory as the wrong type.
Dynamically typed languages have an additional measure of type safety, which is coercion. For example, if you type [] + [] in javascript, it will see that the operands to + are arrays and cannot be added directly, and so will convert them both to strings, giving the result of "" (the empty string).
Some languages, like javascript, will usually coerce other things to strings, while PHP for example will coerce strings to numbers to compare them.
EDIT: Type safety means not being allowed to interpret a chunk of memory holding something of type A as something of type B. As an example of type unsafety, C++ has the reinterpret_cast operator, which means "convert anything to anything else even if it doesn't make sense to do so." For example,
float a = 6.2;
int b = reinterpret_cast<int>(a);
//b now contains some form of garbage
For a much more complete explanation of type safety, see this answer.
I would hesitate to call a dynamic-typed language type-safe, however rigorously it checks types at runtime, because runtime might be too late to do anything about the error!
You could justifiably call such a language strongly typed, but I wouldn't call it type-safe.
Catching the error at compile time gives you a chance to fix it...
For a good example of a type safe language, look at SPARK.
In SPARK, indexing off the end of an array is a type error (each array has a new type for its index, and you have a value that isn't compatible with that type)
You would normally prove there are no such errors before even compiling the program...

What is Type-safe?

What does "type-safe" mean?
Type safety means that the compiler will validate types while compiling, and throw an error if you try to assign the wrong type to a variable.
Some simple examples:
// Fails, Trying to put an integer in a string
String one = 1;
// Also fails.
int foo = "bar";
This also applies to method arguments, since you are passing explicit types to them:
int AddTwoNumbers(int a, int b)
{
return a + b;
}
If I tried to call that using:
int Sum = AddTwoNumbers(5, "5");
The compiler would throw an error, because I am passing a string ("5"), and it is expecting an integer.
In a loosely typed language, such as javascript, I can do the following:
function AddTwoNumbers(a, b)
{
return a + b;
}
if I call it like this:
Sum = AddTwoNumbers(5, "5");
Javascript automaticly converts the 5 to a string, and returns "55". This is due to javascript using the + sign for string concatenation. To make it type-aware, you would need to do something like:
function AddTwoNumbers(a, b)
{
return Number(a) + Number(b);
}
Or, possibly:
function AddOnlyTwoNumbers(a, b)
{
if (isNaN(a) || isNaN(b))
return false;
return Number(a) + Number(b);
}
if I call it like this:
Sum = AddTwoNumbers(5, " dogs");
Javascript automatically converts the 5 to a string, and appends them, to return "5 dogs".
Not all dynamic languages are as forgiving as javascript (In fact a dynamic language does not implicity imply a loose typed language (see Python)), some of them will actually give you a runtime error on invalid type casting.
While its convenient, it opens you up to a lot of errors that can be easily missed, and only identified by testing the running program. Personally, I prefer to have my compiler tell me if I made that mistake.
Now, back to C#...
C# supports a language feature called covariance, this basically means that you can substitute a base type for a child type and not cause an error, for example:
public class Foo : Bar
{
}
Here, I created a new class (Foo) that subclasses Bar. I can now create a method:
void DoSomething(Bar myBar)
And call it using either a Foo, or a Bar as an argument, both will work without causing an error. This works because C# knows that any child class of Bar will implement the interface of Bar.
However, you cannot do the inverse:
void DoSomething(Foo myFoo)
In this situation, I cannot pass Bar to this method, because the compiler does not know that Bar implements Foo's interface. This is because a child class can (and usually will) be much different than the parent class.
Of course, now I've gone way off the deep end and beyond the scope of the original question, but its all good stuff to know :)
Type-safety should not be confused with static / dynamic typing or strong / weak typing.
A type-safe language is one where the only operations that one can execute on data are the ones that are condoned by the data's type. That is, if your data is of type X and X doesn't support operation y, then the language will not allow you to to execute y(X).
This definition doesn't set rules on when this is checked. It can be at compile time (static typing) or at runtime (dynamic typing), typically through exceptions. It can be a bit of both: some statically typed languages allow you to cast data from one type to another, and the validity of casts must be checked at runtime (imagine that you're trying to cast an Object to a Consumer - the compiler has no way of knowing whether it's acceptable or not).
Type-safety does not necessarily mean strongly typed, either - some languages are notoriously weakly typed, but still arguably type safe. Take Javascript, for example: its type system is as weak as they come, but still strictly defined. It allows automatic casting of data (say, strings to ints), but within well defined rules. There is to my knowledge no case where a Javascript program will behave in an undefined fashion, and if you're clever enough (I'm not), you should be able to predict what will happen when reading Javascript code.
An example of a type-unsafe programming language is C: reading / writing an array value outside of the array's bounds has an undefined behaviour by specification. It's impossible to predict what will happen. C is a language that has a type system, but is not type safe.
Type safety is not just a compile time constraint, but a run time constraint. I feel even after all this time, we can add further clarity to this.
There are 2 main issues related to type safety. Memory** and data type (with its corresponding operations).
Memory**
A char typically requires 1 byte per character, or 8 bits (depends on language, Java and C# store unicode chars which require 16 bits).
An int requires 4 bytes, or 32 bits (usually).
Visually:
char: |-|-|-|-|-|-|-|-|
int : |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type safe language does not allow an int to be inserted into a char at run-time (this should throw some kind of class cast or out of memory exception). However, in a type unsafe language, you would overwrite existing data in 3 more adjacent bytes of memory.
int >> char:
|-|-|-|-|-|-|-|-| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?| |?|?|?|?|?|?|?|?|
In the above case, the 3 bytes to the right are overwritten, so any pointers to that memory (say 3 consecutive chars) which expect to get a predictable char value will now have garbage. This causes undefined behavior in your program (or worse, possibly in other programs depending on how the OS allocates memory - very unlikely these days).
** While this first issue is not technically about data type, type safe languages address it inherently and it visually describes the issue to those unaware of how memory allocation "looks".
Data Type
The more subtle and direct type issue is where two data types use the same memory allocation. Take a int vs an unsigned int. Both are 32 bits. (Just as easily could be a char[4] and an int, but the more common issue is uint vs. int).
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
|-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-| |-|-|-|-|-|-|-|-|
A type unsafe language allows the programmer to reference a properly allocated span of 32 bits, but when the value of a unsigned int is read into the space of an int (or vice versa), we again have undefined behavior. Imagine the problems this could cause in a banking program:
"Dude! I overdrafted $30 and now I have $65,506 left!!"
...'course, banking programs use much larger data types. ;) LOL!
As others have already pointed out, the next issue is computational operations on types. That has already been sufficiently covered.
Speed vs Safety
Most programmers today never need to worry about such things unless they are using something like C or C++. Both of these languages allow programmers to easily violate type safety at run time (direct memory referencing) despite the compilers' best efforts to minimize the risk. HOWEVER, this is not all bad.
One reason these languages are so computationally fast is they are not burdened by verifying type compatibility during run time operations like, for example, Java. They assume the developer is a good rational being who won't add a string and an int together and for that, the developer is rewarded with speed/efficiency.
Many answers here conflate type-safety with static-typing and dynamic-typing. A dynamically typed language (like smalltalk) can be type-safe as well.
A short answer: a language is considered type-safe if no operation leads to undefined behavior. Many consider the requirement of explicit type conversions necessary for a language to be strictly typed, as automatic conversions can sometimes leads to well defined but unexpected/unintuitive behaviors.
A programming language that is 'type-safe' means following things:
You can't read from uninitialized variables
You can't index arrays beyond their bounds
You can't perform unchecked type casts
An explanation from a liberal arts major, not a comp sci major:
When people say that a language or language feature is type safe, they mean that the language will help prevent you from, for example, passing something that isn't an integer to some logic that expects an integer.
For example, in C#, I define a function as:
void foo(int arg)
The compiler will then stop me from doing this:
// call foo
foo("hello world")
In other languages, the compiler would not stop me (or there is no compiler...), so the string would be passed to the logic and then probably something bad will happen.
Type safe languages try to catch more at "compile time".
On the down side, with type safe languages, when you have a string like "123" and you want to operate on it like an int, you have to write more code to convert the string to an int, or when you have an int like 123 and want to use it in a message like, "The answer is 123", you have to write more code to convert/cast it to a string.
To get a better understanding do watch the below video which demonstrates code in type safe language (C#) and NOT type safe language ( javascript).
http://www.youtube.com/watch?v=Rlw_njQhkxw
Now for the long text.
Type safety means preventing type errors. Type error occurs when data type of one type is assigned to other type UNKNOWINGLY and we get undesirable results.
For instance JavaScript is a NOT a type safe language. In the below code “num” is a numeric variable and “str” is string. Javascript allows me to do “num + str” , now GUESS will it do arithmetic or concatenation .
Now for the below code the results are “55” but the important point is the confusion created what kind of operation it will do.
This is happening because javascript is not a type safe language. Its allowing to set one type of data to the other type without restrictions.
<script>
var num = 5; // numeric
var str = "5"; // string
var z = num + str; // arthimetic or concat ????
alert(z); // displays “55”
</script>
C# is a type safe language. It does not allow one data type to be assigned to other data type. The below code does not allow “+” operator on different data types.
Concept:
To be very simple Type Safe like the meanings, it makes sure that type of the variable should be safe like
no wrong data type e.g. can't save or initialized a variable of string type with integer
Out of bound indexes are not accessible
Allow only the specific memory location
so it is all about the safety of the types of your storage in terms of variables.
Type-safe means that programmatically, the type of data for a variable, return value, or argument must fit within a certain criteria.
In practice, this means that 7 (an integer type) is different from "7" (a quoted character of string type).
PHP, Javascript and other dynamic scripting languages are usually weakly-typed, in that they will convert a (string) "7" to an (integer) 7 if you try to add "7" + 3, although sometimes you have to do this explicitly (and Javascript uses the "+" character for concatenation).
C/C++/Java will not understand that, or will concatenate the result into "73" instead. Type-safety prevents these types of bugs in code by making the type requirement explicit.
Type-safety is very useful. The solution to the above "7" + 3 would be to type cast (int) "7" + 3 (equals 10).
Try this explanation on...
TypeSafe means that variables are statically checked for appropriate assignment at compile time. For example, consder a string or an integer. These two different data types cannot be cross-assigned (ie, you can't assign an integer to a string nor can you assign a string to an integer).
For non-typesafe behavior, consider this:
object x = 89;
int y;
if you attempt to do this:
y = x;
the compiler throws an error that says it can't convert a System.Object to an Integer. You need to do that explicitly. One way would be:
y = Convert.ToInt32( x );
The assignment above is not typesafe. A typesafe assignement is where the types can directly be assigned to each other.
Non typesafe collections abound in ASP.NET (eg, the application, session, and viewstate collections). The good news about these collections is that (minimizing multiple server state management considerations) you can put pretty much any data type in any of the three collections. The bad news: because these collections aren't typesafe, you'll need to cast the values appropriately when you fetch them back out.
For example:
Session[ "x" ] = 34;
works fine. But to assign the integer value back, you'll need to:
int i = Convert.ToInt32( Session[ "x" ] );
Read about generics for ways that facility helps you easily implement typesafe collections.
C# is a typesafe language but watch for articles about C# 4.0; interesting dynamic possibilities loom (is it a good thing that C# is essentially getting Option Strict: Off... we'll see).
Type-Safe is code that accesses only the memory locations it is authorized to access, and only in well-defined, allowable ways.
Type-safe code cannot perform an operation on an object that is invalid for that object. The C# and VB.NET language compilers always produce type-safe code, which is verified to be type-safe during JIT compilation.
Type-safe means that the set of values that may be assigned to a program variable must fit well-defined and testable criteria. Type-safe variables lead to more robust programs because the algorithms that manipulate the variables can trust that the variable will only take one of a well-defined set of values. Keeping this trust ensures the integrity and quality of the data and the program.
For many variables, the set of values that may be assigned to a variable is defined at the time the program is written. For example, a variable called "colour" may be allowed to take on the values "red", "green", or "blue" and never any other values. For other variables those criteria may change at run-time. For example, a variable called "colour" may only be allowed to take on values in the "name" column of a "Colours" table in a relational database, where "red, "green", and "blue", are three values for "name" in the "Colours" table, but some other part of the computer program may be able to add to that list while the program is running, and the variable can take on the new values after they are added to the Colours table.
Many type-safe languages give the illusion of "type-safety" by insisting on strictly defining types for variables and only allowing a variable to be assigned values of the same "type". There are a couple of problems with this approach. For example, a program may have a variable "yearOfBirth" which is the year a person was born, and it is tempting to type-cast it as a short integer. However, it is not a short integer. This year, it is a number that is less than 2009 and greater than -10000. However, this set grows by 1 every year as the program runs. Making this a "short int" is not adequate. What is needed to make this variable type-safe is a run-time validation function that ensures that the number is always greater than -10000 and less than the next calendar year. There is no compiler that can enforce such criteria because these criteria are always unique characteristics of the problem domain.
Languages that use dynamic typing (or duck-typing, or manifest typing) such as Perl, Python, Ruby, SQLite, and Lua don't have the notion of typed variables. This forces the programmer to write a run-time validation routine for every variable to ensure that it is correct, or endure the consequences of unexplained run-time exceptions. In my experience, programmers in statically typed languages such as C, C++, Java, and C# are often lulled into thinking that statically defined types is all they need to do to get the benefits of type-safety. This is simply not true for many useful computer programs, and it is hard to predict if it is true for any particular computer program.
The long & the short.... Do you want type-safety? If so, then write run-time functions to ensure that when a variable is assigned a value, it conforms to well-defined criteria. The down-side is that it makes domain analysis really difficult for most computer programs because you have to explicitly define the criteria for each program variable.
Type Safety
In modern C++, type safety is very important. Type safety means that you use the types correctly and, therefore, avoid unsafe casts and unions. Every object in C++ is used according to its type and an object needs to be initialized before its use.
Safe Initialization: {}
The compiler protects from information loss during type conversion. For example,
int a{7}; The initialization is OK
int b{7.5} Compiler shows ERROR because of information loss.\
Unsafe Initialization: = or ()
The compiler doesn't protect from information loss during type conversion.
int a = 7 The initialization is OK
int a = 7.5 The initialization is OK, but information loss occurs. The actual value of a will become 7.0
int c(7) The initialization is OK
int c(7.5) The initialization is OK, but information loss occurs. The actual value of a will become 7.0