What are the differences between Null, Zero and Blank in SQL? - mysql

Can someone please explain the differences between Null, Zero and Blank in SQL to me?

Zero is a number value. It is a definite with precise mathematical properties. (You can do arithmetic on it ...)
NULL means the absence of any value. You can't do anything with it except test for it.
Blank is ill-defined. It means different things in different contexts to different people. For example:
AFAIK, there is no SQL or MySQL specific technical meaning for "blank". (Try searching the online MySQL manual ...)
For some people "blank" could mean a zero length string value: i.e. one with no characters in it ('').
For some people "blank" could mean a non-zero length string value consisting of only non-printing characters (SPACE, TAB, etc). Or maybe consisting of just a single SPACE character.
In some contexts (where character and string are different types), some people could use "blank" to mean a non-printing character value.
For some people could even use "blank" mean "anything that doesn't show up when you print or display it".
And then there are meanings that are specific to (for example) ORM mappings.
The point is that "blank" does not have a single well-defined meaning. At least not in (native) English IT terminology. It is probably best to avoid it ... if you want other IT professionals to understand what you mean. (And if someone else uses the term and it is not obvious from the context, ask them to say precisely what they mean!)
We cannot say anything generally meaningful about how ZERO / NULL / BLANK are represented, how much memory they occupy or anything like that. All we can say is that they are represented differently to each other .... and that the actual representation is implementation and context dependent.

You may correlate NULL-BLANK-ZERO case by child birth scenario( A real life Example.).
NULL Case: Child is not born yet.
BLANK Case: Child is born but we didn't give any name to him
ZERO Case: We defined it as zero, Child is born but of zero age.
See how this data will look in a database table:
Also NULL is a absence of value, where a field having NULL is not allocated any memory, where as empty fields have empty value with allocated space in memory.

Could you be more accurate about blank?
For what I understand of your question:
"Blank" is the lack of value. This is a human concept. In SQL, you need to fill the field with a value anyway. So that there is a value which means "no value has been set for this field". It is NULL.
If Blank is "", then it is a string, an empty one.
Zero: well, Zero is 0 ... It is a number.
To sum up:
NULL --> no value set
Blank ("") --> empty string
Zero --> Number equal to 0
Please, try to be more accurate next time you post an answer on Stack!
If I were you, I would check some resources about it, for example:
https://www.tutorialspoint.com/sql/sql-null-values.htm

NULL means it does not have any value not even garbage value.
ZERO is an integer value.
BLANK is simply a empty String value.

Related

How can I determine which regular expressions from a list possibly overlap

I have a table of regular expressions that are in an MySQL table that I match text against.
Is there a way, using MySQL or any other language (preferably Perl) that I can take this list of expressions and determine which of them MAY overlap. This should be independent of whatever text may be supplied to the expressions.
All of the expression have anchors.
Here is an example of what I am trying to get:
Expressions:
^a$
^b$
^ab
^b.*c
^batch
^catch
Result:
'^b.*c' and '^batch' MAY overlap
Thoughts?
Thanks,
Scott
Further explanation:
I have a list of user-created regexes and an imported list of strings that are to be matched against the regexes. In this case the strings are "clean" data (ie they are not user-created but imported from another source - they must not change).
When a user adds to the list of regexes I do not want any collisions on either the existing list of strings nor any future strings (which can not be guessed ahead of time - the only constraints being they are ASCII printable characters no longer than 255 characters).
A brute-force method would be to create a "rainbow" table of all of the permutations of strings and each time a regex is added run all of the regexes against the rainbow table. However I'd like to avoid this (I'm not even sure of the cost) and so was wondering aloud as to the possibility of an algorithm that would AT LEAST show which regexes in a list MAY collide.
I will punt on full REs. Even limiting to BREs and/or MySQL-pre-8.0 will be challenging. Here are some thoughts.
If end-anchored and no + or *, the calculate the length. The fixed-length can be used as a discriminator. Also, it could be used for toning back the "brute force" by perhaps an order of magnitude.
Anything followed by + or * gets turned into .* for simplicity. (Re the "may collide" rule.)
Any RE with explicit characters (including those followed by +) becomes a discriminator in some situations. Eg, ^a.*b$ vs ^a.*c$.
For those anchored at the end, reverse the pattern and test it that way. (I don't know how difficult reversing is.)
If you can say that a particular character must be at any position, then use it as a discriminator: ^a.b.*c$ -- a in pos 1; b in pos 3; c at end. Perhaps this can be extended to character classes: ^\w may match, but ^\d and ^a.*\d$ can't.

roundings with Access

With Microsoft Access 2010, I have two Single fields:
A = 1.1
B = 2.1
I create a query where I have defined C=A*B
Microsoft Access says that C = 2.30999994277954
but, in reality, C =2.31
How can I get the right result (2.31)?
Slightly off results from operations performed on decimal values can happen if your numeric field size is single or double rather than decimal. Single and double (or floating point) numbers are very close approximations of the "true" numbers, but should not be relied upon if accuracy in operations is required. A related stackoverflow question has more information about this issue: Access comparing floating-point numbers "incorrectly"
If it's possible to modify the underlying table's design, you should change the field size property for the "A" and "B" fields from single to decimal. After changing the field size BUT BEFORE saving the table, you will also need to adjust the Scale property for "A" and "B" from 0 to whatever number of places to the right of the decimal point you might require. You will likely still have a notice about losing data, but if you adjust the field properties correctly before saving the table, this shouldn't be a problem. You should probably make a copy of the table before doing this so that you can verify that there was no data loss. After saving your table and verifying the changes did not result in data loss, your query should represent A * B accurately.

Truncates Long Text/Memo string to 255 characters when it is a primary key field or "Indexed: Yes (no-duplicates) allowed"

I created a table in MS Access 2013 with only one column of "Long Text" type (called as Memo earlier) and made it the primary key of the table. I stored a long string of 255+ characters and then I tried to store another string whose first 255 characters were same as previous stored string but all other characters after first 255 were different and MS Access gave "duplicate data" error. In the new string I changed the characters that were after 255th position, using different combinations of characters and all gave error. But when I change any character before the 255th position it does not give any error. So, I concluded that MS Access checks only the first 255 characters of "Long Text" data type for checking duplicates in that column. Is it so? What else could be reason?
String Stored of 256 characters:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelectr
String Gave Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelect1
String Gave Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelect2
String Gave Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelect123
Does Not Give Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelec1
Does Not Give Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelec2
Does Not Give Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelec3
Please notice the difference in the last few characters of above samples. The first stored string has 256 characters. Even if the column is not the primary key, the problem remains same if "Indexed: Yes (no-duplicates) allowed" value is set true in the table design for that column.
As #HansUp stated in the comments, Access (specifically the Jet/ACE db engine) only uses the first 255 characters of a Memo/Long Text field to create its index. Hence, it only uses the first 255 characters to enforce No Duplicates.
#HansUp's advice to use a different db engine that provides better support for long strings and Full Text search is probably the best approach, but I understand there are often other considerations that may be limiting you to solving your problem in Access.
As such, here is an Access-only approach to solving your problem. This assumes the requirement you listed in the comments is valid; i.e., you need to store unique strings of between 400 and 1000 characters.
Alternative 1
Keep your initial Memo/Long Text field: Notes
Create four text fields (not Memo/Long Text) of 250 characters max: Notes1, Notes2, Notes3, Notes4
Set all four text fields: Required -> True and Allow Zero Length -> True (this is required to ensure the unique index is enforced for strings less than 751 characters)
Create a unique index and add all four text fields to that index
Don't ignore nulls in your index
When you store the values, you will need to store them in the Notes field and also split the string among the four smaller NotesX fields
Alternative 2:
Keep your current setup and enforce the uniqueness at code level. Every time you update or insert a note, do a search on all notes that match the first 255 characters, read the value and perform the comparison in code.
Alternative 3 (thanks to #HansUp for suggesting this in the comments):
Keep your initial Memo/Long Text field: Notes
Create a 16 or 32 character text field to store the 256 bit or 512 bit hash of your long text: NotesHash
Add a unique index to your NotesHash field
Every time the memo field is changed, re-compute the hash value and attempt to store it in the table
Notes for this method:
As the pigeonhole principle easily proves, there is the possibility that two different strings will generate the same hash (a collision). However, using a good hashing algorithm will make the actual probability approach zero.
This site offers some VB6/VBA/VBScript implementations of various hashing algorithms. I can't vouch for their correctness, but they passed the eye test for me. Use at your own risk, but it's at least a good starting point.
Really, you can use any deterministic function that returns a string of 255 characters or fewer given an arbitrarily large input. The difference between a crappy hash algorithm and a good one is how well it minimizes collisions. For that reason, I would suggest you use one based on a popular standard.
And yes, I still highly recommend #HansUp's solution to simply use a different db engine.

A name for a template-matching parameter

In my template-matching code I need the user to pass a floating-point parameter, which specifies whether the algorithm should concentrate only on the best matches (thus work faster) or analyse even low-probability areas (making it slower).
The parameter is linear and normalized such that possible values are in range <0, 1>. It doesn't matter whether the number of resulting matches increases or decreases as the parameter grows, as this can be easily changed. The default value is around 0.5; when the value is on one end of the range, the algorithm should possibly return only one match (and work fast); whereas the other end should mean lots of possible matches and long processing time.
What name should I choose for this parameter such that it makes sense to the end-user? I've been thinking about MatchingQuality or MatchingDepth but neither seems appropriate and self-explanatory.
I would probably call it matchingAccuracy, matchingPrecision or something like that.
How about MatchThresholdCoefficient?

What is a 'value' in the context of programming?

Can you suggest a precise definition for a 'value' within the context of programming without reference to specific encoding techniques or particular languages or architectures?
[Previous question text, for discussion reference: "What is value in programming? How to define this word precisely?"]
I just happened to be glancing through Pierce's "Types and Programming Languages" - he slips a reasonably precise definition of "value" in a programming context into the text:
[...] defines a subset of terms, called values, that are possible final results of evaluation
This seems like a rather tidy definition - i.e., we take the set of all possible terms, and the ones that can possibly be left over after all evaluation has taken place are values.
Based on the ongoing comments about "bits" being an unacceptable definition, I think this one is a little better (although possibly still flawed):
A value is anything representable on a piece of possibly-infinite Turing machine tape.
Edit: I'm refining this some more.
A value is a member of the set of possible interpretations of any possibly-infinite sequence of symbols.
That is equivalent to the earlier definition based on Turing machine tape, but it actually generalises better.
Here, I'll take a shot: A value is a piece of stored information (in the information-theoretical sense) that can be manipulated by the computer.
(I won't say that a value has meaning; a random number in a register may have no meaning, but it's still a value.)
In short, a value is some assigned meaning to a variable (the object containing the value)
For example type=boolean; name=help; variable=a storage location; value=what is stored in that location;
Further break down:
X = 2; where X is a variable while 2 is the value stored in X.
Have you checked the article in wikipedia?
In computer science, a value is a sequence of bits that is interpreted according to some data type. It is possible for the same sequence of bits to have different values, depending on the type used to interpret its meaning. For instance, the value could be an integer or floating point value, or a string.
Read the Wiki
Value = Value is what we call the "contents" that was stored in the variable
Variables = containers for storing data values
Example: Think of a folder named "Movies"(Variables) and inside of it are it contents which are namely; Pirates of the Carribean, Fantastic Beast, and Lala land, (this in turn is what we now call it's Values )