Given a GUID, if i shorten it .. can we assume it will still be unique? - language-agnostic

Given that GUID's are (more or less) unique, if we shorten it with some code which:
It basically just converts a GUID into a base64 string and shortens it a bit.
It takes a standard GUID like this:
c9a646d3-9c61-4cb7-bfcd-ee2522c8f633
And converts it into this smaller string:
00amyWGct0y_ze4lIsj2Mw
Can I now assume that the shortened guid is equally as unique as it's previous (normal) form?

This is a reversable transform — you can get the original GUID back with an inverse function. That means that it's exactly as "unique"; there's a different "shortened GUID" for every GUID. The final substr step in the encode function is removing the base64 padding characters ==. This doesn't lose any information because every GUID is the same length and therefore every GUID has the same padding. The decode function re-appends "==" before passing to the base64 decoder.

Related

Alternative ways to extract the contents of a JSON string

Consider the following query:
select '"{\"foo\":\"bar\"}"'::json;
This will return a single record of a single element containing a JSON string. See:
test=# select json_typeof('"{\"foo\":\"bar\"}"'::json); json_typeof
-------------
string
(1 row)
It is possible to extract the contents of the string as follows:
=# select ('"{\"foo\":\"bar\"}"'::json) #>>'{}';
json
---------------
{"foo":"bar"}
(1 row)
From this point onward, the result can be cast as a JSON object:
test=# select json_typeof((('"{\"foo\":\"bar\"}"'::json) #>>'{}')::json);
json_typeof
-------------
object
(1 row)
This way seems magical.
I define no path within the extraction operator, yet what is returned is not what I passed. This seems like passing no index to an array accessor, and getting an element back.
I worry that I will confuse the next maintainer to look at this logic.
Is there a less magical way to do this?
But you did define a path. Defining "root" as path is just another path. And that's just what the #>> operator is for:
Extracts JSON sub-object at the specified path as text.
Rendering as text effectively applies the escape characters in the string. When casting back to json the special meaning of double-quotes (not escaped any more) kicks in. Nothing magic there. No better way to do it.
If you expect it to be confusing to the afterlife, add comments explaining what you are doing there.
Maybe, in the spirit of clarity, you might use the equivalent function json_extract_path_text() instead. The manual:
Extracts JSON sub-object at the specified path as text. (This is functionally equivalent to the #>> operator.)
Now, the function has a VARIADIC parameter, and you typically enter path elements one-by-one, like the example in the manual demonstrates:
json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}',
'f4', 'f6') → foo
You cannot enter the "root" path this way. But (what the manual does not add at this point) you can alternatively provide an actual array after adding the keyword VARIADIC. See:
Pass multiple values in single parameter
So this does the trick:
SELECT json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}')::json;
And since we are all about being explicit, use the verbose SQL standard cast syntax:
SELECT cast(json_extract_path_text('"{\"foo\":\"bar\"}"'::json, VARIADIC '{}') AS json)
Any clearer, yet? (I would personally prefer your shorter original, but I may be biased, being a "native speaker" of Postgres..)
The question is, why do you have that odd JSON literal including escapes as JSON string in the first place?

Deal with long numbers in scientific notation in json string - Freemarker

I have a json string which contains a long number but in scientific notation (like 1.559101974041E12 instead of 1559101974041). Due to this, I am not able to parse it using ?eval as this value must be in double quotes in order to get parsed.
I thought of one solution like putting double quotes around them using regex and get them evaluated. After that, use some free marker method to convert value into long. But this solution is very risky and can alter other values as well.
I'm not sure how your template looks, but if you have variable s that contains the string "1.559101974041E12" (the quotation marks aren't part of the string value itself), then you can parse it like s?number. s?eval doesn't work because scientific notation is not part of the FreeMarker syntax (but ?number can parse more formats).
If you will re-print the number in the template, note that depending on locale and configuration settings, it might will look like 1,559,101,974,041. You can prevent that with ?c (for example like ${s?number?c}), in which case it will always look like 1559101974041.

Char.IsDigit vs NumberFormat.NativeDigits

I'm working on optimizing my parser for JSON that I built in Visual Basic .NET. I do not follow the EBNF verbatim, for example with numbers I match for an optional positive sign and for boolean/null values I'm not limiting a match to just lower-case. However, I have a question on if I should use Char.IsDigit or NumberFormat.NativeDigits for matching digits in a number.
Currently I use Char.IsDigit because I'm iterating through each character in the source code and so it is just easier to compare the currently iterated Char value in the String. However, to check for the optional positive/negative signs I'm using the NumberFormat class anyways and so I was wondering if there is any benefit to checking if the currently iterated character is in the NativeDigits collection.
The downside that I can think of is that since I am iterating through each Char in the String, that I'd have to convert the Char to a String to check if the character is in the NativeDigits collection; since Strings are immutable in VB.NET I try to make as few instances of Strings as possible.

Why is the format string of struct field always lower case

When encoding/decoding structs with json, almost all of the code out there use the same field name, but with the initial letter in lower case, why is this?
Since the names are the same, and json certainly can work with any case, why add this duplicate thing:
Name string `json:"name"`
Why not just use Name string? It other case, adding the format string makes sense if the name is different than the go field name:
Name string `json:"MyName"`
The encoding/json documentation says:
The encoding of each struct field can be customized by the format string stored under the "json" key in the struct field's tag. The format string gives the name of the field, possibly followed by a comma-separated list of options. The name may be empty in order to specify options without overriding the default field name.
Applications specify a lowercase name in the tag to produce a lowercase name in the JSON.
This struct
type Example struct {
Name1 string
Name2 string `json:"name1"`
}
encodes as:
{
"Name1": "1",
"name1": "2"
}
playground example
JSON only requires that field names be valid strings. Lowercase names are not required in JSON. That said, it is very common practice to start field names with a lowercase letter in JSON.
Name string `json:"name" db:"SomeName"`
Keep in mind, string json:"name" db:"Name" used to adjust de/serialization, can be in json or database.
for naming it depends on output. if database field is SomeName so you must define db SomeName.
So my questions goes to why almost all the applications want to use the lowercase?
if you encounter source code which using ouput json using only lowercase, this obviously to keep consistency output.
if lower case on variable give different effect too, for lower case act as private variable and upper case act as public variable so can be accessed through package.
When encoding/decoding structs with json, almost all of the code out there use the same field name, but with the initial letter in lower case, why is this?
Because JavaScript traditionally/preferentially uses camelCase for variable and function names, so naturally JSON (originating in the JavaScript world) followed suit.
Of course this is not an enforce standard, and there are many competing standards. But since the question is why is this common, this seems the most likely answer.
You are, of course, free to use any casing system you want for JSON key names, and you most certainly will find examples of any casing system (including lack of system) in use in real software.

Regex to match a username

I am trying to create a regex to validate usernames which should match the following :
Only one special char (._-) allowed and it must not be at the extremes of the string
The first character cannot be a number
All the other characters allowed are letters and numbers
The total length should be between 3 and 20 chars
This is for an HTML validation pattern, so sadly it must be one big regex.
So far this is what I've got:
^(?=(?![0-9])[A-Za-z0-9]+[._-]?[A-Za-z0-9]+).{3,20}
But the positive lookahead can be repeated more than one time allowing to be more than one special character which is not what I wanted. And I don't know how to correct that.
You should split your regex into two parts (not two Expressions!) to make your life easier:
First, match the format the username needs to have:
^[a-zA-Z][a-zA-Z0-9]*[._-]?[a-zA-Z0-9]+$
Now, we just need to validate the length constraint. In order to not mess around with the already found pattern, you can use a non-consuming match that only validates the number of characters (its literally a hack for creating an and pattern for your regular expression): (?=^.{3,20}$)
The regex will only try to match the valid format if the length constraint is matched. It is non-consuming, so after it is successful, the engine still is at the start of the string.
so, all together:
(?=^.{3,20}$)^[a-zA-Z][a-zA-Z0-9]*[._-]?[a-zA-Z0-9]+$
Debugger Demo
I think you need to use ? instead of +, so the special character is matched only once or not.
^(?=(?![0-9])?[A-Za-z0-9]?[._-]?[A-Za-z0-9]+).{3,20}