mysql check varchar length of json? - mysql

Is there any way we can find varchar length of JSON inputs?
For example, SELECT JSON_LENGTH('[1, 2, {"a": 3}]'); This give output as 3.
But is there any way to find a number of characters in this json? Like we have 19 characters in the JSON input '[1, 2, {"a": 3}]', is there any function that can return this?
We need this to avoid storing very big JSON objects and keep it as safe check before storing any JSON in the database.

Related

ERROR: malformed array literal in PostgreSQL

I want filter on integer array in postgresql but when I am executing below query its giving me malformed array literal error.
select * from querytesting where 1111111111 = any((jsondoc->>'PhoneNumber')::integer[]);
Open image for reference-
https://i.stack.imgur.com/Py3Z2.png
any(x) wants a PostgreSQL array as x. (jsondoc->>'PhoneNumber'), however, is giving you a text representation of a JSON array. A PostgreSQL array would look like this as text:
'{1,2,3}'
but the JSON version you get from ->> would look like:
'[1,2,3]'
You can't mix the two types of array.
You could use a JSON operator instead:
jsondoc->'PhoneNumber' #> 1111111111::text::jsonb
Using -> instead of ->> gives you a JSON array rather than text. Then you can see if the number you're looking for is in that array with #>. The double cast (::text::jsonb) is needed to convert the PostgreSQL number to a JSON number for the #> operator.
As an aside, storing phone numbers as numbers might not be the best idea. You don't do arithmetic on phone numbers so they're not really numbers at all, they're really strings that contain digit characters. Normalizing the phone number format to international standards and then treating them as strings will probably serve you better in the long term.

How can I validate Json schema in spark 2.X?

Using Spark streaming (written in Scala) to read messages from Kafka.
The messages are all Strings in Json format.
Defining the expected schema in a local variable expectedSchema
then parsing the Strings in the RDD to Json
spark.sqlContext.read.schema(schema).json(rdd.toDS())
The problem: Spark will process all the records/rows as long as it has some fields that I try to read, even if the actual Json format (i.e schema) of the input row (String) doesn't match my expectedSchema.
Assume expected schema looks like this (in Json): {"a": 1,"b": 2, "c": 3}
and input row looks like this: {"a": 1, "c": 3}
Spark will process the input without failing.
I tried using the solution described here: How do I apply schema with nullable = false to json reading
but assert(readJson.schema == expectedSchema) never fails, even when I deliberately send input rows with wrong Json schema.
Is there a way for me to verify that the actual schema of a given input row, matches my expected schema?
Is there a way for me to insert a null value to "fill" fields missing from "corrupt" schema row?

Elastic Search JSON String:JSONObject indexing

can elastic search index values such as
"key": [
14.0,
"somestring"
]
if i try to ingest this data, i get this error
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [Bad Request(400) - [WriteFailureException; nested: MapperParsingException[failed to parse [FIXMessage.key]]; nested: NumberFormatException[For input string: "somestring"]; ]]; Bailing out..
first, is the above a valid JSON format ? If so, then why is elastic not able to index it?
Elasticsearch is trying to convert "somestring" to number, and failing.
Your json is valid, and Elasticsearch will happily index multiple values for the same field from an array of values, but they have to all have the same type, and the type must match the type of the corresponding field in the mapping.
If the field doesn't exist in the mapping, Elasticsearch will create it using a sensible default type based on the first value it sees for that field. So if you tried to index what you posted above, and the "key" field didn't exist, Elasticsearch would create the field, look at the first value it found (14.0) and decide to use the float type. Then it would try to index the second value, "somestring", fail to convert it to a float, and send back an error.
Make sense?
You might find this post helpful.
It depends on what you plan to do with the data.
If you only want to store and to get the full array, you can simply convert the JSON array as string.
When you want to get the array from Elastic, that you have to convert back from string to JSON array.
If you want also to search inside this array, you can split the array into multiple arrays, distinguish it after type.

Changing array type representation to use square brackets? Possible?

The answer to my question here: Detecting column changes in a postgres update trigger has me converting rows in my database into their hstore equivalent. It's clever, but leads to serialization / deserialization issues with array column types.
I have a few array-typed columns, so for example:
select hstore_to_json(hstore(documents.*)) from documents where id=283;
gives me (abbreviated form):
{"id": "283", "tags": "{potato,rutabaga}", "reply_parents": "{7}"}
What I'd really like is
"tags": ["potato", "rutabaga"], "reply_parents": [7]
as this is well-formed JSON. Technically, the first response is also well-formed JSON, as the array has been stringified and is sent down the wire as "{potato,rutabaga}". This requires me to fiddle with the parsing of responses I get, and while that's not the end of the world, it is a bit of a pain if it turns out to be unnecessary.
Calling row_to_json on the row first converts the array types into their proper json-array representation, but it doesn't seem like there's a set subtraction type operator on json objects (hstore - hstore in the question asked above is how I'm sending "these columns changed" events down my websocket wire). So, any suggestions as to how to get this working properly right in the database are welcome. (either futzing with the way arrays get stringified into hstore, or by doing set subtraction on json objects).
If you cannot find any natural solution, you can always trust in regex.
create or replace function hj(json text)
returns text language plpgsql immutable
as $$
begin
return
regexp_replace(
regexp_replace(
regexp_replace(
json, '([^"{]+?),', '"\1", ', 'g'),
'([^"{ ]+?)}"', '"\1"]', 'g'),
'"{"', '["', 'g');
end $$;
select hj('{"id": "283", "tags": "{potato,rutabaga}", "reply_parents": "{7}"}');
-- gives:
-- {"id": "283", "tags": ["potato", "rutabaga"], "reply_parents": ["7"]}

mochijson2 or mochijson

I'm encoding some data using mochijson2.
But I found that it behaves strange on strings as lists.
Example:
mochijson2:encode("foo").
[91,"102",44,"111",44,"111",93]
Where "102", "111", "111" are $f, $o, $o encoded as strings
44 are commas and 91 and 93 are square brakets.
Of course if I output this somewhere I'll get string "[102,111,111]" which is obviously not that what I what.
If i try
mochijson2:encode(<<"foo">>).
[34,<<"foo">>,34]
So I again i get a list of two doublequotes and binary part within which can be translated to binary with list_to_binary/1
Here is the question - why is it so inconsistent. I understand that there is a problem distingushing erlang list that should be encoded as json array and erlang string which should be encoded as json string, but at least can it output binary when i pass it binary?
And the second question:
Looks like mochijson outputs everything nice (cause it uses special tuple to designate arrays {array, ...})
mochijson:encode(<<"foo">>).
"\"foo\""
What's the difference between mochijson2 and mochijson? Performance? Unicode handling? Anything else?
Thanks
My guess is that the decision in mochijson is that it treats a binary as a string, and it treats a list of integers as a list of integers. (Un?)fortunately strings in Erlang are in fact a list of integers.
As a result your "foo", or in other words, your [102,111,111] is translated into text representing "[102,111,111]". In the second case your <<"foo">> string becomes "foo"
Regarding the second question, mochijson seems to always return a string, whereas mochijson2 returns an iodata type. Iodata is basically a recursive list of strings, binaries and iodatas (in fact iolists). If you only intend to send the result "through the wire", it is more efficient to just nest them in a list than convert them to a flat string.