QuickSight parseJson with dots in key - json

In AWS QuickSight, I'm trying to create a Calculated Field based on some JSON data. The data has dots in some of the keys, e.g:
{"foo.bar": "baz"}
In order to parse this, I'd need something like the bracket notation, since the dot notation won't work (unless there's a way to escape the dot?):
parseJson({the_column}, '$["foo.bar"]')
Trying this out, I'm getting errors saying that the syntax is incorrect.
So, my question: What is the correct syntax to achieve this?
Note:
I can use replace to at least be able to parse it, e.g parseJson(replace({the_field}, "foo.bar", "whatever"), "$.whatever"). But this does not seem optimal, and it is prone to errors like accidentally replacing the wrong string.

Related

Remove JSON keys with wildcards from a MySQL field

I have a MySQL 8.0.20 database with a table that describes metadata about uploaded image files. One column contains a JSON object with a whole bunch of auto-generated data that I'm trying to clean up.
This JSON object sometimes contains one or more variable key names that match a specific pattern. Something like
{
"image_name": "P10043983",
"image_size": "60138",
"image_original_exifdata": "{
'FileName':'P10043983.jpg',
'MimeType':'image/jpeg',
'UndefinedTag:0xA435':'\u0000\u0000\u0000\u0000\u0000\u0000'
}"
}
That UndefinedTag:0xA435 (with many permutations) is the problem. It's referring to various image Exif details like lens type, GPS data, etc. It's stuff that I'm not interested in and that these cameras mostly don't provide, so I've ended up with a table full of long strings of useless characters just taking up space. I want those JSON fields gone for performance and cleanliness.
Is there a way to run a SQL query that would use wildcards or regular expressions to find (and, ideally, remove) all of these pesky variable keys? I'd like to avoid manually making a list of all of the possible "UndefinedTag" keys to search against, and I also didn't like the results when I just treated the whole thing as a string and did REGEXP_REPLACE calls (it sometimes left trailing commas that broke my JSON and were difficult for me to avoid/resolve).
I know some of the JSON functions like JSON_SEARCH() accept wildcards, but it explicitly says the search path can't end in a wildcard (so no UndefinedTag:0x** allowed). Many of the functions I'm after (e.g., JSON_REMOVE()) don't accept wildcards at all. Hell, I've even had trouble finding known keys, and I suspect that silly colon in the key name might have something to do with it.
So, how can I clean up my table and remove the many forms of this UndefinedTag problem? Maybe it's easier to just go back to the regex_replace plan and deal instead with the trailing commas?

Regex for matching with and without quotes for dynamic JSON

I have the following text strings:
"Name":"John"}]
"Age":36
"Address":"ABC,PQR234[]/.,#ANYCHARACTERS"
"Gender":null
I need to get two groups (key value pair) from this such that the output would be only:
Key|Value
Name|John
Age|36
Address|ABC,PQR234[]/.,#ANYCHARACTERS
The requirement is to have a single regex to grab everything in the double quotes if the double quotes are present. If not, take the value without the quotes.
In our example above, 36 and null are the one without the quotes and they need to be captured as well.
I have tried a lot but have failed to do so.
UPDATE:
I don't know why I am getting down votes for this question. Yes this is JSON that I am trying to parse but there is a reason behind why I am doing this and not using any document parser.
I am supposed to use Talend for getting a dynamic JSON converted into Key Value Pair. What I mean by dynamic is the fields of the JSON can vary and hence I do not have a fixed schema and hence cannot use a document parser (which demands a fixed structure of JSON). I am devising a solution to get around this using Normalizer (on comma) and then extracting the key value pair which will be in double quotes using Regular Expressions. I tried many things on my own and since I am not an expert in Regular expressions, I have come here to get inputs.
If you know any better solution to this, I would be very happy to get your inputs.
How about this?
/"?([^\n"]*)"?:"?([^\n"]*)"?/
Explained in detail at:
https://regex101.com/r/UM0rl2/1/

Parsing XML into JSON - trying to replace all "$" key values with regex or stop it all together

So I have been taking some XML and parsing it through xml2js package to get JSON. The parsing worked fine though I keep getting "$" for the list of attributes. For example:
<AddresseeInformation name="SOME COMPANY LTD"></AddresseeInformation>
Becomes
"AddresseeInformation":[{"$":{"name":"SOME COMPANY LTD"}}]
I dont want this as MongoJS is saying that key $ must not start with '$' when I try to upload it to the DB - so I am going to have to change every instance of "$" or figure out how to stop it from happening.
Here is the regex I have tried to change every instance:
JSONstring.replace('"$"'/g, '"init"');
.replace(/"$"/g, '"init"');
.replace(/'"$"'/g, '"init"');
None of that worked - so I am sending it out to you guys - also if anyone knows how to stop the XML attributes being parsed with a key of "$" I would love you forever.
This, under "Options", seems to suggest that creating the parser with new Parser({attrkey:"init"}) does what you want.
For the RegExp solution, the problem is that $ has a special meaning and you have to escape it: /"\$"/g. (Although technically it should probably be /"\$"\s*:/g in case a value happens to be "$". Or /(?<!\\)"\$"\s*:/g if you really want to be careful and exclude keys ending in "$. And RegExp probably isn't the best solution anyways.)

Elixir - Capitalized keys in structs

I am trying to write a CLI client in Elixir for an API so that I can login to the API system, fetch the data I need for my calculation and then logout. I have defined a Packet.Login struct that supposed to be my internal data structure that I end up with after parsing the JSON I receive.
I am using Poison to parse the JSON. The problem is that it seems like, because of the API returning capitalised properties, I can't match them when printing or parsing, as Poison will return a map with these capitalized keys. The problem is that it seems impossible for me to use the alias like this. If I try to use another syntax,
packet[:Token]
it still does not work and instead gives me an error. But this time about Packet.Login not implementing the Access behaviour. I can understand that part, but not the first issue. And I'm trying to keep the code stupid simple.
defmodule Packet.Login do
defstruct [:Data, :Token]
end
defimpl String.Chars, for: Packet.Login do
def to_string(packet) do
"Packet:\n---Token:\t\t#{packet.Token}\n---Data:\t#{packet.Data}"
end
end
loginPacket = Poison.decode!(json, as: %Packet.Login{})
IO.puts "#{loginPacket}"
When trying to compile the above I get this:
** (CompileError) lib/packet.ex:31: invalid alias: "packet.Token". If you wanted to define an alias, an alias must expand to an atom at compile time but it did not, you may use Module.concat/2 to build it at runtime. If instead you wanted to invoke a function or access a field, wrap the function or field name in double quotes
(elixir) expanding macro: Kernel.to_string/1
Is there a way for me to fix this somehow? I have thought of parsing the map and de-capitalizing all fields first, but I would rather not.
Why can't I have capitalized keys for a struct? It seems like I can though, as long as I don't try to use them.
In order to access a field of a map which is an atom starting with an uppercase letter, you need to either put the key in quotes, e.g. foo."Bar" or use the bracket syntax, e.g. foo[:Bar]. foo.Bar in Elixir is parsed as an alias. With structs, you cannot use the bracket syntax, so the easiest way is to use quotes around the field name. In your code, you'll therefore need to change:
"Packet:\n---Token:\t\t#{packet.Token}\n---Data:\t#{packet.Data}"
to:
"Packet:\n---Token:\t\t#{packet."Token"}\n---Data:\t#{packet."Data"}"
I could not find this documented clearly anywhere but Elixir's source mentions this in some places and also uses this syntax to access some functions in :erlang which have names that are not valid identifiers in Elixir, e.g. :erlang."=<".
Fun fact: you can define functions in Elixir that can only be called with this quote syntax as well:
iex(1)> defmodule Foo do
...(1)> def unquote(:"!##")(), do: :ok
...(1)> end
iex(2)> Foo."!##"()
:ok

Using regex to extract data from structured data

The problem I'm facing here is that I have a blob of text which contains structured data (in the form of a JSON payload) and I'm interested in extracting the value of one of the keys for a specific JSON instance, picture the structured data inside as the following:
"Item 1": {"key1":"item1_key1_value", "key2":"item1_key2_value", "key3":"item1_key3_value"}, "Item 2": {"key1":"item2_key1_value", "key2":"item2_key2_value", "key3":"item2_key3_value"}
What I would like to use is use regex to grab item1_key2_value for instance. The keys all have the same name but the items are different. So I know which key for which Item I need but am not quite sure of the regex to retrieve that value. I've tried a few approaches to some basic matching but was wondering if any other more experienced regex users could direct me a bit here and explain what I'm doing wrong
1(.)(?=item1_key2_value.) will match a chunk of data from here but I'm not sure of the best way to reduce it to the value that I need.
The regex syntax for JSON is clearly specified at http://www.json.org. If you scroll down a little to where it says "A string is a sequence of", you will find the proper string structure.
Assuming the string follows the correct JSON structure, you could use
"key2"\s*:\s*"((\\.|[^\\"])*)"
where \s means whitespace and * means 0 or more times. \\ means a slosh (backslash) character and can be followed by . (any character). If it does not encounter a slosh, then it instead looks for [^\\"], which means not slosh nor quote.
If you want to be a little more strict to the exact JSON form, you could try
"key2"\s*:\s*"((\\["\\/bfnrtu]|[^\\"])*)"
which you can see follows the string form on the webpage more closely.