Fetching XML attributes through JSON - json

I am using ClusterPoint database and I have imported XML document which contains tags and attributes:
<db>
<document>
<id>1</id>
<name lang="en">John</name>
<name lang="ru">Джон</name>
</document>
<document>
<id>2</id>
<name lang="en">Bill</name>
<name lang="ru">Билл</name>
</document>
</db>
When I use the JSON RestAPI to retrieve my document I get the following:
"documents": [
{
"id": "1",
"name": [
"John",
"Джон"
]
},
{
"id": "2",
"name": [
"Bill",
"Билл"
]
}]
Is it possible to somehow get the attribute?

I stumbled on the same problem some time ago too. I would not be surprised if there is no way to get attributes back in JSON (as of now), because documentation does not list such scenario.
It's not just Clusterpoint with similar problems. For example - if you would take particular XML, load it in PHP with simplexml and then encode it to JSON, attributes would be dropped too. XML does not map to JSON 1:1 and things which are ambiguous require special care or they get dropped.
There is not enough information on how to convert
<name lang="en">John</name>
<name lang="ru">Джон</name>
And there is no 1 common standard on how to do things. It's clear that this should result in array. But how to structure it? We could convert each element to object and then store objects in array. However what key to give to values? Something like this might be plausible:
{ "name": [
{"lang": "en", "name": "John"},
{"lang": "ru", "name": "Джон"}
]}
But again we run into problem, that its hard to convert data back as we do not know if this came from attributes. We might use different setup:
{ "name": [
{"#lang": "en", "name": "John"},
{"#lang": "ru", "name": "Джон"}
]}
Then # would indicate that this came from attribute. Or like this:
{ "name": [
{"#lang": "en", "#value": "John"},
{"#lang": "ru", "#value": "Джон"}
]}
to indicate that #value always holds value which was in tag.
To sum up - it looks like currently there is no way to get attributes in JSON. If there is - guys/girls from Clusterpoint should add detailed info to their docs. However if there really is no way to get attributes, then I bet that in future versions there will be some documented standard that Clusterpoint will use.
While there is no documented standard on how Clusterpoint handles XML attribute conversion to JSON and you really need attributes - just use XML. Or if you want JSON you can reshape your documents to something that's does not require attributes.

Related

Add data to a json file using Talend

I have the following JSON:
[
{
"date": "29/11/2021",
"Name": "jack",
},
{
"date": "30/11/2021",
"Name": "Adam",
},
"date": "27/11/2021",
"Name": "james",
}
]
Using Talend, I wanna add 2 lines to have something like:
[
{
"company": "AMA",
"service": "BI",
"date": "29/11/2021",
"Name": "jack",
},
{
"company": "AMA",
"service": "BI",
"date": "30/11/2021",
"Name": "Adam",
},
"company": "AMA",
"service": "BI",
"date": "27/11/2021",
"Name": "james",
}
]
Currently, I use 3 components (tJSONDocOpen, tFixedFlowInput, tJSONDocOutput) but I can't have the right configuration of components in order to get the job done !
If you are not comfortable with json .
Just do these steps :
In the metaData just create a FileJson like this then paste it in your job as a tFileInputJson
Your job design and mapping would be
In your tFileOutputJson don't forget to change in the name of the data block "Data" with ""
What you need to do there according to the Talend practices is read your JSON. Then extract each object of it, add your properties and finally rebuild your JSON in a file.
An efficient way to do this is using tMap componenent like this.
The first tFileInputJSON will have to specify what properties it has to read from the JSON by setting your 2 objects in the mapping field.
Then the tMap will simply add 2 columns to your main stream, here is an example with hard coded string values. Depending on you needs, this component will also offer you the possibility to assign dynamic data to your 2 new columns, it's a powerful tool for manipulating the structure of a data stream.
You will find more infos about this component in the official documentation : https://help.talend.com/r/en-US/7.3/tmap/tmap; especially the "tMap scenarios" part.
Note
Instead of using the tMap, if you are comfortable with Java, you can use a tjavaRow instead. Using this, you can setup your 2 new columns with whatever java code you want to put as long as you have defined the output schema of the component.
output_row.Name = input_row.Name;
output_row.date = input_row.date;
output_row.company = "AMA";
output_row.service = "BI";

Reflect vs Regex to check for empty JSON array properties in Golang

I receive a JSON Array from our Client which properties are empty:
[
{},{},{},{},{}
]
Normally it looks like this e.g:
[
{"Name": "foo", "Text": "Costumer"},
{"Name": "foo", "Text": "Employer"},
{"Name": "foo", "Text": "Costumer"},
{"Name": "foo", "Text": "Emplopyer"},
{"Name": "foo", "Text": "Employer"}
]
As far as my teacher said there is 2 possible ways to check for those empty properties:
regexp package && reflect package
Which should I use for performance?
And please explain why u would choose that package over the other
The most performant and error-proof way would be to manually parse the JSON tokens yourself with json's Decoder.Token and related methods.
This avoids the json package's normal use of Reflect entirely (since you're not unmarshaling into an arbitrary struct), and it avoids error-prone regular expressions. It will likely out-perform regex, too, but a benchmark will be necessary to be sure.
But it will be some verbose, and arguably ugly code.

JSON schema reference another element in the document

Is there a way to express reference to another element in the same JSON document using JSON schema? The title might be a bit confusing, but i'm not looking for the "$ref" attribute, which references another type, but I'm curious, if there is a way, to reference another element in the document using a specified field. I know this is possible to enforce using xsd for xml documents, not sure about JSON.
I want to do something like this:
{
"people": [
{ "id": "1", "name": "A" },
{ "id": "2", "name": "B" },
{ "id": "3", "name": "C" }
],
"chosenOne": "1" // I want the schema to enforce a person ID here
}
I have been looking at the schema definition of v4: http://json-schema.org/draft-04/schema but didn't find anything, that looks like what I'm trying to do. Did I just miss it?
What you want is that you describe a reference ($ref) in the object your schema is describing.
kind of like this
{
"people": []
"chosenOne": { $ref: "#1"}
}
(or maybe a pointer if you want the value of the Id (https://json-spec.readthedocs.io/pointer.html)
I know of no direct way to do this but you might be able to use the pattern or oneof properties to force it being the right value. Kind of like this
"properties": {
"chosenOne"
"type": "string",
"oneOf": ["1","2","3"]
]
},
}
Similarly you could force the value of the property to be a reference pattern. That said since there is no reference value type (http://www.tutorialspoint.com/json/json_data_types.htm) only number or string you can't guarantee the meaning of the value. You can just guarantee that if follows some kind of reference pattern.
If you need more than what json schema's can give you you might want to look in odata for example. OData has some extra things so you can describe an entitySet and then define a navigation property to that set.
It does however force you to follow the odata structure so you aren't as free as you would be with a regular json schema.

Representing child objects in JSON for a mobile API

I am designing a JSON API for a mobile app and have to decide how to show child objects from the server to the client. Typically the request from the client to the server will be a single request to sync and the response will include all the objects that need updating. What is the best way to show the objects?
Option A - Nested Children:
{ "articles": [
{ "id" : 1,
"title": "This is the first article",
"comments": [
{"id": "1",
"article_id" : "1",
"title": "A comment on the first article"
}]
},
{ "id" : 2,
"title": "This is the second article",
"comments": [
{"id": "2",
"article_id" : "2",
"title": "A comment on the second article"
}]
}, ]}
OPTION B - All Objects on Their Own
{ "articles": [
{ "id" : 1,
"title": "This is the first article",
}
{ "id" : 2,
"title": "This is the second article",
}]
"comments": [
{"id": "1",
"article_id" : "1",
"title": "A comment on the first article"
},
{"id": "2",
"article_id" : "2",
"title": "A comment on the second article"
}]}
On the client side I can handle either format and build the relationship based on the article_id field so I am not too sure why nest the children, other than that it makes it look nice. However, when I think about writing tests for the client-side, especially the mapping of json to objects it seems easier to show and map each object on its own. I am a beginner here so any thoughts would be helpful.
PS. I am building the server using Rails/Grape and the clients with RestKit/Coredata (iOS) and probably RoboSpice/ORMLite (Android).
That's very subjective. There isn't one correct answer for that. It really depends on whatever approach is more suited to your task and data. You say this is a request used to sync data. How is the data represented and stored on the client side? If flat, like a relational database, then the flat output is probably easier to use. On the other hand, if the client will use the relationships a lot, it's probably better to use the nested structure.
From an API design standpoint, I'd have the endpoint for the articles collection accept a query parameter like expand, with a level number, or named entities, and it would add the nested children accordingly. So, for instance, GET /api/articles?expand=comments would generate output with nested comments, or GET /api/articles?expand=1 to generate output with all immediate children. That way, clients can easily generate the nested output if they need it, or they can query the endpoints for articles and comments separately and concatenate the output if they need the flat data.,

Can you put comments in Avro JSON schema files?

I'm writing my first Avro schema, which uses JSON as the schema language. I know you cannot put comments into plain JSON, but I'm wondering if the Avro tool allows comments. E.g. Perhaps it strips them (like a preprocessor) before parsing the JSON.
Edit: I'm using the C++ Avro toolchain
Yes, but it is limited. In the schema, Avro data types 'record', 'enum', and 'fixed' allow for a 'doc' field that contains an arbitrary documentation string. For example:
{"type": "record", "name": "test.Weather",
"doc": "A weather reading.",
"fields": [
{"name": "station", "type": "string", "order": "ignore"},
{"name": "time", "type": "long"},
{"name": "temp", "type": "int"}
]
}
From the official Avro spec:
doc: a JSON string providing documentation to the user of this schema (optional).
https://avro.apache.org/docs/current/spec.html#schema_record
An example:
https://github.com/apache/avro/blob/33d495840c896b693b7f37b5ec786ac1acacd3b4/share/test/schemas/weather.avsc#L2
Yes, you can use C comments in an Avro JSON schema : /* something */ or // something Avro tools ignores these expressions during the parsing.
EDIT: It only works with the Java API.
According to the current (1.9.2) Avro specification it's allowed to put in extra attributes, that are not defined, as metadata:
This allows you add comments like this:
{
"type": "record",
"name": "test",
"comment": "This is a comment",
"//": "This is also a comment",
"TODO": "As per this comment we should remember to fix this schema" ,
"fields" : [
{
"name": "a", "type": "long"
},
{
"name": "b", "type": "string"
}
]
}
No, it can't in the C++ nor the C# version (as of 1.7.5). If you look at the code they just shove the JSON into the JSON parser without any comment preprocessing - bizarre programming style. Documentation and language support appears to be pretty sloppy...