I am getting data in an array in .json format, but there is a very large amount of it and the response time is very long. Is there a tag I can add to the url, such as data==4, to get only 4 objects inside a 'products' array?
This is the url:
https://fr.openfoodfacts.org/cgi/search.pl?action=process&tagtype_0=categories&tag_contains_0=contains&tag_0=breakfast_cereals&tagtype_1=nutrition_grades&tag_contains_1=contains&tag_1=A&additives=without&ingredients_from_palm_oil=without&json=1
Related
I'm trying to implement a very simple single get call, and the response returns some text with a bunch of ids separated by newline (like a single column csv). I want to save each one as a row in a dataset.
I understand that in general the Rest connector saves each response as a new row in an avro file, which works well for json responses which can then be parsed in code.
However in my case I need it to just save the response in a txt or csv file, which I can then apply a schema to, getting each id in its own row. How can I achieve this?
By default, the Data Connection Rest connector will place each response from the API as a row in the output dataset. If you know the format type of your response, and it's something that would usually be parsed to be one row per newline (csv for example), you can try setting the outputFileType to the correct format (undefined by default).
For example (for more details see the REST API Plugin documentation):
type: rest-source-adapter2
outputFileType: csv
restCalls:
- type: magritte-rest-call
method: GET
path: '/my/endpoint/file.csv'
If you don't know the format, or the above doesn't work regardless, you'll need to parse the response in transforms to split it into separate rows, this can be done as if the response was a string column, in this case exploding after splitting on newline (\n) might be useful: F.explode(F.split(F.col("response"), r'\n'))
I'm using Azure Data Factory and trying to convert a JSON file that is an array of JSON objects into separate JSON files each contain one element e.g. the input:
[
{"Animal":"Cat","Colour":"Red","Age":12,"Visits":[{"Reason":"Injections","Date":"2020-03-15"},{"Reason":"Check-up","Date":"2020-01-02"}]},
{"Animal":"Dog","Colour":"Blue","Age":1,"Visits":[{"Reason":"Check-up","Date":"2020-02-08"}]},
{"Animal":"Guinea Pig","Colour":"Green","Age":5,"Visits":[{"Reason":"Injections","Date":"2019-12-01"},{"Reason":"Check-up","Date":"2020-02-26"}]}
]
However, I've tried Data Flow to split this array up into single files containing each element of the JSON array but cannot work it out. Ideally I would also want to name each file dynamically e.g. Cat.json, Dog.json and "Guinea Pig.json".
Is Data Flow the correct tool for this with Azure Data Factory (version 2)?
Data Flows should do it for you. Your JSON snippet above will generate 3 rows. Each of those rows can be sent to a single sink. Set the Sink as a JSON sink with no filename in the dataset. In the Sink transformation, use the 'File Name Option' of 'As Data in Column'. Add a Derived Column before that which sets a new column called 'filename' with this expression:
Animal + '.json'
Use the column name 'filename' as data in column in the sink.
You'll get a separate file for each row.
In my MS Azure datafactory, I have a rest API connection to a nested JSON dataset.
The Source "Preview data" shows all data. (7 orders from the online store)
In the "Activity Copy Data", is the menu tab "Mapping" where I map JSON fields with the sink SQL table columns. If I under "Collection Reference" I select None, all 7 orders are copied over.
But if I want the nested metadata, I select the meta field in "Collection Reference", then I get my nested data, in multiple order lines, each with a one metadata point, but I only get data from 1 order, not 7
I think I have a reason for my problem. One of the fields in the nested meta data, is both a string and array. But I still don't have a solution
sceen shot of meta data
Your sense is right,it caused by your nested structure meta data. Based on the statements of Collection Reference property:
If you want to iterate and extract data from the objects inside an
array field with the same pattern and convert to per row per object,
specify the JSON path of that array to do cross-apply. This property
is supported only when hierarchical data is source.
same pattern is key point here, I think. However, your data inside metadata array are not same as your screenshot.
My workaround is using Azure Blob Storage to make a transition, REST API ---> Azure Blob Storage--->Your sink dataset. Inside Blob Storage Dataset, you could flatten the incoming JSON data with Cross-apply nested JSON array setting:
You could refer to this blog to learn about this feature. Then you could copy the flatten data into your destination.
My input file is line-delimited JSON objects (one object per line). Not every key is guaranteed to exist in each and every object.
After reading in the input file, how can I convert it to CSV where the header is the combined list of all possible keys. If a particular object doesn't have a key-value, that column is just empty.
I m fetching data from database and putting it into TreeMap and putting TreeMap object in JSONObject as i required json object returned to ajax call where i will be just showing the all data i fetched in TreeMap. I want all the data to e sorted alphabatically so i put that in TreeMap. Problem is that i am not getting data in sorted manner..This would be pattern for the data that i supposed to sort : "Mumbai-Navi Mumbai-Andheri Branch","Mumbai-Navi Mumbai-Bandra Branch"..etc
Help me out guys