Converting JSON txt file to XML - json

We're constructing a network of data and part of that includes modifying a search query from a public website to pull all of the data we want. That data, however, when pulled is stored into a JSON txt file.
Ultimately we want this data to be stored in an Access Database so the next step, we thought, was to convert it to XML so we can have an Excel sheet to import. We found a formatting tool (http:jsonformatter.org). When running the tool we received the following error:
“Microsoft Access has encountered an error processing the XML schema in file ‘Data.xml’,
A document must contain exactly one root element”
I've no idea what this entails or where to start debugging. Are there alternatives we might consider?

The error says that there is more than one root element. Have you validated the XML generated? I looked at the website. I tried to ask via comment but I don't have enough rep but you should post some of your json and xml.
If I am reading your issue correctly, you are converting json to xml format and then to excel?
I would suggest writing some code to consume the json and export the xml files to import.

Related

Malformed records are detected in schema inference parsing json

I have a really frustrating error trying to parse basic Json read from Blob Storage using a data set within ADF
My Json is below
[{"Bid":0.197514880839,"BaseCurrency":"AED"}
,{"Bid":0.535403560434,"BaseCurrency":"AUD"}
,{"Bid":0.351998712241,"BaseCurrency":"BBD"}
,{"Bid":0.573128306234,"BaseCurrency":"CAD"}
,{"Bid":0.787556605631,"BaseCurrency":"CHF"}
,{"Bid":0.0009212964,"BaseCurrency":"CLP"}
,{"Bid":0.115389497248,"BaseCurrency":"DKK"}
]
I have tried all 3 Json source settings and every one of them gives the error
Malformed records are detected in schema inference. Parse Mode: FAILFAST
The 3 settings as in
Single Document
Array Of Documents
Document Per Line
Can anyone help? I just simply need this to be a list of objects thats it!
Paul
It should work for the JSON setting - Array of documents.
We face this issue when we have the json file with UTF-8 BOM encoding, the ADF DataFlow is unable to parse such files. You can specify the encoding as UTF-8 without encoding while creating the file, it will work.
In my case, I am using copy activity to merge and create the json file and have specified encoding as UTF-8 without BOM, and it resolved my issue.
Note: For some reason, we cant use the dataset which has "UTF-8 without BOM" encoding in DataFlow, in that case, you can create two datasets one with default UTF-8 encoding (which will be used in DataFlow) and one with UTF-8 without BOM(which will be used in copy activity sink/while creating a file).
Thank you.

ADF Merge-Copying JSON files in Copy Data Activity creates error for Mapping Data Flow

I am trying to do some optimization in ADF. Setup is a third-party tool copies one JSON file per object to a BLOB storage container. These feed to a Mapping Data Flow. The individual files written by the third party tool work great. If I copy these files to a different BLOB folder using an Azure Copy Data activity, the MDF can no longer parse the files and gives an error: "JSON parsing error, unsupported encoding or multiline." I started this with a Merge Files, but outcome is same regardless of copy behavior I choose.
2ND EDIT: After another day's work, I have found that the Copy Activity Merge File from JSON to JSON definitely adds an EOL character to each single JSON object as it gets imported to the Merge file. I have also found that the MDF fails definitely with those EOL characters in the Merge file. If I remove all EOL characters from the Merge file, the same MDF will work. For me, this is a bug. The copy activity is adding a character that breaks the MDF. There seems to be a second issue in some of my data that doesn't fail as an individual file but does when concatenated that breaks the MDF when I try to pull all the files together, but I have tested the basic behavior on 1-5000 files and been able to repeat the fail/success tests.
I took the original file, and the copied file, ran them through all of sorts of test, what I eventually found when I dump into Notepad++:
Copied file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\r\n
Original file:
{"CustomerMasterData":{"Customer":[{"ID":"123456","name":"Customer Name",}]}}\n
If I change the copied file from ending with \r\n to \n, the MDF can read the file again. What is going on here? And how do I change the file write behavior or the MDF settings so that I can concatenate or copy files without the CRLF?
EDIT: NEW INFORMATION -- It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Other possibly relevant information:
Both the source and sink JSON datasets are set to use UTF-8 (not default(UTF-8), although I tried that). Would a different encoding fix this?
I have tried remapping schemas, creating new data sets, creating new Mapping Data Flows, still get the same error.
EDITED for clarity based on comments:
In the case of a single JSON element in a file, I can get this to work -- data preview returns same success or failure as pipeline when run
In the case of multiple documents merged by ADF I get the below instead. It seems on further review like maybe the minification/whitespace removal is the culprit. If I download the file created by the ADF copy and format it using a JSON formatter, it works. Maybe the CRLF -> LF masked something else. I'm not sure what to do at this point, but its super frustrating.
Repro: Create any valid JSON as a single file, put it in blob storage, use it as a source in a mapping data flow, to do any sink operation. Create a second file with same schema, get them both to run in same flow using wildcard paths. Use a Copy Activity with Merge Files as the Sink Copy Activity and Array of Objects as the File pattern. Try to make your MDF use this new file. If it fails, download the file created by ADF, run it through a formatter (I have used both VS Code -> "Format Document" from standard VS Code JSON extension, and VS 2019 "Unminify" command) and reupload... It should work now.
don't know if you already solved the problem: I came across the exact same problem 3 days ago and after several tries I found a solution:
in the copy data activity under sink settings, use "set of objects" (instead of "array of objects") under File Pattern, so that the merged big JSON has the value of the original small JSON files written per line
in the MDF after setting up the wildcard paths with the *.json pattern, under JSON Settings select: Document per line as the Document form.
After that you should be good to go, as least it solved my problem. The automatic written CRLF in "array of objects" setting in the copy data activity should be a default setting and MSFT should provide the option to omit it in the settings in the future.
According to my test:
1.copy data activity can't change unix(LF) to windows(CRLF).
2.MDF can also parse unix(LF) file and windows(CRLF) file.
Maybe there is something else wrong.
By the way,I see there is a comma after "name":"Customer Name" in your Original file,I delete it before my test.

How to create a BQ-schema from XSD

I need some guidance on how to proceed with a problem.
Our integration team receives xml files which are converted to json and sent to pub/sub. We then ingest the json files (or are supposed to) into bigquery.
The problem is that the xml files do not include all possible objects or values all the time. So, I cant create a correct schema in bq to receive the json files. I got the xsd file with an extension file which gives me all possible objects but I don't know how to convert this to a correct bq schema.
Do you have any suggestions on how to create a bq schema from xsd files? I was thinking that if I create an xml file with dummy data (including all objects and more than one object when creating repeated objects) with help of the xsd maybe that xml file may be converted to json and then use the auto-schema detection of bq.
Any suggestions?
Thanks,
Cris
If you have the XSD schema files, you can convert these to a valid JSON schema. There are a few tools that can help you to accomplish this.
Keep in mind that the tools are for general purposes and not for the particular case of BigQuery, so you'll have to tune the result to get a valid JSON schema. For this check the components of a BigQuery schema, and for quick reference the sample provided in the documentation.

What does a JSON file do?

I went through previous posts on SO and some of the answers say that a JSON file is used to send data from server to client.
Well that seems to be okay but then we can create package.json, Apidoc.json, manifest.json which do not interact with the client and server
So can someone tell me what actually is a JSON file?
JSON stands for JavaScript Object Notation. It is used to describe a data structure in a simple format. It can be a plain text file, which may be used to pass data from the server to a client, but it could be equally used to hold and consume that data at the same layer e.g. you could have a configuration file at the client side which is read an interpreted by your application.
Note also that JSON does not need to be held in a file; you could create a string variable with JSON data in it and pass this from one method to another without ever storing it in a file.
The tag definition in Stack Overflow can be found here https://stackoverflow.com/tags/json/info and further information can be found here https://www.json.org/.
JSON is a file format, just like CSV. Just because CSV is used with Microsoft Excel, does not mean that is all it is used for (just like with JSON). Just because it is common to get info from a server in JSON format, does not mean that is all JSON is used for. Do some googling before asking a question like this on Stack Overflow.
Here is an intro to JSON. JSON Intro W3Schools

Migrating from Lighthouse to Jira - Problems Importing Data

I am trying to find the best way to import all of our Lighthouse data (which I exported as JSON) into JIRA, which wants a CSV file.
I have a main folder containing many subdirectories, JSON files and attachments. The total size is around 50MB. JIRA allows importing CSV data so I was thinking of trying to convert the JSON data to CSV, but all convertors I have seen online will only do a file, rather than parsing recursively through an entire folder structure, nicely creating the CSV equivalent which can then be imported into JIRA.
Does anybody have any experience of doing this, or any recommendations?
Thanks, Jon
The JIRA CSV importer assumes a denormalized view of each issue, with all the fields available in one line per issue. I think the quickest way would be to write a small Python script to read the JSON and emit the minimum CSV. That should get you issues and comments. Keep track of which Lighthouse ID corresponds to each new issue key. Then write another script to add things like attachments using the JIRA SOAP API. For JIRA 5.0 the REST API is a better choice.
We just went through a Lighthouse to JIRA migration and ran into this. The best thing to do is in your script, start at the top-level export directory and loop through each ticket.json file. You can then build a master CSV or JSON file to import into JIRA that contains all tickets.
In Ruby (which is what we used), it would look something like this:
Dir.glob("path/to/lighthouse_export/tickets/*/ticket.json") do |ticket|
JSON.parse(File.open(ticket).read).each do |data|
# access ticket data and add it to a CSV
end
end