Extracting JSON data to PDF file - json

I am trying to extract the data that I receive from a REST client in a JSON format into a PDF file. I know that I need to format it in columns/sections so first I need to convert to a text format, but is there a way to do that in Ruby? If so, does anyone have an example?
Here is the format of the JSON data that I am getting from the REST API:
{"id"=>123456, "documentKey"=>"xyz", "globalId"=>"xyz", "itemType"=>1234,
"project"=>123, "createdDate"=>"2015-02-20T00:11:56.000+0000",
"modifiedDate"=>"2015-02-20T00:11:56.000+0000",
"lastActivityDate"=>"2016-03-02T16:23:52.000+0000",
"createdBy"=>1234, "modifiedBy"=>12342,
"fields"=>{"name"=>"Introduction",
"globalId"=>"Text",
"documentKey"=>"Text-2",
"description"=>"Some introduction"
}
}

Check out Prawn. It does not just 'do' this for you, you will still have to figure out how to properly transform the hierarchical json data into a flat 'text-like' data. You will have to make decisions like, do I want to display timestamps, show empty values, etc.
Here is a very crude example:
require 'prawn'
data = {"id"=>123456, "documentKey"=>"xyz", "globalId"=>"xyz", "itemType"=>1234, "project"=>123, "createdDate"=>"2015-02-20T00:11:56.000+0000", "modifiedDate"=>"2015-02-20T00:11:56.000+0000", "lastActivityDate"=>"2016-03-02T16:23:52.000+0000", "createdBy"=>1234, "modifiedBy"=>12342, "fields"=>{"name"=>"Introduction", "globalId"=>"Text", "documentKey"=>"Text-2", "description"=>" Some introduction"}}
Prawn::Document.generate('example.pdf') do
text "Project: #{data['project']}"
text "Item Type: #{data['itemType']}"
text "Description: #{data['fields']['description']}"
end
For anything more advanced I would check the prawn manual.
The other quick option is to create an HTML template and convert that to PDF, and there are multiple gems for this as well such as Wicked_PDF or PDFKit

Related

Reading JSON in Azure Synapse

I'm trying to understand the code for reading JSON file in Synapse Analytics. And here's the code provided by Microsoft documentation:
Query JSON files using serverless SQL pool in Azure Synapse Analytics
select top 10 *
from openrowset(
bulk 'https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.jsonl',
format = 'csv',
fieldterminator ='0x0b',
fieldquote = '0x0b'
) with (doc nvarchar(max)) as rows
go
I wonder why the format = 'csv'. Is it trying to convert JSON to CSV to flatten the file?
Why they didn't just read the file as a SINGLE_CLOB I don't know
When you use SINGLE_CLOB then the entire file is important as one value and the content of the file in the doc is not well formatted as a single JSON. Using SINGLE_CLOB will make us do more work after using the openrowset, before we can use the content as JSON (since it is not valid JSON we will need to parse the value). It can be done but will require more work probably.
The format of the file is multiple JSON's like strings, each in separate line. "line-delimited JSON", as the document call it.
By the way, If you will check the history of the document at GitHub, then you will find that originally this was not the case. As much as I remember, originally the file included a single JSON document with an array of objects (was wrapped with [] after loaded). Someone named "Ronen Ariely" in fact found this issue in the document, which is why you can see my name in the list if the Authors of the document :-)
I wonder why the format = 'csv'. Is it trying to convert json to csv to flatten the hierarchy?
(1) JSON is not a data type in SQL Server. There is no data type name JSON. What we have in SQL Server are tools like functions which work on text and provide support for strings which are JSON's like format. Therefore, we do not CONVERT to JSON or from JSON.
(2) The format parameter has nothing to do with JSON. It specifies that the content of the file is a comma separated values file. You can (and should) use it whenever your file is well formatted as comma separated values file (also commonly known as csv file).
In this specific sample in the document, the values in the csv file are strings, which each one of them has a valid JSON format. Only after you read the file using the openrowset, we start to parse the content of the text as JSON.
Notice that only after the title "Parse JSON documents" in the document, the document starts to speak about parsing the text as JSON.

is there any way where we can load deformed json into python object?

i am getting a json data after hitting an API .
when i try to load that json into python using json.loads(response.text), I am getting a delimiter error .
when checked few fields in json dose not have "," separating them.
{
"id":"142379",
"label":"1182_Mailer_PROD",
"location":"Bangalore, India",
"targetType":"HTTPS performance",
"frequency":"15",
"fails":"2764",
"totalUptime":"85.32"
"tests":[
{"date":"09-24-2019 09:31","status":"Could not resolve: mailer.accenture.com (DNS server returned answer with no data)","responseTime":"0.000","dnsTime":"0.000","connectTime":"0.000","redirectTime":"0.000","firstbyteTime":"0.000","lastbyteTime":"0.000","pingLoss":"0.00","pingMin":"0.000","pingAvg":"0.000","pingMax":"0.000","size":"0","md5hash":"(null)"}
]
}
,
{
"id":"158651",
"label":"11883-GR dd-WSP",
"location":"Chicago, IL",
"targetType":"Performance transaction",
"frequency":"15",
"fails":"5919",
"totalUptime":"35.14"
,"tests":[
{"date":"09-24-2019 09:26","status":"Keywords not found - Working","responseTime":"0.669","stepresults":[
{"stepid":"1","date":"09-24-2019 09:26","status":"OK","responseTime":"0.453","dnsTime":"0.000","connectTime":"0.025","redirectTime":"0.264","firstbyteTime":"0.141","lastbyteTime":"0.024","size":"22351","md5hash":"ca002cf662980511a9faa88286f2ee96"},
{"stepid":"2","date":"09-24-2019 09:26","status":"Keywords not found - Working","responseTime":"0.216","dnsTime":"0.000","connectTime":"0.023","redirectTime":"0.000","firstbyteTime":"0.171","lastbyteTime":"0.022","size":"22457","md5hash":"38327404e4f2392979aa7dfa27118f4e"}
]}]
}
This is a small chunk of data from the response , as you could see "totalUptime":"85.32" doesn't have a comma separating it.
could you please let me know how can we load the data into python object even though the json is deformed
Deformed JSON is not JSON, so obviously you can't load it with a standard procedure. There are only two possibilities to load it:
Create your own parser
Modify the input to conform to the JSON standard
Both possibilities need you to define what format do you want to import. If it is OK for your format not to have commas then you have to define what your delimiters are.
From the example you posted is difficult to make any definitive assessment about how the input format is defined. So you probably will have to write a rudimentary parser and approximate it by try and error to the input you are trying to parse.

How to convert a string into json format in flink side?

I received a one digit as string format, for example, which look like 12.
What I want to do is to convert those strings into a json format and
write them to the text file in my local directory.
However, I didn't get the right solution except for those things that manually change the strings so that it looks like the json format. I think it is tedious and laborious tasks.
The completed json format will be shown as below.
{"temperature": 12}
Is there any libraries that achieve my issue?
Thanks.
Check out Gson. In particular, if you have a Java class with a single "temperature" field, then see this for how to convert to Json.

What format does BigQuery json export use?

I was trying to load data from Google's json export, but it looks like it's not valid JSON (ECMA-404),(RFC 7159),(RFC 4627). Here is what I'm expecting for json newline:
[{},{},{}]
But here is what it's giving:
{}{}{}
Here's an example output from clicking the "Download as JSON" button on a four-row query result:
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OkQ8IMAV","c3":"Luxembourg","c4":"German","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OkQ8IMAV","c3":"Luxembourg","c4":"German","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OjUuEMAV","c3":"Luxembourg","c4":"French - Parisian","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
{"c0":"001U0000016lf5jIAA","c1":"Tim Burton's Corpse Bride","c2":"a0KU000000OkQ8IMAV","c3":"Luxembourg","c4":"German","c5":"Sub & Audio","c21":null,"c22":"2025542.0"}
Is there a reason why BigQuery is using this export format for json? Are there other Google services or something that are dependent on this format, or why would it be pushing a non-standard json format? (Maybe I'm just misunderstanding json line format). Note, this is from the web-UI, not the API, which gives valid json.
BigQuery reads and outputs new-line delimited JSON - this because traditional JSON doesn't adapt well to the needs of big data.
See:
http://specs.okfnlabs.org/ndjson/
https://en.wikipedia.org/wiki/JSON_streaming#Line-delimited_JSON
The output of "Download as JSON" shown in the question is compatible with the JSON input that BigQuery can read.
Note that the web UI also offers to look at the results of a query as JSON - and those results are formatted as a traditional JSON object. I'm not sure what was the design decision to have this incompatible output here - but results in that form won't be able to be imported back to BigQuery.
So in general, this format is incompatible with BigQuery:
While this is compatible with BigQuery:
Why is this less traditional JSON format the best choice in the big data world? Encapsulating a trillion rows within [...] defines a single object with a trillion rows - which is hard to parse and handle. New line delimited JSON solves this problem, with each row being an independent object.

Python: Dump JSON Data Following Custom Format

I'm working on some Python code for my local billiard hall and I'm running into problems with JSON encoding. When I dump my data into a file I obviously get all the data in a single line. However, I want my data to be dumped into the file following the format that I want. For example (Had to do picture to get point across),
My custom JSON format
. I've looked up questions on custom JSONEncoders but it seems they all have to do with datatypes that aren't JSON serializable. I never found a solution for my specific need which is having everything laid out in the manner that I want. Basically, I want all of the list elements to on a separate row but all of the dict items to be in the same row. Do I need to write my own custom encoder or is there some other approach I need to take? Thanks!