What should be the schema for the JSON in Big Query? - json

I have the following JSON and i have to import it to Big Query. What schema should i specify for the below JSON? What should be the field names of the table? I am using BigQuery WebUI.
{
"users": {
"userid1mohan": {
"password": "123456",
"username": "mohan"
},
"userid2kutubuddin": {
"password": "234567",
"username": "kutubuddin"
},
"userid3pankaj": {
"password": "345678",
"username": "pankaj"
},
"userid4vivek": {
"password": "456789",
"username": "vivek"
}
}
}

Please note that BigQuery will easily ingest CSVs and newline delimited JSONs, but not a plain JSON file like the one provided in the question.
Find a specification on the newline delimited JSON format here: http://dataprotocols.org/ndjson/
For a use case like this one, the nljson would need to look like:
{"username":"kutubuddin","password":"456789"}
{"username":"pankaj","password":"312231"}
{"username":"vivek","password":"123h1"}
So you'll need to transform the json object you have into multiple json objects, one in each line, before ingesting it into BigQuery.

Related

Azure Data Factory - convert Json Array to Json Object

I retrieve data using Azure Data Factory from an OnPremise database and the output I get is as follows:
{
"value":[
{
"JSON_F52E2B61-18A1-11d1-B105-XXXXXXX":"{\"productUsages\":[{\"customerId\":3552,\"productId\":120,\"productionDate\":\"2015-02-10\",\"quantity\":1,\"userName\":\"XXXXXXXX\",\"productUsageId\":XXXXXX},{\"customerId\":5098,\"productId\":120,\"productionDate\":\"2015-04-07\",\"quantity\":1,\"userName\":\"ZZZZZZZ\",\"productUsageId\":ZZZZZZ}]}"
}
]
}
The entire value array is being serialized into a JSON and I end up with:
[{
"productUsages":
[
{
"customerId": 3552,
"productId": 120,
"productionDate": "2015-02-10",
"quantity": 1,
"userName": "XXXXXXXX",
"productUsageId": XXXXXX
},
{
"customerId": 5098,
"productId": 120,
"productionDate": "2015-04-07",
"quantity": 1,
"userName": "ZZZZZZZ",
"productUsageId": ZZZZZZZ
}
]
}]
I need to have a Json Object at a root level, not Json Array ([] replaced with {}). What's the easiest way to achieve that in Azure Data Factory?
Thanks
In ADF When you read any Json file it will read as array of Objects by default :
Sample data While reading Json data:
Data preview:
But when you want to move data to sink in Json format you have option called Set of objects you need to select that:
Sample data While storing in sink in form of Json data:
Output

Sending a request without specifying the fields in the body

I'm trying to send several PUT request to a specific endpoint.
I am using a CSV file within the columns and values, the name of the different columns reference the different variables inside the body, do you get me?
This is the endpoint:
{{URL_API}}/products/{{sku}} --sku is the id of the product, i created that variable because i use it to pass the reference
This is the body that i use:
{
"price":"{{price}}",
"tax_percentage":"{{tax_percentage}}",
"store_code":"{{store_code}}",
"markup_top":"{{markup_top}}",
"status":"{{status}}",
"group_prices": [
{
"group":"{{class_a}}",
"price":"{{price_a}}",
"website":"{{website_a}}"
}
]
}
I don't want to use the body anymore.. sometimes some products have more tan 1 group of prices, i mean:
"group_prices": [
{
"group":"{{class_a}}",
"price":"{{price_a}}",
"website":"{{website_a}}"
},
{
"group":"{{class_b}}",
"price":"{{price_b}}",
"website":"{{website_b}}"
}
Is it possible to create something like this in the CSV file?
sku,requestBody
99RE345GT, {JSON Payload}
How should i declare the {JSON Payload}?
Can you help me?
EDIT:
This is the CSV file i used:
sku,price,tax_percentage,store_code,markup_top,status,class_a,price_a,website_a
95LB645R34ER,147000,US-21,B2BUSD,1.62,1,CLASS A,700038.79,B2BUSD
I want to pass the JSON within the CSV file, i mean
sku,requestBody
95LB645R34ER,{"price":"147000","tax_percentage":"US-21","store_code":"B2BUSD","markup_top":"1.62","status":"1","group_prices":
[{ "group":"CLASS A","price":"700038.79","website":"B2BUSD"}]}
Is it okay?Should i specify anything on the request body or not? I read the documentation posted in POSTMAN website but i did not find an example like this.
As you're using JSON data as a payload, I would use a JSON file rather than a CSV file in the Collection Runner. Use this as a template and save it as data.json, it's in the correct format accepted in the Runner.
[
{
"sku": "95LB645R34ER",
"payload": {
"price": "147000",
"tax_percentage": "US-21",
"store_code": "B2BUSD",
"markup_top": "1.62",
"status": "1",
"group_prices": [
{
"group": "CLASS A",
"price": "700038.79",
"website": "B2BUSD"
}
]
}
},
{
"sku": "MADEUPSKU",
"payload": {
"price": "99999",
"tax_percentage": "UK-99",
"store_code": "BLAH",
"markup_top": "9.99",
"status": "5",
"group_prices": [
{
"group": "CLASS B",
"price": "88888.79",
"website": "BLAH"
}
]
}
}
]
In the pre-request Script of the request, add this code. It's creating a new local variable from the data under the payload key in the data file. The data needs to be transformed into a string so it's using JSON.stringify() to do this.
pm.variables.set("JSONpayload", JSON.stringify(pm.iterationData.get('payload'), null, 2));
In the Request Body, add this:
{{JSONpayload}}
In the Collection Runner, select the Collection you would like to run and then select the data.json file that you previously created. Run the Collection.

Need to relationalise a nested JSON string using Pyspark

I am new to Pyspark and need guidance to perform the following task.
A sample data in the form of a JSON string has been given,
{
"id": "1234",
"location": "znd",
"contact": "{\"phone\": [{\"number\":\"12345\",\"code\":\"111\",\"altno\":\"No\"},{\"number\":\"55656\",\"code\":\"222\",\"altno\":\"Yes\"}]}"
}
This needs to be rationalized as follows, as seen below one row of input will get translated to 2 rows.
{id: "1234", "location": "znd","number": "12345", "code": "111","altno":"No"}
{id: "1234", "location": "znd","number": "55656", "code": "222","altno":"No"}
I have tried to use the explode function but as this is a JSON string, explode does not work on it.
I have read the data into a DF and tried to enforce a struct type to later use explode, but that does not work either.

Reading complex json data without iteration

I am working with some data and often the data is nested and i am required to perform some CRUD operations based on the structure of the data i have. For instance i have this json structure
{
"_id": "KnNLkJEhrDsvWedLu",
"createdAt": {
"$date": "2016-10-13T11:24:13.843Z"
},
"services": {
"password": {
"bcrypt": "$2a$30$1/cniPwPNCuwZ/MQDPQkLej..cAATkoGX.qD1TS4iHgf/pwZYE.j."
},
"email": {
"verificationTokens": [
{
"token": "qxe_T9IS7jW7gntpK0Q7UQ35RJ9jO9m2lclnokO3z87",
"address": "drwho#gmail.com",
"when": {
"$date": "2016-10-13T11:24:14.428Z"
}
}
]
},
"resume": {
"loginTokens": []
}
},
"username": "doctorwho",
"emails": [
{
"address": "drwho#gmail.com",
"verified": false
}
],
"persodata": {
"lastlogin": {
"$date": "2016-10-13T11:29:36.816Z"
},
"fname": "Doctor",
"lname": "Who",
"mobile": "+4480000000",
"identity": "1",
"email": "drwho#gmail.com",
"gender": null
}
}
I have several data sets with such complex structure. I need to read the data, edit and also delete. Before i get to iteration, i was wondering how i can read the data without iteration then iterate when i absolutely have to.
What are the rules i should keep in mind when reading such complex json structures to enable me read any complex structure i come across?.
I am currently using javascript but i am looking for rules that apply in other languages as well.
Parsing Json in JavaScript should be easy. http://www.json.org/js.html.
"Since JSON is a proper subset of JavaScript, the compiler will correctly parse the text and produce an object structure". Just follow the examples on that page.
If you want to use another language, in Java you could use Jackson or Gson to map those json strings to objects. Then using them becomes easy. Both libraries are annotation based, and wouldn't be difficult to implement.

Access object returned from Newtonsoft json DeserializeObject

Should be a no brainer, but I'm can't seem to access the elements returned from Newtonsoft's json deserializer.
Example json:
{
"ns0:Test": {
"xmlns:ns0": "http:/someurl",
"RecordCount": "6",
"Record": [{
"aaa": "1",
"bbb": "2",
},
{
"aaa": "1",
"bbb": "2",
}]
}
}
var result = Newtonsoft.Json.JsonConvert.DeserializeObject<dynamic>(somestring);
Stripping out the json up to the Record text, i can access the data without issue.
i.e. result.Recordcount
If i leave the json as shown above, can someone enlighten me how to access Recordcount?
All inputs appreciated. Thanks!
For those JSON properties that have punctuation characters or spaces (such that they cannot be made into valid C# property names), you can use square bracket syntax to access them.
Try this:
int count = result["ns0:Test"].RecordCount;