I have a large JSON file that looks similar to the code below. Is there anyway I can iterate through each object, look for the field "element_type" (it is not present in all objects in the file if that matters) and extract or write each object with the same element type to a file? For example each user would end up in a file called user.json and each book in a file called book.json?
I thought about using javascript but to my knowledge js can't write to files, I also tried to do it using linux command line tools by removing all new lines, then inserting a new line after each "}," and then iterating through each line to find the element type and write it to a file. This worked for most of the data; however, where there were objects like the "problem_type" below, it inserted a new line in the middle of the data due to the nested json in the "times" element. I've run out of ideas at this point.
{
"data": [
{
"element_type": "user",
"first": "John",
"last": "Doe"
},
{
"element_type": "user",
"first": "Lucy",
"last": "Ball"
},
{
"element_type": "book",
"name": "someBook",
"barcode": "111111"
},
{
"element_type": "book",
"name": "bookTwo",
"barcode": "111111"
},
{
"element_type": "problem_type",
"name": "problem object",
"times": "[{\"start\": \"1230\", \"end\": \"1345\", \"day\": \"T\"}, {\"start\": \"1230\", \"end\": \"1345\", \"day\": \"R\"}]"
}
]
}
I would recommend Java for this purpose. It sounds like you're running on Linux so it should be a good fit.
You'll have no problems writing to files. And you can use a library like this - http://json-lib.sourceforge.net/ - to gain access to things like JSONArray and JSONObject. Which you can easily use to iterate through the data in your JSON request, and check what's in "element_type" and write to a file accordingly.
Related
I have the following JSON:
[
{
"date": "29/11/2021",
"Name": "jack",
},
{
"date": "30/11/2021",
"Name": "Adam",
},
"date": "27/11/2021",
"Name": "james",
}
]
Using Talend, I wanna add 2 lines to have something like:
[
{
"company": "AMA",
"service": "BI",
"date": "29/11/2021",
"Name": "jack",
},
{
"company": "AMA",
"service": "BI",
"date": "30/11/2021",
"Name": "Adam",
},
"company": "AMA",
"service": "BI",
"date": "27/11/2021",
"Name": "james",
}
]
Currently, I use 3 components (tJSONDocOpen, tFixedFlowInput, tJSONDocOutput) but I can't have the right configuration of components in order to get the job done !
If you are not comfortable with json .
Just do these steps :
In the metaData just create a FileJson like this then paste it in your job as a tFileInputJson
Your job design and mapping would be
In your tFileOutputJson don't forget to change in the name of the data block "Data" with ""
What you need to do there according to the Talend practices is read your JSON. Then extract each object of it, add your properties and finally rebuild your JSON in a file.
An efficient way to do this is using tMap componenent like this.
The first tFileInputJSON will have to specify what properties it has to read from the JSON by setting your 2 objects in the mapping field.
Then the tMap will simply add 2 columns to your main stream, here is an example with hard coded string values. Depending on you needs, this component will also offer you the possibility to assign dynamic data to your 2 new columns, it's a powerful tool for manipulating the structure of a data stream.
You will find more infos about this component in the official documentation : https://help.talend.com/r/en-US/7.3/tmap/tmap; especially the "tMap scenarios" part.
Note
Instead of using the tMap, if you are comfortable with Java, you can use a tjavaRow instead. Using this, you can setup your 2 new columns with whatever java code you want to put as long as you have defined the output schema of the component.
output_row.Name = input_row.Name;
output_row.date = input_row.date;
output_row.company = "AMA";
output_row.service = "BI";
I'm pretty new to Spark and to teach myself I have been using small json files, which work perfectly. I'm using Pyspark with Spark 2.2.1 However I don't get how to read in a single data line instead of the entire json file. I have been looking for documentation on this but it seems pretty scarce. I have to process a single large (larger than my RAM) json file (wikipedia dump: https://archive.org/details/wikidata-json-20150316) and want to do this in chuncks or line by line. I thought Spark was designed to do just that but can't find out how to do it and when I request the top 5 observations in a naive way I run out of memory. I have tried RDD .
SparkRDD= spark.read.json("largejson.json").rdd
SparkRDD.take(5)
and Dataframe
SparkDF= spark.read.json("largejson.json")
SparkDF.show(5,truncate = False)
So in short:
1) How do I read in just a fraction of a large JSON file? (Show first 5 entries)
2) How do I filter a large JSON file line by line to keep just the required results?
Also: I don't want to predefine the datascheme for this to work.
I must be overlooking something.
Thanks
Edit: With some help I have gotten a look at the first observation but it by itself is already too huge to post here so I'll just put a fraction of it here.
[
{
"id": "Q1",
"type": "item",
"aliases": {
"pl": [{
"language": "pl",
"value": "kosmos"
}, {
"language": "pl",
"value": "\\u015bwiat"
}, {
"language": "pl",
"value": "natura"
}, {
"language": "pl",
"value": "uniwersum"
}],
"en": [{
"language": "en",
"value": "cosmos"
}, {
"language": "en",
"value": "The Universe"
}, {
"language": "en",
"value": "Space"
}],
...etc
That's very similar to Select only first line from files under a directory in pyspark
Hence something like this should work :
def read_firstline(filename):
with open(filename, 'rb') as f:
return f.readline()
# files is a list of filenames
rdd_of_firstlines = sc.parallelize(files).flatMap(read_firstline)
After downloading a parse Class, I found that it stores file type column as:
{ "results": [
{
"createdAt": "2015-10-27T15:06:37.324Z",
"file": {
"__type": "File",
"name": "uniqueidentifier1-filename.ext",
"url": "http://files.parsetfss.com/example-file-url.png"
},
"objectId": "8eBlOHHchQ",
"updatedAt": "2015-10-27T15:06:37.324Z"
},
{
"createdAt": "2015-10-27T14:35:02.853Z",
"file": {
"__type": "File",
"name": "uniqueidentifier2-filename.ext",
"url": "http://files.parsetfss.com/example-file-url.png"
},
"objectId": "B2tg7tBsHL",
"updatedAt": "2015-10-27T14:35:02.853Z"
}] }
For an app, I need to locally construct a JSON class like this and then manually upload it to the parse app. So I save the file first to parse and the get the file name and file url by file.url() and file.name() and then construct an object like this:
object.file.name = file.name();
object.file.url = file.url();
This works fine and sets the url and name keys as expected. However, after this if I do
object.file['__type'] = 'file'
the object.file object get converted into some weird parse file object and console.log(object) gives (notice the extra underscore and no __type key)
file: b.File
_name: "uniqueidentifier1-filename.ext"
_url: "http://files.parsetfss.com/example-file-url.png"
but console.log(object.file) gives properly
Object {url: "http://files.parsetfss.com/example-file-url.png", name: "uniqueidentifier1-filename.ext", __type: "File"}
saving the object in a text file also gives the same result as console.log(object). However, I want the text file to be similar to how parse actually stores it so that I can then upload the text file to a parse class.
In Javascript, call the toJSON() function on your PFObject which returns a JSON object suitable for saving on Parse.
I am using QT 5.3. I have read various materials present online describing how to write json file,but no content describes it systematically and stepwise.
It would be really helpful if someone can explain the stepwise process of writing a json file in simple language since i am new to qt.
In my case i have a json file that already exists "LOM.json" with some content.How do i add new data to this.
{
"LOM": [
{
"LOM ID": 1,
"Source": "Open Internet",
"Content": "Complete Reference Java.pdf",
"Difficulty Level": "Hard",
"Type": "Text",
"Length": "Long",
"Topic-Proficiency": [
{
"Topic": "Programming",
"Proficiency": "E2"
},
{
"Topic": "Java",
"Proficiency": "E3"
}
]
},
{
"LOM ID": 2,
"Source": "Open Internet",
"Content": "www.LatexTutorial.com",
"Difficulty Level": "Medium",
"Type": "WebCourse",
"Length": "Medium",
"Topic-Proficiency": [
{
"Topic": "Latex",
"Proficiency": "E2"
}
]
}
]
}
Thanks.
You can't directly insert data into the middle of the document. You would need to read the document and write it out again. Let's look at how we'd go about this.
Assuming the current JSON you posted is in memory as a QByteArray, you create a QJsonDocument:-
QJsonDocument doc = QJsonDocument::fromJson(data); // where data is the current JSON
If we want to add another LOM object to the array. We get the first object, which is the array:-
QJsonObject rootObj = doc.object();
QJsonValue lomObj = rootObj.value("LOM");
if(!lomObj.isArray())
{
// array expected - handle error
}
QJsonArray lomArray = lomObj.toArray();
Now we have the array, we can create a new object
QJsonObj newObject;
newObject["LOM ID"] = 3;
newObject["Source"] = "Open Internet"
newObject["Content"] = "Some other content"
//etc...
And add this to the array
lomArray.push_back(newObject);
Finally, you can create a new document and get a byte array of the data to write to the file
QJsonDocument newDoc(obj);
QByteArray finalData = newDoc.toJson();
I finally got it done.
Actually the mistake was that while declaring the QJsonObject and QjsonArray,i was declaring them as pointer type that's why it was not allowing to insert qjsonobject to qjsonarray.
As far as writing to already existing json file is concerned,firstly the file is to be opened and content is to be read in qjsonarray or object.Next the changes to be done are appended to the read data(in qjson object or qjsonarray) and finally the new value is inserted to the read document by removing the previous one.
Thanks #merlin069 and this post -Qt modifying a JSON file.
I am a real dumb with HTML and JavaScript, so please excuse any dumbness.
I am using D3 Tree Diagram, but I need to load a JSON file instead of writing it inside the JS script, which the name of the file to be loaded will be chose by the user in a select tag. Here's the D3 code
First, how can I load/read a JSON file, lets say exampleNodes.json,
And then, how can I pass the name of the selected select tag so that it reads the appropriate JSON?
Thanks for your patience, and help. Thank you.
in code
var treeData = [
{
"name": "Top Level",
"parent": "null",
"children": [
{
"name": "Level 2: A",
"parent": "Top Level",
"children": [
{
"name": "Son of A",
"parent": "Level 2: A"
},
{
"name": "Daughter of A",
"parent": "Level 2: A"
}
]
},
{
"name": "Level 2: B",
"parent": "Top Level"
}
]
}
];
you have to save it on data.json file like
{
"treeData" : [ ... your data array ...]
}
after that in d3.json() function you will receive this object
d3.json("data.json",function(json){
// do your coding
// or all code put inside one function and call it after data loaded
});
if you are using google chrome than it will gave you error on data reading from json because security Google Chrome not allow read files from file system you can get data in Firefox. to make it run upload your code on some local server. i.e in WampServer or Apache tomcat etc.