How to store JSON data on local machine? - json

This question has plagued me for months, now, and no matter how many articles and topics I read, I've gotten no good information...
I want to send a request to a server which returns a JSON file. I want to take those results and load them into tables on my local machine. Preferably Access or Excel so I can sort and manipulate the data.
Is there a way to do this...? Please help!!

Google comes up with this: json2excel.
Or write your own little application.
EDIT
I decided to be nice and write a python3 application for you. Use on the command line like this python jsontoxml.py infile1.json infile2.json and it will output infile1.json.xml and infile2.json.xml.
#!/usr/bin/env python3
import json
import sys
import re
from xml.dom.minidom import parseString
if len(sys.argv) < 2:
print("Need to specify at least one file.")
sys.exit()
ident = " " * 4
for infile in sys.argv[1:]:
orig = json.load(open(infile))
def parseitem(item, document):
if type(item) == dict:
parsedict(item, document)
elif type(item) == list:
for listitem in item:
parseitem(listitem, document)
else:
document.append(str(item))
def parsedict(jsondict, document):
for name, value in jsondict.items():
document.append("<%s>" % name)
parseitem(value, document)
document.append("</%s>" % name)
document = []
parsedict(orig, document)
outfile = open(infile + ".xml", "w")
xmlcontent = parseString("".join(document)).toprettyxml(ident)
#http://stackoverflow.com/questions/749796/pretty-printing-xml-in-python/3367423#3367423
xmlcontent = re.sub(">\n\s+([^<>\s].*?)\n\s+</", ">\g<1></", xmlcontent, flags=re.DOTALL)
outfile.write(xmlcontent)
Sample input
{"widget": {
"debug": "on",
"window": {
"title": "Sample Konfabulator Widget",
"name": "main_window",
"width": 500,
"height": 500
},
"image": {
"src": "Images/Sun.png",
"name": "sun1",
"hOffset": 250,
"vOffset": 250,
"alignment": "center"
},
"text": {
"data": "Click Here",
"size": 36,
"style": "bold",
"name": "text1",
"hOffset": 250,
"vOffset": 100,
"alignment": "center",
"onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;"
}
}}
Sample output
<widget>
<debug>on</debug>
<window title="Sample Konfabulator Widget">
<name>main_window</name>
<width>500</width>
<height>500</height>
</window>
<image src="Images/Sun.png" name="sun1">
<hOffset>250</hOffset>
<vOffset>250</vOffset>
<alignment>center</alignment>
</image>
<text data="Click Here" size="36" style="bold">
<name>text1</name>
<hOffset>250</hOffset>
<vOffset>100</vOffset>
<alignment>center</alignment>
<onMouseUp>
sun1.opacity = (sun1.opacity / 100) * 90;
</onMouseUp>
</text>
</widget>

It's probably overkill, but MongoDB uses JSON-style documents as it's native format. That means you can insert your JSON data directly with little or no modifications. It can handle JSON data on its own, without you having to jump through hoops to force your data into a more RDBMS-friendly format.
It is open source software and available for most major platforms. It can also handle extreme amounts of data and multiple servers.
Its command shell is probably not as easy to use as Excel or Access, but it can do sorting etc on its own, and there are bindings for most programming languages (e.g. C, Python and Java) if you find that you need to do more tricky stuff.
EDIT:
For importing/exporting data from/to other more common formats MongoDB has a couple of useful utilities. CSV is supported, although you should keep in mind that JSON uses structured objects and it is not easy to come up with a direct mapping to a table-based model like CSV, especially on a schema-free database like MongoDB.
Converting JSON to CSV or any other RDBMS-friendly format comes close to (if it does not outright enter) the field or Object-Relational Mapping which in general is neither simple nor something that can be easily automated.
The MongoDB tools, for example, allow you to create CSV files, but you have to specify which field will be in each collumn, implicitly assuming that there is in fact some kind of schema in your data.
MongoDB allows you to store and manipulate structured JSON data without having to go through a cumbersome mapping process than can be very frustrating. You would have to modify your way of thinking, moving a bit away from the conventional tabular view of databases, but it allows you to work on the data as it is intended to be worked on, rather than try to force the tabular model on it.

Json (like xml) is a tree rather than a literal table of elements. You will need to populate the table by hand (essentially doing a stack of SQL LEFT JOINS) or populate a bunch of tables and manipulate the joins by hand.
Or is the JSON flat packed? It MAY be possible to do what you're asking, I'm just pointing out that there's no guarantee.
If it's a quick kludge, and the data is flatpacked then a quick script to read the json, dump to csv and then open in Excel will probably be easiest.

Storing in Access or Excel can not be done easily I guess. You would have to essentially parse the json string with any programming language that supports it (PHP, NodeJS, Python, .. all have native support for it) and then use a library to output an Excel sheet with the data.
Something else that could be an option depending on how versed you are with programming languages is to use something like the ElasticSearch search engine or the CouchDB database that both support json input natively. You could then use them to query the content in various ways.

I've kinda done that before. Turn JSON into HTML table. that means, you can turn into csv.
However here are something you need to know
1) JSON data must be well format into predefined structure. e.g.
{
[
['col1', 'col2', 'col3'],
[data11, data12, data13],
...
]
}
2) U have to parse the data row by row, column by column. and you have to take care of missing data or unmatch column, if possible. Of course, you have to aware of data type.
3) My experience is, if you have ridicuously large data, then doing that will kill client's browser. You have to progressively get formatted HTML or CSV data from server.
as suggested by nightcracker above, try the google tool. :)

Related

User Titles in Firebase Realtime Database

so I am importing a JSON file of (Example) Users into Realtime Database and have noticed a slight issue.
When added, each User is sorted by the number order they are in on the JSON File. Since it starts with User 4, it has the value of 1. As seen here:
.
The User Json is formatted as such:
{
"instagram": "null",
"invited_by_user_profile": "null",
"name": "Rohan Seth",
"num_followers": 4187268,
"num_following": 599,
"photo_url": "https://clubhouseprod.s3.amazonaws.com:443/4_b471abef-7c14-43af-999a-6ecd1dd1709c",
"time_created": "2020-03-17T07:51:28.085566+00:00",
"twitter": "rohanseth",
"user_id": 4,
"username": "rohan"
}
Is there some easy way to make it so the titles in Firebase are the User ID's of each user instead of the numbers currently used?
When you import a JSON into the Firebase Realtime Database, it uses whatever keys exist in the JSON. There is no support to remap the keys during the import.
But of course you can do this with some code:
For example, you can change the JSON before you import it, to have the keys you want.
You can read the JSON in a small script, and then insert the data into Firebase through its API.
I'd recommend against using sequential numeric IDs as keys though. To learn why, have a look at this blog post: Best Practices: Arrays in Firebase..

Methods to convert CSV to unique JSON

I need to convert a .CSV to a specific .JSON format, and I decided to use the csvtojson package from NPM since it seemed to be designed for this sort of thing.
First, a little background on my problem. I have .CSV data that looks similar to this:
scan_type, date/time, source address, source-lat, source-lng, dest address, dest-lat, dest-lng
nexpose,2016-07-18 18:21:44,1008,40.585260,-10.124120,10.111.131.4,10.844880,-10.933360
I have a plain csv-to-json converter here , but there is a problem. It outputs a rather plain file, and I need to split the source/destination, and add some special formatting that I will show later on.
[
{
"scan_type": "nexpose",
"date/time": "2026-07-28 28:22:44",
"source_address": 2008,
"source-lat": 10.58526,
"source-lng": -105.08442,
"dest_address": "11.266.282.0",
"dest-lat": 11.83388,
"dest-lng": -111.82236
}
]
The first thing I need to do is be able to separate the "source" values, from the "destination" values. Here is an example of what I want the "source" values to look like:
(var destination will be exactly the same format)
var source={
id: "99.58926-295.09492",
"source-lat": 49.59926,
"source-lng": -209.98942,
"source_address": 2009,
x: {
valueOf: function() {
var latlng=[
49.58596,
-209.08442
];
var xy = map.FUNCTION_FOR_CONVERTING_LAT_LNG_TO_X_Y(latlng);
return xy[0]; //xy.x
},
y: {
valueOf: function(){
varlatlng=[
49.58596,
-209.08442
];
var xy = map.FUNCTION_FOR_CONVERTING_LAT_LNG_TO_X_Y(latlng);
return xy[1]; //xy.y
}
}
So, my question is, how should I approach converting my data? Should I convert everything with csvtojson? Or should I convert it from the plain .JSON file I generated?
Does anyone have any advice, or similar examples, they could share on how to approach this problem?
I do a lot of work with parsing CSV data and as I am sure you have seen, CSV is very hard to parse and work with correctly as there are a huge number of edge cases that can break even the most rugged of parsers (although it looks like your dataset is fairly plain so that isn't a huge concern). Not to mention you could potentially run into corruption by performing operations while reading from disk, so it is a much better idea to get the data from CSV into a JSON file and then make any manipulations to a JSON object loaded from that "plain" JSON file.
tl;dr: convert your data from the plain .JSON file

Parsing a json column in tsv file to Spark RDD

I'm trying to port an existing Python (PySpark) script to Scala in an effort to improve performance.
I'm having trouble with something troublingly basic though -- how to parse a json column in Scala?
Here is the Python version
# Each row in file is tab separated, example:
# 2015-10-10 149775392 {"url": "http://example.com", "id": 149775392, "segments": {"completed_segments": [321, 4322, 126]}}
action_files = sc.textFile("s3://my-s3-bucket/2015/10/10/")
actions = (action_files
.map(lambda row: json.loads(row.split('\t')[-1]))
.filter(lambda a: a.get('url') != None and a.get('segments') != None and a.get('segments').get('completed_segments') != None)
.map(lambda a: (action['url'], {"url": action['url'], "action_id": action["id"], "completed_segments": action["segments"]["completed_segments"],}))
.partitionBy(100)
.persist())
Basically, I'm just trying to parse the json column and then transform it into a simplified version that I can process further in SparkSQL
As a new Scala user, I'm finding that there are dozens of libraries json parsing libraries for this simple task. Doesn't look like there is one in the stdlib. From what I've read so far, looks like the languages strong typing is was makes this simple task a bit of a chore.
I'd appreciate any push in the right direction!
PS. By the way, if I'm missing something obvious that is making the PySpark version crawl, I'd love to hear about it! I'm porting a Pig Script from Hadoop/MR, and performance dropped from 17min with MR to over 5 and a half hours on Spark! I'm guessing it is serialization overhead to and from Python....
If your goal is to pass data to SparkSQL anyway and you're sure that you don't have malformed fields (I don't see any exception handling in your code) I wouldn't bother with parsing manually at all:
val raw = sqlContext.read.json(action_files.flatMap(_.split("\t").takeRight(1)))
val df = raw
.withColumn("completed_segments", $"segments.completed_segments")
.where($"url".isNotNull && $"completed_segments".isNotNull)
.select($"url", $"id".alias("action_id"), $"completed_segments")
Regarding you Python code:
don't use != to compare to None. A correct way is to use is / is not. It is semantically correct (None is a singleton) and significantly faster. See also PEP8
don't duplicate data unless you have to. Emitting url twice means a higher memory usage and subsequent network traffic
if you plan to use SparkSQL check for missing values can be perform on a DataFrame, same as in Scala. I would also persist DataFrame not a RDD.
On a side note I am rather skeptical abut serialization being a real problem here. There is an overhead but a real impact shouldn't be anywhere near to what you've described.

Explaining JSON (structure) .. to a business user

Suppose you have some data you would want business users to contribute to, which will end up being represented as JSON. Data represents a piece of business logic your program knows how to handle.
As expected, JSON has nested sections, data has categorizations, some custom rules may optionally be introduced etc.
It so happens that you already a vision of what "a perfect" JSON should look like. That JSON is your starting point.
Question:
Is there a way one can take a (reasonably complex) JSON and present it in a (non-JSON) format, that would be easy for a non-technical person to understand?
If possible, could you provide an example?
What do you think of this?
http://www.codeproject.com/script/Articles/ArticleVersion.aspx?aid=90357&av=126401
Or, make your own using Ext JS for the visualization part. After all, JSON is a lingua franca on the web these days.
Apart from that, you could use XML instead of JSON, given that there are more "wizard" type tools for XML.
And finally, if when you say "business users" you mean "people who are going to laugh at you when you show them code," you should stop thinking about this as "How do I make people in suits edit JSON" and start thinking about it as "How do I make a GUI that makes sense to people, and I'll make it spit out JSON later."
Show them as key, value pairs. If your value has sub sections then show them as drill downs/tree structure. An HTML mockup which parses a JSON object in your system would help in the understanding.
Picked this example from JSON site
{
"name": "Jack (\"Bee\") Nimble",
"format": {
"type": "rect",
"width": 1920,
"height": 1080,
"interlace": false,
"frame rate": 24
}
}
Name,format would be the tree nodes.

Efficient Portable Database for Hierarchical Dataset - Json, Sqlite or?

I need to make a file that contains a hierarchical dataset. The dataset in question is a file-system listing (directory names, file name/sizes in each directory, sub-directories, ...).
My first instinct was to use Json and flatten the hierarchy using paths so the parser doesn't have to recurse so much. As seen in the example below, each entry is a path ("/", "/child01", "/child01/gchild01",...) and it's files.
{
"entries":
[
{
"path":"/",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
},
{
"path":"/child01",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
},
{
"path":"/child01/gchild01",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
},
{
"path":"/child02",
"files":
[
{"name":"File1", "size":1024},
{"name":"File2", "size":1024}
]
}
]
}
Then I thought that repeating the keys over and over ("name", "size") for each file kind of sucks. So I found this article about how to use Json as if it were a database - http://peter.michaux.ca/articles/json-db-a-compressed-json-format
Using that technique I'd have a Json table like "Entry" with columns "Id", "ParentId", "EntryType", "Name", "FileSize" where "EntryType" would be 0 for Directory and 1 for File.
So, at this point, I'm wondering if sqlite would be a better choice. I'm thinking that the file size would be a LOT smaller than a Json file, but it might only be negligible if I use Json-DB-compressed format from the article. Besides size, are there any other advantages that you can think of?
I think a Javascript object for datasource, loaded as a file stream into the browser and then used in javascript logic in the browser would consume the least time and have good performance.. BUT only until a limited hierarchy size of the content.
Also, not storing the hierarchy anywhere else and keeping it only as a JSON file badly limits your data source's use in your project to client-side technologies.. or forces conversions to other technologies.
If you are building a pure javascript based application (html, js, css only app), then you could keep it as JSON object alone.. and limit your hierarchy sizes.. you could split bigger hierarchies into multiple files linking json objects.
If you will have server-side code like php, in your project,
Considering managebility of code, and scaling, you should ideally store the data in SQLite DB, at runtime create your json hierarchies for limited levels as ajax loads from your page.
If this is the only data your application stores then you can do something really simple like just store the data in an easy to parse/read text file like this:
File1:1024
File2:1024
child01
File1:1024
File2:1024
gchild01
File1:1024
File2:1024
child02
File1:1024
File2:1024
Files get File:Size and directories get just their name. Indentation gives structure. For something slightly more standard but just as easy to read, use yaml.
http://www.yaml.org/
Both can benefit from decreased file size (but decreased user readability) by gzipping the file.
And if you have more data to store, then use SQLite. SQLite is great.
Don't use JSON for data persistence. It's wasteful.