Multiple JSON files, parse, and load into tables - json

I'm a real beginner when it comes time for this, so I apologize in advance.
The long and short of what I am looking for is a fairly simple concept - I want to pull JSON data off a server, parse it, and load it into excel, access, or some other type of tables. Basically, I want to be able to store the data so I can filter, sort, and query it.
To make matters a little more complicated, the server will only return truncated results with each JSON, so it will be necessary to make multiple requests to the server.
Are there tools out there or code available which will help me do what I am looking for? I am completely lost, and I have no idea where to start.
(please be gentle)

I'm glad seeing this question b/c I'm doing very similar things! And based on what I'd gone through, it has lot to do with how those tables are designed or even linked together at first, and then the mapping between these tables and different JSON objects at different depth or position in the original JSON file. After the mapping rules are made clear, the code can be done by merely hard-coding the mapping(I mean like: if you got JSON object after a certain parent of it, then you save the data into certain table(s)) if you're using some high level JSON paring library.

OK as i have to dash home from the office now:
Assuming that you are going to use Excel to Parse the data you are going to need:
1.Some Json Parser JSON Parser for VBA
2.Some code to download the JSON
3.A loop of VBA code that loops through each file and parses it into a sheet.
Is this ok for a starter? If you are struggling let me know and I will try and knock something up a little better over the weekend.

Related

Azure Data Factory finding "malformed" records in valid JSON

I'm developing an ADF pipeline that reads in a JSON file from ADLS, removes two entities that I don't need, and writes the resulting file back to ADLS. This requires a data flow, but functionality wise this is simple stuff.
I identify the JSON file to the source, it says fine. I do the import of the schema, it loads the schema without incident. I go to "inspect" and it shows me the entities and complex objects all in the correct positions, everything's good.
When I go to "Preview Data" it tells me there are "malformed records" in the source. Well, no, there aren't, and 3 separate online JSON validation engines have confirmed that. The document does not have badly formed records.
Before you ask, I have tried all three document types (single document, one document per line, and array of documents) and all of them get the same response. Allow schema drift and validate schema, neither produces any better results.
Has anyone else run into this? And is there a work-around? Or should I abandon all hope that ADF will ever be able to successfully read in valid JSON?
UPDATE: Trailing CR/LF caused the issue. I've seen this in other engagements, now we build something to trim off the trailing characters and we'll be good to go.

Uploading Data into Redis

I am working on school project where we need to create website and use Redis to search database, in my case it will be movie database. I got JSON file with names and ratings of 100 movies. I would like to upload this dataset into Redis instead on entering the entire dataset manually. JSON file is saved on my desktop and I am using Ubuntu 20.04.
Is there a way to do it?
I have never used Redis so my question might be very silly. I've been looking all over the internet and cannot find exactly what needs to be done. I might be googling incorrect question maybe that's why I cannot find the answer.
Any help would be appreciated.
Write an appropriate program to do the job. There's no one-size-fits-all process because how your data is structured in redis is up to you; once you decide on that, it should be easy to write a program to parse the JSON and insert the data.

Apache Spark-Get All Field Names From Nested Arbitrary JSON Files

I have run into a somewhat perplexing issue that has plagued me for several months now. I am trying to create an Avro Schema (schema-enforced format for serializing arbitrary data, basically, as I understand it) to convert some complex JSON files (arbitrary and nested) eventually to Parquet in a pipeline.
I am wondering if there is a way to get the superset of field names I need for this use case staying in Apache Spark instead of Hadoop MR in a reasonable fashion?
I think Apache Arrow under development might be able to help avoid this by treating JSON as a first class citizen eventually, but it is still aways off yet.
Any guidance would be sincerely appreciated!

Get tweet JSON into SQL table with nodejs

I'm getting some tweets from the twitter API with nodejs and saving some of the data with mysql. Gradually as I require more and more data from each tweet, it's become obvious I should just save the whole tweet each time. I'm looking for the cleanest way to save a whole tweet straight from a JSON object to a new row in my db.
I was surprised there aren't more node modules or anything pre-written for creating the database table ready for tweets and mapping a tweet's JSON schema directly to it - can anyone help? At the moment I'm using the 'mysql' module for queries and thought about combining it with 'json-sql' for building the query, but it seems like this must be such a common thing that there should be an even simpler way.
Is anyone aware of another process? Thanks!
I eventually moved over to using MongoDB, it works really well with NodeJS and I'm really enjoying it.

Dynamic JSON file vs API

I am designing a system with 30,000 objects or so and can't decide between the two: either have a JSON file pre computed for each one and get data by pointing to URL of the file (I think Twitter does something similar) or have a PHP/Perl/whatever else script that will produce JSON object on the fly when requested, from let's say database, and send it back. Is one more suited for than another? I guess if it takes a long time to generate the JSON data it is better to have already done JSON files. What if generating is as quick as accessing a database? Although I suppose one has a dedicated table in the database specifically for that. Data doesn't change very often so updating is not a constant thing. In that respect the data is static for all intense and purposes.
Anyways, any thought would be much appreciated!
Alex
You might want to try MongoDB which retrieves the objects as JSON and is highly scalable and easy to setup.