Tracking changes in JSON files - json

One of the resources my app uses is data in two JSON files that are pulled from a third party and that are constantly updated with fresh content.
Each of these files have a specific structure that doesn't change.
However, sometimes the third party creates structural changes that may mess with my app.
My question is: how can I monitor their structure so I can detect changes as they occur?
Thanks!

For this reason you can use json schema and validate the files agains it. If they have the schema than you're, good just need to validate. If not than you have to do it based on a correct json. There are online generators for that.

Related

Rails model concept with multiple sources

I have a document management system. I have a data set that can run through a program (another kind of file) which can be turned into images, a different kind of data, or even a new data set. I have to keep track of this "lineage".
If I was thinking in Mysql terms directly, I would add a "source" column and link each file to the file that it was created from.
I can't think of a logical way to do this within the confines of Ruby on Rails. Any ideas/hints/tips?
What you are looking for is GraphDBs. You can try neo4j www.neo4j.org/‎

Should I be merging Core data and syncing with a website?

I don't quite understand syncing between apps and the web or what I'm meant to do with the data.
You have a website which talks to your iphone app, the app downloads say a JSON file but what changes is it meant to do with the Core data database?
What checks should I be doing?
Should I be merging, replacing, inserting data from the JSON file into Core data or assume that the JSON file is always up to date and always do a replace?
This also gets confusing when the app's data is changed. How do you know which is the up to date version and which is not?
Perhaps I am over-confusing it?
What checks should I be doing? Should I be merging, replacing, inserting data from the JSON file into Core data or assume that the JSON file is always up to date and always do a replace?
Only you can really answer that. It depends on what your app does and what the web site does. It depends on what the data is, who updates it, and where and how they do so. There is no single right answer to your question. Define what your app needs to do and how it interacts with this web site, and the answers will follow from that. Is the JSON up to date? It might be or it might not-- that's not a technical question, it's a question of when and how data can be changed. It's part of the process that drives your app and site.

Rails upload a file and render it as an HTML page

I am building a website with ROR 3. I need to provide a page to my clients wherein he could edit his pricing info regarding the application. I am quite confused on how to do this. The pricing page needs to be displayed as an html table with different columns which has got the pricing info.
I am thinking of different ways to do this.
1) Allow the client to create and upload an html page and then save it as a file in my public directory and render as an when the client clicks on the pricing link.
2) The clients may not have bare technical knowledge, hence make the client upload some other formats like Word, Excel etc and then parse it and store it as an HTML file in the public directory.
3) Provide the client with some real time editing tools where in the client could edit in a fixed format, and after wards save the file and render it later.
Also, I wouldn't like to store these infos in my database. There would be quite a few number of clients and hence managing all these data in my database would become cumbersome. Storing all these as plain html files and rendering it later would be the most ideal thing for me.
There might be other better steps in doing this as well. Could you please suggest which might the better, or any other option that could suit my needs? Basically I would want my clients to have a mechanism where they could provide there pricing details, edit it later and display it back as an html table, all this without using an Database backend. Any suggestions would be mostly appreciated.
Good way is Excel(csv format).
You can do PHP with Excel. I thing this is the best solutions for your requirement.
Try this.
http://php.net/manual/en/function.fgetcsv.php
If you are give authority to user to change edit contain and you have to used " CSV or Excel" please see these links:
Importing CSV and Excel
Exporting CSV and Excel
If you really don't want to use database then you can use YAML as a structured storage.
e.g. ( you, most probably, could come up with a better structure )
SMS_Pack:
Sl_No:
1: 10000
2: 25000
3: 50000
You can read those .yml files and parse them as hashes. Should be fairly easy to represent that hash as a HTML table.
For the creation, I'm sure you can come up with some dynamic form input. Or to just let the client send this kind of file ( which might not be the best solution ).
But it just might be easier to manage all of this information within a database.

Dynamic JSON file vs API

I am designing a system with 30,000 objects or so and can't decide between the two: either have a JSON file pre computed for each one and get data by pointing to URL of the file (I think Twitter does something similar) or have a PHP/Perl/whatever else script that will produce JSON object on the fly when requested, from let's say database, and send it back. Is one more suited for than another? I guess if it takes a long time to generate the JSON data it is better to have already done JSON files. What if generating is as quick as accessing a database? Although I suppose one has a dedicated table in the database specifically for that. Data doesn't change very often so updating is not a constant thing. In that respect the data is static for all intense and purposes.
Anyways, any thought would be much appreciated!
Alex
You might want to try MongoDB which retrieves the objects as JSON and is highly scalable and easy to setup.

automatic web crawler

I'm writing a crawler which needs to get data from many websites. The problem is that every website has different structure. How can I easily write a crawler which downloads (correctly) data from (many) different websites? If the structure of a website will change will I need to rewrite the crawler, or are there other methods?
What logical and implemented tools can be used to improve the quality of data mined by an automatic web-crawler (many websites are involved with different structure)?
Thank You!
I presume you want to query it is some way, in which case you should store the data in a flexible data store. A relational database would not be fit for purpose as it has a strict schema, but something like mongodb which lets you store semi structured data without having to define a schema up front, but still provides a powerful query language.
The same goes for how you represent the data in the crawler code. Don't map the data to classes where the structure is defined up front, but use a flexible data structures that can change at runtime. If you are using Java then de-serialise the data into HashMaps. In other languages this might be called Dictionaries or Hashes.
If you're scraping data from websites that actually want to allow you to do that, chances are they will provide some sort of webservice to allow you to query their data in a structured way.
Otherwise, you're on your own, and you might even be violating their terms of use.
If the websites provide no APIs, then you're out cold and you have to write separate extraction module for each data format you're encountering. If the website changes the format, then you have to update your format module. A standard thing to do is to have plugins for every website you're crawling and have a testing framework which does regression testing with data you've already collected. When a test fails you know something went wrong and you can investigate whether you have to update your format plugin or if there is another issue.
Without knowing what kind of data you're collecting it will be very difficult to try to hypothesize about ways to improve the "quality" of the data that was mined.
Maybe you could find out whether the website allows you to access the data like API, if so, you could use this kind of structured data to your website directly. If not, you may need plugins for that. Or you could turn to other web crawlers with API access like Octoparse, to find the way to access their API to your own web crawler.