Failsafe modification of a JSON file on Linux - json

Serialising JSON to a file is a convenient way for storing arbitrary data structures in a persistent manner. At work, I see this happening a lot, and I think it's understandable to do this instead of using something like SQLite because it's just so damn easy.
The problem with this is that when you modify the file programmatically, you might end up corrupting the file and you might lose your data or the software might be unable to proceed after the data has been corrupted. E.g. the file could be only partially written due to an abrupt power failure or a crash. Also, if there are multiple processes modifying the file, it will need some locking, but this is rarely the case.
A few years back, I came up with what I believe is a failsafe approach to modifying JSON files on Linux, but I am not a database expert and people say that "you should never write your own database". Thus, I'd appreciate some feedback from database experts on the matter. Is this really a failsafe approach?
For the single consumer, single producer case, it goes like this:
Read the JSON file and parse it
Change a node in the object tree
Serialise the new object tree to a file on the same file system.
E.g. if the path to the old file is /etc/example/config.json then the new file should be at /etc/example/config.json.tmp.
It's important to keep the temporary file on the same filesystem, and not on e.g. /tmp because this makes the rename() system call atomic with regard to filesystem operations.
This means that after a rename(), the file is guaranteed to be complete.
If the system experiences a power failure at the time of rename() the change may be lost but the old file will still be complete and not corrupted.
Run fdatasync() or fsync() on the new file.
This can take a while. Don't run this in the main loop.
When this call returns, the file is guaranteed to have been written to persistent storage
(optional) Read the new file and verify that it's valid JSON and that it fits the schema
Rename the new file to the name of the old file using the rename() system call
We almost never share files between processes but in the multiple producers case, one might use something like flock(), but read-locks are not necessary because of the rename() logic described above. The file is guaranteed to always be complete.

Related

SSIS Cache Mamnager

Is it possible to use an SSIS Cache manger with anything other than a Lookup? I would like to use similar data across multiple data flows.
I haven't been able to find a way to cache this data in memory in a cache manager and then reuse it in a later flow.
Nope, a cache connection manager was specific to solving lookup tasks originally only allowing an OLE DB Connection to be used.
However, if you have a set of data you want to be static for the life of a package run and able to be used across data flows, or even other packages, as a table-like entity, perhaps you're looking for a Raw File. It's a tight, binary implementation of the data stored to disk. Since it's stored to disk, you will pay a write and subsequent read performance penalty but it's likely that the files are right sized such that any penalty is offset by the specific needs.
The first step you will need to do is define the data that will go into a Raw file and connect a Raw File Destination. Which is going to involve creating a Raw File Connection Manager where you will define where the file lives and the rules about the data in there (recreate, append, etc). At this point, run the data flow task so the file is created and populated.
The next step is everywhere you want to use the data, you'll patch in a Raw File Source. It's going to behave much as any other data source in your toolkit at this point.

TIMEOUT in Laravel

So, i have to read a excel file in which each row contains some data that i want do write in my database. I pass the whole file to laravel, it reads the file and format it to a array and then i make a new insertion (or update) in my databse.
The thing is, the input excel file can contain thousands of rows and its taking a while to complete, giving a timeout error in some cases.
When i try to make this locally i use set_time_limit(0); function so timeout doesnt occur, and it works pretty wel. But in a remote server this function is disabled for security reasons and my code crashes because of a timeout.
Somebody can help in how to solve this problem ? Maybe another ideia in how to better solve this problem ?
A nice way to handle tasks that take a long time is by making use of so called jobs.
You can make a job called ImportExcel and dispatch it when someone send you a file.
Take a good look at the docs, they have some great examples on how to do this.
You can take care of this using following steps :
1. Take the csv file and store it temporarily in storage :
You can store the large csv when user uploads. If it's something which is not uploaded from frontend, just make sure you have it saved to be processed in next step.
2. Then dispatch a job which can be queued :
You can create a job which can handle this asynchronously. You can use Supervisor to manage queues and timeouts etc.
3. Use package like thephpleague :
Using this package(or similar), you can chunk the records or read one at a time. It is really really helpful to keep your memory usage under limit. Plus it has different options of methods available to read the data from files.
4. Once file is processed, you can delete it from the temporary storage :
Just some teardown cleanup activity.

Options for storing and retrieving small objects (node.js), is a database necessary?

I am in the process of building my first live node.js web app. It contains a form that accepts data regarding my clients current stock. When submitted, an object is made and saved to an array of current stock. This stock is then permanently displayed on their website until the entry is modified or deleted.
It is unlikely that there will ever be more than 20 objects stored at any time and these will only be updated perhaps once a week. I am not sure if it is necessary to use MongoDB to store these, or whether there could be a simpler more appropriate alternative. Perhaps the objects could be stored to a JSON file instead? Or would this have too big an implication on page load times?
You could potentially store in a JSON file or even in a cache of sorts such as Redis but I still think MongoDB would be your best bet for a live site.
Storing something in a JSON file is not scalable so if you end up storing a lot more data than originally planned (this often happens) you may find you run out of storage on your server hard drive. Also if you end up scaling and putting your app behind a load balancer, then you will need to make sure there are matching copy's of that JSON file on each server. Further more, it is easy to run into race conditions when updating a JSON file. If two processes are trying to update the file at the same time, you are going to potentially lose data. Technically speaking, JSON file would work but it's not recommended.
Storing in memory (i.e.) Redis has similar implications that the data is only available on that one server. Also the data is not persistent, so if your server restarted for whatever reason, you'd lose what was stored in memory.
For all intents and purposes, MongoDB is your best bet.
The only way to know for sure is test it with a load test. But as you probably read html and js files from the file system when serving web pages anyway, the extra load of reading a few json files shouldn't be a problem.
If you want to go with simpler way i.e JSON file use nedb API which is plenty fast as well.

Looking to make MySQL db records readable as CSV-like lines in text file via standard file io interface Linux device driver (working with legacy code)

I want to be able to read from a real live proper MySQL database using standard file access routines. I don't mean reading the MySQL database's own underlying private files. What I mean is implementing a file-based linux device driver that "presents" a MySQL database as a file. In other words, the text file is a "View" of the MySQL database. The MySQL records are presented in our homegrown custom variation of the CSV format that the legacy code was originally written to understand.
Background
I have some legacy code that reads from a text file that contains a very large table of data, each line being a separate record. New records (lines) need to be added but there is contention for the file among the team, there is also an overhead in deployment of the legacy code and this file to many systems when releasing the software to them. The text file itself also needs to be version controlled.
Rather than modify the legacy code to call a MYSQL database version of these records directly, I thought it would be better to leave it untouched. This would avoid risks in modifying the code and ease deployment and moreover, modifying the code would cause much overhead in de-risking, design discussions, more testing etc.
So what I'm looking to do is write a file-based device driver such that this makes the MySQL database appear as a file to the legacy code, with the data within the format that the legacy code expects. That way the legacy code is not changed and can work oblivious that the file is really an underlying database. Contention is removed because the individual records in the database can now be updated/added to separately (via MySQL, or even better a separate web admin interface that guides and validates data entry from the user for individual records) and deployment effort is much reduced without having to up-issue the whole file on all the systems that use it.
The device driver would contain routines to internally translate standard file read operations into MySQL queries to the MySQL database and contain routines to return the MySQL results and translate these into the text format for returning back to the file read operation.
This is for a Linux/Unix platform.
Has this been done and what are your thoughts?
(cleaned up the question, grammar, clarification, readability. This does not affect the accepted answer.)
This kind of thing has been done before - an obvious example being the dynamic view filing system in ClearCase which provided (maybe still does?) a virtualised view onto a version control repository. Behind the scenes it implemented an object cache and used RPC to fetch objects from other hosts if necessary, and made extensive use of both local and remote databases.
It's fairly clear that you are going to implement the bulk of your filing system in user-space, but you will need a (small) kernel resident portion. Unless there's a really good reason to do otherwise, FUSE is what you're looking for - it will provide the kernel-resident part for you. All you'll need to write is glue to turn file operations into SQL requests.

Storing simple data on the server, db or json?

I have an app and it needs to store a few simple objects that change, and it needs to be able to persist them. Should I save the data to a db like mongo and read it in on startup or should I just save to a json file?
Thanks for any tips.
Either way can work. 'It depends' which is best in your situation.
Things to consider:
Flat files are not (or not easily) queriable.
You will have to manage flat files (e.g. folders, file names, etc)
Is it possible that there will be a collision? Are you going to read & write often?
If you are only persisting a few objects, is a database like mongo overkill?
Is something like redis a better solution?
I'm currently working on a project that involves several data stores. MySQL, flat-files, mongo, et cetera.
The flat files work really well for storing data that will be pulled up later. It's suitable for data that isn't read or written often.
Mongo has modules like Mongoose that take away a lot of complexities. But the documentation for both Mongoose and Mongo can be a bit dry and confusing.
Think about how the data will be used, how it will evolve, and if you need to query it. If it just needs to be pulled up at the app's startup, then a flat file can work. If it is going to grow into larger objects then flat files can still be the solution. But if there's a chance for additional objects in the future, then flat-files will become harder to manage, and it will be a pain to add them into a working system.
And overall, if you want any type of querying (other than manually opening files and inspecting them) then go with Mongoose or a similar database.
It depends mongo will be easier if u want to alter the data. It is basically a storage engine for json lol. Depends on your use case. If you think this will grow, you might as well use mongo. If your storage needs will never change use a json file.