The git repo for my Django app includes several .tsv files which contain the initial entries to populate my app's database. During app setup, these items are imported into the app's SQLite database. The SQLite database is not stored in the app's git repo.
During normal app usage, I plan to add more items to the database by using the admin panel. However I also want to get these entries saved as fixtures in the app repo. I was thinking that a JSON file might be ideal for this purpose, since it is text-based and so will work with the git version control. These files would then become more fixtures for the app, which would be imported upon initial configuration.
How can I configure my app so that any time I add new entries to the Admin panel, a copy of that entry is saved in a JSON file as well?
I know that you can use the manage.py dumpdata command to dump the entire database to JSON, but I do not want the entire database, I just want JSON for new entries of specific database tables/models.
I was thinking that I could try to hack the save method on the model to try and write a JSON representation of the item to file, but I am not sure if this is ideal.
Is there a better way to do this?
Overriding save method for something that can go wrong or that can take more than it should is not recommended. You usually override save when changes are simple and important.
You can use signals but in your case it's too much work. You can instead write a function to do this for you but still not exactly after you saved the data to database. You can do it right away but it's too much process unless it's so important for your file to be updated.
I recommend using something like celery to run a function in the background separated from all of your django functions. You can call it on every data update or each hour for example and edit your backup file. You can even create a table to monitor the update process.
Which solution is the best is highly depended you and how important the data is. And keep in mind that editing a file can be a heavy process too so creating a backup like everyday might be a better idea anyway.
Related
I have a csv file which contains very detailed data of the products that my company sells and it gets updated daily.
And I want my rails to import the data from the csv file then update my database (MYSQL) if there are any new changes found.
What's the best way to achieve this? Some people mentioned about MYSQL for excel. Would this be the way to go about it?
I will appreciate if someone can give me a guidance on this. Thank you.
I'm not gonna walk through the details, specifically because you have no details, nothing attempted at all so I'm gonna stick with a overview.
From a systems point of view I would (assuming your rails app is live and not local):
have the CSV file live in place where you (or whoever needs to) can update it and is also fetchable to the application (dropbox, s3 bucket, your own server, wtv).
have a daily cron rake task, that downloads the CSV file
parse the CSV file and decide what to update.
The trickiest part will be to decide what to update from the CSV and it will depend on how it can change itself. Like if only new lines can be added, or lines removed, if columns in lines can be changed, etc.
I have about 10GB worth of data that I would like to import to Parse. the data is currently in JSON format which is great for importing data using the parse importer.
However I have no unique identifier to these objects. Of course they have unique properties e.g. a url, the ids pointing to specific objects need to be constant.
What would be the best way to edit the large amount of data -in bulk- on their server without running into request issues (as I'm currently on the free pricing model) and without taking too much time to alter the data.
Option 1
Import the data once and export the data in JSON with the newly assigned objectIds. Then edit them locally matching the url then replace the class with the new edited data. Any new editions will receive a new objectId by Parse.
How much downtime between import and export will there be as I would need to delete the class and recreate it? Are there any other concerns with this methodology?
Option 2
Query for the URL or array of URLs and then edit the data then re-save. This means the data will persist indefinitely but as the edit will consist of hundreds of thousands of objects will this most likely over run the request limit?
Option 3
Is there a better option I am missing?
The best option is to upload to Parse then edit through their normal channels. Using various hacks it is possible to stay below the 30pings/second offered as part of the free tier. You can iterate over the data using background jobs (written in Javascript) -- you may need to slow down your processing so you don't hit limits. The super hacky way is to download from the table to a client (iOS/Android) app and then push back up to Parse. If you do this in batch (not a synchronous for loop, by the way), then the latency alone will keep you under the 30ping/sec limit.
I'm not sure why you're worried about downtime. If the data isn't already uploaded to Parse, can't you upload it, pull it down and edit it, and re-upload it -- taking as long as you'd like? Do this in a separate table from any you are using in production, and you should be just fine.
I have a node.js application that uses a simple JSON file as the model. Since this is an MVP with very limited data storage needs, I don't want to spend time designing and configuring a MongoDB database. Instead, I simply read from and write to a JSON file stored in /data directory of my Node JS application.
However, on Heroku, the JSON file appears to get reset (to the original file I'd deployed to Heroku) every so often. I don't know why this happens or how to turn off this behavior. Any help would be really appreciated, I need to fix this problem within the next four hours.
Heroku uses an ephemeral file system, so that's why it's going to vanish (every 24 hours, or thereabouts).
If you want to store something, you have to use an external backing store. Adding a free tier MongoDB database shouldn't take more than a few minutes. See here or here for examples.
I have a production Rails app that serves data from a set of tables that are built from a LOAD DATA LOCAL INFILE MYSQL import of CSV files, via a ruby script. The tables are consistently named and the schema does not change. The script drops/creates the tables and schema, then loads the data.
However I want to re-do how I manage data changes. I need a suggestion on how to manage new published data over time, since the app is in production, so I can (1) push data updates frequently without breaking the application servicing user requests and (2) make the new set of data "testable" before it is live (with the ability to roll back to the previous tables/data if something went wrong).
What I'm thinking is keeping a table of "versions" and creating a record each time a new rebuild is done. The latest version ID could be stuck into the database.yml, and each model could specify a table name from database.yml. A script could move the version forward or backward to make sure everything is ok on the new import, without destroying the old version.
Is that a good approach? Any patterns like this already? It seems similar to Rails' migrations somewhat. Any plugins or gems that help with this sort of data management?
UPDATE/current solution: I ended up creating database.yml configuration and creating the tables at import time there. The data doesn't change based on the environment, so it is a "peer" to the environment-specific config. Since there are only four models to update, I added the database connection explicitly:
establish_connection Rails.configuration.database_configuration["other_db"]
This way migrations and queries work as normal with Rails. To I can keep running imports, I update the database name in the separate config for each import. I could manually specify the previous database version this way and restart the app if there was a problem.
config = YAML.load_file(File.join("config/database.yml"))
config["other_db"]["database"] = OTHER_DB_NAME
File.open(path, 'w'){|f| f.write(config.to_yaml)}
One option would be to use soft deletes or an "is active" column. If you need to know when records were replaced/deleted, you can also add columns for date imported and date deleted. When you load new data, default "is active" to false. Your application can preview the newly loaded data by using different queries than the production application, and when you're ready to promote the new data, you can do it in a single transaction so the production application gets the changes atomically.
This would be simpler than trying to maintain multiple tables, but there would be some complexity around separating previously deleted rows and incoming rows that were just imported but haven't been made active.
I need to make a backup system for my rails app but this has to be a little special: It doesn't have to back up all the database info and files in a single file or folder but it has to back up the database info and attachment files per user. I mean, every one of this backups should be able to regenerate all the information and files for one single user.
My questions are:
Is this possible? What's the best way to do it? And, if it's impossible or a bad idea at all, why is it?
Note: The database is a MySQL one.
Note2: I used Paperclip for the users uploads.
Im guessing you have an app that backs up data, when a user clicks on something right? I'm thinking get all the info connected to the user(depends on how you did your user model, so maybe you should have a get_all_info method) then write it out in sql format to a file, which you save as .sql. (either using File.new or Logger.new)
I would dump the entire user object and related objects into a single xml file dump. As you go through the creation of the XML grab out all the files and write the XML + all files into one directory, then compress them.
I think there are definitely use cases to have a feature like this, but be sure to have it run in a background process and only when needed in order to not bog down the web server. Take a look at http://github.com/tobi/delayed_job or http://github.com/defunkt/resque.