So I have a MySQL table where it needs to get updated on a daily basis and the structure look like the following:
siteid user proportion
1000 1 0.1
1000 2 0.5
1000 3 0.4
Problem is that other parts of the code need to access this table while its being updated while the update might take several minutes everyday. This is what happens on a everyday:
A task runs around 8am EST everyday to update the above table based on yesterday's data. This update might take up to half an hour.
Users should have access to this table anytime to get the most recent updates
I have came up about the following ideas but pretty sure none of them would for work me:
Create a temp table to access it while updating the main table: this does not sound like a good idea since there are five of those tables and not feasible to switch between temp and main table.
Put a halt on all operations while the table being updated which is impossible since other code accessing the tables must be always up and running.
I would be very grateful for any help toward any possible solutions for this. I know this is broad and there might not be any right or wrong answers, mostly interested in your experiences with similar situations.
NOTES:
There tables have millions of rows stored in them.
I am using MySQL here and as mentioned before not able to lock transactions since everything must be live.
Backend code is written in python
You can accomplish this maybe this way?
Have two tables with the same structure, names ending in _A and _B.
Have a view that can be toggled on the fly to point to _A or _B (In Oracle we would use a synonym).
In your update procedure:
1. Copy data from _A to _B.
2. Point view to _A (update the view).
3. Do Updates to _B
4. Point view to _B (update the view).
Next time you run, you'll repeat the process, but the _A references become _B and vice versa. You might want to way to track whatever the "current" tables are (_A OR _B).
Related
I am hosting a forum with "forum gold".
People trade this a lot, gift it, award people with it to "thank" or "like" posts, or to increase reputation.
However, I am concerned that there might be some exploit that allows people to hack gold into their forum account, so I added logging on EVERY forum gold transaction.
It works well. I can perform sum queries to assure that no unknown sources are introducing forum gold into the system, and to ensure that all forum gold awarded to users are accounted for.
However, it totally blew up. Within just a couple of days, I have more than 100,000 entries in the table. I also got mail from my webhost about a slow mySQL query warning, which is just a simple SELECT from that table of a single record, no joins, ordering, functions like date_add() or anything at all even.
So I want to completely export AND empty the table with the logs. Now, I normally back up the rest of my database via the "export" feature in phpmyadmin. However, this table is highly active, anywhere from 10 up to 50 new rows are added every second, but I want to keep the integrity and accuracy of my computations by not losing any records.
Is there an "atomic" way I can export then delete all records, with no transactions getting in between?
Okay, so I just ended up:
creating a new TEMP table,
selecting everything from the LOG table,
inserting it into the new TEMP table,
then deleting from LOG everything where exists the same record in the TEMP table
exporting the TEMP table
doing a global replace of "INSERT INTO `temp`" into "INSERT INTO `log`"
During an upgrade from Magento 1.5 to 1.7 unfortunately we had to reinstall Magento (don't ask) and now I need to get all of the old order information into the live upgrade. I've researched several scenarios. First would be identifying the tables in the database they correspond to and migrating those tables over, but I have three issues with that. One: I already did a little bit of that and because of the discrepancies between 1.5 and 1.7 caused several hours of debugging fun. Two: I can't figure out specifically which of these freaking tables needs updating (was going to just replace all sales_ tables). Third: since the upgrade other orders have been placed, and as you know, started order ids all over again and I don't want those entries to get replaced.
My other choice is to attempt to build an extension like this one: http://www.magentocommerce.com/magento-connect/dataflow-batch-import-export-orders-to-csv-xml.html. I already started but alas am already stuck on the oAuth process.
Before I waste anymore time, I'd like some advice. What would be the best way to go about this process?
Update 1-17
I have tried UNION queries on applicable tables but of course I get error "#1062 - Duplicate entry '1' for key 'PRIMARY'" being there are several primary keys. Is there a query to increment those primary keys of the new orders to follow after the ids of the old orders? I tried to do this in individual columns via UPDATE sales_flat_invoice_grid
SET increment_id = (increment_id+6150) or similar, but the ids are mapped to the ids on other tables! Please help! I'm afraid I'm going to have to tell boss that we need to buy that extension.
Continued from comment above ^^
hmmm .. If that's the case, the way I've done exports/imports in the past (see my post HERE) is use MySQL Workbench EER Modeling tool to create myself a diagram of the order storage system. I would select only one table at first, then MySQL will tell you what other tables are tied to that table. I repeat this process so that in my EER diagram, I don't have unnecessary tables, and I am not missing tables. I end up with exactly the number of tables I need to understand the flow.
Next, do the same for your 1.7 setup ... And compare. You will need to generate a SQL query that will not only INSERT the 1.5 orders into the 1.7 database, but you may need to create your own link table in case there is an ID that is used, you may need to link X to Y by way of Z. It's messy, and a little confusing, but I spent 3 weeks writing a SQL/PHP script that pulled 1) Product Data, 2) Category Data, 3) Customer Data and 4) Order Data. I also wrote one for Admin backend users and preferences. This was before I discovered the aforementioned tool.
EDIT I might add , that extension is only $200.00. How much in development time is it going to cost your employer for you to develop your way out of this? Depending on your hourly cost (I know with what I charge, $200 would be a steal) it may make more sense to get the extension.
I'm pretty sure this particular quirk isn't a duplicate so here goes.
I have a table of services. In this table, I have about 40 rows of the following columns:
Services:
id_Services -- primary key
Name -- name of the service
Cost_a -- for variant a of service
Cost_b -- for variant b of service
Order -- order service is displayed in
The user can go into an admin tool and update any of this information - including deleting multiple rows, adding a row, editing info, and changing the order they are displayed in.
My question is this, since I will never know how many rows will be incoming from a submission (there could be 1 more or 100% less), I was wondering how to address this in my query.
Upon submission, every value is resubmitted. I'd hate to do it this way but the easiest way I can think of is to truncate the table and reinsert everything... but that seems a little... uhhh... bad! What is the best way to accomplish this?
RE-EDIT: For example: I start with 40 rows, update with 36. I still have to do something to the values in rows 37-40. How can I do this? Are there any mysql tricks or functions that will do this for me?
Thank you very much for your help!
You're slightly limited by the use case; you're doing insertion/update/truncation that's presented to the user as a batch operation, but in the back-end you'll have to do these in separate statements.
Watch out for concurrency: use transactions if you can.
I was wondering what would be the best solution to dynamically archive rows. For instance when a user marks a task as completed, that task needs to be archived yet still accessible.
What would be the best practices for achieving this? Should I just leave it all in the same table and leave out completed tasks from the queries? I'm afraid that over time the table will become huge (1,000,000 rows in a year or less). Or should I create another table ie task_archive and query that row whenever data is needed from it?
I know similar questions have been asked before but most of them where about archiving thousands of rows simultaneously, I just need to know what would be the best method (and why) to archive 1 row at a time once it's been marked completed
For speed and ease of use, I would generally leave the row in the same table (and flag it as completed) and then later move it to an archive table. This way the user doesn't incur the delay of making that move on the spot; the move can happen as a batch process during non-busy periods.
When that move should happen depends on your application. For example, if they have a dashboard widget that shows "Recently Completed Tasks" that shows all of the tasks completed in the past week (and lets them drill in to see details), it might make sense to move the rows to the archive a week after they've been completed. Or if they frequently need to look at tasks from the current semester (for an academic app) but rarely for previous semesters, make the batch move happen at the end of the semester.
If the table is indexed 1,000,000 rows shouldn't be that big a deal, honestly.
You could use a trigger to capture that the order was marked completed, remove from the current table, and insert into the archive table.
Or, you could create a stored procedure that performed the archive. For example
sp_markcompleted(taskid)
start transaction;
insert into newtable select * from oldtable where id=taskid;
delete from oldtable where id=taskid;
commit;
I have a model Post which has a expiry_date. I want to know what is the
best way to manage scalability in this case. 2 options:
Whenever I want to SELECT from the table, I need to include where
expiry_date > NOW. If the table Post grows like a monster, I will be in
trouble. Imagine after 3 years or more. Indexes will be huge too.
Have a trigger, cron job, or a plugin (if it exists) that would go
around the table and move expired items to a new table Post_Archive.
That way, I maintain only current Posts in my main table, which implies
that over 3 years I won't be as bad as option 1.
If you need to archive data on a continuous basis (your #2) than a good option is MaatKit.
http://www.maatkit.org/
It can "nibble" away at data in chunks rather than running mass queries which consume lots of resources (and avoiding polluting your key cache).
So yes, you would run a Maatkit job from cron.
In the meantime, if you want to do #1 at the same time, you could maybe implement a view which conveniently wraps up the "WHERE expiry_dat > NOW" condition so you dont have to include it all on your code.
A cron job sounds good to me, and it can be done by feeding a simple script directly to the mysql command, e.g., roughly:
CREATE TEMPORARY TABLE Moving
SELECT * FROM Post WHERE expiry > NOW();
INSERT INTO Post_Archive
SELECT * FROM Moving;
DELETE FROM Post
WHERE id IN (SELECT id FROM Moving);
DROP TEMPORARY TABLE Moving;