Data sync from MySQL to NoSQL key-value store

Data sync from MySQL to NoSQL key-value store - mysql

I am having a legacy system with the MySQL at the backend and python as the primary programming language.
Recently we have a scenario where we need to display a dashboard with the information present in the MySQL database. The data in the table changes every second.
This can be thought of similar to a bid application where people bid constantly. Every time a user bids a record goes in to the database. When an user updates his bid it updates the previous value.
I also have few clients who monitor this dashboard which updates the statistics.
I need to order this data in realtime as people bid in real time.
I don't prefer to run queries against MySQL because at any second I may have few 1000 clients querying the database. This will create load on database.
Please advice.

If you need to collect and order data in realtime you should be looking at the atomic ordered map and ordered list operations in Aerospike.
I have examples of using KV-ordered maps at rbotzer/aerospike-cdt-examples.
You could use a similar approach with the user's ID being the key, the bid being a list with the structure [1343, { foo: bar, ts: 1234, this: that} ]. The bid amount in cents (integer) is the first element of the list, all the other information is in a map in the second element position.
This will allow you to update a user bid with a single map operation, get back the user's bid with a single operation, order by rank (on the bid amount) to get the top bids ordered, get all the bids in a particular range, etc. You would have one record per item, with all the bids in this KV-sorted map.

Related

Is there a best practice for storing data for a database object (model) that will change or be deleted in the future (Django)?

I am building an order management system for an online store and would like to store information about the Product being ordered.
If I use a Foreign Key relationship to the Product, when someone changes the price, brand, supplier etc. of the Product or deletes it, the Order will be affected as well. I want the order management system to be able to display the state of the Product when it was ordered even if it is altered or deleted from the database afterwards.
I have thought about it long and hard and have come up with ideas such as storing a JSON string representation of the object; creating a duplicate Product whose foreign key I then use for the Order etc. However, I was wondering if there is a best practice or what other people use to handle this kind of situation in commercial software?
PS: I also have other slightly more complex situations, for instance, I would like the data for a User object attached to the Order to change as the User changes but then never get deleted when the User is deleted. An answer to the above question would definitely give me a good starting point.

This price-change problem is commonly handled in RDBMS (SQL) commerce applications by doing two things.
inserting rows into an order_detail table when an order is placed. Each row of that table contains the particulars of the item as sold: item_id, item_count, unit_price, total_price, unit_weight, total_weight, tax_status, and so forth. So, the app captures what actually was sold, and at what price. A later price change doesn't mess up sales records. You really have to do this.
a price table containing item_id, price, start_time, end_time. You retrieve the current price something like this:
SELECT item.item, price.price
FROM item
JOIN price ON item.item = price.item
AND price.start_date <= NOW()
AND (price.end_date > NOW() OR price.end_date IS NULL)
This approach allows you to keep track of historical prices, and also to set up future price changes. But you still copy the price into the order_detail table.
The point is: once you've accepted an order, its details cannot change in the future. You copy the actual customer data (name, shipping address, etc) into a separate order table from your current customer table when you accept the order, and (as mentioned above) the details of each item into an order_detail table.
Your auditors will hate you if you don't do this. Ask me how I know that sometime.

I would recommend creating attributes for the Order model and extracting the data you need one by one into those attributes while you are saving the model and then implementing a historical data table where you store JSONFields or some other version of the Product etc. when it is created or updated; that way people can refer to the historical data table if need be. This would be more efficient usage than storing the full fledged representation of the Product in the Order object as time taken to create the historical data is essentially charged to the admin creating the Product rather than the customer creating the Order. You can even create historical data objects in the background using threads etc. when you get to those advanced levels.

While it is hard answering your question without seeing your models.py at least, I will suggest archiving the results. You can add a boolean field called historical which defaults to False. When an order is made you need to set the previous order's (or orders') historical value to True in your view set or function.
Here, historical=True means the record is being archived. You can filter on this historical column to display what you want when. Sorry this is just a high-level outline.

Database design and layout

I want to revisit a project I made to store user data into a database and improve on the way it is stored. I currently went the hard way about it and stored user data in JSON format within a MySQL database field making it difficult to complete CRUD actions. The reason I did this was to keep all the user's data within the user's field. And was reasonably new to this.
I didn't want to store the data mixed with other user's data and as I thought there may be issues with increased users. for example,
If I had 1000 users with 500 rows of data for each, that's 500 000 rows to sort through when reading the data and displaying it on a web page. And is there a risk of mixing the data up or performance issues?
I basically just want a user database that stores the user's id, name, and credentials. Then another database that will store data from a user's activity(run). So at least 5 fields for each event: Time, location, date, duration, etc. And this will be saved for different events(runs) which could end up in the 100's over a period of time.
My question is, Should I design the table as above. Or would it be better to have a table for each user? Or are there other options that I have not explored?

Given the information shared, I believe below mentioned design may be suitable.
Create a Table called User_Details with columns as id (auto increment),user id, name and credentials.
Now create a User_Activity Table with these columns id, user_id, event name, data(json field).
Explanation:
The User Activity table will store the event data for you related to each user through user_id field to user_details table. The data which is a json field will help you to store all the fields for the event. As you are using json field in DB it will allow you to dump any number of fields for the event which may/may not be structured. You can then map this in your middle layer as required.
Also, in case you have finite number of events then you can also create a table called user_event_types and have column id, event name and then in user_activity table you can refer the id instead of event name.

Disregard changes to a product description when retrieving order records

The title is somewhat hard to understand, so here is the explanation:
I am building a system, that deals with retail transactions. Meaning - purchases. I have a database with products, where each product has an ID, that is also known to the POS system. When a customer makes a purchase, the data is sent to the back-end for parsing, and is saved. Now everything is fine and dandy, until there are changes to the products name, since my client wants to see the name of the product, as it was purchased then.
How do I save this data, while also keeping a nice, normal-formed database?
Solutions I could think of are:
De-normalization, where we correlate the incoming data with the info we have in the database, and then save only the final text values, not id's.
Versioning, where we keep multiple versions of every product, and save the transactions with the id of the products version, when it came in. The problem with this one is, that as our retail store chain grows, and there are more and more changes happening to the products, the complexity of the whole product will greatly increase.
Any thoughts on this?

This is called a slowly changing dimension.
Either solution that you mention works. My preference is the second, versioning. I would have a product table that has an effdate and enddate on the record. You can easily find the current record (where enddate is null) or the record at any point in time.
The first method always strikes me as more "quick-and-dirty", but it also works. It just gets cumbersome when you have more fields and more objects you are trying to track. It does, in general though, win on performance.

If the name has to be the name as it was originally, the easiest, simplest and most reliable way to do that is to save the name of the product in the invoice line item record.
You should still link to the product with a ProductID, of course.
If you want to keep a history of name changes, you can do that in a separate table if you wish:
ProductNameID
ProductID
Date
Description
And store a ProductNameID with the invoice line item.

Database Historization

We have a requirement in our application where we need to store references for later access.
Example: A user can commit an invoice at a time and all references(customer address, calculated amount of money, product descriptions) which this invoice contains and calculations should be stored over time.
We need to hold the references somehow but what if the e.g. the product name changes? So somehow we need to copy everything so its documented for later and not affected by changes in future. Even when products are deleted, they need to reviewed later when the invoice is stored.
What is the best practise here regarding database design? Even what is the most flexible approach e.g. when the user want to edit his invoice later and restore it from the db?
Thank you!

Here is one way to do it:
Essentially, we never modify or delete the existing data. We "modify" it by creating a new version. We "delete" it by setting the DELETED flag.
For example:
If product changes the price, we insert a new row into PRODUCT_VERSION while old orders are kept connected to the old PRODUCT_VERSION and the old price.
When buyer changes the address, we simply insert a new row in CUSTOMER_VERSION and link new orders to that, while keeping the old orders linked to the old version.
If product is deleted, we don't really delete it - we simply set the PRODUCT.DELETED flag, so all the orders historically made for that product stay in the database.
If customer is deleted (e.g. because (s)he requested to be unregistered), set the CUSTOMER.DELETED flag.
Caveats:
If product name needs to be unique, that can't be enforced declaratively in the model above. You'll either need to "promote" the NAME from PRODUCT_VERSION to PRODUCT, make it a key there and give-up ability to "evolve" product's name, or enforce uniqueness on only latest PRODUCT_VER (probably through triggers).
There is a potential problem with the customer's privacy. If a customer is deleted from the system, it may be desirable to physically remove its data from the database and just setting CUSTOMER.DELETED won't do that. If that's a concern, either blank-out the privacy-sensitive data in all the customer's versions, or alternatively disconnect existing orders from the real customer and reconnect them to a special "anonymous" customer, then physically delete all the customer versions.
This model uses a lot of identifying relationships. This leads to "fat" foreign keys and could be a bit of a storage problem since MySQL doesn't support leading-edge index compression (unlike, say, Oracle), but on the other hand InnoDB always clusters the data on PK and this clustering can be beneficial for performance. Also, JOINs are less necessary.
Equivalent model with non-identifying relationships and surrogate keys would look like this:

You could add a column in the product table indicating whether or not it is being sold. Then when the product is "deleted" you just set the flag so that it is no longer available as a new product, but you retain the data for future lookups.
To deal with name changes, you should be using ID's to refer to products rather than using the name directly.

You've opened up an eternal debate between the purist and practical approach.
From a normalization standpoint of your database, you "should" keep all the relevant data. In other words, say a product name changes, save the date of the change so that you could go back in time and rebuild your invoice with that product name, and all other data as it existed that day.
A "de"normalized approach is to view that invoice as a "moment in time", recording in the relevant tables data as it actually was that day. This approach lets you pull up that invoice without any dependancies at all, but you could never recreate that invoice from scratch.

The problem you're facing is, as I'm sure you know, a result of Database Normalization. One of the approaches to resolve this can be taken from Business Intelligence techniques - archiving the data ina de-normalized state in a Data Warehouse.
Normalized data:
Orders table
OrderId
CustomerId
Customers Table
CustomerId
Firstname
etc
Items table
ItemId
Itemname
ItemPrice
OrderDetails Table
ItemDetailId
OrderId
ItemId
ItemQty
etc
When queried and stored de-normalized, the data warehouse table looks like
OrderId
CustomerId
CustomerName
CustomerAddress
(other Customer Fields)
ItemDetailId
ItemId
ItemName
ItemPrice
(Other OrderDetail and Item Fields)
Typically, there is either some sort of scheduled job that pulls data from the normalized datas into the Data Warehouse on a scheduled basis, OR if your design allows, it could be done when an order reaches a certain status. (Such as shipped) It could be that the records are stored at each change of status (with a field called OrderStatus tacking the current status), so the fully de-normalized data is available for each step of the oprder/fulfillment process. When and how to archive the data into the warehouse will vary based on your needs.
There is a lot of overhead involved in the above, but the other common approach I'm aware of carries even MORE overhead.
The other approach would be to make the tables read-only. If a customer wants to change their address, you don't edit their existing address, you insert a new record.
So if my address is AddressId 12 when I first order on your site in Jamnuary, then I move on July 4, I get a new AddressId tied to my account. (Say AddressId 123123 because your site is very successful and has attracted a ton of customers.)
Orders I palced before July 4 would have AddressId 12 associated with them, and orders placed on or after July 4 have AddressId 123123.
Repeat that pattern with every table that needs to retain historical data.
I do have a third approach, but searching it is difficult. I use this in one app only, and it actually works out pretty well in this single instance, which had some pretty specific business needs for reconstructing the data exactly as it was at a specific point in time. I wouldn't use it unless I had similar business needs.
At a specific status, serialize the data into an Xml document, or some other document you can use to reconstruct the data. This allows you to save the data as it was at the time it was serialized, retaining original table structure and relaitons.

When you have time-sensitive data, you use things like the product and Customer tables as lookup tables and store the information directly in your Orders/orderdetails tables.
So the order table might contain the customer name and address, the details woudl contain all relevant information about the produtct including especially price(you never want to rely on the product table for price information beyond the intial lookup at teh time of the order).
This is NOT denormalizing, the data changes over time but you need the historical value, so you must store it at the time the record is created or you will lose data intergrity. You don't want your financial reports to suddenly indicate you sold 30% more last year because you have price updates. That's not what you sold.

Database structure - most common queries span 3-4 tables. Should I reduce tables?

I am creating a new DB in MySQL for an application and wondered if anyone could provide some advice on the following set up. I'll try and simplify things as best as I can.
This DB is designed to store alerts which are related to specific items created by a user. In turn there is the need to store notes related to the items and/or alerts. At first I considered the following structure...
USERS table - to store basic app user info (e.g. user_id. name, email) - this is the only bit I'm fairly certain does not need to be changed
ITEMS table: contains info on particular item (4 fields or so). Contains user_id to indicate which user created/owns this item
ALERTS table: contains info on the alert, item_id to indicate which item the alert is related to, contains user_id to indicate which user created alert
NOTES table: contains note info, user_id of note owner, item_id if associated with an item, alert_id if associated with alert
Relationships:
An item does not always have an an alert associated with it
An item or alert does not always have a note associated with it
An alert is always associated with an item. More than one alert can be associated with the same item.
A note is always associated with an item or alert. More than one note can be associated with the same item or alert.
Once first created item info is unlikely to be updated by a user.
For arguments sake let's say that each user will create an average of 10 items, each item will have an average of 2 alerts associated with it. There will be an average of 2 notes per item/alert.
Very common queries that will be run:
1) Return all items created by a particular user with any associated alerts and notes. Given a user_id this query would span 3 tables
2) Checking each day for alerts that need to be sent to a user's email address. WHERE alert date==today, return user's email address, item name and any associated notes. This would require a query spanning 4 tables which is why I'm wondering if I need to take a different approach...
Option 1) one table to cover items, alerts and notes. user_id owner for each row. Every time you add a note to an item or alert you are repeating the alert and/or item info. Seems a bit wasteful but item and alert info won't be large.
Option 2) I don't foresee the need to query notes (famous last words?) so how about serializing note data so multiple notes are stored in one row in either the item or alert table (or just a combined alert/item table)
Option 3) Anything else you can think of? I'm asking this question as each option I've considered doesn't feel quite right.
I appreciate this is currently a small project and so performance shouldn't be of great concern and I should just go with the 4 tables. It's more that my common queries will end up being relatively complex that makes me think I need to re-evaluate the structure.

I would say that the common wisdom is to normalize to start and denormalize only when performance data suggest that it's necessary.
Make sure that your tables are indexed properly, with foreign key relationships for JOINs.
If you think you'll end up with a lot of data, this might be a good time to think about a partitioning strategy. Partitioning your fast-growing tables by time would be a good first step.

Four tables is not complex. I commonly write report queries that hit 15 or more tables in a database structure that has hundreds of tables (most with millions of records) and I wouldn't even say our dbs are anything more than medium sized (a typical db in our system might have around 200 gigs of data, so not large at all as databases go). Because they are properly indexed, they still run fast unless I am doing very complex calculations. Normalize, don't even consider denormalizing until you are an experienced database designer who knows better than to worry about the number of tables.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008