Storing incremental prices in MongoDB - mysql

I'm using MongoDB and MySQL for different aspects of an e-commerce site.
One of the features is 'bidding'. The price goes up with each bid.
There are several ways I could do this, such as having a single column that updates the 'price' or I could have another column that simply adds prices and I can get the latest price based on the date, requiring an order by. Also, each new price, will be based off the current high price, so I'll need to know the current high price.
I'd like to keep this in the MongoDB portion, but not sure what best way to handle this.
Any suggestions would be great!
Thank you!

You can atomically update documents in mongodb, there's an $inc operator, so you can atomically update a document's "max price" while also $pushing the last bidder, the date, and the price increase to an array, for example. This way you'll never be in danger of having an inconsistent auction document. Using safe mode for writes is necessary too.
Splitting bids into separate documents which you then assemble to find the current price is another solution. It really depends on how much state you're tracking with the bids.

Related

MySQL - Partitioning vs multiple table suggestion for a use case

We are having around 30,000 customers and each customer is having multiple products. We are currently storing all the products in a single table partitioned by KEY(customerid). I would like to get your suggestions if separate tables for each customer would be more beneficial over the partitioning OR we continue to use partitioning with current (HASH) or different type.
Number of products per customers varies, a few customers having > 1M products while some customers having as small as a few hundred products. This may result in not so perfect partitions.
If a customer account is to be deleted, so will be all products of that customer. In case of separate tables, this would be quite useful.
All customers are disjointed. So there is no query to access cross-customer products.
Number of customers are quite large (around 30k), I am not sure if that's a good idea to have so many tables.
Is any other partitioning scheme is better than what we currently using.
Thank you for your inputs.
Generally I would go with the single table solution that you already have, it's the simple, straight-forward way to go.
You don't mention your motivation for wanting to change your setup.
How many entries do you have in your products table?
Are you experiencing performance issues with your current setup? If not I might be inclined to call this a case of "premature optimization".
If you ARE experiencing performance issues I would start by analyzing those first (profiling) to determine whether they are caused by your single products table design being a bottleneck.
Practical advice I can offer: Make sure you are using InnoDB storage engine and not MyISAM since that will allow for row level locks.
The downside to your proposal of having one table for each customer is maintenance and complexity. If you ever want to change your schema of the product tables it will be a lot more complicated and error prone task than before. You might have to make a script to batch the changes of all those tables, and what if the script crashes halfway? Then half of you customers have a changed table schema and the other half doesn't. As I mentioned if you do not currently have a performance problem you would be adding this complexity and maintenance without gaining anything.
You state that "All customers are disjointed. So there is no query to access cross-customer products." however it might not stay that way forever. Imagine in 2 months you need to extract a list of all customers who own specific product of type x, that would be a simple SQL query in your current setup, in the multi-table setup you would have to make a script or small program that could iterate over all customers and for each customer make a product query. So what was 1 query before is now 30.000 queries.
What you propose is a simple form of sharding. If you decide to go that way you may want to look into sharding since there are other ways to approach than the somewhat aggressive approach of giving every customer a dedicated table. E.g. use a hash of each customer id as sharding key, so every customer is either part of group A or group B. Products owned by A-customers are in ProductTableA, products owned by B-customers are in ProductTableB. (in a real implementation you may want to hash to a value between 0-255 and then keep a reference list saying that 0-127 are table-A, 128-255 are table-B, that way if you ever decide to scale up and add one more table, you don't have to recalculate all your hashes you just update your reference list).

Same table or different table?

My application tracks purchases and sales of inventory. I can't decide if I should use separate tables with auto-increment or the same table with a distinguishing type field with a manual auto-increment id. The tables would store close to identical data. I'm worried that combining the two tables would make it harder to visualize inventory movement in the future. I understand that this is purely for human comfort but I'm not sure if there are other performance related issues as well. I would like to hear opinions from both ends.
Suppose I decide to combine my sales invoice and purchase order tables into the same table, there is a single difference in the columns required - purchase orders store the tax paid while sales invoices use a bool on whether the order is taxed. I have two options:
Use two fields - one bool and one decimal
Use the same field and type cast the values on the application level
Does anyone know if the second would cause more problems?
Thanks
If you're using a single-table design you probably need two columns, one for each purpose, where some records may use column A, and some column B, but the rest are shared.
The real question is if a purchase order and an invoice are really the same thing or not when speaking in terms of data and relationships. Normally a purchase order is related to an invoice, but an invoice may not have an associated purchase order. In some systems you will have a complex arrangement between multiple purchase orders and multiple invoices, it depends on the nature of what's being sold and how it's packaged.
If you're dealing with fairly granular things, like large, expensive items that can be tracked indidually, then your system can get into a lot of detail. If it's tracking inexpensive items sold in large quantities it gets pretty hard to manage that. Things get even more complicated if you're dealing with things that are made-to-order.
I'd do a lot more research about the types of situations you're likely to encounter, order those by probability, and then test your schema against the ugliest cases you're likely to encounter.

Database design to get as many stats as possible

I have to structure a MySQL database for work and haven't done that in years. I'd love to get some ideas from you. So here's the task:
I have a couple of "shops" that have, depending on the day of the week and year, different opening hours, which could change further down the line. The shops have
space for a given amount of people (which could change later as well).
A few times a day we count the amount of people in the shop.
We want to compare the utilized capacity between shops. I myself would like to use dc.js to be able to get as much stats as possible from the data.
We also have two different methods of counting our users:
By hand. Reliable, but time consuming.
Light barrier. Automatic, but very inaccurate.
I'd like to get a better approximation of the usercount using the light barrier data and some machine learning algorithm.
Anyway, do you have any tips on how to design the DB as efficiently as possible for my tasks. I was thinking:
SHOP
Id
Name
OPENINGHOURS
Id
ShopId
MaxUsers
Date
Open
Close
MANUALUSERCOUNT
Id
ShopId
Time
Count
AUTOUSERCOUNT
ID
ShopId
Time
Count
Does this structure make sense (at all and for my tasks)?
Thank you!
For an application of this size, I see no problem with this at all. Except what does "time" column in usercount tables refer to ?

Need a database design advice - query vs. additional column

I have following tables:
Customer(customer_id) - 1000 rows (1000 customers)
Invoice(invoice_id, customer_id) - 1000000 rows (1000 invoices per customer)
Charge(charge_id, invoice_id, charge_amount) - 20000000 rows (20 charges per invoice)
Now, I am trying to produce a customer's invoice with it's total charge amount.
The resulting table would look something like this:
Customer_name | invoice_id | charge_total
test 1 $1000
test 2 $1200
test 3 $900
...
My question is, what is the best practice for database design for this case?
I am pondering over two options below:
Just run everything through a query?
Add "charge_total" column in Invoice table to save query processing time (20 times faster)
Thanks everybody!
There are two ways to look at this question. The database purist will say that derived or computed data is redundant and violates 3rd Normal Form. This is a concern in transactional systems where data is being edited, since normalization prevents you from falling into the trap of having self-conflicting data.
On the other hand, there is a practical view which says that data which is written once and never updated is not subject to update and delete anomalies anyway, so redundancy costs disk space, but is not otherwise a risk.
As a rule, I always design databases to be normalized first and then introduce redundancy on a limited basis, after careful examination of the competing risks.
This is hard to answer - do you know that you have a performance problem? I'd not optimize unless I really, really had to.
And even then, I would consider an "invoice archive" table to hold the computed values. Logically, there's nothing wrong in calculating summaries and storing them in a table to reflect the amount that was actually invoiced - including tax, shipping etc. This means you can store an archive version of the invoice data without having to worry about.
I'd not want to store it in the main "invoice" table unless invoices are immutable - you create it, and nothing ever changes from the moment it's created. That doesn't work if you have a business process in which invoices are created in advance and items are added to it over time.
This decision comes down to the tradeoff of speed for your users vs additional complexity in your database that makes your code more susceptible to errors. It reminds me of this discussion:
https://stackoverflow.com/questions/211414/is-premature-optimization-really-the-root-of-all-evil
In your case, since you've already done the performance testing, I feel like denormalizing your database like you suggest is a good thing.
One thing you want to keep in mind, is how often does the data change that would affect the value of "charge_total"? For example, if an item is returned, does that charge get taken off the invoice at a later date? If things do change often, you'll have to keep in mind the overhead of having those change events responsible for updating the "charge_total" field.
First you should check if the performance without an additional column is sufficient in your case. If it is not, then, and not before (!), you should check if your "20 times faster" guess is really correct. Try to add a View to your database for your charge_total and test how your DB system handles that view. I don't know MySql enough, but some modern DB systems are able to do internal caching of view data as long as the source data does not change.
When you have done that, and you are sure the additional column charge_total is a solution for a problem you really have, then you should make sure that those redundant data is hold consistent. You can do this on the DB side (using triggers), or on the client side - when you have the one-and-only process that changes the charges table under your control.
Making charge_total a calculated column in the invoice table would probably be the easiest way I can think of. It would save you from doing that calculation each time you ran the query to get the values, which I'm assuming happens more frequently that adding a charge.
Nowadays disk space is cheap so you do not have to worry about size. If the extra column improves the performance, just go with it.

Implementing A Ranking System

I've seen several question on how to secure and prevent abuse of ranking systems (like staring movies, products, etc) but nothing on actually implementing it. To simplify this question, security is not a concern to me, the people accessing this system are all trusted, and abuse of the ranking system if it were to happen is trivial and easier to revert than cause. Anyways, I'm curious how to store the votes.
One thought is to have a votes table, that logs each vote, and then either immediately, at scheduled times, or on every load of the product (this seems inefficient, but maybe not) the votes are tallied and a double between 0 and 5 is updated into the product's entry in the product table.
Alternatively, I store in the products table a total score and a number of votes, and just divide that out when I display, and add the vote to total and increment number when someone votes.
Or is there a better way to do it that I haven't though of? I'd kind of like to just have a 'rating' field in the product table, but can't think of a way to update votes without some additional data.
Again, data integrity is important, but by no means necessary, any thoughts?
I would keep a "score" with your products but would also keep a vote table to see who voted for what. And when somebody votes, Insert vote, update product score.
This allows quick sorting and you also have a table to be able to recalculate the scores from and to stop people double-voting.
There is no need to wait to write the vote and update the scores. That will introduce problems and if it's acting like a traditional system (lots more reads than writes), gives you no benefits.
you mean, you'll store the votes seperately in a table and then update the respective ranking of product in product's table with a defined strategy?
That seems like an inefficient way of storing it. Maybe there is a background to that reason; but why would you not want to store all votes in one table and keep making references of those votes to respective product. This gives you a real time count.
On UI you'll calculate a average of all the votings to a near integer to show. That would suffice, isn't it? Or am I missing something?
I agree with Oli. In addition, you can cache your score. So you update the product score in the cache and your application always picks up the cache value. Thus even on a page refresh, you would get the latest score without hitting the database.