mysql for payment system? - mysql

I'm trying to create a payment system for my website. The website is a market place for 3d printing blueprint. Users buy credits on my website. When a user purchase a 3d printing blueprint uploaded by other user, it creates a new tuple or a row in the 'purchased' table while deducting credit in user credit table. Here's the important part. My gut tells me to use event scheduler to mark rows of purchased as payed every month and wire the sum of money earned by each seller. My worry is the table will grow infinitely as months pass by.
Is this the right implementation?
Or can I somehow create a new table each month that holds transactions for only this month?
Is there a Nosql equivalent to this?

Stripe.com or Braintree.com might be good options for you.
It is not advisable to create or roll your own payments implementation. These established services not only handle the PCI compliance aspect of payments, but they also have direct support for the use case you're asking about.
In an effort to answer your question further - it's probably not going to be an issue from the stand point of performing inserts into this MySQL table or in terms of iterating across it for batch processing. Querying on the other hand will become more onerous as the data set gets very large.
You can use partitioning in MySQL and perform the partitioning based on date but I doubt this is something you should spend your time accomplishing at this point. Wait until your site blows up and is super popular then come back and update your schema and configuration to meet your actual usage demands.
It's worth noting that you'll also want to make sure to take regular backups of something as important of payments information. Typically you'd also see at least one replica for something this critical.
Again I don't think you should try and solve this yourself. Just pay for a service that does this for you and focus on building the best 3d blueprint marketplace.

Related

How to: log and anlyze clicks, pageviews and sessions to optimize conversion

We have a medium size e-commerce site. We sell books. On said site we have promotions, user recommendations, regular book pages, related books, etcetera. Quite similar to amazon.com except ofcourse the volume of the site.
We have a traditional LAMP setup, where the M still stands for MariaDB.
TPTB want to log and analyze user behaviour in order to optimize conversion.
Bottom line, each click has to be logged, I think. (I fear)
This will add up to a few million clicks every month. The system has to be able to go back in time at least 3 years.
Questions that might be asked the system are: Given a page (eg: homepage), and clicks on a promotional banner, which color of said banner gives the best conversion. Now split that question into new and returning customers. (Multi-dimensional or A/B-testing) Or, given a view of book A and B, which books do users buy next. The range of queries is going to be very wide. Aggregating the data will be pointless.
I have serious doubts about MySQL's ability to provide a good platform for storing, analyzing and querying this data. We could store the rows, feeding them to MySQL via RabbitMQ as to avoid delays, but query and analyze this data efficiently might not be optimal in MySQL, given 50M rows.
There have been a number of articles about using MongoDB to store analytical data. But all the posts seem to increment a counter in a document (pre-aggregating the data), which is not good enough for us.
The big question is: Is there any database (or other system) that is particularly well-suited to store and analyze data like this? Might MySQL still do the trick? Am I correct in my assessment that MongoDB probably might not be of any added value here?
If I understand correctly, then you only want to have reports with aggregated data done say once a day (As opposed to "live")? If that's the case, I would suggest to use Hadoop, as it allows you to run massive Map/Reduce jobs running this aggregations for you, and then present you with a report. At this amount of data, any "live" solution will just not work.
If you don't want to mess with the complexity of Hadoop and Map/Reduce, then perhaps MongoDB might work. It has quite a powerful aggregation framework that can be tasked to do many aggregations in a sort-of-live environment. It's not really meant for running at every pageview, but it's also not a "let's do this once a day" kinda thing. It depends a little bit on your data aggregation requirements whether the Aggregation Framework can help you, but if it doesn't, then MongoDB also supports Map/Reduce for some more complex tasks (at a slower pace). MongoDB is a quite a good fit, as you can have large write performance - if one node doesn't work, you can always shard to have higher write performance.
If your primary convern is to offer recommendations based on past user choices, you may also consider a graph database like Neo4j or FlockDB.
Those database would allow you to build relationship between buyers and the items they bought (which should be a lot less data to store, since you will have a lot less user data redundancies) which you can use to do some Triadic closure processes- In other words finding out what similar users bought that user 'A' did not buy yet.
I can not say I have done it yet, but I am also seriously looking into this.
Otherwise MongoDB in addition to the Map Reduce paradigm, has now (v 2.4.6) an Aggregation Pipeline Framework that I have found very powerful.

Building up an online administration service, what database strategy should I go for

I'm building up an online (paid) service used for business administration purposes. The database is structured like so:
I have a contacts table filled with persons, contact info and the like. Then I have a few other tables holding information about payments, agreements and appointments. Also statistics like how much money was transferred this month, how many hours worth of appointments this month and the like.
I'm using MySQL (but could also go for MSSQL or some other service if necessary) and I had no formal training in any programming language whatsoever (yet).
I'm building a WPF application for acces to this database. Also planning on building an app so users can access their data and plan new appointments and register payments on the go.
I'm going to go for a login system to verify their right to login and use my service.
My question is about how to structure this. I'm not an SQL expert nor have I had any formal training in SQL or any other programming language. What I do know though is that my client-side app is almost out of the alpha stage.
So far I have come up with two ways to structure this.
1. Users get a seperate database.
My original idea was to give each user a seperate database, this makes it easier to provide people with statistics. Also it makes it easier to spread the workload through multiple, seperate servers. People would login to a master/main server, where their login information is stored, fetch their server info and programatically be 'redirected' to their own database. Spreading these databases also make it easier to provide individual back-ups to users.
The down-side of this is the sheer quantity of databases I'd have to manage. I'm planning on ending up with hundreds of thousands of users. Let's just say I want the system to be able to provide to an infinite amount of users.
2. Everything is stored in one database.
It's also possible to store everything in one database. This would make the database structure somewhat more complicated (while it also makes the whole a lot simpler). I'd have to add 'AND consumer_ID='" + MyID + "' to every query. (Which ofcourse is possible) and add a few tables to handle statistics per user.
It would be simpler to provide every user with the same database updates. Maintenance would be easier.
The down-side of this is that it makes it harder to spread the workload to seperate servers, I'd have to build something to make it possible that seperate servers mirror each other. Also I'd have to make sure that the workload is automatically divided between the servers, instead of simply going for: Fill server with X databases, then new server, fill, new etc.
I'm not in the luxury of hiring someone with any SQL training.
The most important thing for me now is that the system can be easily maintained while still being safe and reliable. I'm an amateur developer, going to college next year. I don't want to spend 50% of my time maintaining the database.
I think I got the major part of the details you might need, if you need anymore please ask for them.
I thank you in advance :)
Just go with solution 2. The downside of spreading the workload to many servers is fullfilled by "partitioning", look here for a starting point: http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html
Partitioning would allow you for example to put all information of a table containing even IDs for consumers on the one, all other on the second server. Or whatever you want...
But i wouldn't start that complicated: do you need that now? It burdens you (either way) with such a big additional overhead! You can also look into the NoSQL database world for solutions that can be spread to as many servers as you want with low effort. You loose SQL and it's ACID features in the most cases; if you need those NoSQL is not an option.

Database structure for storing Bank-like accounts and transactions

We're in the process of adding a bank-like sub-system to our own shop.
We already have customers, so each will be given a sort of account and transactions of some kind will be possible (adding to the account or subtracting from it).
So we at least need the account entity, the transaction one and operations will then have to recalculate overall balances.
How would you structure your database to handle this?
Is there any standard bank system have to use that I could mock?
By the way, we're on mysql but will also look at some nosql solution for performance boost.
I don't imagine you would need NoSQL for any speed boost, since it's unlikely to need much/any parallelism and not sure how non-schema-oriented you might need to be. Except when you start getting into complex business requirements for analysis across many million customers and hundreds of millions of transactions, like profitability, and even then that's kind of a data warehousing-style problem anyway which you probably wouldn't run on your transactional schema in the first place if it had gotten that large.
In relational designs, I would tend to avoid any design which requires balance-recalculation because then you end up with balance-repair programs etc. With proper indexing and a simple enough design, you can do a simple SUM on the transactions (positive and negative) to get a balance. With good consistent sign conventions on the transactions (no ambiguity on whether to add or subtract - always add the values) and appropriate constraints (with limited number of types of transactions, you can specify with constraints that all deposits are positive and all withdrawals are negative) you can let the database ensure there are no anomalies like negative deposits.
Even if you want to cache the balance in some way, you could still rely on such a simple mechanism augmented with a trigger on the transaction table to update an account summary table.
I'm not a big fan of putting any of this in a middle layer outside of the database. Your basic accounting should be fairly simple that it can be handled within the database engine at speed so that anyone or any part of the application executing a query is going to get the same answer without any client-code logic getting involved. And so the database ensures integrity at a level slightly above referential integrity (accounts with non-zero balance might not be allowed to be closed, balances might not be allowed to go negative etc) using a combination of constraints, triggers and stored procedures, in increasing order of complexity as required. I'm not talking about all your business logic, just prohibiting low-level situations you feel the database should never get into due to bad client programming or a failure to do things in the right order or call things with the right parameters.
In real banking (i.e. COBOL apps) typically the database schema (usually non-relational and non-normalized - a lot of these things predate SQL) you see a lot of things like 12 monthly buckets of past balances which are updated and shifted when the account rolls over. Some of the databases these systems use are kind of hierarchical. And this is where the code is really important, because everything gets done in code. Again, it's kind of old-fashioned and subject to all kinds of problems (i.e. probably a lot like what NatWest is going through) and NoSQL is a trend back towards this code-is-king way of looking at things. I just tend to think after a long time working with these things - I don't like having systems with balances cached and I don't like systems where you really don't have point-in-time accountability - i.e. you ignore transactions after a certain date and you can see EXACTLY what things looked like on a certain date/time.
I'm sure someone has "standard" patterns of bank-like database design, but I'm not aware of them despite having built several accounting-like systems over the years - accounts and transactions are just not that complex and once you get beyond that concept, everything gets highly customized.
For instance, in some cases, you might recognize earnings on contracts on some kind of schedule according to GAAP and contracts which are paid over time. In banking you have a lot of interest-related things with different interest rates for cost of funds etc. Everything just gets unique once you start mixing the business needs in with just the basics of the accounting of ins and outs of money.
You don't say whether or not you have a middle tier in your app, between the UI and the database. If you do, you have a choice as to where you'll mark transactions and recalculate balances. If this database is wholly owned by the one application, you can move the calculations to the middle tier and just use the database for persistence.
Spring is a framework that has a nice annotation-based way to declare transactions. It's based on POJOs; an alternative to EJBs. It's a three legged stool of dependency injection, aspect-oriented programming, and great libraries. Perhaps it can help you with both structuring and implementing your app.
If you do have a middle tier, and it's written in an object-oriented language, I'd recommend having a look at Martin Fowler's "Analysis Patterns". It's been around for a long time, but the chapter on financial systems is as good today as it was when it was first written.

Online Security and Storing Monetary Values in a Database

I'm building an e-commerce app for an online store, and I'm planning to have a credit system so customers can earn credits to purchase products.
Handling credits I have no problems with, but I'm uneasy about the idea of storing values with actual monetary value in my MySQL database.
Currently I'm planning to have a table for Credits with a foreign key that links it to a user, so I can figure out a user's amount of Credits by a single JOIN.
I just wanted to ask if there are things that I should be careful with, lest I leave vulnerabilities that could be avoided.
Thanks!
First, as Fernando mentions in the comments, use decimal to store the value.
Have audit trails, that way you can go back and determine why a value is the value it is.
Your biggest challenge will be making sure your system is secure, not so much how it is stored (although that obviously comes into security as well). Make sure the app is tested, perhaps with a proper pen testing tool to start with. Make sure the product machine is locked down and audited.
It (almost) goes without saying that reliable and tested backups are extremely important when dealing with something of value.
I'm also assuming that you are not handling credit card transactions directly? Just in case you are, I urge you to reconsider and use a third party because there is a lot that can go wrong for you (or your customers). Plus, you don't want to have to deal with PCI-DSS.

DB design and optimization considerations for a social application

The usual case. I have a simple app that will allow people to upload photos and follow other people. As a result, every user will have something like a "wall" or an "activity feed" where he or she sees the latest photos uploaded from his/her friends (people he or she follows).
Most of the functionalities are easy to implement. However, when it comes to this history activity feed, things can easily turn into a mess because of pure performance reasons.
I have come to the following dilemma here:
i can easily design the activity feed as a normalized part of the database, which will save me writing cycles, but will enormously increase the complexity when selecting those results for each user (for each photo uploaded within a certain time period, select a certain number, whose uploaders I am following / for each person I follow, select his photos )
An optimization option could be the introduction of a series of threshold constraints which, for instance would allow me to order the people I follow on the basis of the date of their last upload, even exclude some, to save cycles, and for each user, select only the 5 (for example) last uploaded photos.
The second approach is to introduce a completely denormalized schema for the activity feed, in which every row represents a notification for one of my followers. This means that every time I upload a photo, the DB will put n rows in this "drop bucket", n meaning the number of people I follow, i.e. lots of writing cycles. If I have such a table, though, I could easily apply some optimization techniques such as clever indexing, as well as pruning entries older than a certain period of time (queue).
Yet, a third approach that comes to mind, is even a less denormalized schema where the server side application will take some part of the complexity off the DB. I saw that some social apps such as friendfeed, heavily rely on the storage of serialized objects such as JSON objects in the DB.
I am definitely still mastering the skill of scalable DB design, so I am sure that there are many things I've missed, or still to learn. I would highly appreciate it if someone could give me at least a light in the right direction.
If your application is successful, then it's a good bet that you'll have more reads than writes - I only upload a photo once (write), but each of my friends reads it whenever they refresh their feed. Therefore you should optimize for fast reads, not fast writes, which points in the direction of a denormalized schema.
The problem here is that the amount of data you create could quickly get out of hand if you have a large number of users. Very large tables are hard on the db to query, so again there's a potential performance issue. (There's also the question of having enough storage, but that's much more easily solved).
If, as you suggest, you can delete rows after a certain amount of time, then this could be a good solution. You can reduce that amount of time (up to a point) as you grow and run into performance issues.
Regarding storing serialized objects, it's a good option if these objects are immutable (you won't change them after writing) and you don't need to index them or query on them. Note that if you denormalize your data, it probably means that you have a single table for the activity feed. In that case I see little gain in storing blobs.
If you're going the serialized objects way, consider using some NoSQL solution, such as CouchDB - they're better optimized for handling that kind of data, so in principle you should get better performance for the same hardware setup.
Note that I'm not suggesting that you move all your data to NoSQL - only for that part where it's a better solution.
Finally, a word of caution, spoken from experience: building an application that can scale is hard and takes time better spent elsewhere. You should spend your times worrying about how to get millions of users to your app before you worry about how you're going to serve those millions - the first is the more difficult problem. When you get to the point that you're hugely successful, you can re-architect and rebuild your application.
There are many options you can take
Add more hardware, Memory, CPU -- Enter cloud hosting
Hows 24GB of memory sound? Most of your importantly accessed DB information can fit just in memory.
Choose a host with expandable SSDs.
Use an events based system in your application to write the "history" of all users. So it will be like so: id, user_id, event_name, date, event_parameters' -- an example would be: 1, 8, CHANGED_PROFILE_PICTURE, 26-03-2011 12:34, <id of picture> and most important of all, this table will be in memory. No longer need to worry about write performance. After the records go past i.e. 3 days they can be purged into another table (in non-memory) and included into the query results, if the user chooses to go back that far. By having all this in one table you remove having to do multiple queries and SELECTs to build up this information.
Consider using INNODB for the history/feeds table.
Good Resources to read
Exploring the software behind Facebook, the world’s largest site
Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
Caching & Performance: Lessons from Facebook
I would probably start with using a normalized schema so that you can write quickly and compactly. Then use non transactional (no locking) reads to pull the information back out making sure to use a cursor so that you can process the results as they're coming back as opposed to waiting for the entire result set. Since it doesn't sound like the information has any particular critical implications you don't really need to worry about a lock of the concerns that would normally push you away from transactional reads.
These kind of problems are why currently NOSql solutions used these days. What I did in my previos projecs is really simple. I don't keep user->wall user->history which contains purely feed'ids in memory stores(my favorite is redis). so in every insert I do 1 insert operation on database and (n*read optimization) insert operation in memory store. I design memory store to optimize my reads. if I want to filter user history (or wall) for videos I put a push feedid to a list like user::{userid}::wall::videos.
Well ofcourse you can purely build the system in memstores aswell but its nice to have 2 systems doing what they are doing the best.
edit :
checkout these applications to get an idea:
http://retwis.antirez.com/
http://twissandra.com/
I'm reading more and more about NoSQL solutions and people suggesting them, however no one ever mentions drawbacks of such choice.
Most obvious for me is lack of transactions - imagine if you lost a few records every now and then (there are cases reporting this happens often).
But, what I'm surprised with is that no one mentions MySQL being used as NoSQL - here's a link for some reading.
In the end, no matter what solution you choose (relational database or NoSQL storage), they scale in similar manner - by sharding data across network (naturally, there are more choices but this is the most obvious one). Since NoSQL does less work (no SQL layer so CPU cycles aren't wasted on interpreting SQL), it's faster, but it can hit the roof too.
As Elad already pointed out - building an app that's scalable from the get go is a painful process. It's better that you spend time focusing on making it popular and then scale it out.