design guidelines for heavy traffic - web-traffic

I am working on a website for movie reviews. In the current design, the front-end( jQuery, HTML, CSS) connects with the model (individual php scripts which deals with MYSQL database) for basic storage and retrievals. This trivial design will work for small traffic.
My concern is how to tackle problems related to heavy traffic with many requests coming in at the same time. What are the design changes i should do to the model part to handle heavy traffic and make the system scalable?
PS: please let me know if you need more info.
Thank you

http://redis.io/ - Redis allows your database to be memory-resident with lazy writes back to disk. This greatly improves DB performance. Pour in a bucket of RAM first. Memcached is also a popular tool, but not as feature-packed.

Related

Session storage preferences in node.js

I have a node.js application that uses a MySQL database. I wanted to know what would be a good place for storing the sessions?
My application is actually a final project for one of my courses, but it could be a real world application later, as we are re-writing a software that is currently used by the university. I can use MySQL for session store, but I want to make my application using the most reliable or best practice in my situation.
I have read many posts/answers/forums, and the opinion is divided. Using another technology like Memcached/MemcacheDB or Redis, just for session store, would it be a recommended approach? Or should I just stick to MySQL, and later deal with scaling if the server load increases?
Even if the application is later used in real world, it would only be used by the undergraduate university students and faculties, so the users are sort of limited.
As of now, I'm leaning towards MySQL for the session store.
I am replying under the assumption that you are using MySQL throughout the whole application.
If the application will be used in the context of your university possibly it will not have scaling issues. SQL databases are not bad, they are able to handle quite a lot of data efficiently, you just need to be careful in the first place and to create efficient queries. Be careful with the joins because can really kill the server. You need to analyze quite a lot your application. For example, why do you think that you will have scaling/performance issues on the sessions and not in another place of your application? Do a bit of load testing, get some metrics and try to understand if you need it or no.
If you are a student though and you don't have prior experience with redis, I would go with redis because it is good to work with a new technology and gain a bit more of experience :)

mobile application and database interaction design

I'm involved in a project thats going to be developing a mobile application which will be interacting with a remote database. I'm wondering what the best way to design the interaction between these devices and the database will be, to make best use of battery life, bandwidth etc.
Should we have a server side application/set of scripts which do all the database interaction and then send the data back to our application as XML in a response, or should we have the devices querying the database directly? Or is there another better way to approach this?
My feeling atm is that the former will be a bit more work, but will reduce the workload on the devices, saving power etc, which would be a better way to go.
Thanks!
What do you mean by "have the devices querying the database directly". It is normally considered a bad idea (for security reasons) to expose your database server directly to the Internet. The standard approach is to provide Web Services that connect to the database and return data as JSON or some other format.
Performance is a big issue with these devices and I have found that they are very slow to process XML so you may be better to try using JSON. Also I suggest to stay clear of data sets and data tables. Collections of plain business objects run much faster.
You might want to have a look at this:
What are the most valuable .Net Compact Framework Tips, Tricks, and Gotcha-Avoiders?

SQL Server vs. NoSQL

So I have a website that could eventually get some pretty high traffic. My DB implementation is in SQL Server 2008 at the moment. I really only have 2 tables and a few stored procs. Most of the DB could be re-designed to work without joining (although it wouldn't make sense when I can join so easily within SQL Server).
I heard that sites like Digg and Facebook use NoSQL databases for a lot of their basic data access. Is this something worth looking into, or will SQL Server not really slow me down that bad?
I use paging on my site (although this might change in the future), and I also use AJAX'd data access for most of the "live" stuff, so it doesn't really seem to be a performance hindrance at the moment, but I'm afraid it will be as the data starts expanding exponentially.
Am I going to gain a lot of performance my moving to NoSQL? Honestly, right now I don't even completely understand NoSQL, so any tips on how this will help me improve the better.
Thanks guys.
Actually Facebook use a relational database at its core, see SOCC Keynote Address: Building Facebook: Performance at Massive Scale. And so do many other web-scale sites, see Why does Quora use MySQL as the data store instead of NoSQLs such as Cassandra, MongoDB, CouchDB etc?. There is also a discussion of how to scale SQL Server to web-scale size, see How do large-scale sites and applications remain SQL-based? which is based on MySpace's architecture (more details at Scale out SQL Server by using Reliable Messaging). I'm not saying that NoSQL doesn't have its use cases, I just want to point out that there are many shades of gray between white and black.
If you're afraid that your current solution will not scale then perhaps you should look at what are the factors that prevent scalability with your current solution. Test data is cheap to produce, load the 'exponentially increased' data volume and run your test harness, see where it cracks. None of the NoSQL solutions will bring magic off-the-shelf scalability, they all require you to understand how to use them effectively and deploy them correctly. And they also require you to test with large volumes if you want to ensure success at scale. Same for traditional relational solutions.
Sql Server scales pretty well. For example, Stack Overflow used it to serve you this very page. Facebook and Google might use a form of nosql, but even if you make it really big you're unlikely to rise to that level.
With a simple table structure and data that fits on one server, it doesn't matter much what platform you use. There are a several possible reasons to need to move to NoSQL:
Data scaling - SQL works best when all the data fits on one server (up to a few TB). The reason a lot of NoSQL stores don't have join is that they were designed not to require all the objects to be on one server.
Performance scaling - NoSQL stores do tend to be faster at handling high traffic, but not necessarily by enough to matter. You can improve SQL performance quite a lot with replication and caching as long as you aren't running into data size issues. Writes generally do have to run on the one server, but in most cases you will need to improve read performance long before write performance becomes an issue.
Complex data access - some types of queries simply don't fit well into a relational model. Graph and set stores work quite differently from relational databases so are a better fit for some applications.
Easier development - If you don't already have a SQL database and all the code to support it, using a schemaless datastore can save quite a bit of development time.
I don't think so you have to move your database from SQL to NoSQL unless and untill you are serving thousands of TB data. If you properly normalize your tables and serve the data and also need to set proper archive mechanism it should work.
If you still have question what to choose and how, than check this. Let's assume that you have decided to move on to NoSQL database than there are lot of market player. Just have a look at the list which is again depending upon your need and type of data you have.
Am I going to gain a lot of performance my moving to NoSQL?
It depends.
Check out this article for 7 reasons when you DON'T want to use NoSQL. If none is your case, then read further.
The main advantage of Document-based NoSQL for the traditional enterprise needs is cheaper hosting at high scale due to lower CPU usage on querying denormalised data (the most often request). Key points:
The CPU is going nuts on JOINs and GROUP BYs in the SQL queries, when a denormilised data structure implies no/less JOINs, hence less stress on CPU.
CPU is the most expensive resource in the cloud, then storage is the cheapest. And denormalised data trades higher storage for lower CPU.
How to get there?
Master the DDD (Domain-Driven Design).
Gain good understanding of CQRS (Command Query Responsibility Segregation) and Eventual consistency.
Understand your domain and business processes.
Design model, which is tuned to the access patterns.
Review.
Repeat steps 3 - 5.

What database systems should a startup company consider?

Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough.
This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?
The other answers here have focused mainly on the technical aspects, but I think there are important points to be made that focus on the startup company aspect of things:
Availabililty of talent. MySQL is very common and you will probably find it easier (and more importantly, cheaper) to find developers for it, compared to the more rarified database systems. This larger developer base will also mean more tutorials, a more active support community, etc.
Ease of development. Again, because MySQL is so common, you will find it is the db of choice for a great many systems / services. This common ground may make any external integration a little easier.
You are preparing for a situation that may never exist, and is manageable if it does. Very few businesses (nevermind startups) come close to MySQL's limits, and with all due respect (and I am just guessing here); the likelihood that your startup will ever hit the sort of data throughput to cripple a properly structured, well resourced MySQL db is almost zero.
Basically, don't spend your time ( == money) worrying about which db to use, as MySQL can handle a lot of data, is well proven and well supported.
Going back to the technical side of things... Something that will have a far greater impact on the speed of your app than choice of db, is how efficiently data can be cached. An effective cache can have dramatic effects on reducing db load and speeding up the general responsivness of an app. I would spend your time investigating caching solutions and making sure you are developing your app in such a way that it can make the best use of those solutions.
FYI, my caching solution of choice is memcached.
So far no one has mentioned PostgreSQL as alternative to MySQL on the relational side. Be aware that MySQL libs are pure GPL, not LGPL. That might force you to release your code if you link to them, although maybe someone with more legal experience could tell you better the implications. On the other side, linking to a MySQL library is not the same that just connecting to the server and issue commands, you can do that with closed source.
PostreSQL is usually the best free replacement of Oracle and the BSD license should be more business friendly.
Since you prefer a non relational database, consider that the transition will be more dramatic. If you ever need to customize your database, you should also consider the license type factor.
There are three things that really have a deep impact on which one is your best database choice and you do not mention:
The size of your data or if you need to store files within your database.
A huge number of reads and very few (even restricted) writes. In that case more than a database you need a directory such as LDAP
The importance of of data distribution and/or replication. Most relational databases can be more or less well replicated, but because of their concept/design do not handle data distribution as well... but will you handle as much data that does not fit into one server or have access rights that needs special separate/extra servers?
However most people will go for a non relational database just because they do not like learning SQL
What do you think is a significant amount of data? MySQL, and basically most relational database engines, can handle rather large amount of data, with proper indexes and sane database schema.
Why don't you try how MySQL behaves with bigger data amount in your setup? Make some scripts that generate realistic data to MySQL test database and and generate some load on the system and see if it is fast enough.
Only when it is not fast enough, first start considering optimizing the database and changing to different database engine.
Be careful with NHibernate, it is easy to make a solution that is nice and easy to code with, but has bad performance with large amount of data. For example whether to use lazy or eager fetching with associations should be carefully considered. I don't mean that you shouldn't use NHibernate, but make sure that you understand how NHibernate works, for example what "n + 1 selects" -problem means.
Measure, don't assume.
Relational databases and NoSQL databases can both scale enormously, if the application is written right in each case, and if the system it runs on is properly tuned.
So, if you have a use case for NoSQL, code to it. Or, if you're more comfortable with relational, code to that. Then, measure how well it performs and how it scales, and if it's OK, go with it, if not, analyse why.
Only once you understand your performance problem should you go searching for exotic technology, unless you're comfortable with that technology or want to try it for some other reason.
I'd suggest you try out each db and pick the one that makes it easiest to develop your application. Go to http://try.mongodb.org to try MongoDB with a simple tutorial. Don't worry as much about speed since at the beginning developer time is more valuable than the CPU time.
I know that many MongoDB users have been able to ditch their ORM and their caching layer. Mongo's data model is much closer to the objects you work with than relational tables, so you can usually just directly store your objects as-is, even if they contain lists of nested objects, such as a blog post with comments. Also, because mongo is fast enough for most sites as-is, you can avoid dealing the complexities of caching and generally deliver a more real-time site. For example, Wordnik.com reported 250,000 reads/sec and 100,000 inserts/sec with a 1.2TB / 5 billion object DB.
There are a few ways to connect to MongoDB from .Net, but I don't have enough experience with that platform to know which is best:
Norm: http://wiki.github.com/atheken/NoRM/
MongoDB-CSharp: http://github.com/samus/mongodb-csharp
Simple-MongoDB: http://code.google.com/p/simple-mongodb/
Disclaimer: I work for 10gen on MongoDB so I am a bit biased.

Design of the recommendation engine database?

i am currently working on recommendation systems especially for audio files.but i am a beginner at this subject.i am trying to design database first with mysql but i cant decide how to do it.İt is basicly a system which users create profile then search for the music and system recommend them music similar to they liked.
which database should i use ?(Mysql
comes my mind as a first guess)
it is a web project and also then
with mobile side.Which technologies
should i use?(php,android
platform...)
what are the pitfalls of this
project.
how to design database for system
like that?
Any relational database should be good for storing the raw data like lists of songs, list of users, users' song preferences..
I think that you'll find that a relational databases (and SQL) are not that great for storing the various data structures that your recommender will be constructing. Your recommendation engine will probably creating data that doesn't really need to be in tables and manipulating it for storage in a relational database may just be wasted work.
Just be aware of what you are doing and don't spend time putting stuff into a SQL database if it feels wrong. Maybe look into using a document oriented database like MongoDB.
The recommender that I recently wrote is actually a Java server process that reads in the raw data from MySQL, does all of its work in-memory, and provides recommendation data to my application via an HTTP API. I didn't even bother storing the recommendation data permanently since it can be regenerated.
Go read "Programming Collective Intelligence". They have a number of fine algorithms for recommendations in Chapter 2, "Making Recommendations".
Well, this is a vague question and a half, but I'll do my best to answer:
MySQL is a solid database, and so is PostgreSQL. Both are free and open sourced. MySQL is more widely supported and a little easier to use, but Postgres has some very cool features and functionality that's worth taking a gander at. WikiVS has a good comparison of the two.
Smartphones are having better and better browsers. Use PHP or ASP.NET (whatever you're comfortable with), and then build out a mobile site which looks better on the smaller resolutions.
There are a lot. First and foremost, how good is your recommendation algorithm? Secondly, storing audio files can eat up storage space quickly. What's your plan for scaling? Thirdly, how well do you know database design? Can you design a large, hefty database and index it properly? If not, you need to start reading everything you can on indices and database design. Fourthly, it's a software project, and those always have pitfalls. The best you can do is post here when problems arise and we can always see what the fine people of StackOverflow can do to help.