mysql sharding case study link or paper - mysql

I have been going through the book "High performance mysql", its really a nice book. But the only concern for myself is the MySQL sharding part. Even though there are a lot of theories but the practical implementation is lacking and some of the aspects are also like blackbox (arranging shrds on node). It would be great if somebody can point me to some case study article or paper so that i can under it properly.
Thanks in advance!!

I found one [link] (http://tumblr.github.com/assets/2011-11-massively_sharded_mysql.pdf). please share more if somebody has. Thanks.

Yes, "sharding" is rather a design/development pattern... It's not a database feature of any kind, I would call it "it's like the database had outsourced the scale-out capability to the application".
I work for ScaleBase (http://www.scalebase.com), which is a maker of a complete scale-out solution an "automatic sharding machine" if you like, analyzes the data and SQL stream, splits the data across DB nodes, load-balances reads, and aggregates results in runtime – so you won’t have to!
No code changes, everything continues to work with “1 database”. Your application or any other client tool (mysql, mysqldump, PHPMyAdmin...) connects to ScaleBase controller (looks and feels like a MySQL), which is a proxy to a grid of "shards", automating command routing and parallelizing cross-db queries, and merge results – exactly as if the result came from 1 database. ORDER, GROUP, LIMIT, agg functions supported!
Also, please visit my blog, http://database-scalability.blogspot.com/, all about scalability...
ScaleBase my company had a webinar not so long ago, specifically about sharding and data distribution. Amazingly it's not (yet?) in the http://www.scalebase.com/resources/webinars/. I'll see if they can upload it, or I'll have the slides attached here, or similar. Stay tuned!
Hope I helped...
Doron

Related

MySql vs NoSql - Social network comments and notifications data structure and implementation

I am really finding it tough to figure out the insights about how does a social networking site (Facebook being a reference) manage their comments and notifications for its users.
How would they actually store the comments data? also how would a notification be stored and sent to all the users that. An example scenario would be that a friend comments on my status and everyone that has liked my status including me gets a notification for that. Also each user has their own read/unread functionality implemented, So I guess there is a notification reference that is stored for each user. But then there would be a lot of redundancy of notification information. If we use a separate table/collection to store these with reference of actual notificatin, then that would create realtime scalability issues. So how would you decide which way to trade-off. My brain crashes when I think about all this. Too much stuff to figure with not a ot of help available over the web.
Now how would each notification be sent to the all the users who are supposed to receive that.. and how would the data structure look like.
I read a lot of implementations those suggest to use MySql. My understanding was that the kind of data (size) that is, it would be better to use a NoSql for scalability purpose.
So how does MySql work well for such use cases, and why is a NoSql like Mongo not suggested anywhere for such implementation, when these are meant to be heavily scalable.
Well, I know a lot of questions in one. But I am not looking for a complete answer here, insights on particular things would also be a great help for me to build my own application.
The question is extremely broad, but I'll try to answer it to the best of my ability.
How would they actually store the comments data? also how would a notification be stored and sent to all the users that.
I generally don't like answering questions like this because it appears as if you did very little research before coming to SO. It also seems like you're confused with application and database roles. I'll at least start you off with some material/ideas and let you decide on your own.
There is no "silver bullet" for a backend design, especially when it comes to databases. SQL databases are generally very good at most database functionality, and rightfully so; it's a technology that is very mature and has stood the test of time for a reason. Most NOSQL solutions are specialized for particular purposes. For instance: if you were logging a lot of information, you might want to look at Cassandra. If you were dealing with a lot of relational data, you would want to use something like Neo4j (or PostgreSQL/MySQL for RMDBS). If you were dealing with a lot of real-time data, you might want to look at Redis.
It's dumb to ask NOSQL vs SQL for a few reasons:
NOSQL is a bad term in general. And it doesn't mean "No SQL". It means "Not Only SQL". Unfortunately the term has encapsulated even the most polar opposite of databases.
Only you know your application's full functionality. Even if I knew the basics of what you wanted to achieve, I still couldn't give you a definitive answer. Nor can anyone else. It's highly subjective, and again, only YOU know EXACTLY what your application should do.
The biggest reason: It's 2014. Why one database? Ten years ago "DatabaseX vs DatabaseY" would have been a practical question. Now, you can configure many application frameworks to reliably use multiple databases in a matter of minutes. Moral of the story: Use each database for its specialized purpose. More on polyglot persistence here.
As far as Facebook goes: a five minute Google search reveals what backend technologies they've used in the past, and it's not that difficult to research some of their current backend solutions. You're not Facebook. You don't need to prepare for a billion users right now. Start with simple, proven technologies. This will let you naturally scale your application. When those technologies start to become a bottleneck, then be worried about scalability.
I hope this helped you with starting your coding journey, but please use Stack Overflow as a last resort if you're having trouble with code. Not an immediate go-to.

Is RavenDB just a frontend for Access?

I've started using Raven for my last project. When my boss learned about it, he mentioned it's based on Access and he had very bad experience with multiple users and Access. Now I have to either switch or prove to him he is wrong.
No, it isn't. The confusion is because RavenDB can use ESENT for data storage and ESENT used to be called Jet Blue. It was called Jet Blue because it was originally developed to replace the Jet Red engine which was/is used in Access. The Wikipedia entry is quite accurate about the history and differences.
Laurion's answer is correct, but I also wanted to point out that in Raven you can swap out the ESENT storage engine for another that Oren developed called Munin.
From Ayende's blog post about Munin.
Raven.Munin is the actual implementation of a low level managed storage for RavenDB. I split it out of the RavenDB project because I intend to make use of it in additional projects.
At its core, Munin provides high performance transactional, non relational, data store written completely in managed code. The main point in writing it was to support the managed storage in RavenDB, but it is going to be used for Raven MQ as well, and probably a bunch of other stuff as well. I’ll post about Raven MQ in the future, so don’t bother asking about it.
Munin is a low level api, not something that you are likely to use directly. And it was explicitly modeled to give me an interface similar in capability to what Esent gives me, but in purely managed code.

geo spatial application: mySql vs CouchDB vs others

I am developing an application on google map and checking out various options to store and retrieve spatial information within a bounding box.
Initially I thought MySql was not a good option, but after checking http://dev.mysql.com/doc/refman/5.6/en/spatial-analysis-functions.html and http://code.google.com/apis/maps/articles/phpsqlsearch.html, looks like I can use MySql and it does support my use cases.
I was also evaluating node.js and couchdb with geocouch.. With modules like socket.io, geo etc looks like this is also a good choice. check out the book "Getting Started with GEO, CouchDB, and Node.js". My application would be 1 page application and I do not foresee if I would require rdbms anytime in future.
i have also seen this - http://nodeguide.com/convincing_the_boss.html and this makes me little apprehensive about whether to go with node.js-geocouch....
If the architecture for your next apps reads like the cookbook of
NoSQL ingredients, please pause for a second and read this.
Yes, Redis, CouchDB, MongoDB, Riak, Casandra, etc. all look really
tempting, but so did that red apple Eve couldn't resist. If you're
already taking a technological risk with using node.js, you shouldn't
multiply it with more technology you probably don't fully understand
yet.
Sure, there are legitimate use cases for choosing a document oriented
database. But if you are trying to build a business on top of your
software, sticking to conservative database technology (like postgres
or mysql) might just outweigh the benefits of satisfying your inner
nerd and impressing your friends.
What is your opinion ?
GeoCouch sounds like a good solution in your case. If you want to have an easy installation, you can have a look at Couchbase Single Server, which is basically a CouchDB with GeoCouch included (check out the Developer Preview for 2.0.

MySQL implementation with CUDA

I am a senior undergrad majoring in CS. At the moment I am taking a Computer Architecture class. We need to do a project. I want to do something related to CUDA, where the performance of the computation will have a moderate increase compred to a serial implementation.
I am really interested in databases so I decided to do something related to SQL. I only have experience with MySQL and I could not find anything related to how to work with MySQL using CUDA. There is only one reseasrch I could find about SQL and it uses SQLite. I am not sure what to do and how to gather information on this subject so I decided to take your opinions.
Best
Just in case someone end-up in this page, the PGStorm is a module of foreign data wrapper of PostgreSQL database.
You might want to look at implementation of SQL language which runs on GPU and uses CUDA.
it is open source so you can look at algorithms for joins, sorts and groupings.
Link :
http://sourceforge.net/projects/alenka/
Really? Google found this from NVIDIA:
http://forums.nvidia.com/index.php?showtopic=100342
They have a guide. Is that not suitable? It's certainly not for the faint of heart.
http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf
On the contrary to everyone else, while I'm not sure how, a GPU can work with MySQL... I dont know why every says it couldnt. If there is CPU workload in MySQL then whatever that CPU is doing, at some level, if someone took the time to implement it, a GPU could, for example, work on UPDATING separate rows on separate threads. Could either alloc each table to each block, or just freeform it and let the end user decide.
Or at least someone could edit the driver to speed up efficiency of communication.
It looks like that now there's a solution for querying data using SQL on gpus in python :
https://developer.nvidia.com/blog/beginners-guide-to-querying-data-using-sql-on-gpus-in-python/
I wonder if they are some other possibilities using different programming langages though.

Best Environment For Large Web Application

We are developing a web application, which will have a database with over 5 millon documents, all of them will be in various languages. The site is planned to have more than 3 million visits per month (hopefully more).
We need a stable and scalable solution.
We are now using Java EE over JBoss application server with PGSQL DB, but we would like to know if this fits the problem or there is a better solution, because the project is a the beginning and changes are yet viable.
Also, as many of us, doesn't have a lot of experience with this type of projects, the opinions of the ones who does, will be very useful!
I hope I made myself clear. Please let me know if you need more information.
Thanks in advance.
The architectural design considerations of your solution are probably more important than the choice of "platform". In other words, how are you going to make your application scale? Do you need to store distributed session? Do you need real-time database synchronization or something a little less up to date? How will you do request load balancing, or handle fail over? Can the business logic work over a distributed set of nodes/sites or whatever you envisage.
Once you have a design that suits your purposes then the choice of your implementation platform can be a better informed decision. Whether it's java, .net, rails or whatever doesn't really matter. They all have their strength and weaknesses, as do the members of your team. Use their strengths to guide this part of your decision making process. Don't try to learn a new technology in tandem with building what sounds like a fairly serious site.
I've used JBoss on a pretty large distributed ebook delivery system with tens of thousands of page views per day and it never missed a beat. Likewise I think Stack Overflow is a more than adequate example of the capabilities of the ASP.NET platform with regards to the numbers you are mentioning.
Hope that helps.
I personally would not take responsibility to offer own solution to a team without asking for advice from somewhere else first. Same way as chaKa does. What I would not do is to rely on one source of help making final decision.
You may need to consider following criteria:
How much time do you have? What is development plan? Should you start right away or you will be given time to learn.
Do you need framework? Are you expected to deliver quickly? How many requirements do you have? It all affects will it be framework based solution or from scratch.
Will you support project as well? How many people will do it? You need to know also will project grow slowly or it should be deployed quickly and forgotten.
What skills does your team have? What are they good at?
What would make you excited and want to do your best implementing solution?
I believe there is more to think about...