Multi-room chat logging in Rails/MySQL app - mysql

I'm coding a browser game application in Rails. It's chat-based, and I'm using currently MySQL for the databases. However, I'm running into a problem when it comes to chat logging for the games.
The application goals dictate that there will be multiple rooms at any given time in which people are playing the chat-based game. Conversation will be pretty much constant, and there are a number of other actions, such as private messages and game actions, which must be logged as well. Players who join the game after other players must be able to see the chat log from before they joined, and games must be available to review.
My first thought was to, on game start, create a database that matches the game identifier, and store everything there. Then when someone joins the game, we could just parse it back to them. Once the game had been over for a certain time, the script would take the database content, parse it into an XML object, and store this in a dataase for game review, deleting the table to keep things running lean.
I created a moddel called Message, with a matching table with identical columns for those I want to store in the game tables - id, timestamp, sender, target (for PMs and actions), type of message and content. Then I set the initializer for the Message object to set the table name to 'game_#{game_id}'. Rails however is throwing tantrums - I get an undefined method has_key? error when I try to initialize the object in Rails. It occurs to me based on this that the method I'm using may be a bit un-Rails-ian, and that possibly it defeats the purpose of working in Rails to pass up using the excellent object/db management features it has.
I've considered other alternatives, such as temporarily keeping all the messages in the main Messages table and just querying them by game ID, but I'm unsure if a MySQL table is up to the task of speedily serving up this data while accepting constant writes, especially in the event that we get a dozen or more games going at once averaging a message or two per second. It was suggested to me that a noSQL solution like a MongoDB capped collection for the temporary storage would be an excellent option from a performance standpoint, but that would still waste all that ActiveRecord goodness that Rails offers.
Is there a reliable and relatively fast way to meet the constraints of making the logged messages able to be quickly stored and fetched for quick access while the game is ongoing and then stored in some type of low-overhead method for review? Would any of the above ideas be workable or is there a whole separate option I've overlooked?

Related

How to make sure mySQL database takes values only from my game?

I have coded a simple snake like game in Java. I want to store the high scores of users in a database and so I have created a mySQL database. The problem I'm facing is having the program connect to the mySQL database without leaking the login information which will allow users to mess with the database. I have looked into it and everyone has suggested a web service in the middle between the game and the SQL database. However the problem is still there and anyone can feed data which can again mess up the database.
One option I've thought of is having the jar file uploaded along with the data to the web service. The web service then gets the hash value (was looking into SAH 512) and compares it to the hash value that it's suppose to get and if matches then it proceeds. But then someone can just reverse engineer my game and change the code a bit and send the original game file but send a different NAME+SCORE to the database. Also having people keep uploading files to the web service would be a huge pain on my network because it could handle only so much considering I'm using a home network.
I could encrypt some file with the password on but since the program is able to decrypt it then surely the user will be able to decrypt it and get the information as well.
Basically anyway I provide the program with a key to my database someone will be able to reverse engineer it and edit it so that they are able to access my database. That's my thought behind it. Anyway I provide the program a way to access the database someone will also be able to use that way to mess with the database if they wanted to.
There must be a way for me to be able to store information from my game onto a mySQL database and making sure that nobody is able to change stuff around it but only the program that I've made is able to. Somehow to hide the details behind a service or something. How the hell do other people do it? Can I have some guidance? Any ideas are welcome.
Typically, applications shouldn't make direct connections to a database, rather they make calls to a server that has db access.
In fact, you can not get a guarantee for that. The application can always be decompiled and its behavior changed.
If you accept this limit, you can guarantee a certain level of security by using a middle layer (web service). Here, you can guarantee a secure transport layer (SSL) and trustworthy login through a user certificate placed in the app. You can also add a digital signature to the sent data.
This will give you a fairly reliable system whose drilling will need to decompile the application.
This is an age old problem, and there is no easy solution to it. For some scenarios, it can be solved, but for others it cannot.
The basic rule is, anything you do in the client is pretty much useless in terms of security. As you said, anybody can decompile and analyse your client-side code, obtain any secrets and so on. The user has full control of the client, and therefore, you cannot trust data (scores) sent by the client.
So you need a server-side solution, and that's where it starts to get interesting. What you can actually do is you can send the whole flow of the game (all the events) to the server to be able to fully (or at least in part) reconstruct what happened - server side. The score doesn't even need to be sent then, you can calculate it from how the game went in the client.
For single player skill games, this doesn't help, the user can still construct the whole flow like he was playing really well. But on the one hand, that's much harder than just to fake a score, and on the other hand, there may be games, where coming up with a solution is the point. Especially for games that are NP-hard, finding the solution is indeed the trick, verifying it is easy.
Also for multiplayer games, this would help, because if all players submit the flow, those flows received must match, otherwise there is a cheater (or all the players are cheaters, or the same cheater is playing as all the players, but you should be able to prevent that then by other means).
If you have the whole flow of the game, you can also implement other anti-cheating techniques, like for example if it's a skill game, you can try to find actions that appear super-human. It will never be perfect, and the difficulty then is not to mark legitimate players as cheaters.
it depends a little...
one example that will at least make it a little harder for someone who wants to mess with your game could be like this:
if your game is procedurally generating content, store and transmit the initial seed ... then do not just transmit the score and a name...
store and transmit all player decisions that lead to a specific score
let your server replay the game based on the provided seed and decisions, and take the score that was calculated on the server
so... to forge a score you need to solve the game (or an automated approach for this)
this approach does make it harder to fake a score, but it does not bind a name and a score together (you could submit the same score with a different name later ...)
to add that you can change things a little further:
online scores require online games ...
when a game is started, ask for the players name ...
post that name to the server, to aquire a HMAC (Hashed Message Authentication Code) secured token, containing an ID (think of an GUID) and the server generated starting seed
the server stores name and ID, and gives out a limited number of tokens per timespan and IP
the token is valid for a timespan x, after that you can delete it if it was not "used" within that timespan
when the player finished the game, you can provide this token, together with the players decisions to the server
the server knows who the player is, and can recreated the final gamestate and find the score
systems like this do not prevent all forms of cheating, but they are usually not that hard to implement and demand at least a little more effort from an attacker, thus filtering out the script kiddies

Database Design Best Practice for webchat

I'm designing a webchat that uses Pusher to handle the async calls. It's for a game/tool I'm designing. There can be any number of games, and each game has channels (for Pusher). And each channel will open a new session after being inactive for 6 hours.
The messages need to be stored by session to access later. So what would the best design for a database be? Should I store the messages individually, or something like as json in a mysql text field?
Currently there's a table for games, game sessions, channels, chat sessions (linked 1:1 to game sessions), and one for the messages (with the messages in json format in a text field). But I'm not sure if that's the best way to store what will amass to anywhere from 100-2000 messages in a single session, so storing each message in its own row sounds impractical.
Edit: In terms of what I plan on doing with it. I just need them archived so the users can view them by session again later.
So does that mean the best way would be to store each message in its own row with a unique id, and just filter all those out when loading the session?

Almost Real Time RESTful Achievements Web Service that Scales, How can I reduce the number of calls?

I am building real time achievements web service.
My current idea is to have an achievements collection in MongoDB as well as a player collection. I would store the list of achievements in the achievements collection (that list can be modified to add new achievements and would serve as the achievements definitions) and it would contains a list of stats and thresholds (goals to complete the achievement), while the player collection would have objects that are composed of the playerID as well as a dict of each achievements as keys and many stats (progress) as values, as well as informations (completed or not).
When a client would post new stats, I would get the list of achievements and find those that use those stats in their progression by fetching the achievements collection. Then I would need to fetch the players collection to find which achievements are already completed and remove those from my current list of achievements to process. Then I would fetch the players collection again to get the other stats and compute the new progress. I would need to update the progress of the achievement on the players collection. If an achievement is complete, I would send a callback to the client, so it can see it "live".
My problem is that I need the service to work under high pressure (hundreds of thousands of players sending new stats a lot (like number of kills, maybe thousands of stats with thousands of achievements)) and my current idea seems to do WAY TOO MANY CALLS to the database.
I thought of changing to an MySQL database instead but I am not very good with them, so I am not sure if things would be better that way (could views speed things up?). Redis seems to be too costly for a big database.
Is there a better flow / Design pattern I should use instead?
Is there a way to make schemas so it will still be quick on heavy load?
Should I use MySQL instead? And if yes, what is the key element that would help me speed up things? (So I can read on it and design something better)
I never used NoSQL but used SQL a lot. So my thought may be biased or too much SQL centric.
Having said that, here is my idea. Overall, I think two db call is needed per new stat.
When a client would post new stats, I would get the list of achievements and find those that use those stats in their progression by fetching the achievements collection.
If the achievements collection is small enough, you could cache into the memory when your service is initialized.
If not, I think you should go "MySQL" approach and not do this step alone but join to the next step. In conclusion, we could reduce one trip to DB
Then I would need to fetch the players collection to find which achievements are already completed
This could be the first trip to DB
remove those from my current list of achievements to process
I believe this is not DB related but logic inside your program. But please correct me if I am wrong.
Then I would fetch the players collection again to get the other stats and compute the new progress.
I think you could get this information from your first DB trip and save somewhere in the memory. So no further DB trip is needed
I would need to update the progress of the achievement on the players collection.
This will be your second DB trip to update.
If an achievement is complete, I would send a callback to the client, so it can see it "live".
And this is not related with DB
If this is still too much DB call and you would like to make this only one trip, my only idea is to switch MySQL and create a procedure that deal with the logic.
In this way, you will make only one DB contact per stat, which is inevitable and push all your load to DB layer so that it scales over there.

How did Facebook or Twitter implement their subscribe system

I'm working on a SNS like mobile app project, where users upload their contents and can see updates of their subscribed topic or friends on their homepage.
I store user contents in mysql, and query the user specific homepage data by simply querying out first who and what the user subscribed and then query the content table filtering out using the 'where userid IN (....) or topic IN (....)' clause.
I suspect this would become quite slow when the content table piles up or when a user subscribe tons of users or topics. Our newly released app is already starting to have thousands of new users each week, and getting more over time. Scalability must be a concern for us right now.
So I wonder how Facebook or Twitter handle this subscribing problem with their amazing number of users. Do they handle a list for each user? I tried to search, but all I got is how to interact with Facebook or Twitter rather than how they actually implement this feature.
I noticed that you see only updates rather than history in your feed when using Facebook. Which means that subscribing a new user won't dump lots out dated content into your feed as how it would be by using my current method.
How do Facebook design their database and how did they dispatch new contents to subscribed users?
My backend is currently PHP+MySQL, and I don't mind introducing other backend technologies such as Redis or JMS and stuff if that's the way it should be done.
Sounds like you guys are still in a pretty early stage. There are N-number of ways to solve this, all depending on which stage of DAUs you think you'll hit in the near term, how much money you have to spend on hardware, time in your hands to build it, etc.
You can try an interim table that queues up the newly introduced items, its meta-data on what it entails (which topic, friend user_id list, etc.). Then use a queue-consumer system like RabbitMQ/GearMan to manage the consumption of this growing list, and figure out who should process this. Build the queue-consumer program in Scala or a J2EE system like Maven/Tomcat, something that can persist. If you really wanna stick with PHP, build a PHP REST API that can live in php5-fpm's memory, and managed by the FastCGI process manager, and called via a proxy like nginx, initiated by curl calls at an appropriate interval from a cron executed script.
[EDIT] - It's probably better to not use a DB for a queueing system, use a cache server like Redis, it outperforms a DB in many ways and it can persist to disk (lookup RDB and AOF). It's not very fault tolerant in case the job fails all of a sudden, you might lose a job record. Most likely you won't care on these crash edge cases. Also lookup php-resque!
To prep for the SNS to go out efficiently, I'm assuming you're already de-normalizing the tables. I'd imagine a "user_topic" table with the topic mapped to users who subscribed to them. Create another table "notification_metadata" describing where users prefer receiving notifications (SMS/push/email/in-app notification), and the meta-data needed to push to those channels (mobile client approval keys for APNS/GCM, email addresses, user auth-tokens). Use JSON blobs for the two fields in notification_metadata, so each user will have a single row. This saves I/O hits on the DB.
Use user_id as your primary key for "notification_meta" and user_id + topic_id as PK for "user_topic". DO NOT add an auto-increment "id" field for either, it's pretty useless in this use case (takes up space, CPU, index memory, etc). If both fields are in the PK, queries on user_topic will be all from memory, and the only disk hit is on "notification_meta" during the JOIN.
So if a user subscribes to 2 topics, there'll be two entries in "user_topic", and each user will always have a single row in "notification_meta"
There are more ways to scale, like dynamically creating a new table for each new topic, sharding to different MySQL instances based on user_id, partitioning, etc. There's N-ways to scale, especially in MySQL. Good luck!

Why should you FanOut when building a Activity Feed

I'm looking into the logistics of building an Activity Feed, similar to that of Facebook, or Twitter's timeline.
There are tons of answers here on StackOverlfow and on Quora and other articles I've found on google that describe fanning out on read or write. It all makes sense. You record all the activity in one main activity table/collection, and then at some point, write a copy of that data to separate, appropriate tables for each user.
What I dont completely understand is why is there a need for a fanout? That is, Why is there a need to record the activity on individual user feeds? Is there a reason why you cant just use one activity table/collection? It would have appropriate indexes, and have the acting user's ID. And then, when someone wants to see their activity stream, just query the activity stream for users that the current user is following.
I understand that this may not be as efficient since activities outnumber actual objects in the database a few times over. That is, there are might be 100 posts in a database, but over 1,000 actions on posts, thus queries may be slow on the activity table/collection when row numbers get pretty high.
But wouldnt this work? Cant you just scale the database so it can handle queries more efficiently? Is there really a need for fanning out?
Not necessary to fan-out always, but the decision is depends on many factors.
For eg. Twitter does both but Facebook follows fan-out-on-load.
As you can imagine, Facebook's activity stream is much more complex than Twitter's. FB needs to apply lot of filters/privacy settings per user/group basis, hence it make sense for them to pull and build the stream on-the fly. Their TAO graph infrastructure (Graphing on top of MySQL + Caching) makes it easy for them to build and fetch the feeds quite fast for each user.