I'm new for MongoDB
I make a simple application abount account in bank.an account can transfer money to others
I design Account collection like that
account
{
name:A
age: 24
money: 100
}
account
{
name:B
age: 22
money: 300
}
assuming that user A transfer 100$ for user B , there are 2 operations :
1) decrease 100$ in user A // update for document A
2) increase 100$ for user B // update with document B
It said that atomic only applied for only single document but no mutiple document.
I have a alter desgign
Bank
{
name:
address:
Account[
{
name:A
age: 22
money: SS
},
{
name:B
age: 23
money: S1S
}
]
}
I have some question :
If I use later way, How can I write transaction query (Can I use findAndModify() function?) ?
Does MongoDB support transaction operations like Mysql (InnoDB)?
Some pepple tell me that use Mysql for this project is the best way, and just only use MongoDB to save transaction information.(use extra
collection named Transaction_money to save them), If I use both
MongoDB and Mysql (InnoDB) how can make some operations below are
atomic (fail or success whole):
> 1) -100$ with user A
> 2) +100$ with user B
> 3) save transaction
information like
transaction
{
sender: A
receiver: B
money : 100
date: 05/04/2013
}
Thank so much.
I am not sure this is what you are looking for:
db.bank.update({name : "name"},{ "$inc" : {'Account.0.money' : -100, 'Account.1.money' : 100}})
update() operation is satisfies ACI properties of ( ACID ). Durability ( D ) depends on the mongo and application configuration while making query.
You can prefer to use findAndModify() which won't yield lock on page fault
MongoDB provides transactions within a document
I can't understand, if your application requirement is very simple, then why you are trying to use mongodb. No doubt its a good data-store, but I guess MySql will satisfy all your requirements.
Just FYI : There is a doc which is exactly the problem you are trying to solve. http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/
But I won't recommend you to use this because a single query ( transferring money) has been turned into sequence of queries.
Hope it helped
If I use later way, How can I write transaction query (Can I use findAndModify() function?) ?
There are a lot of mis-conceptions about what findAndModify does; it is not a transaction. That being said it is atomic, which is quite different.
The reason for two phase commits and transactions in this sense is so that if something goes wrong you can fix it (Or at least have a 99.99% chance that corruption hasn't occurred)
The problem with findAndModify is that it has no such transactional behaviour, not only that but MongoDB only provides atomic state upon single document level which means that, in the same call, if your functions change multiple documents you could actually have an in-consistent in-between state in your database data. This, of course, won't do for money handling.
It is noted that MongoDB is not extremely great in these scenarios and you are trying to use MongoDB away from its purpose, with this in mind it is clear you have not researched your question well, as your next question shows:
Does MongoDB support transaction operations like Mysql (InnoDB)?
No it does not.
With all that background info aside let's look at your schema:
Bank
{
name:
address:
Account[{
name:A
age: 22
money: SS
},{
name:B
age: 23
money: S1S
}]
}
It is true that you could get a transaction query on here whereby the document would never be able to exist in a in-between state, only one or the other; as such no in-consistencies would exist.
But then we have to talk more about the real world. A document in MongoDB is 16mb big. I do not think you would fit an entire bank into one document, so this schema is badly planned and useless.
Instead you would require (maybe) a document per account holder in your bank with a subdocument of their accounts. With this you now have the problem that in-consistencies can occur.
MongoDB, as #Abhishek states, does support client side 2 phase commits but these are not going to be as good as server-side within the database itself whereby the mongod can take safety precautions to ensure that the data is consistent at all times.
So coming back to your last question:
Some pepple tell me that use Mysql for this project is the best way, and just only use MongoDB to save transaction information.(use extra collection named Transaction_money to save them), If I use both MongoDB and Mysql (InnoDB) how can make some operations below are atomic (fail or success whole):
I would say something a bit more robust than MySQL personally, I heard MSSQL is quite good for this.
Related
I was stuck in a situation that I have initialised a namesapce with
default-ttl to 30 days. There was about 5 million data with that (30-day calculated) ttl-value. Actually, my requirement is that ttl should be zero(0), but It(ttl-30d) was kept with unaware or un-recognise.
So, Now I want to update prev(old) 5 million data with new ttl-value (Zero).
I've checked/tried "set-disable-eviction true", but it is not working, it is removing data according to (old)ttl-value.
How do I overcome out this? (and I want to retrieve the removed data, How can I?).
Someone help me.
First, eviction and expiration are two different mechanisms. You can disable evictions in various ways, such as the set-disable-eviction config parameter you've used. You cannot disable the cleanup of expired records. There's a good knowledge base FAQ What are Expiration, Eviction and Stop-Writes?. Unfortunately, the expired records that have been cleaned up are gone if their void time is in the past. If those records were merely evicted (i.e. removed before their void time due to crossing the namespace high-water mark for memory or disk) you can cold restart your node, and those records with a future TTL will come back. They won't return if either they were durably deleted or if their TTL is in the past (such records gets skipped).
As for resetting TTLs, the easiest way would be to do this through a record UDF that is applied to all the records in your namespace using a scan.
The UDF for your situation would be very simple:
ttl.lua
function to_zero_ttl(rec)
local rec_ttl = record.ttl(rec)
if rec_ttl > 0 then
record.set_ttl(rec, -1)
aerospike:update(rec)
end
end
In AQL:
$ aql
Aerospike Query Client
Version 3.12.0
C Client Version 4.1.4
Copyright 2012-2017 Aerospike. All rights reserved.
aql> register module './ttl.lua'
OK, 1 module added.
aql> execute ttl.to_zero_ttl() on test.foo
Using a Python script would be easier if you have more complex logic, with filters etc.
zero_ttl_operation = [operations.touch(-1)]
query = client.query(namespace, set_name)
query.add_ops(zero_ttl_operation)
policy = {}
job = query.execute_background(policy)
print(f'executing job {job}')
while True:
response = client.job_info(job, aerospike.JOB_SCAN, policy={'timeout': 60000})
print(f'job status: {response}')
if response['status'] != aerospike.JOB_STATUS_INPROGRESS:
break
time.sleep(0.5)
Aerospike v6 and Python SDK v7.
Suppose I have a resource called Person. I can update Person entities by doing a POST to /data/Person/{ID}. Suppose for simplicity that a person has three properties, first name, last name, and age.
GET /data/Person/1 yields something like:
{ id: 1, firstName: "John", lastName: "Smith", age: 30 }.
My question is about updates to this person and the semantics of the services that do this. Suppose I wanted to update John, he's now 31. In terms of design approach, I've seen APIs work two ways:
Option 1:
POST /data/Person/1 with { id: 1, age: 31 } does the right thing. Implicitly, any property that isn't mentioned isn't updated.
Option 2:
POST /data/Person/1 with the full object that would have been received by GET -- all properties must be specified, even if many don't change, because the API (in the presence of a missing property) would assume that its proper value is null.
Which option is correct from a recommended design perspective? Option 1 is attractive because it's short and simple, but has the downside of being ambiguous in some cases. Option 2 has you sending a lot of data back and forth even if it's not changing, and doesn't tell the server what's really important about this payload (only the age changed).
Option 1 - updating a subset of the resource - is now formalised in HTTP as the PATCH method. Option 2 - updating the whole resource - is the PUT method.
In real-world scenarios, it's common to want to upload only a subset of the resource. This is better for performance of the request and modularity/diversity of clients.
For that reason, PATCH is now more useful than PUT in a typical API (imo), though you can support both if you want to. There are a few corner cases where a platform may not support PATCH, but I believe they are rare now.
If you do support both, don't just make them interchangeable. The difference with PUT is, if it receives a subset, it should assume the whole thing was uploaded, so should then apply default properties to those that were omitted, or return an error if they are required. Whereas PATCH would just ignore those omitted properties.
When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.
It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:
Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.
The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).
The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.
Which of the two above is better? Or, is there yet a better way?
What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.
If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.
In regards to comparing your methods, here is a Pro / Cons list.
Method #1: Upon Request
Pros
The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
Cons
It could result in redundant processing meaning longer computing time.
Method #2: Validation on the Go
Pros
It's efficient theoretically by saving process and compute time doing them at the same time.
Cons
In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.
Hope this helps.
The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.
For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):
GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:
Sample object (JAVA used as example language):
public class someInnerDataFromJSON {
String name;
String address;
int housenumber;
String buildingType;
// Getters and setters
public String getName() { return name; }
public void setName(String name) { this.name=name; }
//etc.
}
The data parsed by GSON is by using the model provided, already type checked.
This is the first point where your code can abort.
After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.
Assume for this buildingType is a list:
Single family house
Multi family house
Apartment
You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.
I would definitively go for validation before processing.
Let's say you receive some json data with 10 variables of which you expect:
the first 5 variables to be of type string
6 and 7 are supposed to be integers
8, 9 and 10 are supposed to be arrays
You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.
foreach($data as $varName => $varValue){
$varType = gettype($varValue);
if(!$this->isTypeValid($varName, $varType)){
// return validation error
}
}
// continue processing
Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.
I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.
In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.
In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.
Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.
Here are some steps you could follow:
Go for a JSON parser like GSON that would just convert your entire
JSON input into the corresponding Java domain model object. (If GSON
doesn't throw an exception, you can be sure that the JSON is
perfectly valid.)
Of course, the objects which were constructed using GSON in step 1
may not be in a functionally valid state. For example, functional
checks like mandatory fields and limit checks would have to be done.
For this, you could define a validateState method which repeatedly
validates the states of the object itself and its child objects.
Here is an example of a validateState method:
public void validateState(){
//Assume this validateState is part of Customer class.
if(age<12 || age>150)
throw new IllegalArgumentException("Age should be in the range 12 to 120");
if(age<18 && (guardianId==null || guardianId.trim().equals(""))
throw new IllegalArgumentException("Guardian id is mandatory for minors");
for(Account a:customer.getAccounts()){
a.validateState(); //Throws appropriate exceptions if any inconsistency in state
}
}
The answer depends entirely on your use case.
If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.
However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.
If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.
It really depends on your requirements. But in general I'd always go for #1.
Few considerations:
For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.
Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.
For validation, I use the jsonschema python module which makes json validation easy.
Another approach:
if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.
The final decision:
If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:
1c = average time spent with method 1 on correct input
1w = average time spent with method 1 on wrong input
2c = average time spent with method 2 on correct input
2w = average time spent with method 2 on wrong input
CR = correct input rate (or frequency)
WR = wrong input rate (or frequency)
if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
chose method 1
else:
chose method 2
I set up my Rails application twice. One is working with MongoDB (Mongoid as mapper) and the other with MySQL and ActiveRecord. Then I wrote a rake task which inserts some test-data to both databases (100.000 entries).
I measured how long it takes for each database with the ruby Benchmark module. I did some testing with 100 and 10.000 entries where mongodb was always faster than mysql (about 1/3). The weird thing is that it takes about 3 times longer in mongodb to insert the 100.000 entries than with mysql. I have no idea why mongodb has this behaviour?! The only thing that I know is that the cpu time is much lower than the total time. Is it possible that mongodb starts some sort of garbage collection while it's inserting the data? At the beginning it's fast, but as more data mongodb is inserting, it gets slower and slower...any idea on this?
To get somehow a read performance of the two databases, I thought about measuring the time when the database gets an search query and respond the result. As I need some precise measurements, I don't want to include the time where Rails is processing my query from the controller to the database.
How do I do the measurement directly at the database and not in the Rails controller? Is there any gem / tool which would help me?
Thanks in advance!
EDIT: Updated my question according to my current situation
If your base goal is to measure database performance time at the DB level, I would recommend you get familiar with the benchRun method in MongoDB.
To do the type of thing you want to do, you can get started with the example on the linked page, here is a variant with explanations:
// skipped dropping the table and reinitializing as I'm assuming you have your test dataset
// your database is called test and collection is foo in this code
ops = [
// this sets up an array of operations benchRun will run
{
// possible operations include find (added in 2.1), findOne, update, insert, delete, etc.
op : "find" ,
// your db.collection
ns : "test.foo" ,
// different operations have different query options - this matches based on _id
// using a random value between 0 and 100 each time
query : { _id : { "#RAND_INT" : [ 0 , 100 ] } }
}
]
for ( x = 1; x<=128; x*=2){
// actual call to benchRun, each time using different number of threads
res = benchRun( { parallel : x , // number of threads to run in parallel
seconds : 5 , // duration of run; can be fractional seconds
ops : ops // array of operations to run (see above)
} )
// res is a json object returned, easiest way to see everything in it:
printjson( res )
print( "threads: " + x + "\t queries/sec: " + res.query )
}
If you put this in a file called testing.js you can run it from mongo shell like this:
> load("testing.js")
{
"note" : "values per second",
"errCount" : NumberLong(0),
"trapped" : "error: not implemented",
"queryLatencyAverageMs" : 69.3567923734754,
"insert" : 0,
"query" : 12839.4,
"update" : 0,
"delete" : 0,
"getmore" : 0,
"command" : 128.4
}
threads: 1 queries/sec: 12839.4
and so on.
I found the reason why MongoDB is getting slower while inserting many documents.
Many to many relations are not recommended for over 10,000 documents when using MRI due to the garbage collector taking over 90% of the run time when calling #build or #create. This is due to the large array appending occuring in these operations.
http://mongoid.org/performance.html
Now I would like to know how to measure the query performance of each database. My main concerns are the the measurement of the query time and the flow capacity / throughput. This measurement should be made directly at the database, so that nothing can adulterate the result.
Not really a c/c++ person so I was hoping someone could direct me to the files that contain the main calculations of the game?
I am specifically interested in how things are calculated when deciding if the person 'wins' or 'loses' (generally speaking) during events like running/standing/etc.
In other words, winning/losing will be based on many factors: what are they? What are the formulae?
You didn't reference the source, so I Googled DopeWars and found this:
http://dopewars.sourceforge.net/
Looking into the source, serverside.h/c seems to be what you are looking for. But keep in mind a lot of the limits are already predefined in dopewars.c. Take a look at the drug prices in this struct:
struct DRUG DefaultDrug[] = {
/* The names of the default drugs, and the messages displayed when they
* are specially cheap or expensive */
{N_("Acid"), 1000, 4400, TRUE, FALSE,
N_("The market is flooded with cheap home-made acid!")},
{N_("Cocaine"), 15000, 29000, FALSE, TRUE, ""},
}
Note: sample struct is not complete. Please review the source to see the full listing.
The actual functionality that validates the actions chosen by the player exists in serverside.c.
It is up to the "server" (game engine) to validate the players choice and next step to be taken and communicate it back to the client. The client in this case can be a GUI or Curses (command line) driven client. It is the clients responsibility to update the screen, get new input from the server (be it typing characters for input or mouse clicks).