When Fetchinig Multiple key sets, I can see that client makes the request in one long string and sends to the connected couchbase server (the protocol seems to include the vbucket map of each key as well)
So, one network call from client with all the keys, their vbucketmaps.
How does server respond to this request?
If the connected server has all the values requested, then I expect the connected server to just give the values requested.
However, if there are several clusters, there is chance that the connected server might not have the requested key. What does server do in this situation? I can see that the request include the vbucket map, from this, I can expect that connected server could ask specific Key's master server for its values. This is just my guess, I would like to know how server respond in this situation.
Also, what happens if Key exists, however, the server fails to return the value due to "server busy" or some other error.
Always appreciated with your help
There are two different ways this can happen, either with moxi or without moxi.
Without Moxi (Smart Client)
The the client makes a connection with Couchbase it will first get a list of all of the servers in the cluster and the vbucket map. It then makes a connection to each server in the cluster. When you do a multi-operation the client will consult with the vbucket map that it contains and figure out which vbucket the server belongs. If we have three servers then the client will put together up to three multi-operations and send each to the corresponding server that contains all of the keys in that multi-operation. Each server will respond to the client and the client will put all of the results together into on set of results.
With Moxi
In this case the client doesn't know about the cluster or the vbucket map, but moxi does. The client will send all keys to moxi and then moxi will take care of splitting them up and sending them to the appropriate servers.
Sever Down Scenario:
If a server is down or busy then all keys in that server specific multi-operation will fail. The client should return you the keys that it could get from the other servers and alert you of the error.
Rebalancing Scenario:
During a rebalance there is a small chance that a request will be sent to the wrong server. In this case the client should retry the operation on the correct server. During rebalance each client should receive a "fast-forward" vbucket map that says where all of the vbuckets will be after the rebalance. It will use the server in this vbucket map for the retry.
Related
I was wondering if I could see any example source code (Language: C) of a client that uses client-side Moxi.
I've seen architecture , but I have no idea how to write it in codes.
Also, from the get_callback function, if I need to return the CAS value and the Data received, is there any suggested way to do this?
And what is this vbucketmap thing? what do they represent and how to configure them?
Client side moxi means that you setup a moxi server on your client machine and then just tells the client to connect to moxi on your local host. This means that if moxi is running on localhost port 11211 then you tell you client to connect to localhost port 11211 and moxi will handle communication with the server. You don't need to write any special code to do this.
Also, from the get_callback function, if I need to return the CAS value and the Data received, is there any suggested way to do this?
I'm not very familiar with the c api, but there is probably a gets function call that returns the cas id in the callback.
And what is this vbucketmap thing? what do they represent and how to configure them?
A vbucket map is a map of servers to VBuckets. In Couchbase Server there are 1024 vbuckets that your data can hash into. VBuckets a spread around a cluster and the map tells the client which server to send a request to. With that said you shouldn't ever touch the vbucket map with your code. The map is obtained from the cluster and managed by either the client-side SDK or in your case Moxi.
or any other type of realtime data feed from server to client... I'm talking about a bunch of realtime data from server to client. i.e., an informational update every second.
Does the server magically push the data to the client, or does the client need to continuously poll the server for updates? And under what protocol does this usually work? (http, socket communication, etc?)
In server-side financial applications used by brokers/banks etc. market data (quotes,trades etc) is transmitted over TCP via some application-level protocol which most probably won't be HTTP. Of course, there's no polling. Client is establishing TCP connection with server, server pushes data to client. One of common approaches to distribute market data is FIX.
Thomson-Reuters have bunch of cryptic proprietary protocols dating from mainframe days to distribute such data.
HTTP can be used for SOAP/RESTful to transmit/request data of not-so-large volume, like business news.
UPDATE Actually, even FIX is not enough in some cases, as it has big overhead because of it's "text" nature. Most brokers and exchanges transmit high-volume streams, such as quotes, using binary-format protocols (FAST or some proprietary).
In a simple case:
Create a server with a listening socket.
On the client, connect to the server's socket.
Have the client do a while(data = recv(socket)) (pseudocode)
When the server has something exciting to tell the client, it simply send(...)s on the socket.
You can even implement this pattern over HTTP (there is no real upper time limit to an HTTP socket). The server need not even read from the socket - it can be trying to write to the firehose only.
Usually a TCP socket is employed - messages arrive in order, and are best-effort. If latency is more important and dropped or out of order is not an issue, UDP can be used.
I have one app. that consists of "Manager" and "Worker". Currently, the worker always initiates the connection, says something to the manager, and the manager will send the response.
Since there is a LOT of communication between the Manager and the Worker, I'm considering to have a socket open between the two and do the communication. I'm also hoping to initiate the interaction from both sides - enabling the manager to say something to the worker whenever it wants.
However, I'm a little confused as to how to deal with "collisions". Say, the manager decides to say something to the worker, and at the same time the worker decides to say something to the manager. What will happen? How should such situation be handled?
P.S. I plan to use Netty for the actual implementation.
"I'm also hoping to initiate the interaction from both sides - enabling the manager to say something to the worker whenever it wants."
Simple answer. Don't.
Learn from existing protocols: Have a client and a server. Things will work out nicely. Worker can be the server and the Manager can be a client. Manager can make numerous requests. Worker responds to the requests as they arrive.
Peer-to-peer can be complex with no real value for complexity.
I'd go for a persistent bi-directional channel between server and client.
If all you'll have is one server and one client, then there's no collision issue... If the server accepts a connection, it knows it's the client and vice versa. Both can read and write on the same socket.
Now, if you have multiple clients and your server needs to send a request specifically to client X, then you need handshaking!
When a client boots, it connects to the server. Once this connection is established, the client identifies itself as being client X (the handshake message). The server now knows it has a socket open to client X and every time it needs to send a message to client X, it reuses that socket.
Lucky you, I've just written a tutorial (sample project included) on this precise problem. Using Netty! :)
Here's the link: http://bruno.linker45.eu/2010/07/15/handshaking-tutorial-with-netty/
Notice that in this solution, the server does not attempt to connect to the client. It's always the client who connects to the server.
If you were thinking about opening a socket every time you wanted to send a message, you should reconsider persistent connections as they avoid the overhead of connection establishment, consequently speeding up the data transfer rate N-fold.
I think you need to read up on sockets....
You don't really get these kinds of problems....Other than how to responsively handle both receiving and sending, generally this is done through threading your communications... depending on the app you can take a number of approaches to this.
The correct link to the Handshake/Netty tutorial mentioned in brunodecarvalho's response is http://bruno.factor45.org/blag/2010/07/15/handshaking-tutorial-with-netty/
I would add this as a comment to his question but I don't have the minimum required reputation to do so.
If you feel like reinventing the wheel and don't want to use middleware...
Design your protocol so that the other peer's answers to your requests are always easily distinguishable from requests from the other peer. Then, choose your network I/O strategy carefully. Whatever code is responsible for reading from the socket must first determine if the incoming data is a response to data that was sent out, or if it's a new request from the peer (looking at the data's header, and whether you've issued a request recently). Also, you need to maintain proper queueing so that when you send responses to the peer's requests it is properly separated from new requests you issue.
I have a client/server connection over a TCP socket, with the server writing to the client as fast as it can.
Looking over my network activity, the production client receives data at around 2.5 Mb/s.
A new lightweight client that I wrote to just read and benchmark the rate, has a rate of about 5.0Mb/s (Which is probably around the max speed the server can transmit).
I was wondering what governs the rates here, since the client sends no data to the server to tell it about any rate limits.
In TCP it is the client. If server's TCP window is full - it needs to wait until more ACKs from client came. It is hidden from you inside the TCP stack, but TCP introduces guaranteed delivery, which also means that server can't send data faster than rate at which client is processing them.
TCP has flow control and it happens automatically. Read about it at http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control
When the pipe fills due to flow control, the server I/O socket write operations won't complete untill the flow control is releaved.
The server is writing data at 5.0Mb/s, but if your client is the bottleneck here then server has to wait before the data in "Sent Buffer" is completely sent to client, or enough space is released to put in more data.
As you said that the light weight client was able to receive at 5.0Mb/s, then it will be the post-receiving operations in your client that you have to check. If you are receiving data and then processing it before you read more data, then this might be the bottleneck.
It is better to receive data asynchronously, and as soon as one receive is complete, ask the client sockets to start receiving data again, while you process the received data in a separate thread pool thread. This way your client is always available to receive incomming data, and server can send it at full speed.
Situation: The server calls listen() (but not accept()!). The client sends a SYN to the server. The server gets the SYN, and then sends a SYN/ACK back to the client. However, the client now hangs up / dies, so it never sends an ACK back to the server. The connection is in the SYN_SENT state.
Now another client sends a SYN, gets a SYN/ACK back from the server, and sends back an ACK. This connection is now in the ESTABLISHED state.
Now the server finally calls accept(). What happens? Does accept() block on the first, faulty connection, until some kind of timeout occurs? Does it check the queue for any ESTABLISHED connections and return those, first?
Well, what you're describing here is a typical syn-flood attack ( http://en.wikipedia.org/wiki/SYN_flood ) when executed more than once.
When looking for example at: http://lkml.indiana.edu/hypermail/linux/kernel/0307.0/1258.html there are two seperate queues, one syn queue and one established queue. Apparently it the first connection will remain in the syn queue (since it's in the SYN_RCVD state), the second connection will be in the established queue where the accept() will get it from. A netstat should still show the first in the SYN_RCVD state.
Note: see also my comment, it is the client who will be in the SYN_SENT state, the server (which we are discussing) will be in the SYN_RCVD state.
You should note that in some implementations, the half open connection (the one in the SYN_RCVD state), may not even be recorded on the server. Implementations may use SYN cookies, in which they encode all of the information they need to complete establishing the connection into the sequence number of the SYN+ACK packet. When the ACK packet is returned, with the sequence number incremented, they can decrement it and get the information back. This can help protect against SYN floods by not allocating any resources on the server for these half-open connections; thus no matter how many extra SYN packets a client sends, the server will not run out of resources.
Note that SCTP implements a 4-way handshake, with cookies built into the protocol, to protect against SYN floods while allowing more information to be stored in the cookie, and thus not having to limit the protocol features supported because the size of the cookie is too small (in TCP, you only get 32 bits of sequence number to store all of the information).
So to answer your question, the user-space accept() will only ever see fully established connections, and will have no notion of the half-open connections that are purely an implementation detail of the TCP stack.
You have to remember that listen(), accept(), et al, are not under the hood protocol debugging tools. From the accept man page: "accept - accept a connection on a socket". Incomplete connections aren't reported, nor should they be. The application doesn't need to worry about setup and teardown of sockets, or retransmissions, or fragment reassembly, or ...
If you are writing a network application, covering the things that you should be concerned about is more than enough work. If you have a working application but are trying to figure out problems then use a nice network debugging tool, tools for inspecting the state of your OS, etc. Do NOT try to put this in your applications.
If you're trying to write a debugging tool, then you can't accomplish what you want by using application level TCP/IP calls. You'll need to drop down at least one level.