Are protobufs sent as binary data over WebSockets faster than JSON sent as text data over WebSockets? As on paper, this seems to be true, even taking into account the small overhead generated by handling bytes on both sides. Did anyone really had a chance to try this and has some concrete results? Thanks!
So I've made a small project for researching this and I've got some results. You can find the project here, you can find more information in the README and in the results package.
To answer the question; YES, protocol buffers are faster than JSON over 100_000 messages sent as ping pong (no processing over them except marshalling and unmarshalling). But the difference is not as notable as I would have expected.
Related
There's a similar question about streaming large results but the answer just points at docs and no clear answer emerges.
I believe that merely treating a full result set as a stream still takes a lot of memory on the jdbc driver side..
I am wondering if there's any clear cut pattern, or best practice, for making it work, especially on the jdbc driver side.
And in particular I am not sure why setFetchSize(Integer.MIN_VALUE) is a very good idea, as it seems far from optimal if that means each row is sent on its own on the wire.
I believe libraries like jooq and slick already take care of that... and am curious as to how to accomplish it with and without them.
Thanks!
I am wondering if there's any clear cut pattern, or best practice, for making it work, especially on the jdbc driver side.
The best practice is not to do synchronous streaming but rather fetch in moderate size chunks. However avoid using OFFSET (also see). If your doing a batch process this can be facilitated by first selecting and pushing the data into a temporary table (ie turn your original results you want into a table first and then select chunks from the table... databases are really fast at copying data internally).
Synchronous streaming in general does not scale (aka iterator). It does not scale well for batch processing and it certainly does not scale for handling loads of clients. This is why the drivers vary and do so many different things because its fairly difficult to figure out how much resources to load because it is a pull model. Async streaming (push model) would probably help but unfortunately the JDBC standard does not support async streaming.
You might notice but this is one of the reasons why many of the wrappers around JDBC such as Spring JDBC do not return Iterators (along with fact that the resource also needs to be manually cleaned up). Some of the wrappers provide iterators but really they just turn the results into a list .
Your link to the Scala version is rather disturbing that its upvoted given the stateful nature of managing a ResultSet... its very un-Scala like... I'm not sure those folks know they have to consume the iterator or close the connection/ResultSet properly which requires a fair amount of imperative programming.
While it may seem inefficient to let the database decide how much to buffer just remember that most database connections are extremely heavy memory wise (at least on postgres they are). So if you take a long time streaming and have many clients your going to have to create more connections and put serious burden on the database. Not to mention the default buffers have probably been highly optimized (ie the resultset size that client ends up with).
Finally for batch processing chunks can be done in parallel which is obviously more efficient than a synchronous pipeline as well as being restarted (with out having to rework already processed data) if a problem occurs.
I just discovered vert.x and played a little with it. I love it, however there is one thing that puzzles me: SQL support. Better said, asynchronous SQL support.
I know there is a postgres mod, and even a mysql one but from what I've read on the github page it is still in an early phase.
Also, one should use the event bus to send the query from the requesting verticle to the working verticle that would do the request and then transfer the results back.
However, this doesn't sound like a good idea to me. The communication would be done in text-mode which would imply a lot of serialization/deserialization. Plus, the results would need to read completely and then sent to the requesting verticle which sometimes can be overkill.
And now for my questions:
What is the best way to use mySQL in a vert.x verticle?
Is using the event-bus ok to transfer this much information in text-mode? (or maybe I'm missing something and there is actually a way to send binary data through the event-bus).
so I'm trying to deconstruct the messages passed by server-client interaction in a fairly old Halo game through LAN. I've been conducting tests with Wireshark and large packets. Although I am confused as to which type of data I should be analysing. In a chat message packet that was all a characters, I received this:
fe:fe:00:03:3a:00:11:19:39:1a:28:0d:b9:20:9d:7b:b8:59:52:90:e3:3e:93:7b:b8:59:52:90:e3:3e:93:7b:b8:59:52:90:e3:3e:93: [SNIP]
And in a message with all but the first 3 letters being 'a', I received this:
fe:fe:00:21:64:00:68:8f:02:6d:5f:ab:a7:cb:d0:78:0f:e9:6d:55:89:13:72:7b:b8:59:52:90:e3:3e:93:7b:b8:59:52:90:e3:3e:93: [SNIP]
Now, I can see some similarities between the packets at some stages (probably the a's), I've come to the conclusion that this:
7b:b8:59:52:90:e3
Might be an 'a' character. But have no way of proving it. How can I get the above strange string into a readable character, namely, back to 'a'? Is it possible?
Thanks for reviewing this question!
Protocol is UDP.
You just have to attach ollydbg to the process you are trying to understand, and set breakpoints at WSARecv (or recv) http://msdn.microsoft.com/de-de/library/windows/desktop/ms741688(v=vs.85).aspx
The next packet you receive will hit your breakpoint, follow the memory ptr to see it in a separate window and step over the call WSARecv. You should see a filled buffer now, set a memory breakpoint at the top of the new arrived data and if you press play you should get to the crypt function, if its crypted. (Its possible, that you have to reverse a bit more to get to that point) I hope its a starting point for you to get into reverse engineering assembly :)
Maybe my tut will help a bit, its for another game but i think it should show some ideas:
http://blog.praty.net/?p=315
Greetz defragger
Guessing the protocol by looking at network dumps is very inefficient. I recommend decompiling pieces of a game using modern tools such as Hex-Rays Decompiler and then combining knowledge of data structures used in networking modules with debugging live app using OllyDbg.
Let's say you have 1000's of devices all sending data in all the time, would a message queue be a good data collection tool for this data?
Obviously, it depends:
Do you need to process every set of data, regardless of how old it
is?
Will data arive in a stead stream, or be bursty?
Can a single
application process all the data, or will it need to be load
balanced?
Do you have any need for the "messaging" features, such as
topics?
If your devices would be the clients, is there a client
implementation that works on them?
If you have a stream of data that can be processed by a single application, and you are tolerant of occasional lost data, I would keep it simple and post the data via REST or equivalent. I would only look to messaging once you needed either scalability, durability, fault tolerance, or the ability to level out load of time.
You can't go wrong design-wise with employing queues, but as Chris (other responder) stated, it may not be worth your effort in terms of infrastructure as web servers are pretty good at handling reasonable load.
In the "real world" I have seen commercial instruments report status into a queue for processing, so it is certainly a valid solution.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Very large HTTP request vs many small requests
I need a 2D array(as Json) to be sent from server to client. It would be around 400X400 in size with each entry around 4 characters of text. So that makes it around 640KB of data.
Which of the following extreme approaches is better ?
I make a large HTTP request of all the data at one go.
I make 400 requests - each asking for a single row (around 1.6 KB)
I believe optimal approach would be somewhere in middle. Could anyone give me an idea what might be the optimal single request size for this data.
Thanks,
When making a request you always have to deal with some overhead (like DNS request, opening connection and closing it). So it might be wiser to have 1 big request.
Also you might experience better gzip/deflate compression when having 1 big request.
Depends on the application and the effect you wish to achieve. Here are two scenarios:
if you are dealing with a GUI then perhaps chunking is a good idea where a small chunk would update the visuals giving an illusion of 'speed' to human. Here you want to logically chunk up the data as per gui update requirements. You can apply this same concept to prioritizing any other pseudo-real-time scenario.
If on the other hand you are just dumping this data then don't chunk, since 100 6 byte requests are overall significantly more time consuming than 1 600 byte request.
Generally speaking however, network packet transportation (TCP) chunking and delivery is FAR more optimized than whatever you could come up with at the application transport layer (HTTP). Multiple requests / chunks means multiple fragments.
It is generally a futile effort to try to do transport layer optimizations using application layer protocol. And, IMHO it defeats the purpose of both :-)
If you have real time requirements for whatever reason, then you should take control of the transport itself and do optimization there (but that does not seem to be the case).
Happy coding
Definitely go with 1 request and if you enable gzip compression on the server you wont be sending anything near 640KB
All the page speed optimisation tools(eg yslow, google page speed etc) recommend reducing the number of requests to speed up page load times.
Small number of HTTP requests would be better so make one Request.