What is better: WebSocket-Node or ws? And is there a standard interface for nodejs websockets? - html

I want to move away from socket.io to regular websockets to take advantage of the binary data transfers and get rid of the base64 encoding.
There seem to be two main websocket libraries for nodejs, both are on github:
Worlize/WebSocket-Node
einaros/ws
Both seem to be getting regular updates, both claim to be supporting the RFC-6455 standard.
Does anyone have experience with either or both of these who can share experience and/or make recommendations? Or does anyone know where I can find a recent comparison of them?
Further are there any plans for an official server side Websocket interface standard? These two libraries seem to have different API's. I did find this, but it is clearly for the client side only, and significantly newer than the date on the RFC standard.
I have been looking through every variation of Google search I can think of, and many related StackOverflow questions, but none seem to answer my question, and even the top Google results on the subject are several years out of date. Some related but insufficient StackOverflow threads include:
which-websocket-library-to-use-with-node-js
are-websockets-really-meant-to-be-handled-by-web-servers
web-sockets-server-side-implementation-for-nodejs

einaros/ws works great. However, Websocket-Node comes with routing support, which is quite handy for non-trivial implementations.

Related

Headlight status with OBD-II

I'm developing an Android app that uses an ELM327 device for OBD-II communications, and I'd like to be able to get the status of the headlights. Specifically, I would like to know if the driver has turned them on or not, but it would also be moderately useful to be able to tell what lights are on (mains vs brights vs DRLs and so on) and whether or not any of the bulbs are out. I was under the impression that there were ways of figuring out whether the headlights were on over OBD-II, but I can't find anything to confirm that, and the API I'm using (the pires obd-java-api on Github) doesn't have anything in it either. Can I actually do any of this?
All the standardized OBD PIDs are within ISO 15031-5 standard. Some part of it can be found in Wikipedia and here as well. All the other PIDs are vehicle specific PIDs which you cannot generalized (or even find) them.
The PIDs that you are searching for might not be standard or supported by every vehicle.

MySql vs NoSql - Social network comments and notifications data structure and implementation

I am really finding it tough to figure out the insights about how does a social networking site (Facebook being a reference) manage their comments and notifications for its users.
How would they actually store the comments data? also how would a notification be stored and sent to all the users that. An example scenario would be that a friend comments on my status and everyone that has liked my status including me gets a notification for that. Also each user has their own read/unread functionality implemented, So I guess there is a notification reference that is stored for each user. But then there would be a lot of redundancy of notification information. If we use a separate table/collection to store these with reference of actual notificatin, then that would create realtime scalability issues. So how would you decide which way to trade-off. My brain crashes when I think about all this. Too much stuff to figure with not a ot of help available over the web.
Now how would each notification be sent to the all the users who are supposed to receive that.. and how would the data structure look like.
I read a lot of implementations those suggest to use MySql. My understanding was that the kind of data (size) that is, it would be better to use a NoSql for scalability purpose.
So how does MySql work well for such use cases, and why is a NoSql like Mongo not suggested anywhere for such implementation, when these are meant to be heavily scalable.
Well, I know a lot of questions in one. But I am not looking for a complete answer here, insights on particular things would also be a great help for me to build my own application.
The question is extremely broad, but I'll try to answer it to the best of my ability.
How would they actually store the comments data? also how would a notification be stored and sent to all the users that.
I generally don't like answering questions like this because it appears as if you did very little research before coming to SO. It also seems like you're confused with application and database roles. I'll at least start you off with some material/ideas and let you decide on your own.
There is no "silver bullet" for a backend design, especially when it comes to databases. SQL databases are generally very good at most database functionality, and rightfully so; it's a technology that is very mature and has stood the test of time for a reason. Most NOSQL solutions are specialized for particular purposes. For instance: if you were logging a lot of information, you might want to look at Cassandra. If you were dealing with a lot of relational data, you would want to use something like Neo4j (or PostgreSQL/MySQL for RMDBS). If you were dealing with a lot of real-time data, you might want to look at Redis.
It's dumb to ask NOSQL vs SQL for a few reasons:
NOSQL is a bad term in general. And it doesn't mean "No SQL". It means "Not Only SQL". Unfortunately the term has encapsulated even the most polar opposite of databases.
Only you know your application's full functionality. Even if I knew the basics of what you wanted to achieve, I still couldn't give you a definitive answer. Nor can anyone else. It's highly subjective, and again, only YOU know EXACTLY what your application should do.
The biggest reason: It's 2014. Why one database? Ten years ago "DatabaseX vs DatabaseY" would have been a practical question. Now, you can configure many application frameworks to reliably use multiple databases in a matter of minutes. Moral of the story: Use each database for its specialized purpose. More on polyglot persistence here.
As far as Facebook goes: a five minute Google search reveals what backend technologies they've used in the past, and it's not that difficult to research some of their current backend solutions. You're not Facebook. You don't need to prepare for a billion users right now. Start with simple, proven technologies. This will let you naturally scale your application. When those technologies start to become a bottleneck, then be worried about scalability.
I hope this helped you with starting your coding journey, but please use Stack Overflow as a last resort if you're having trouble with code. Not an immediate go-to.

MySQL: which API to use?

I'm just getting started with interfacing to MySQL from a C++ app. The app is pretty simple: it's a Linux web server, and the C++ code retrieves JavaScript from a local database to return to the client via Apache and Ajax. The database will contain no more than a few thousand short JavaScript programs.
Question: any advice on which API I should use? I'm just reading through the docs on dev.mysql.com, and there doesn't seem to be any good reason to choose one or other of libmysql, Connector/C, Connector/C++, MySQL++, or Connector/ODBC. Thanks.
With no more than a few thousand rows, chances are, you should pick your API after your language preferences, not the other way round - so go aheead and chose whatever fits your mood.
If your app's performance stands and falls with the performance differences of the MySQL connectors you should be quite busy fixing your design elsewhere.
I personally prefer portability, so I tend to use a lot of ODBC, accepting the small performance hit, but others might think different. If you never ever want to use a different RDBMS stay away from ODBC - without the portability benefit it's quite ugly.
I would just use the raw C API. Seems to be the simplest way with the least overhead.

First write code using API, then actual API - does this approach have a name and is valid for API design process?

Standard way of working on new API (library, class, whatever) usually looks like this:
you think about what methods would API user need
you implement API that you suspect user will need
So basically you trying to guess what your API should look like. It very often leads to over engineering stuff, huge APIs that you think user will need and it is very possible that great part of your code won't be used at all.
Some time ago, maybe few years even, I read some article that promoted writing client code first. I don't remember where I found it but author pointed out several advantages like better understanding how API will be used, what it should provide and what is basically obsolete. I think idea was that it goes along with SCRUM methodology and user stories but on implementation level.
Just out of curiosity for my latest private project I started not with actual API (some kind of toolkit library) but with client code that would use this API. Of course my code is all in red because classes, methods and properties does not exist and I can forget about help from intellisense but what I noticed is that after few days of coding my application "has" all basic functionalities and my library API "is" a lot smaller than I imagined when starting a project.
I don't say that if somebody took my library and started using it it wouldn't lack some features but I think it helped me to realize that my idea of this API was somewhat flawed because I usually try to cover all bases and provide methods "just in case". And sometimes it bites me badly because I made some stupid mistake in basic functions being more focused on code that somebody maybe would need.
So what I would like to ask you do you ever tried this approach when needed to create a new API and did it helped you? Is it some recognized technique that has a name?
So basically you're trying to guess what your API should look like.
And that's the biggest problem with designing anything this way: there should be no (well, minimal) guesswork in software design. Designing an API based on assumptions rather than actual information is dangerous, for several reasons:
It's directly counter to the principle of YAGNI: in order to get anything done, you have to assume what the user is going to need, with no information to back up those assumptions.
When you're done, and you finally get around to using your API, you'll invariably find that it sucks to use (poor user experience), because you weren't thinking about how the library is used (UX), you were thinking about what the library must do (features).
An API, by definition, is an interface for users (i.e., developers). Designing as anything else just makes for a bad design, without fail.
Writing sample code is like designing a GUI before writing the backend: a Good Thing. It forces you to think about user experience and practical effects of design decisions without getting bogged down in useless theorising and assumption.
And contrary to Gabriel's answer, this is not bottom-up design: it's top-down. Rather than design the concrete backend of your library and then force an abstract interface on top of it, you first design the interface and then worry about the implementation.
Generally speaking, the idea of designing the concrete first and abstracting from that afterwards is called bottom-up design. Test Driven Development uses similar principle to what you describe to support better design. Firstly you write a test, which is an use of code you are going to write afterwards. It is important to proceed stepwise, because you have to proove the API is implementable. IMportant part of each part is refactoring - this allows you design more concise API and reuse parts of your code.

Reliably extracting identity fields from scanned documents / images?

I have to pull two pre-printed (not hand-written) fields out of a paper form, such that it can be automatically routed after being scanned. The fields contain batch and item identifiers, like "GG-9192" or "EPN/245G".
I've tried the following software:
Tesseract-OCR
Cuneiform
Canon ImageRunner built-in OCR
Asprise OCR Java API (demo)
I've tried the following settings:
Scanning at resolutions of 300dpi and 600dpi
Tried different fonts, including OCR-A and OCR-B.
In all cases output was pretty much all over the place. I can kick back documents for which I can't properly extract the necessary information, but I'm thinking it's going to be at least half of them. I considered some sort of fuzzy logic based on known values in a database, but sometimes these identifiers can differ by a single character, like "123G" and "123C".
Is this a lost cause? Perhaps OCR just isn't mature enough to handle a requirement of this nature? What other techniques might you recommend? Barcodes?
Edit: the containing application is in Java, so any recommendations for which there are free or cheap Java-based APIs for would help.
Edit 2: if anyone is interested...without any special tuning, Cuneiform for Linux and the Canon ImageRunner worked best, with Tesserect-OCR and Asprise Java API producing the worst results...none of the four was acceptable for anything but standard document search grade OCR. I'm beginning to think that this isn't going to work out.
If you have control over the fields, why use a human-readable format in the first place? For scanning, it seems like a QR Code, or something similar would be best. It is marked for orientation, and has some built-in error correction.
http://en.wikipedia.org/wiki/QR_Code
I started digging for products starting with Tomato's suggestion. I tried ABBYY and CVISION. Both have products that can automate OCR:
CVISION Maestro Recognition Server 4.0
ABBYY Recognition Server 2.0
In addition, ABBYY has SDKs for various platforms, and CVISION has an SDK that appears to work with at least VB/VC++.
I haven't tried either SDK yet, and am not sure it's necessary for my project. All I need is PDFs coming in that I can extract the text from. I did however try CVISION's server product and with the OCR on its most accurate settings, it worked really well. I haven't tried ABBYY's server product yet because I have to go through a reseller to get a trial. I'm in the process of doing so, but if it starts getting annoying I'm probably going to go with CVISION. I did try ABBYY's FineReader standalone product, and it worked very well, so I assume that their server product would also.