Implementing cross-thread/process queues in Perl - mysql

What is the most efficient way of implementing queues to be read by another thread/process?
I'm thinking of using a basic MySQL table with polling on sleep. This sounds to be the most scalable (it doesn't even have to be on the same server) but might potentially result in too many queries to the DB.

You have several options, and it really depends on what you are trying to get the system to do.
fork child processes, and interface using connections their stdin/stdout pipes.
create a named pipe on the file system, like /tmp/mysql.sock. This is basically using sockets to communicate cross process.
Setup a message broker. I'd recommend giving ActiveMQ a try, and using the Stomp client for Perl. This is probably your most scalable solution.

This is one of those things that is simple to write yourself to your exact specifications. I wrote a toy one here:
http://github.com/jrockway/app-queue
I am not sure it compiles anymore, as AnyEvent::Subprocess has changed significantly since I wrote it. But you can steal the ideas.
Basically, I think an RPC-style infrastructure is the best. You have a server that handles keeping the data. Then clients connect and add data or remove data via RPC calls. This gives you ultimate flexibility with the semantics. You can be "transactional" so that if a client takes data and then never says "hey, I am done with it", you can assume the client died and give the job to another client. You can also ensure that each job is only run once.
Anyway, making a queue work with a relational database table involves a bit of effort. You should use something KiokuDB for the persistence. (You can physically store the data in MySQL if you desire, but this provides a nicer Perl API to that.)

In PostgreSQL you could use the NOTIFY/LISTEN combination, would need only a wait on the PG connection socket after running LISTEN(s).

Related

Reliability Android when connection is off

I'm developing an App where I store my data in a DB online using HTTP POSTO and GET.
I need to implement some reliability to my software, so if the user presses the button, and there is no connection, the data should be stored in something (file? sqlite?) and then when the connection is again on, send the HTTP request to send data.
Any advices or pieces of code to show me how to do this?
Thanks.
Sounds good and pretty forward for me. Just go.
You use a local sqlite db as "cache". To keep it simple, do not implement any logic about that into your apps normal code. Just use the local db. Then, separately, you code a synchronizer. That one checks for the online connection and synchronizes the the local sqlite database with a remote database, maybe mysql.
This should be perfectly fine for all applications that to not require immediate exchange of the data with other processes all the time.
There is one catch, though: the low performance of sqlite on bigger data sets. That is an issue with all single file database solutions. So this approach probably is only valid for small data sets in total, or if you can reduce the usage of the local database to only a part of the total data, maybe only the time critical stuff.
Another workaround might be to use joins over two separate databases, the local and the remote one. But such things really boost the complexity of code, so think thrice if that really is required.

What is the best way to use Web database using Delphi?

all.
I'm using DBExpress and C++ Builder(Delphi) 2007 and MySQL, firebird , ...
I'd like to make win 32 application which use Database(located on my web server).
I tried using DBExpress (TSQLConnection for MySQL), it's so so slow...
and I tried local database then upload/download using Indy..
but it was not good and little complicated.
So what is the base way to use web-based database for win 32 application?
Do you have any experience? or any document or any comment will be so so graceful..
thanks a lot..
Database connections via an Internet link (using a VPN or not) are slow - you are perfectly right. The main reason IMHO is the "ping" delay of every request, which is very low on a local network, and much higher via Internet. So direct connection is not a good idea.
In latest versions of Delphi, you have the DataSnap components, which is the new "standard" (or Embarcadero recommended) way of doing remote access (including web access). Even if it was found at first to be a bit limited, the latest versions are perfectly usable, and are becoming a key product for cross-platform application building with Delphi. But it is not available for Delphi 2007.
One much matured product (and available for Delphi 2007) is Data Abstract:
Data Abstract is a framework for building database-driven applications
using the multi-tier data access model, for a variety of platforms.
Of course, this is not free, but this is a proven and efficient solution.
You may also take a look at our Client-Server ORM, which can connect to any DB, and is able to implement a RESTful SOA architecture with Delphi 2007, even without using the ORM part - that is, you can use your existing DBExpress-based source code, and expose easily some web interfaces to the data. It is Open Source, and uses JSON as communication format over a secured authentication mechanism. There is a lot of documentation included (more than 700 pages of PDF), which also tries to introduce to the SOA world.
Take a look at Datasnap: info
You need a data access library, which offers features:
Thread safety. In general, you will need to use a dedicated connection for each thread.
Connection pooling. To make connection creation (what is needed for (1)) fast, there must be a connection pool.
Fast execute SQL command, open result set, fetch capabilities.
Tracing. With any one library you may run into performance issues. You need a tool to see what is going on wrong. For that you will need to see and analyze the client and server communication.
Result set caching and ability to read it simultaneously from different threads. You may have few read-only tables, which you will fetch once and cache in your application. But you will need a machanism to read this data from threads. Kind of InMemTable cloning.
My answer is biased, but you may consider AnyDAC. It has all these and many other features.
PS: dbExpress should work too. Try to find first the reason for your performance issue, and not a different library. Because the same may happen with other library ...
DB applications over a slow link need a different approach than those using a fast link. You have to be careful about how much data you move around, and about how many roundtrips your application perform.
Usually an approach when the needed subset is cached on the client, modified, and the applied to the database is preferrable (of course if changes do not neeed to be seen immediately, and the chances of conflicts are low).
No middleware will help you much if the application is not designed with handling a slow link in mind.

How does a LAMP developer get started using a Redis/Node.js Solution?

I come from the cliche land of PHP and MySQL on Dreamhost. BUT! I am also a javascript jenie and I've been dying to get on the Node.js train. In my reading I've discovered inadvertently a NoSQL solution called Redis!
With my shared web host and limited server experience (I know how to install Linux on one of my old dell's and do some basic server admin) how can I get started using Redis and Node.js? and the next best question is -- what does one even use Redis for? What situation would Redis be better suited than MySQL? And does Node.js remove the necessity for Apache? If so why do developers recommend using NGINX server?
Lots of questions but there doesnt seem to be a solid source out there with this info all in one place!
Thanks again for your guidance and feedback!
NoSQL is just an inadequate buzz word.
I'll attempt to answer the latter part of the question.
Redis is a key-value store database system. Speed is its primary objective, so most of its use comes from event driven implementations (as it goes over in its reddit tutorial).
It excels at areas like logging, message transactions, and other reactive processes.
Node.js on the other hand is mainly for independent HTTP transactions. It is basically used to serve content (much like a web server, but Node.js really wouldn't be necessarily public facing) very fast which makes it useful for backend business logic applications.
For example, having a C program calculate stock values and having Node.js serve the content for another internal application to retrieve or using Node.js to serve a web page one is developing so one's coworkers can view it internally.
It really excels as a middleman between applications.
Redis
Redis is an in-memory datastore : All your data are stored in the memory meaning that a huge database means huge memory usage, but with really fast access and lookup.
It is also a key-value store : You don't have any realtionships, or queries to retrieve your data. You can only set a key value pair, and retreive it by its id. (Redis also provides useful types such as sets and hashes).
These particularities makes Redis really well suited for storing sessions in a web application, creating indexes on a database, handling real-time data like analytics.
So if you need something that will "replace" MySQL for storing your basic application models I suggest you try something like MongoDB, Riak or CouchDB that are document store.
Document stores manages your data as something analogous to JSON objects (I know it's a huge shortcut).
Read this article if you want to know more about popular nosql databases.
Node.js
Node.js provides asynchrous I/O for the V8 JavaScript engine.
When you run a node server, it listens on a port on your machine (e.g. 3000). It does not do any sort of Domain name resolution and Virtual Host handling so you have to use a http server with a proxy such as Apache or nginx.
Choosing over nginx in production is a matter of performance, and I find it easier to use. But I suggest you use the one you're the most comfortable with.
To get started with it just install them and start playing with it. HowToNode
You can get a free plan from https://redistogo.com/ - it is a hosted redis database instance.
Quick intro to redis data types and basic commands is available here - http://redis.io/topics/data-types-intro.
A good comparison of when to use what is here - http://playbook.thoughtbot.com/choosing-platforms/databases/

How to Guarantee Message delivery with Celery?

I have a python application where I want to start doing more work in the background so that it will scale better as it gets busier. In the past I have used Celery for doing normal background tasks, and this has worked well.
The only difference between this application and the others I have done in the past is that I need to guarantee that these messages are processed, they can't be lost.
For this application I'm not too concerned about speed for my message queue, I need reliability and durability first and formost. To be safe I want to have two queue servers, both in different data centers in case something goes wrong, one a backup of the other.
Looking at Celery it looks like it supports a bunch of different backends, some with more features then the others. The two most popular look like redis and RabbitMQ so I took some time to examine them further.
RabbitMQ:
Supports durable queues and clustering, but the problem with the way they have clustering today is that if you lose a node in the cluster, all messages in that node are unavailable until you bring that node back online. It doesn't replicated the messages between the different nodes in the cluster, it just replicates the metadata about the message, and then it goes back to the originating node to get the message, if the node isn't running, you are S.O.L. Not ideal.
The way they recommend to get around this is to setup a second server and replicate the file system using DRBD, and then running something like pacemaker to switch the clients to the backup server when it needs too. This seems pretty complicated, not sure if there is a better way. Anyone know of a better way?
Redis:
Supports a read slave and this would allow me to have a backup in case of emergencies but it doesn't support master-master setup, and I'm not sure if it handles active failover between master and slave. It doesn't have the same features as RabbitMQ, but looks much easier to setup and maintain.
Questions:
What is the best way to setup celery
so that it will guarantee message
processing.
Has anyone done this before? If so,
would be mind sharing what you did?
A lot has changed since the OP! There is now an option for high-availability aka "mirrored" queues. This goes pretty far toward solving the problem you described. See http://www.rabbitmq.com/ha.html.
You might want to check out IronMQ, it covers your requirements (durable, highly available, etc) and is a cloud native solution so zero maintenance. And there's a Celery broker for it: https://github.com/iron-io/iron_celery so you can start using it just by changing your Celery config.
I suspect that Celery bound to existing backends is the wrong solution for the reliability guarantees you need.
Given that you want a distributed queueing system with strong durability and reliability guarantees, I'd start by looking for such a system (they do exist) and then figuring out the best way to bind to it in Python. That may be via Celery & a new backend, or not.
I've used Amazon SQS for this propose and got good results. You will recieve message until you will delete it from queue and it allows to grow you app as high as you will need.
Is using a distributed rendering system an option? Normally reserved for HPC but alot of concepts are the same. Check out Qube or Deadline Render. There are other, open source solutions as well. All have failover in mind given the high degree of complexity and risk of failure in some renders that can take hours per image sequence frame.

We're using JDBC+XMLRPC+Tomcat+MySQL to execute potentially large MySQL queries. What is a better way?

I'm working on a Java based project that has a client program which needs to connect to a MySQL database on a remote server. This was implemented is as follows:
Use JDBC to write the SQL queries to be executed which are then hosted as a servlet using Apache Tomcat and made accessible via XML-RPC. The client code uses XML-RPC to remotely execute these JDBC based functions. This allows us to keep our MySQL database non-public, restricts use to the pre-defined functions, and allows Tomcat to manage the database transactions (which I've been told is better than letting MySQL do it alone, but I really don't understand why). However, this approach requires a lot of boiler-plate code, and Tomcat is a huge memory hog on our server.
I'm looking for a better way to do this. One way I'm considering is to make the MySQL database publicly accessible, re-writing the JDBC based code as stored procedures, and restricting public use to these procedures only. The problem I see with this are that translating all the JDBC code to stored procedures will be difficult and time consuming. I'm also not too familiar with MySQL's permissions. Can one grant access to a stored procedure which performs select statements on a table, but also deny arbitrary select statements on that same table?
Any other ideas are welcome, as are thoughts and or sugguestions on the stored procedure solution.
Thank you!
You can probably get the RAM upgraded in your server for less than the cost of even a few days development time, so don't write any code if that's all you're getting from the exercise. Also, just because the memory is used inside of tomcat, it doesn't mean that tomcat itself is using it. The memory could be used up by data or by technical flaws in your code.
If you've tried additional RAM and it is being eaten up, then that smells like a coding issue, so I'd suggest using a profiler, or log data to try and work out what the root cause is before changing anything. If the cause is large data sets then using the database directly will only delay the inevitable, instead you'd need to look at things like paging, summarisation, client side caching, or redesigning clients to reduce the use of expensive queries. Using a profiler, or simply reviewing the code base, will also tell you if something is creating too many objects (especially strings, or XML nodes) or leaking memory.
Boiler plate code can be avoided by refactoring creatively, and its good that you do avoid repetition. Its unclear how much structure you might already have, but with a little work its easy to centralise boilerplate JDBCs calls. There is no fundamental reason JDBC code should be repeated, perhaps you could tell us what code is being repeated?
Finally, I'll venture that there are many good reasons to put a web tier over your database. Flexibility (of deployment), compatibility, control (over the SQL) and security are all good reasons to keep the web tier.
MySQL 5.0.3+ does have an execute privilege that you can set (without setting select privileges) that should allow you to get the functionality you seek.
However, note this mysql bug report with JDBC (well and a lot of other drivers).
When calling the [procedure] with JDBC, I get "java.sql.SQLException: Driver requires
declaration of procedure to either contain a '\nbegin' or '\n' to follow argument
declaration, or SELECT privilege on mysql.proc to parse column types."
the workaround is:
See "noAccessToProcedureBodies" in /J 5.0.3 for a somewhat hackish, non-JDBC compliant
workaround.
I am sure you could implement your solution without much boiler-plate, esp. using something like Spring's remoting. Also, how much memory is Tomcat eating? I frankly believe that if it's just doing what you are describing, it could work in less than 128mb (conservative guess).
Your alternative is the "correct by the book" way of solving the problem. I say build a prototype and see how it works. The major problems you could have are:
MySQL having some important gotcha in this regard
MySQL's Stored Procedure support being too primitive and forcing you to do a lot of work
Some other strange hiccup
I'm probably one of those MySQL haters, so the situation might be better than I think.