I have a scenario in which I need to process(in SQL Server) messages being delivered as .xml files in a folder in real time.
I started investigating SQL Service Broker for my queuing needs. Basically, I want the Service Broker to pick up my .xml files and place them in a queue as they arrive in the folder. But, SQL Service Broker does not support "Monolog" conversations, at least not in the current version. It supports only a dialog between an initiator and a target service.
I can use MSMQ but then I will have two things to maintain - the .Net Code for file processing in MSMQ and the SQL Server T-SQL stored procs. What options do I have left?
Thanks.
You'll want to leverage the FileSystemWatcher to monitor the directory. Your implementation can simply respond to new files and use the event to queue processing of the file(s) (which could implemented in Service Broker if that makes your life better).
As the other posters have mentioned, you're really got things backwards: Service Broker responds to messages; someone must send a message for it to respond to. It is not a generic service host process. Depending on the feature set and scale out/up requirements, you might want to look at BizTalk as this is a very common pattern implemented with it and it has TONS of infrastructure to support all the orthagonal "cost of doing business" components to make the thing be reliable and actually work.
Once you're done writing/debugging all the required code on your own you'll often find you've spent more $ than the licenses cost. Again though, it's all about requirements.
None. The whole idea you have is broken - as you have to pick up the files from a directory, the use of service broker simply does not make sense to start with. YOu need a listening process, so you can have the listening process do the processing, too.
Related
I'm finishing a system at work that makes calls to mysql server. Those calls' arguments reveal information that I need to keep private, like vote(idUser, idCandidate). There's no information in the db that relates those two of course, nor in "the visible part" of the back end, but even though I think this can't be done, I wanted to make sure that it is impossible to trace this sort of calls, with a log or something (calls that were made, or calls being made at the moment), as it is impossible in most languages, unless you specifically "debug" in a certain way, while the system is in production and being used. I hope the questions is clear enough. Thanks.
How do I log thee? Let me count the ways.
MySQL query log. I can enable this per-session and send everything to a log file.
I can set up a slave server and have insertions sent to me by the master. This is a significant intervention and would leave a wide trace.
On the server, unbeknownst to either Web app and MySQL log, I can intercept communications between the two. I need administrative access to the machine, of course.
On the server, again with administrative access, I can both log the query calls and inject a logging instrumentation into the SQL interface (the legitimate one is the MySQL Audit Plugin, but there are several alternatives, developed for various purposes by developers over the years)
What can you do? You can have the applications use a secure protocol, just for starters.
Then, you need to secure your machine so that administrator tricks do not work, and even if the logs are activated, nobody can read them and you can be advised of any new and modified file to delete it promptly.
Just for fun, I'm designing a few web applications using a microservices architecture. I'm trying to determine the best way to do configuration management, and I'm worried that my approach for configuration may have some enormous pitfalls and/or something better exists.
To frame the problem, let's say I have an authentication service written in c++, an identity service written in rust, an analytics services written in haskell, some middletier written in scala, and a frontend written in javascript. There would also be the corresponding identity DB, auth DB, analytics DB, (maybe a redis cache for sessions), etc... I'm deploying all of these apps using docker swarm.
Whenever one of these apps is deployed, it necessarily has to discover all the other applications. Since I use docker swarm, discovery isn't an issue as long all the nodes share the requisite overlay network.
However, each application still needs the upstream services host_addr, maybe a port, the credentials for some DB or sealed service, etc...
I know docker has secrets which enable apps to read the configuration from the container, but I would then need to write some configuration parser in each language for each service. This seems messy.
What I would rather do is have a configuration service, which maintains knowledge about how to configure all other services. So, each application would start with some RPC call designed to get the configuration for the application at runtime. Something like
int main() {
AppConfig cfg = configClient.getConfiguration("APP_NAME");
// do application things... and pass around cfg
return 0;
}
The AppConfig would be defined in an IDL, so the class would be instantly available and language agnostic.
This seems like a good solution, but maybe I'm really missing the point here. Even at scale, tens of thousands of nodes can be served easily by a few configuration services, so I don't forsee any scaling issues. Again, it's just a hobby project, but I like thinking about the "what-if" scenarios :)
How are configuration schemes handled in microservices architecture? Does this seem like a reasonable approach? What do the major players like Facebook, Google, LinkedIn, AWS, etc... do?
Instead of building a custom configuration management solution, I would use one of these existing ones:
Spring Cloud Config
Spring Cloud Config is a config server written in Java offering an HTTP API to retrieve the configuration parameters of applications. Obviously, it ships with a Java client and a nice Spring integration, but as the server is just a HTTP API, you may use it with any language you like. The config server also features symmetric / asymmetric encryption of configuration values.
Configuration Source: The externalized configuration is stored in a GIT repository which must be made accessible to the Spring Cloud Config server. The properties in that repository are then accessible through the HTTP API, so you can even consider implementing an update process for configuration properties.
Server location: Ideally, you make your config server accessible through a domain (e.g. config.myapp.io), so you can implement load-balancing and fail-over scenarios as needed. Also, all you need to provide to all your services then is just that exact location (and some authentication / decryption info).
Getting started: You may have a look at this getting started guide for centralized configuration on the Spring docs or read through this Quick Intro to Spring Cloud Config.
Netflix Archaius
Netflix Archaius is part of the Netflix OSS stack and "is a Java library that provides APIs to access and utilize properties that can change dynamically at runtime".
While limited to Java (which does not quite match the context you have asked), the library is capable of using a database as source for the configuration properties.
confd
confd keeps local configuration files up-to-date using data stored in external sources (etcd, consul, dynamodb, redis, vault, ...). After configuration changes, confd restarts the application so that it can pick up the updated configuration file.
In the context of your question, this might be worthwhile to try as confd makes no assumption about the application and requires no special client code. Most languages and frameworks support file-based configuration so confd should be fairly easy to add on top of existing microservices that currently use env variables and did not anticipate decentralized configuration management.
I don't have a good solution for you, but I can point out some issues for you to consider.
First, your applications will presumably need some bootstrap configuration that enables them to locate and connect to the configuration service. For example, you mentioned defining the configuration service API with IDL for a middleware system that supports remote procedure calls. I assume you mean something like CORBA IDL. This means your bootstrap configuration will not be just the endpoint to connect to (specified perhaps as a stringified IOR or a path/in/naming/service), but also a configuration file for the CORBA product you are using. You can't download that CORBA product's configuration file from the configuration service, because that would be a chicken-and-egg situation. So, instead, you end up with having to manually maintain a separate copy of the CORBA product's configuration file for each application instance.
Second, your pseudo-code example suggests that you will use a single RPC invocation to retrieve all the configuration for an application in a single go. This coarse level of granularity is good. If, instead, an application used a separate RPC call to retrieve each name=value pair, then you could suffer major scalability problems. To illustrate, let's assume an application has 100 name=value pairs in its configuration, so it needs to make 100 RPC calls to retrieve its configuration data. I can foresee the following scalability problems:
Each RPC might take, say, 1 millisecond round-trip time if the application and the configuration server are on the same local area network, so your application's start-up time is 1 millisecond for each of 100 RPC calls = 100 milliseconds = 0.1 second. That might seem acceptable. But if you now deploy another application instance on another continent with, say, a 50 millisecond round-trip latency, then the start-up time for that new application instance will be 100 RPC calls at 50 milliseconds latency per call = 5 seconds. Ouch!
The need to make only 100 RPC calls to retrieve configuration data assumes that the application will retrieve each name=value pair once and cache that information in, say, an instance variable of an object, and then later on access the name=value pair via that local cache. However, sooner or later somebody will call x = cfg.lookup("variable-name") from inside a for-loop, and this means the application will be making a RPC every time around the loop. Obviously, this will slow down that application instance, but if you end up with dozens or hundreds of application instances doing that, then your configuration service will be swamped with hundreds or thousands of requests per second, and it will become a centralised performance bottleneck.
You might start off writing long-lived applications that do 100 RPCs at start-up to retrieve configuration data, and then run for hours or days before terminating. Let's assume those applications are CORBA servers that other applications can communicate with via RPC. Sooner or later you might decide to write some command-line utilities to do things like: "ping" an application instance to see if it is running; "query" an application instance to get some status details; ask an application instance to gracefully terminate; and so on. Each of those command-line utilities is short-lived; when they start-up, they use RPCs to obtain their configuration data, then do the "real" work by making a single RPC to a server process to ping/query/kill it, and then they terminate. Now somebody will write a UNIX shell script that calls those ping and query commands once per second for each of your dozens or hundreds of application instances. This seemingly innocuous shell script will be responsible for creating dozens or hundreds of short-lived processes per second, and of those short-lived processes will make numerous RPC calls to the centralised configuration server to retrieve name=value pairs one at a time. That sort of shell script can pu a massive load on your centralised configuration server.
I am not trying to discourage you from designing a centralised configuration server. The above points are just warning about scalability issues you need to consider. Your plan for an application to retrieve all its configuration data via one coarse-granularity RPC call will certainly help you to avoid the kinds of scalability problems I mentioned above.
To provide some food for thought, you might want to consider a different approach. You could store each application's configuration files on a web sever. A shell start script "wrapper" for an application can do the following:
Use wget or curl to download "template" configuration files from the web server and store the files on the local file system. A "template" configuration file is a normal configuration file but with some placeholders for values. A placeholder might look like ${host_name}.
Also use wget or curl to download a file containing search-and-replace pairs, such as ${host_name}=host42.pizza.com.
Perform a global search-and-replace of those search-and-replace terms on all the downloaded template configuration files to produce the configuration files that are ready to use. You might use UNIX shell tools like sed or a scripting language to perform this global search-and-replace. Alternatively, you could use a templating engine like Apache Velocity.
Execute the actual application, using a command-line argument to specify the path/to/downloaded/config/files.
I am trying to Ajax load from LAN's mysql using chrome app.
I am proposing Ajax because I need chrome app to load up any update in the SQL instantaneously.
Since this app is only used in LAN network, I presume there is no need to maintain a web server (aka running Apache). Can anyone provide some hints as this answer I found on the forum does not help me (an absolute newbie) too much.
https://developer.chrome.com/extensions/xhr
Thank you.
YY
Since this app is only used in LAN network, I presume there is no need to maintain a web server (aka running Apache).
AJAX refers to making a HTTP request to.. something.
Something that can answer HTTP requests is called a web server.
So, you do need some sort of web server. It may be a component of MySQL server, but it's still a web server.
That said, it doesn't look like MySQL has a supported HTTP interface. There is an experimental HTTP Plugin that provides REST API, but it's experimental. Therefore, you would need a separate server application that does what you need.
That said,
I am proposing Ajax because I need chrome app to load up any update in the SQL instantaneously.
AJAX is not a magic bullet. It works well for requesting data, but it is not adapted to receiving updates initiated by the server you're talking to. It's a request-response cycle, and while there are some techniques to use it to push data they are hacks.
WebSockets evolved to cover the bidirectional, persistent communication needs. However, this again would require a web server to sit as a proxy between your DB and your app - this time, WebSockets-capable.
That said, building a Chrome App allows you to connect to a database directly - since Chrome Apps are capable of using chrome.sockets API. You would need a JavaScript library specifically adapted to the task, but those probably exist.
That said, and noting that I'm not an expert on databases, but..
Databases are not designed to notify you about updates. You need to poll them to see if the data has changed. You will not get it instantaneously no matter what interface you use. You'll need to periodically monitor it for changes.
Considering this, depending on what you're trying to ultimately do you may be choosing a wrong instrument.
There's a lot of "buts" here, and it seems like a complex task. You should re-evaluate your readiness as an "absolute newbie" to undertake it.
I have a realtime HTML5 canvas game that runs off a node backend. Players are connected via Websocket (socket.io). The problem is sometimes I need to deploy new code (hotfixes for instance) and restart the server but I don't want to disconnect players.
My idea for this was to divide the websocket server and application server into separately deployable components and add a message queue in the middle to decouple the 2 components. That way if the application server was rebooting there would just be a short delay while the messages bunch up but nothing would be lost. Is this a good strategy? Is there an alternative?
It's very possible for websocket based applications to be restarted without the user noticing anything (that's the case for my chat server for example).
To make that possible, the solution isn't to have a websocket application isolated and never restarted. In fact this would be very optimistic (are you sure you could ensure its API is never changed ?).
A solution is
to ensure the client reconnects if disconnected (this is standard if you use socket.io for websocketing)
to make the server ask the client its id (or session id) on client initiated reconnection
to persists the state of the application. This is usually done with a database. If your server has no other state than the queue between clients (which is a little unlikely) then you might look for an existing persistent queue implementation or build your own over a fast local storage (redis comes to mind)
I have a service that accepts callbacks from a provider.
Motivation: I do not want to EVER lose any callbacks (unless of course my network becomes unreachable).
Let's suppose the impossible happens and my mysql server becomes unreachable for some time,
I want to fallback to a secondary persistence store once I've retried several times and fail.
What are my options? Queues, in-memory cache ?
You say you're receiving "Callbacks" - you've not made clear what they are. What is the protocol? Is it over a network.
If it were HTTP, then I would say the best way is that if your application is unable to write the data into permanent storage, it should return an error ("Try again later" if that exists in the protocol) to the caller, who should try again later.
An asynchronous process like a callback should always be able to cope with failures downstream and queue its requests.
I've worked with a payment provider where this has been the case (Paypal). If you're unable to completely process the request, just send an error back to the caller.
I recommend some sort of job queue server. I personally use Starling and have had great results with it. It speaks the memcache protocol so it is easy to use as a persistent queue.
Starling on Github
I've put a queue in SQLite for this before. Though, in my case, it was to protect against loss of the network link to the MySQL server — the data was locally-generated.
You can have a backup MySQL server, and switch your connection to that one in case primary one breaks down. If it's going to be only fail-over store you could probably run it locally on the application server.