How to monitor a POP, SMTP and Exchange Server for mail activity - smtp

We need to write a .Net (C#) application that monitors all mail activity through a POP, SMTP and Exchange Server (2007 and later) and essentially grab the mail for archiving into a document management system. I realise that the way to monitor each type of server would probably be different so I'd like to know what the best (most elegant and reliable) way is to achieve this.
Thanks.

Many countries have rather narrow regulations for what such a system must do and what it must not do in order to be in compliance with the law. If you are developing a product for a company in SA that wants to sell it internationally, I would suggest that need a more targeted approach.
Depending on the legal framework, your solution will have to intercept and archive all emails, or just a subset.
For instance, some countries do not allow the company to store private emails of employees, in which case the archival process needs to be configurable with rules that the employee can control.
If the intent is to archive each and every email, then the network-level approach that Jimmy Chandra suggested is better, because it is easier to deploy.

I don't think you need to worry about POP right? it is not used for sending mails (unless you need to monitor access to emails too).
Regarding Exchange, versions 2000 onwards have Journaling support (don't know about previous ones), so a mail is copied cto a mailbox as it is sent/recieved (there are several different options depending Exchange version, check it out). Then you can read that mailbox or set a rule to forward it to an external SMTP, and you app listen to it.
For other SMTP servers, it would be possible to get a similar approach by forwarding rules etc, and some might have custom support as Exchange has.

Related

They have won. I cannot use a custom SMTP server for my user registration

I have tried to have my SMTP server with exim4 for a while to send ecommerce registration and transaction confirms.
I followed all the good rules, like SPF, DKIM, DMARC, SMTP encryption. The postmaster tool by google said the IP has a good reputation and there are not spam traffic.
I reached a antispam test result of 10/10 using https://www.mail-tester.com/ or some similar tools.
But everything was not enough, my ecommerce registration messages are sent to spam by Gmail and it is the same with some other famous providers. I have understood I can't do nothing against.
Then, what is a good solution for user registration for an ecommerce? How can I reduce the messages delivered to spam folder?
I'm afraid you haven't provided enough information to identify the problem.
Hundreds of factors can contribute to deliverability outcomes; it's not as simple as setting up authentication, reverse DNS, etc. In addition, no free mail tester can accurately tell you how your deliverability will be at Gmail, Outlook, Yahoo, etc., because mail testers don't have the same data those ISPs use to make spam filtering decisions.
If you want more reliable deliverability, it's generally easiest to use a transactional email service to send email, rather than trying to run an SMTP server yourself. There are occasional exceptions to this, but because most senders will have an easier time using one of these services, it's almost always the first thing to try.
(How do email services provide better deliverability? A number of ways, but I think the biggest difference is that they can more closely manage the email sending reputation of their IP ranges. Having anti-spam systems built into the service enables them to resolve problems that much faster, compared to hosting providers which tend to have fewer tools and less data available to stop spammers, so their IP ranges' sending reputation is usually worse.)
Here is something you can try that may help.
Use a 3rd-party delivery provider (AWS SES, SendGrid, Mailgun) with a Good IP reputation. Here's a list of places you can check their IP reputation https://www.helloinbox.email/#reputation (Talos and Barracuda).
Use a subdomain to send transactional emails (email.example.com).
Let me know if that doesn't work.

Which hook to limit the number of messages a user can send per day?

We want to use ejabberd in the context of a web application having fairly unique and business rules, we'd therefore need to have every chat message (not protocol message, but message a user sends to another one) go through our web application first, and then have the web application deliver the message to ejabberd on behalf of the user (if our business rules allow the message to be sent).
The web application is also the one providing the contact lists (called rosters if I understand correctly to ejabberd). We need to be and remain the single source of truth to ease maintenance.
To us, ejabberd value added would be to deliver chat messages in near real-time to clients, and enable cool things such as presence indicators. Web clients will maintain a direct connection to ejabberd through websocket, but this connection will have to be read-only as far as chat messages are concerned, and read-write as far as presence messages are concerned.
The situation is similar with regards to audio and video calls. While this time the call per see will directly be managed by ejabberd to take advantage of built-in STURN, TURN etc... and will not need to go through our web app, we have custom business logic to manage who is able to call who, when, how often etc... (so in order words, we have custom business logic to authorize the call or not and we'd like to keep all the business logic centralized in the web app).
My question is what would be the proper hooks we'd need to look into to achieve what we are after? I spent an hour or so in the documentation, but I couldn't find what I am after so hopefully someone can provide me pointers. In an ideal world, we'd like to expose API endpoints from our web app that ejabberd hooks can hit. However, the first question is: which relevant hooks is ejabberd offering and where are these hooks documented?
Any help would be greatly appreciated, thank you!
When a client sends a packet to ejabberd, it triggers the user_send_packet hook, providing the packet and the state of the client's session process. Several modules use that hook, for example mod_service_log.

In which domains are message oriented middleware like AMQP useful?

What problem do MOM (Message Oriented Middleware) solve? Scalability? Integration?
In which domain are they typically used and in which domains are they typically not used?
For example, say, is Google using such solution for it's main search engine or to power GMail?
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Does it make sense for desktop apps that need to communicate with a server?
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
I'm a bit confused as to what they're useful for and I think that with example of where they're appropriate and where they're not appropriate I could better understand their use.
This is a great question.
The main uses of messaging are: scaling, offloading work, integration, monitoring, event handling, routing, networking, push, mobility, buffering, queueing, task sharing, alerts, management, logging, batch, data delivery, pubsub, multicast, audit, scheduling, ... and more. Basically: anything where you need data but don't want to make a database request. (Caching is another, longer story).
Another way of looking at this is to notice that many applications used to be built by assuming that users (people) would perform actions that would be fulfilled by executing a transaction on a database (including reads, writes). But today, many actions are not user-initiated. Instead they are application-initiated. For example "tell me when the book that I want to buy is in stock". The best way to solve this class of problems is with messaging of some sort. Whether you call it middleware or web push or real time salad dressing does not matter. It's all messaging.
When you enable applications to initiate or react to events, then it is much easier to scale because your architecture can be based on loosely coupled components. It is also much easier to integrate those components if your messaging is based on a stable, scalable, serviceable tool, preferably using open standard APIs and protocols.
I hope this helps. We try to maintain a list of useful links about messaging here
Please get in touch with questions and comments on any of this, we are dead easy to find.
To address your specific questions:
In which domain are they typically used and in which domains are they typically not used?
Like databases, messaging systems crop up everywhere.
For example, say, is Google using such solution for it's main search engine or to power GMail?
Google uses a lot of home grown technology, but a lot of their open source contributions and known use cases suggest that messaging is (or should be) central to some of the main services.
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Very much so.
An example use case is scaling web page requests. When the user makes a web request, the web server puts it onto a queue for background processing. This means that the web server can keep working while the request is processed. It also means that the web server does not need to know how the request is handled, making system maintenance, upgrade and rollback much simpler because the main parts are 'decoupled'.
So, anyway, the web request gets processed by a back end service, or possibly by many services, eg 'look up book titles', 'draw shopping cart', 'get advertisement', 'check user account'... Finally all the results get put onto another queue, ready for collection and user response by the web server. Typically the system will include a timeout of around 100ms so that any late requests just get thrown away. The user sees anything that got processed in the time interval. This is one reason why some large ecommerce sites have pages that appear to load in stages.
There are many more use cases...
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Definitely. If you have an unknown, or unbounded, number of users, server side instances, and application latencies, then it makes sense to use messaging, even if just as a scalable substrate for non-blocking RPC.
Does it make sense for desktop apps that need to communicate with a server?
In lots of cases. One very common case is when the server pushes events to the desktop app, eg game event, tweets, price feeds in finance, system alerts....
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
Definitely not only for those 'legacy integration' cases but they are important too. At RabbitMQ, the biggest customers we have in terms of pure scale or message volume are cloud providers and big web application providers.
I will answer only one answer, from prior experience - take a look at this middle-ware that is employed by big companies here - middle-ware has one purpose - to glue dis-connected systems (written in disparate languages) together so that they can interact with one another and streamline the business process - Entera as I have had experience with, creates a middle layer in which the unix box using processes written in C, interact with the mainframe system (DB2, COBOL) via a front-end written in PowerBuilder (I am not naming the company!).
From the description I have given, Entera is a middle-ware which hosts a number of things - smooth integration of the flow of data regardless of the endian format, ability for different languages to talk to the middle-ware broker (a broker is a CORBA or DCE like process, that conforms to 'The Open Group) that listens on a particular port) and is specified by an IDL which makes a process appear to be local - if you understand the terminology used in Remoting under Microsoft's .NET Framework, you are not far off the mark! The middle-ware generates stubs which are linked at compile-time and manages the creation of the process, hosting it off a port, multi-threading at run-time, and also, the modern front-ends (such as .NET, Java, PowerBuilder even the unspeakable VB6...ok...VB.NET for the purists out there) can interact by opening a connection to the specified port on a particular IP address, and using the stubs generated, can interact with it directly.
Obviously, from what was described you can see how the legacy systems can have new life breathed into it and thus scalability of the process, the major downside of this is the cost factor which can run into thousdands of dollars. Big companies who uses mainframes as their back-end processing systems for billing/invoicing, who generate a huge revenue can obviously afford such an expensive product - to them it would seem like throwing pennies into a pool of water...because of the use of middle-ware which prolongs the business process, and breathe new life into it, can extend the business by a good number of years into the future without worrying about 'legacy' tag attached to it.
Incidentally, I carried this out as part of my thesis for my BSc. in Information Systems which covered this commercial front-end. There was an open source version of the middle-ware available on sourceforge called FreeDCE, but development efforts have declined or stopped.
Edit:
#cocotwo: That is exactly what middle-ware does as you said it is a plumbing tool...message oriented middle-ware is not really heard of AFAIK because I would imagine, the processes (functions) would need to be called as if they are locally visible within the application domain of the front-end to make it easy to interact with.
Using messages may have its advantages over RPC calls in that the messages are queued in a safe-holding area in the event that a network disconnection occurs - there may be some data caching going on within that aspect to allow the front-end to continue regardless...it would be useful in the instances of 'updating a status of a particular billing/invoice number' - a one-way write-data to the back-end via the middle-ware.
Ok, big companies would have advanced systems infrastructure in that technicians are constantly around the clock to ensure a smooth delivery of data-flow so that would have to be factored in. The company that I worked with had IBM Global Support contract to fulfill in order to ensure a maximum uptime 99% with 6 nine's after the decimal point...with hot-swapping/balanced-clusters/mirroring systems in place...
Whereas with RPC, if the disconnection occurs, the front-end would have to be restarted or would have to handle the disconnection event. It really depends if the message-queueing middle-ware handles each message in real-time and pass back results to the front-end immediately...
This is where each (Message-queueing and RPC related middle-ware) have their strengths and weaknesses...and also the cost mitigation factor such as support, maximum up-time, development efforts and training - that's a biggie here as middle-ware are really proprietary (despite following the 'The Open Group' layout/standards) and complex to setup and to glue the whole thing together via scripts.
Good answers and discussion here. Our consulting team has two preferred "messaging" solutions: RabittMQ and NXTera a high speed RPC middleware, the contemporary version of Entera mentioned above. My partners and I have developed several solutions using RabittMQ, it is the best tool available in that space right now. Additionally, I happen to work for the company that makes NXTera/Entera.
From experience I can clearly say that both of these products meet the need for reliability and low maintenance as discussed above. There are situations where a messaging service, like RabittMQ, is the right choice -- where Publish and subscribe, certified delivery, Queuing or store-and-forward are required.
In other cases, RPC's (remote procedure calls) are the best and fastest solutions for transactional and distributed processing for enterprise or cloud-based applications. When it is right to use an RPC, but SOAP/.NET (yes these are RPC implementations) are too slow, expensive or complex, a lightwieght high speed RPC middleware like NXTera/Entera is the right choice for us.
There is some use case overlap between RPC middleware and message oriented middleware, and where there are you can use either successfully. But both are strong and dependable choices.
The large companies I work with use both RPC and MoM side-by-side. As far as Internet companies, Google (Protocol Buffers) and Facebook (Thrift) show that RPC's have a roll to play in modern web and cloud-based development.

What is the Royal Mail's PAF Address Database?

I'm struggling to understand what you would get from the Royal Mail if you bought their PAF file dataset of UK addresses.
I was expecting that PAF was some form of database which you would host yourself, and the Royal Mail provide APIs into that database.
However, after reading this, I'm presuming that all you get is a series of files containing the data. I can't find any obvious information regarding an API.
Are there any libraries available to help you handle these files, especially from Java?
Do you have to parse the file yourself and stick it in your own database, so you can do quick lookups from an application?
If all this is true, why would you ever bother buying this off the Royal Mail? Aren't all the third party providers, with their web based APIs, just far simpler to use - in terms of both programming and data maintenance?
Apologies if I've missed the obvious on this one, but I find the Royal Mail site lacking in information. I'm beginning to think that I've misunderstood their PAF file offering.
The postcode address file (PAF) is a set of data-files provided by Royal Mail that contain all address in the UK. My understanding is that it's normally updated every three months.
I'm aware of two companies that have products that supply APIs into the PAF data: QAS and Capscan. With these you're able to search addresses to find missing postcodes or vice versa. APIs include both web-based solutions and native calls.
Why you'd buy direct from Royal Mail? Because you'd want to write your own query tools rather rely on third party products or you want to do data-mining that other products can't provide.
Could you import into a SQL database? Yes, but only after you'd written your own PAF file parser.
Why use these over web-based tools? Because you're sitting behind an intranet, have limited internet access from servers, restrictive licensing from any web-based solution, etc.
It's all in wikipedia
http://en.wikipedia.org/wiki/Postcode_Address_File
Check out www.PostcodeAnywhere.co.uk a web-service based lookup site. Also desktop lookup app available. Decision likley to be based on lookup volume, ease of use, costs, etc. But for low-medium volumes, simple implementation in a few minutes and 'automatic' maintenance built-in.
I've subsequently found this page where you can order a sample data set. It states:
Please be aware that Raw Data contains no software and the data must be processed for use in IT applications. If you do not wish to program PAF or Postzon then we can supply it to you in a pre-written application known as UK Addresses on CD
The UK Addresses on CD page goes on about something called "UK Addresses Utilities", and it states:
The UK Addresses CD also contains a set of dynamic link libraries and provides the ability to interrogate the address datasets programmatically through a .NET 2.0(+) DLL.
I have written something in C# that can parse these files into SQL Server
https://github.com/Telexx/Royal-Mail-PAF-Parser/

How to implement a single sign-on authentication server?

I want to implement a discrete remote authentication server that handles login for many sites. Somewhat similar to OpenID.
Basically, I have site-1 and site-2 and they're both reliant on the same user database, which is on a separate auth-site. So, auth-site handles user authentication for them, and during this process, makes information on the authenticating user available to the requesting system.
Each site can be on a completely separate domain name, on completely separate machines.
This is all via HTTP(S), there can be no direct database access.
There's one last quirk: once an user has logged in to site-1, when accessing any other site reliant on auth-site, the site must treat the user as already authenticated.
This whole business must be entirely fuss-free to the end-user. It should work like a simple everyday login form.
As a concrete example, say we're talking about stackoverflow.com and serverfault.com, and they both authenticate via authentic-overflow-server-stack.com. Again, once logged in to either site, I can go to the other and do my business without logging in again.
What I'd like to know are the general interaction mechanism between the sites behind this scenario.
In my particular setup, I'm using Rails, but I'm not looking for code[1], just general best practice and guidance, so feel free to answer in pseudo-code or any generally readable language. OTOH, bear in mind that I'll have decent MVC, REST, and meta-programming in my toolkit.
[1]: unless you happen to know an existing tiny neat free MIT/BSD-licensed app/plugin/generator that handles this.
It sounds like (especially with the emphasis on fuss-free), you want something like what the Wikimedia Foundation is doing. Basically, you log on to en.wikipedia.org, then that server communicates with other servers (e.g. en.wikinews.org) and gets authentication tokens. Finally, those tokens are embedded into images, e.g. http://en.wikinews.org/wiki/Special:AutoLogin?token=xxxxxxxxxxxxxxx , and when your browser visits that url (img src) it gets a authentication cookie for Wikinews. Of course, the source code is available for your reivew at http://www.mediawiki.org/wiki/Extension:CentralAuth .
OpenID is also a good choice, but it does require that the user "consciously" visit two domains. An example of one entity with two domains doing this is Canonical. E.g., if you go to https://help.ubuntu.com/community/UserPreferences they will redirect you to Launchpad (https://login.launchpad.net/+openid) for authentication.
Note that Wikipedia is doing this over http, but you can do it all https to ensure the img src tokens aren't intercepted.
Looks like CAS is good enough for me, and has ruby implementations, along with dozens of other lesser languages, e.g. one that rhymes with femoral bone rage.
http://code.google.com/p/rubycas-server/
http://code.google.com/p/rubycas-client/
It sounds like you want to actually use the OpenID protocol itself. There's no reason you can't restrict the authentication provider to only your own server, and do some shortcuts that make the authentication process transparent. Also, the OpenID protocol supports what you describe about logging into one implies logging in to all services.