Bulk email download from outlook365 - exchangewebservices

I am trying to download/access on daily basis, all the emails exchanged over outlook365 by employees of an organization, who obviously uses outlook365. After download finishes I'll be running some background jobs on these emails.
I've option of doing this via EWS APIs, but the throttling policies are turning out to be pain and affecting the predictability of the system, because of throttling policies. Daily no of emails to be accessed could range from 0.1- 1 million or above.
I am exploring upcoming graph as of now, to see if it helps solving this. I also have another way out by routing these emails to lets say AWS SES or apache james and accessing/downloading from there, thus avoiding throttling all together. But I am trying to avoid additional servers in deployment as of now.
My question -
Has anybody experienced this issue and what was if at all any reliable way around while using outlook supported email APIs?

I've option of doing this via EWS APIs, but the throttling policies are turning out to be pain and affecting the predictability of the system, because of throttling policies. Daily no of emails to be accessed could range from 0.1- 1 million or above.
Inefficiency code is more likely the cause of the throttling then blaming the API (eg if your not using batching, requesting more properties then you need etc) so the first thing you should do make sure you optimize the code as all the client based API's are throttling similarly. 0.1- 1 million over the time-span of day isn't that many emails to process in my experience with EWS especially if your using Impersonation where throttling cost would be dispersed across the mailboxes your accessing.

Related

Can events written via EWS API not make it to the mailbox, despite being accepted successfully by the API?

I have an application which writes lots (millions) of calendar entries to mailboxes for large organisations. Occasionally, Office 365 EWS API will accept a batch of entries, return success (not error) codes, and the entries fail to make into mailboxes.
Microsoft support don't (appear to) have a public-facing API support team so the usual Microsoft support routes just say either "third-party application" or "we don't have a support team you can speak to" ... so I'm a bit stuck. This does not appear to be a failure of the app, as I can see from the trace that it writes successfully and is given a change key back. And this only happens over a short period of time (say, all writes in a 30 min window have this problem).
I'm a bit stuck as to where to go here, as there's no error, just occasional and undesirable behaviour. It could even not be the API that's at fault, and could be just a sync error between EWS and mailbox stores. But, as it's Office 365, I can't see this.
Application is .Net 3.5 if it helps; very stable installs, runs fine for years, just occasionally has this problem... with just one customer...
I'm a bit stuck as to where to go here, as there's no error, just occasional and undesirable behaviour. It could even not be the API that's at fault, and could be just a sync error between EWS and mailbox stores. But, as it's Office 365, I can't see this.
EWS is just an API to access the Mail Store there is no sync involved and no cache, if your getting an ItemId returned then it must at some point have been written to the Exchange Store. DAG's https://learn.microsoft.com/en-us/exchange/high-availability/database-availability-groups/database-availability-groups?view=exchserver-2019#:~:text=A%20DAG%20is%20a%20group,affect%20individual%20servers%20or%20databases.&text=For%20example%2C%20you%20can't,servers%20in%20the%20same%20DAG. are Exchanges way of ensuring redundancy.
How are you determining that the appointments aren't in the Mailbox (or haven't been deleted or modified by another mail client). Most likely it will be another mail client (eg the IOS client has been cause of many issue of the past years). I would suggest calendar logging or auditing maybe be useful to see what might be happening if you can reproduce the issue.

Any known limits to calls per minute for webapps?

Im getting inconsistent behaviours when sending POST requests to a google apps script deployed as webapp.
I have a desktop app sending POST calls to a GAS webapp. This calls may be totally variable in their cadence, from 1 in several minutes, to bursts of dozens per second.
In my tests I have found requests seemingly lost, requests that don't progress along the webapp internal logic flow (like script instances that get cut or interrupeted (?)), while others work flawlessly. There is no evident pattern.
However, trying things around, I found that if I space the calls, adding a pause between requests, everything normalizes.
Are there stablished and known limits for this? The only option I have to solve this is to introduce this artificial intervals between calls? I have not found information on this in the GAS quotas page.
Any help and ideas would be appreciated.
Confirming in the answer: there is no evident or documented per-minute limit on the number of requests to a GAS webapp.
The issue I'm experimenting is related to concurrency. Even when they are coming from the same source, fast paced requests can produce concurrency issues when accesing storage services like Cache or Properties.
This should be handled using the Lock Service.

Salesforce: Google maps query status 620 G_GEO_TOO_MANY_QUERIES

In Salesforce I have created a future method that makes a Google Maps geocode callout. I know there is a limit of 2,500 requests per day but our instance has made no more than 100 requests today. Maybe the same number of requests yesterday. Yet the response code is 620 G_GEO_TOO_MANY_QUERIES.
Could it be that Google is seeing the IP address of the instance of Salesforce and aggregating all of these requests as coming from one location. So other companies that are sharing the address are causing my instance to hit this limit?
If not can anyone suggest another cause?
This discussion suggests that it is the shared origin from salesforce that is messing it up.
This only makes sense though if you are doing the geocode lookup from the server and not from a client. If you would do it fromt the client it would use the ip from the client and you are dealing with the local clients lookup limits (see below).
If you are doing it on the server you might also have to check if you are actually doing something that is legal and not breaking the ToS from google. In this discussion you will get some background on that and also a solution if you need to fix this on the server (buy a licence)
To be complete the G_GEO_TOO_MANY_QUERIES can mean one of 2 things:
You exceeded the daily limit (too many in a day)
You exceeded the speed limit (too many request is too short period)
Google has not specified an exact limit as far as I know but they don't seem to enjoy automated lookups. If you look at various libraries and plugins you'll see that all of them force a delay between each request and often they add a little randomness to the delay. You could experiment with this if it makes any difference
Did this end up working for you? we're hitting a server-side 620 and all configuration looks a.o.k... we have a premier license and upgraded to 250k requests per day.

Use of messaging like RabbitMQ in web application?

I would like to learn what are the scenarios/usecases/ where messaging like RabbitMQ can help consumer web applications.
Are there any specific resources to learn from?
What web applications currently are making use of such messaging schemes and how?
In general, a message bus (such as RabbitMQ, but not limited to) allows for a reliable queue of job processing.
What this means to you in terms of a web application is the ability to scale your app as demand grows and to keep your UI quick and responsive.
Instead of forcing the user to wait while a job is processed they can request a job to be processed (for example, clicking a button on a web page to begin transcoding a video file on your server) which sends a message to your bus, let's the backend service pick it up when it's turn in the queue comes up, and maybe notify the user that work has/will begin. You can then return control to the UI, so the user can continue working with the application.
In this situation, your web interface does zero heavy lifting, instead just giving the user visibility into stages of the process as you see fit (for example, the job could incrementally update database records with the state of process which you can query and display to your user).
I would assume that any web application that experiences any kind of considerable traffic would have this type of infrastructure. While there are downsides (network glitches could potentially disrupt message delivery, more complex infrastructure, etc.) the advantages of scaling your backend become increasingly evident. If you're using cloud services this type of infrastructure makes it trivial to add additional message handlers to process your jobs by subscribing to the job queue and just picking off messages to process.
I just did a Google search and came up with the following:
Reddit.com
Digg.com
Poppen.De
That should get you started, at least.

In which domains are message oriented middleware like AMQP useful?

What problem do MOM (Message Oriented Middleware) solve? Scalability? Integration?
In which domain are they typically used and in which domains are they typically not used?
For example, say, is Google using such solution for it's main search engine or to power GMail?
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Does it make sense for desktop apps that need to communicate with a server?
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
I'm a bit confused as to what they're useful for and I think that with example of where they're appropriate and where they're not appropriate I could better understand their use.
This is a great question.
The main uses of messaging are: scaling, offloading work, integration, monitoring, event handling, routing, networking, push, mobility, buffering, queueing, task sharing, alerts, management, logging, batch, data delivery, pubsub, multicast, audit, scheduling, ... and more. Basically: anything where you need data but don't want to make a database request. (Caching is another, longer story).
Another way of looking at this is to notice that many applications used to be built by assuming that users (people) would perform actions that would be fulfilled by executing a transaction on a database (including reads, writes). But today, many actions are not user-initiated. Instead they are application-initiated. For example "tell me when the book that I want to buy is in stock". The best way to solve this class of problems is with messaging of some sort. Whether you call it middleware or web push or real time salad dressing does not matter. It's all messaging.
When you enable applications to initiate or react to events, then it is much easier to scale because your architecture can be based on loosely coupled components. It is also much easier to integrate those components if your messaging is based on a stable, scalable, serviceable tool, preferably using open standard APIs and protocols.
I hope this helps. We try to maintain a list of useful links about messaging here
Please get in touch with questions and comments on any of this, we are dead easy to find.
To address your specific questions:
In which domain are they typically used and in which domains are they typically not used?
Like databases, messaging systems crop up everywhere.
For example, say, is Google using such solution for it's main search engine or to power GMail?
Google uses a lot of home grown technology, but a lot of their open source contributions and known use cases suggest that messaging is (or should be) central to some of the main services.
What about big websites like Walmart, eBay, FedEx (pretty much a Java shop) and buy.com (pretty much an MS shop)? Does MOM solve a need there?
Very much so.
An example use case is scaling web page requests. When the user makes a web request, the web server puts it onto a queue for background processing. This means that the web server can keep working while the request is processed. It also means that the web server does not need to know how the request is handled, making system maintenance, upgrade and rollback much simpler because the main parts are 'decoupled'.
So, anyway, the web request gets processed by a back end service, or possibly by many services, eg 'look up book titles', 'draw shopping cart', 'get advertisement', 'check user account'... Finally all the results get put onto another queue, ready for collection and user response by the web server. Typically the system will include a timeout of around 100ms so that any late requests just get thrown away. The user sees anything that got processed in the time interval. This is one reason why some large ecommerce sites have pages that appear to load in stages.
There are many more use cases...
Does it make any sense when you're writing a Webapp where you control the server-side and have an homogenous environment (say tens of Amazon EC2 instances all running Linux + Java JVMs) there and where the clients are, well, Web browsers?
Definitely. If you have an unknown, or unbounded, number of users, server side instances, and application latencies, then it makes sense to use messaging, even if just as a scalable substrate for non-blocking RPC.
Does it make sense for desktop apps that need to communicate with a server?
In lots of cases. One very common case is when the server pushes events to the desktop app, eg game event, tweets, price feeds in finance, system alerts....
Or is it 'only' for big enterprise stuff where you typically have a happy mix of countless of different systems that needs to communicate in a way or another?
Definitely not only for those 'legacy integration' cases but they are important too. At RabbitMQ, the biggest customers we have in terms of pure scale or message volume are cloud providers and big web application providers.
I will answer only one answer, from prior experience - take a look at this middle-ware that is employed by big companies here - middle-ware has one purpose - to glue dis-connected systems (written in disparate languages) together so that they can interact with one another and streamline the business process - Entera as I have had experience with, creates a middle layer in which the unix box using processes written in C, interact with the mainframe system (DB2, COBOL) via a front-end written in PowerBuilder (I am not naming the company!).
From the description I have given, Entera is a middle-ware which hosts a number of things - smooth integration of the flow of data regardless of the endian format, ability for different languages to talk to the middle-ware broker (a broker is a CORBA or DCE like process, that conforms to 'The Open Group) that listens on a particular port) and is specified by an IDL which makes a process appear to be local - if you understand the terminology used in Remoting under Microsoft's .NET Framework, you are not far off the mark! The middle-ware generates stubs which are linked at compile-time and manages the creation of the process, hosting it off a port, multi-threading at run-time, and also, the modern front-ends (such as .NET, Java, PowerBuilder even the unspeakable VB6...ok...VB.NET for the purists out there) can interact by opening a connection to the specified port on a particular IP address, and using the stubs generated, can interact with it directly.
Obviously, from what was described you can see how the legacy systems can have new life breathed into it and thus scalability of the process, the major downside of this is the cost factor which can run into thousdands of dollars. Big companies who uses mainframes as their back-end processing systems for billing/invoicing, who generate a huge revenue can obviously afford such an expensive product - to them it would seem like throwing pennies into a pool of water...because of the use of middle-ware which prolongs the business process, and breathe new life into it, can extend the business by a good number of years into the future without worrying about 'legacy' tag attached to it.
Incidentally, I carried this out as part of my thesis for my BSc. in Information Systems which covered this commercial front-end. There was an open source version of the middle-ware available on sourceforge called FreeDCE, but development efforts have declined or stopped.
Edit:
#cocotwo: That is exactly what middle-ware does as you said it is a plumbing tool...message oriented middle-ware is not really heard of AFAIK because I would imagine, the processes (functions) would need to be called as if they are locally visible within the application domain of the front-end to make it easy to interact with.
Using messages may have its advantages over RPC calls in that the messages are queued in a safe-holding area in the event that a network disconnection occurs - there may be some data caching going on within that aspect to allow the front-end to continue regardless...it would be useful in the instances of 'updating a status of a particular billing/invoice number' - a one-way write-data to the back-end via the middle-ware.
Ok, big companies would have advanced systems infrastructure in that technicians are constantly around the clock to ensure a smooth delivery of data-flow so that would have to be factored in. The company that I worked with had IBM Global Support contract to fulfill in order to ensure a maximum uptime 99% with 6 nine's after the decimal point...with hot-swapping/balanced-clusters/mirroring systems in place...
Whereas with RPC, if the disconnection occurs, the front-end would have to be restarted or would have to handle the disconnection event. It really depends if the message-queueing middle-ware handles each message in real-time and pass back results to the front-end immediately...
This is where each (Message-queueing and RPC related middle-ware) have their strengths and weaknesses...and also the cost mitigation factor such as support, maximum up-time, development efforts and training - that's a biggie here as middle-ware are really proprietary (despite following the 'The Open Group' layout/standards) and complex to setup and to glue the whole thing together via scripts.
Good answers and discussion here. Our consulting team has two preferred "messaging" solutions: RabittMQ and NXTera a high speed RPC middleware, the contemporary version of Entera mentioned above. My partners and I have developed several solutions using RabittMQ, it is the best tool available in that space right now. Additionally, I happen to work for the company that makes NXTera/Entera.
From experience I can clearly say that both of these products meet the need for reliability and low maintenance as discussed above. There are situations where a messaging service, like RabittMQ, is the right choice -- where Publish and subscribe, certified delivery, Queuing or store-and-forward are required.
In other cases, RPC's (remote procedure calls) are the best and fastest solutions for transactional and distributed processing for enterprise or cloud-based applications. When it is right to use an RPC, but SOAP/.NET (yes these are RPC implementations) are too slow, expensive or complex, a lightwieght high speed RPC middleware like NXTera/Entera is the right choice for us.
There is some use case overlap between RPC middleware and message oriented middleware, and where there are you can use either successfully. But both are strong and dependable choices.
The large companies I work with use both RPC and MoM side-by-side. As far as Internet companies, Google (Protocol Buffers) and Facebook (Thrift) show that RPC's have a roll to play in modern web and cloud-based development.