I've set up a solution that creates rapid fire PDF reports. Currently it seems I can't get Reporting Services to use all the resources it has available to it. The system doesn't appear to be IO bound, CPU bound, or memory bound. Any suggestions on trying to figure out why it's running so?
The application isn't network IO bound, and it is multi-threaded to 2 times the number of processors.
SQL Server Reporting Services limits the number of reports run to 2 simultaneous ad-hoc reports and 2 simultaneous web reports. This is a hard limit imposed by the server.
Robin Day is probably right, however if you are using a processor that supports hyper threading you may get a performance benefit by turning this off in the BIOS. You can try an a A/B performance test.
You could also check the SQL instance (when you say reporting service you mean SSRS right?) has not a got processor affinity set.
Is this a case of not using a multi threaded approach? Is the machine using 100% of one core of a processor and that's the bottleneck?
EDIT: Sorry for stating the obvious, was just an idea before you mentioned that it was already multi threaded. I'm afraid I can't offer any more suggestions.
Any suggestions on trying to figure out why it's running so?
a) There's an API to restrict a whole process to one CPU: test that using GetProcessAffinityMask.
b) 'Thread state' and 'Thread wait reason' are two of the performance counters ... maybe you can read this to see why threads, we you think ought to be running, aren't.
All the threads of your application are fighting for a single lock. Use a profiler to see if there is a congestion somewhere.
If you have four cores, that would explain why you see 25% overall CPU usage.
Maybe the server can't deliver more data over the network (so it's network IO bound)?
Related
We're trying to develop a Natural Language Processing application that has a user facing component. The user can call models through an API, and get the results back.
The models are pretrained using Keras with Theano. We use GPUs to speed up the training. However, prediction is still sped up significantly by using the GPU. Currently, we have a machine with two GPUs. However, at runtime (e.g. when running the user facing bits) there is a problem: multiple Python processes sharing the GPUs via CUDA does not seem to offer a parallelism speed up.
We're using nvidia-docker with libgpuarray (pygpu), Theano and Keras.
The GPUs are still mostly idle, but adding more Python workers does not speed up the process.
What is the preferred way of solving the problem of running GPU models behind an API? Ideally we'd utilize the existing GPUs more efficiently before buying new ones.
I can imagine that we want some sort of buffer before sending it off to the GPU, rather than requesting a lock for each HTTP call?
This is not an answer to your more general question, but rather an answer based on how I understand the scenario you described.
If someone has coded a system which uses a GPU for some computational task, they have (hopefully) taken the time to parallelize its execution so as to benefit from the full resources the GPU can offer, or something close to that.
That means that if you add a second similar task - even in parallel - the total amount of time to complete them should be similar to the amount of time to complete them serially, i.e. one after the other - since there are very little underutilized GPU resources for the second task to benefit from. In fact, it could even be the case that both tasks will be slower (if, say, they both somehow utilize the L2 cache a lot, and when running together they thrash it).
At any rate, when you want to improve performance, a good thing to do is profile your application - in this case, using the nvprof profiler or its nvvp frontend (the first link is the official documentation, the second link is a presentation).
We are writing a real-time game in EventMachine/Ruby. We're using ActiveRecord with MySQL for storing the game objects.
When we start the server we plan to load all the game objects into memory. This will allow for us to avoid any blocking/slow SQL queries with ActiveRecord.
However, we still need to persist the data in the database in case the server crashes, of course.
What are our options for doing so? I could use EM.Defer but I have no idea how many concurrent players that could handle since the thread pool is limited to 20.
Currently I'm thinking using Resque with Redis would be the best bet. Do everything with the objects in memory, and whenever there is a save that needs to occur for the database, fire off a job and add it to the Resque queue.
Any advice?
Threadpool size can be tweaked - see EventMachine.threadpool_size
Each server process (apache...) will spawn its own EventMachine reactor and its own EM.defer threadpool, so if you use a forking server (a mongrel farm, passenger, ...) you don't need to go crazy on the threadpool size
See EM-Synchrony by Ilja Grigorik (https://github.com/igrigorik/em-synchrony) - you should be able to simplify your code with it
Afaik, mysql has a non-blocking driver that you can use freely with EM, EM::Synchrony supports it http://www.igvita.com/2010/04/15/non-blocking-activerecord-rails/ - this would mean you don't need EM.defer at all!
Take a look at Thin - https://github.com/macournoyer/thin/ - it's non-blocking EM-based webserver that supports Rails
Having said all this, writing evented code is a bitch - forget about stack traces and make sure you're running benchmark tests often as anything blocking your reactor will block the entire application.
Also, this all applies to MRI Ruby ONLY. If you mean to use jruby... You're bound to get into trouble as thread-safety of eventmachine seems to be largely due to GIL of MRI Ruby and standard patterns don't work (many aspects of it can be made to work with this fork https://github.com/WebtehHR/eventmachine/tree/v1.0.3_w_fix which fixes some issues EM has with JRuby)
Unfortunately, guys from https://github.com/eventmachine/eventmachine are kind of not very active, the project currently has 200+ issues and almost 60 open pull requests which is why I've had to use a separate fork to continue playing with my current project - this still means EM is an awesome project just don't expect problems you encounter to be quickly fixed so do your best to not go out of the trodden path of EM use.
Another problem with JRuby is that EM::Synchrony imposes a heavy performance penalty because JRuby doesn't have implemented fibers as of 1.7.8 but rather maps them to Java native threads which are MUCH slower
Also, have you considered messaging with something like RabbitMQ (it has a synchronous https://github.com/ruby-amqp/bunny, and evented driver https://github.com/ruby-amqp/amqp) as a possibility to communicate game objects between clients and perhaps reduce overhead on the database / distributed memory store that you had in mind?
Redis/Resque seem good, but if all the jobs will need to do is simple persistance, and if there will be A LOT of such calls, you might want to consider beanstalkd - it has A LOT faster but simpler queue then Resque and you can probably make this even faster if you don't really need activerecord to dump attribute hashes into the database, see delayed_jobs vs resque vs beanstalkd?
A couple years and a failed project later, some thoughts:
avoid eventmachine if any way possible, there's a plethora of opportunities to peg your CPU nowadays with YARV/MRI Ruby on a IO constrained application and without wasting memory.
My favorite approach for a web application at this time is use Puma with multiple processes and threads.
Have in mind that GIL in YARV only affects the Ruby interpreter code, not the IO operations, meaning that on a IO constrained application you can add threads and see better utilization of a single core,
add more processes and you see better utilization of many cores :) On Heroku 1x worker we run 2 processes with 4 threads each and this pegs our CPU potential to the top in benchmark meaning the application is no longer IO bound, but CPU bound and doing so without unacceptable memory losses.
When we needed super-fast responses we were troubled by the DB write operation times which did not affect the response to client, so we did asynchronous database writes using sidekiq / resque,
In hindsight you could even do celluloid or concurrent-ruby for asynchronous IO reads/writes (think DB writes, cache visits etc), it's less overhead and infrastructure but harder to debug and problem solve in production - my worst nightmare being an async operation failing silently with no error trace in our Errors console (an exception in exception handling for example)
End result is that your application experiences the same sort of benefits you used to get from using eventmachine (elimination of the IO bound, full utilization of CPU without huge memory footprint, parallel non-blocking IO) without resorting to writing reactor code which is a complete bitch to do as explained in my 2013 post
I'm talking about a library that would allow me to log events from different machines and would align these events on a "global" time axis with sufficiently high precision.
Actually, I'm asking because I've written such a thing myself in the course of a cluster computing project, I found it terrifically useful, and I was surprised that I couldn't find any analogues.
Therefore, the point is whether something like this exists (and I better contribute to it) or nothing exists (and I better write an open-source analogue of my solution).
Here are the features that I'd expect from such a library:
Independence on the clock offset between different machines
Timing precision on the order of at least milliseconds, preferably microseconds
Scalability to thousands of concurrent logging processes, with at least several megabytes of aggregated logs per second
Soft real-time operation (t.i. I don't want to collect 200 big logs from 200 machines and then compute clock offsets and merge them - I want to see what happens "live", perhaps with a small lag like 10s)
Facebook's contribution in the matter is called 'Scribe'.
Excerpt:
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups.
...
Scribe is implemented as a thrift service using the non-blocking C++ server. The installation at facebook runs on thousands of machines and reliably delivers tens of billions of messages a day.
The API is Thrift-based, so you have a good platform coverage, but in case you're looking for simple integration for Java you may want to have a look at Digg's log4j appender for Scribe.
You could use log4j/log4net targeting a central syslog daemon. log4j has a builtin SyslogAppender, and in log4net you can do it as shown here. log4cpp docs here.
There are Windows implementations of Syslog around if you don't have a Unix system to hand for this.
Use Chukwa, Its Open source and Large scale Log Monitoring System
How to measure current load of MySQL server? I know I can measure different things like CPU usage, RAM usage, disk IO etc but is there a generic load measure for example the server is at 40% load etc?
mysql> SHOW GLOBAL STATUS;
Found here.
The notion of "40% load" is not really well-defined. Your particular application may react differently to constraints on different resources. Applications will typically be bound by one of three factors: available (physical) memory, available CPU time, and disk IO.
On Linux (or possibly other *NIX) systems, you can get a snapshot of these with vmstat, or iostat (which provides more detail on disk IO).
However, to connect these to "40% load", you need to understand your database's performance characteristics under typical load. The best way to do this is to test with typical queries under varying amounts of load, until you observe response times increasing dramatically (this will mean you've hit a bottleneck in memory, CPU, or disk). This load should be considered your critical level, which you do not want to exceed.
is there a generic load measure for example the server is at 40% load ?
Yes! there is:
SELECT LOAD_FILE("/proc/loadavg")
Works on a linux machine. It displays the system load averages for the past 1, 5, and 15 minutes.
System load averages is the average number of processes that are either in a runnable or uninterruptable state. A process in a runnable state is either using the CPU or waiting to use the CPU.
A process in uninterruptable state is waiting for some I/O access, eg waiting for disk. The averages are taken over the three time intervals. Load averages are not normalized for the number of
CPUs in a system, so a load average of 1 means a single CPU system is loaded all the time while on a 4 CPU system it means it was idle 75% of the time.
So if you want to normalize you need to count de number of cpu's also.
you can do that too with
SELECT LOAD_FILE("/proc/cpuinfo")
see also 'man proc'
with top or htop you can follow the usage in Linux realtime
On linux based systems the standard check is usually uptime, a load index is returned according to metrics described here.
aside from all the good answers on this page (SHOW GLOBAL STATUS, VMSTAT, TOP...) there is also a very simple to use tool written by Jeremy Zawodny, it is perfect for non-admin users. It is called "mytop". more info # http://jeremy.zawodny.com/mysql/mytop/
Hi friend as per my research we have some command like
MYTOP: open source program written using PERL language
MTOP: also an open source program written on PERL, It works same as MYTOP but it monitors the queries which are taking longer time and kills them after specific time.
Link for details of above command
It just hit me the other day. What actually happens when I tell the computer to go into Stand-By or to Hibernate?
More spesifically, what implications, if any, does it have on code that is running? For example if an application is compressing some files, encoding video files, checking email, running a database query, generating reports or just processing lots of data or doing complicated math stuff. What happens? Can you end up with a bug in your video? Can the database query fail? Can data processing end up containing errors?
I'm asking this both out of general curiosity, but also because I started to wonder if this is something I should think about when I program myself.
You should remember that the OS (scheduler) freezes your program about a gazillion times each second. This means that your program can already function pretty well when the operating system freezes it. There isn't much difference, from your point of view, between stand-by, hibernate and context switching.
What is different is that you'll be frozen for a long time. And this is the only thing you need to think about. In most cases, this shouldn't be a problem.
If you have a network connection you'll probably need to re-establish it, and similar issues. But this just means checking for errors in all IO operations, which I'm sure you're already doing... :-)
My initial thought is that as long as your program and its eco-system is contained within the pc that is going on stand - by or hibernation, then, upon resume your program should not be affected.
However, if you are say updating a record in some database hosted on a separate machine then hibernation / stand - by will be treated as a timeout.
If your program is dependent on such a change in "power status" you can listen to WM_POWERBROADCAST Message as mentioned on msdn
Stand-By keeps your "state" alive by keeping it in RAM. As a consequence if you lose power you'll lose your stored "state".
But it makes it quicker to achieve.
Hibernation stores your "state" in virtual RAM on the hard disk, so if you lose power you can still come back three days later. But it's slower.
I guess a limitation with Stand-By is how much RAM you've got, but I'm sure virtual RAM must be employed by Stand-By when it runs out of standard RAM. I'll look that up though and get back!
The Wikipedia article on ACPI contains the details about the different power savings modes which are present in modern PCs.
Here's the basic idea, from how I understand things:
The basic idea is to keep the current state of the system persisted, so when the machine is brought back into operation, it can resume at the state it was before the machine was put into sleep/standby/hibernation, etc. Think of it as serialization for your PC.
In standby, the computer will keep feeding power to the RAM, as the main memory is volatile memory that needs constant refreshing to hold on to its state. This means that the hard drives, CPU, and other components can be turned off, as long as there is enough power to keep the DRAM refreshed to keep its contents from disappearing.
In hibernation, the main memory will also be turned off, so the contents must be copied to permanent storage, such as a hard drive, before the system power is turned off. Other than that, the basic premise of hiberation is no different from standby -- to store the current state of the machine to restore at a later time.
With that in mind, it's probably not too likely that going into standby or hibernate will cause problems with tasks that are executing at the moment. However, it may not be a good idea to allow network activity to stop in the middle of execution, as depending on the protocol, your network connection could timeout and be unable to resume upon returning the system to its running state.
Also, there may be some machines that just have flaky power-savings drivers which may cause it to go to standby and never come back, but that's completely a different issue.
There are some implications for your code. Hibernation is more than just a context switch from the scheduler. Network connections will be closed, network drives or removable media might be disconnected during the hibernation, ...
I dont think your application can be notified of hibernation (but I might be wrong). What you should do is handle error scenarios (loss of network connectivity for example) as gracefully as possible. And note that those error scenario can occur during normal operation as well, not only when going into hibernation ...