max time a report should take to generate report - reporting-services

I am just curious to know how long, in minutes, does the reporting service take to generate report when it returns 1MB of data. Maybe using views and table is properly index. SSRS reporting and server side generation.

Report generation time has two components:
- Data Acquisition time
- Render Time
So for 1 Mb of data, how many records (rows) are we talking? How many pages will the report have? How many controls per page? Does the report use charting? These are the factors that will determine generation time.
For most reports, data acquisition time is the most significant factor. Your report is never going to run faster than the raw data acquisition. So if you are using SQL, the report can't generate faster than the time required to run the query. I have seen queries that return far more than 1Mb of data very quickly. I have also seen queries that return very little data, that run for a long time.
On the render side, there are a couple of things that that can cause a report to run be slow. The first is in report aggregation. If a report needs to receive all of the records prior to starting rendering, then its performance will suffer. In particular, depending on the reporting tool. With large data sets (more than 10,000 records), you can have significant improvements in rendering by doing aggregation at the source (DB). The other is charting, which typically involves heavy rendering overhead and aggregation.
Most reporting systems allow you to build in timers or logging that will help you to performance tune the report. It is best to build timers into the report that will tell you what percentage of time the report is spending getting the data, and what percentage is spent rendering. When you have this information, you will know where to focus your energies.
If you are really trying to evaluate the performance of the reporting tool, the best way is to build a report that either reads a flat file or generates the data through code. In other words, eliminate the impact of the database and see how fast your reporting tool can generate pages.
Hope this helps.

How long is acceptable? Depends on what it's doing, how much it's run, things like that. Anything below 30 seconds would be fine if it's run once every day or two. If it's run once a week or once a month that number could be a lot higher.

The report itself is generally very fast, if you're seeing a hangup you may want to check the execution time of the query which generates the data. A complex query can take a long time, even if it only returns a little data...

I've found, when using BIRT and other reporting systems that the best improvements tend to come by offloading most of the work to the database at the back end.
In other words, don't send lots of data across the wire and sort or group it locally. The database is almost certainly going to outperform you with its SQL orderby and groupby clauses and optimizing indexes (among other things).
That way, you get faster extraction of the data you want AND less network traffic.

As several said already, a general question like this really can't be answered. However, I wrote up Turbo-charge Your Report Speed – General Rules & Guidelines (disclaimer - I'm the CTO at Windward Reports, a competitor of SSRS). I think that will help you look for what you can do to speed up the process.
And with all the caveats about the specifics matter a lot, on a 3GHz workstation we generally see 7-30 pages/second. Keep in mind this is numbers for Windward, not SSRS.

Related

Multiple large MySQL SELECT queries - better to run in parallel or in a queue?

I have looked up answers to this question a bunch and couldn't find a specific answer - sorry in advance if I missed something! Also, I'm a SQL optimization noob.
I have an analytics dashboard which pulls data based on users' requests from a large database.
Each page the user loads runs a number of different queries to populate different parts of the page (different charts, tables, etc). Some of these pages can take quite some time to load as the user might request several years of data.
Currently, each part of the page pings off one SELECT query to the SQL server but as there are several parts of the page, those queries end up running in parallel.
Would it be faster to run these queries in a queue - to allow the server to process one query at a time? Or to keep everything in parallel, as is?
The added benefit of running them one at a time is that we could run the queries to fill in the "above-the-fold" part of the page first...
Hope that all makes sense and take it easy on me please :)
I also say "it depends", but I lean toward parallelism.
Probably should not have more parallelism than the number of CPU cores.
I rarely see a system that chews up all the CPU cores -- unless it does not have good enough indexes. That is, fix the indexes before asking the question.
If the data is bigger than can be cached, it may be faster to queue, since you may have a choke point -- I/O.
If the table(s) are continually being changed, turn off the Query Cache.
Is your goal to get some results on the page early (a likely Human Interface goal), add a small delay in all but one AJAX callee (not caller).
If multiple pages could be computing at the same time, things get more complex. For example, you can't really control the parallelism.
Let's see the queries. Perhaps we can speed them up enough to obviate the question.
There is no right answer to this question. Up to a point, running parallel SELECT queries is (generally) going to be faster than one running query. Whether that point is 2 queries or 200 depends on the nature of the queries, the hardware configuration, the data, and the speeds of various components.
The situation becomes even more complex when you consider how many different users may be involved and whether or not the data is being updated. You can get into really bad situations with parallel queries and updates if the locks start cascading. Of course, this can happen with multiple simultaneous users as well.
My guess is that you want a throttling mechanism that will run, say, n queries at a time and put the rest into a queue.

using JPA ( hibernate) VS stored procedures

I am working on a project using ZK Framework, Hibernate, Spring and Mysql.
I need to generate some charts from Mysql database, but after I calculate the number of objects that I need to calculate the values of those charts I found it more than 1400 objects and same numbers of queries and transactions.
So i thought if using stored procedures in Mysql to calculate those values and save them in a separate tables (using an architecture close to Data Warehouse), and then use my web application to just read the values of those tables and display them as charts.
I want to know in terms of speed and performance, which of those methods is better?
And thank you
No way to tell, really, without many more details. However:
What you want to do is called Denormalisation. This is a recognised technique for speeding up reporting and making it easier. (If it doesn't, your denormalisation has failed!) When it works it has the following advantages:
Reports run faster
Report code is easier to write
On the other hand:
Report Data is out of date, containing only data as at the time you
last did the calculations
An extreme form of doing this is to take the OLTP database (a standard database) and export it into an Analysis database (aka a Cube or an OLAP database).
One of the problems of Denormalisation is that a) it is usually a significant effort, b) it adds extra code which adds complexity and thus increases support costs, and c) it might not make enough (or any) difference. Because of this, it is usual not to do it until you know you have a problem. This will happen when you have done your reports on the basic database and have found that they either are too difficult to write and/or run too slowly. I would strongly suggest that only when you reach that point do you go for Denormalisation.
There can be times when you don't need to do that, but I've only seen 1 such example in over 25 years of development; and that decision was helped by a desire to use an OLAP database by Management for political purposes.

What would be the best DB cache to use for this application?

I am about 70% of the way through developing a web application which contains what is essentially a largeish datatable of around 50,000 rows.
The app itself is a filtering app providing various different ways of filtering this table such as range filtering by number, drag and drop filtering that ultimately performs regexp filtering, live text searching and i could go on and on.
Due to this I coded my MySQL queries in a modular fashion so that the actual query itself is put together dynamically dependant on the type of filtering happening.
At the moment each filtering action (in total) takes between 250-350ms on average. For example:-
The user grabs one end of a visual slider, drags it inwards, when he/she lets go a range filtering query is dynamically put together by my PHP code and the results are returned as a JSON response. The total time from the user letting go of the slider until the user has recieved all data and the table is redrawn is between 250-350ms on average.
I am concerned with scaleability further down the line as users can be expected to perform a huge number of the filtering actions in a short space of time in order to retrieve the data they are looking for.
I have toyed with trying to do some fancy cache expiry work with memcached but couldn't get it to play ball correctly with my dynamically generated queries. Although everything would cache correctly I was having trouble expiring the cache when the query changes and keeping the data relevent. I am however extremely inexperienced with memcached. My first few attempts have led me to believe that memcached isn't the right tool for this job (due to the highly dynamic nature of the queries. Although this app could ultimately see very high concurrent usage.
So... My question really is, are there any caching mechanisms/layers that I can add to this sort of application that would reduce hits on the server? Bearing in mind the dynamic queries.
Or... If memcached is the best tool for the job, and I am missing a piece of the puzzle with my early attempts, can you provide some information or guidance on using memcached with an application of this sort?
Huge thanks to all who respond.
EDIT: I should mention that the database is MySQL. The siite itself is running on Apache with an nginx proxy. But this question is related purely to speeding up and reducing the database hits, of which there are many.
I should also add that the quoted 250-350ms roundtrip time is fully remote. As in from a remote computer accessing the website. The time includes DNS lookup, Data retrieval etc.
If I understand your question correctly, you're essentially asking for a way to reduce the number of queries against the database eventhough there will be very few exactly the same queries.
You essentially have three choices:
Live with having a large amount of queries against your database, optimise the database with appropriate indexes and normalise the data as far as you can. Make sure to avoid normal performance pitfalls in your query building (lots of ORs in ON-clauses or WHERE-clauses for instance). Provide views for mashup queries, etc.
Cache the generic queries in memcached or similar, that is, without some or all filters. And apply the filters in the application layer.
Implement a search index server, like SOLR.
I would recommend you do the first though. A roundtrip time of 250~300 ms sounds a bit high even for complex queries and it sounds like you have a lot to gain by just improving what you already have at this stage.
For much higher workloads, I'd suggest solution number 3, it will help you achieve what you are trying to do while being a champ at handling lots of different queries.
Use Memcache and set the key to be the filtering query or some unique key based on the filter. Ideally you would write your application to expire the key as new data is added.
You can only make good use of caches when you occasionally run the same query.
A good way to work with memcache caches is to define a key that matches the function that calls it. For example, if the model named UserModel has a method getUser($userID), you could cache all users as USER_id. For more advanced functions (Model2::largerFunction($arg1, $arg2)) you can simply use MODEL2_arg1_arg2 - this will make it easy to avoid namespace conflicts.
For fulltext searches, use a search indexer such as Sphinx or Apache Lucene. They improve your queries a LOT (I was able to do a fulltext search on a 10 million record table on a 1.6 GHz atom processor, in less than 500 ms).

SQL Assemblies vs Application code for complicated queries on large XML columns

I have a table with a few relational columns and one XML column which sometimes holds a fairly large chunk of data. I also have a simple webservice which uses the database. I need to be able to report on things like all the instances of a certain element within the XML column, a list of all the distinct values for a certain element, things like that.
I was able to get a list of all the distinct values for an element, but didn't get much further than that. I ended up writing incredibly complex T-SQL code to do something that seems pretty simple in C#: go through all the rows in this table, and apply this ( XPath | XQuery | XSLT ) to the XML column. I can filter on the relational columns to reduce the amount of data, but this is still a lot of data for some of the queries.
My plan was to embed an assembly in SQL Server (I'm using 2008 SP2) and have it create an indexed view on the fly for a given query (I'd have other logic to clean this view up). This would allow me to keep the network traffic down, and possibly also allow me to use tools like Excel and MSRS reports as a cheap user interface, but I'm seeing a lot of people saying "just use application logic rather than SQL assemblies". (I could be barking entirely up the wrong tree here, I guess).
Grabbing the big chunk of data to the web service and doing the processing there would have benefits as well - I'm less constrained by the SQL Server environment (since I don't live inside it) and my setup process is easier. But it does mean I'm bringing a lot of data over the network, storing it in memory while I process it, then throwing some of it away.
Any advice here would be appreciated.
Thanks
Edit:
Thanks guys, you've all been a big help. The issue was that we were generating a row in the table for a file, and each file could have multiple results, and we would doing this each time we ran a particular build job. I wanted to flatten this out into a table view.
Each execution of this build job checked thousands of files for several attributes, and in some cases each of these tests these were generating thousands of results (MSIVAL tests were the worst culprit).
The answer (duh!) is to flatten it out before it goes into the database! Based on your feedback, I decided to try creating a row for each result for each test on each file, and the XML just had the details of that one result - this made the query much simpler. Of course, we now have hundreds of thousands of rows each time we run this tool but the performance is much better. I now have a view which creates a flattened version of one of the classes of results that are emitted by the build job - this returns >200,000 and takes <5 seconds, compared to around 3 minutes for the equivalent (complicated) query before I went the flatter route, and between 10 and 30 minutes for the XML file processing of the old (non-database) version.
I now have some issues with the number of times I connect, but I have an idea of how to fix that.
Thanks again! +1's all round
I suggest using the standard xml tools in TSQL. (http://msdn.microsoft.com/en-us/library/ms189075.aspx). If you don't wish to use this I would recommend processing the xml on another machine.
SQLCLR is perfect for smaller functions, but with the restrictions on the usable methods it tends to become an exercise in frustration once you are trying to do more advanced things.
What you're asking about is really a huge balancing act and it totally depends on several factors. First, what's the current load on your database? If you're running this on a database that is already under heavy load, you're probably going to want to do this parsing on the web service. XML shredding and querying is an incredibly expensive procedure in SQL Server, especially if you're doing it on un-indexed columns that don't have a schema defined for them. Schemas and indexes help with this processing overhead, but they can't eliminate the fact that XML parsing isn't cheap. Secondly, the amount of data you're working with. It's entirely possible that you just have too much data to push over the network. Depending on the location of your servers and the amount of data, you could face insurmountable problems here.
Finally, what are the relative specs of your machines? If your web service machine has low memory, it's going to be thrashing data in and out of virtual memory trying to parse the XML which will destroy your performance. Maybe you're not running the most powerful database hardware and shredding XML is going to be performance prohibitive for the CPU you've got on your database machine.
At the end of the day, the only way to really know is to try both ways and figure out what makes sense for you. Doing the development on your web services machine will almost undoubtedly be easier as LINQ to XML is a more elegant way of parsing through XML than XQuery shoehorned into T-SQL is. My indication, given the information you provided in your question, is that T-SQL is going to perform better for you in the long run because you're doing XML parsing on every row or at least most rows in the database for reporting purposes. Pushing that kind of information over the network is just ugly. That said, if performance isn't that important, there's something to be said about taking the easier and more maintainable route of doing all the parsing on the application server.

How many queries are too many?

I have to run one time 10 mysql queries for one person in one page. Is it very bad? I have quite good hosting, but still, can it break or something? Thank you very much.
Drupal sites typically make anywhere from 150 to 400+ queries per request. The total time spent querying the database is still under 1s - it's not the number that kills the server, but the quality/complexity of the queries (and possibly the size of the dataset they search through).
I can't tell what queries you're talking about but on most sites 10 is not much at all.
If you're concerned with performance, you can always see how long your queries take to execute in a database management program, such as MySQL Workbench.
10 fast queries can be better than 1 slow one. Define what's acceptable in terms of response time, throughput, in normal and peek traffic conditions, and measure if these 10 queries are a problem or not (i.e. don't respect your expectations).
If they are, then try to change your design and find a better solution.
How many queries are too many?
I will rephrase your question:
Is my app fast enough?
Come up with a business definition of "fast enough" for your application (based on business/user requirements), come up with a way to model all your usage scenarios and expected load, create simulations of that load and profile (trace/time) it.
This approach amounts to an educated guess. Anything short of it is pure speculation, and worthless.
If your application is already in production, and is working well in most cases, you can get feedback from users to determine pain points. From there, you can model those pain points and corresponding load, and profile.
Document your results. Once you make improvements to your application, you have a tool to determine if the optimizations you made achieved your goals.
When new to development as I assume you are. I recommend focusing on the most logical and obvious way to avoid over-processing. That is usually the avoidance of repeating a query by caching its first execution and checking for cached results before running queries.
After that don't spend too much time thinking about the number of queries and focus on well-written code. That means a good use of classes, methods and functions. While still having much to learn, you do not want to over-complicate every interaction with the database.
Enjoy what you are doing and keep it neat. That will result in easier to debug code which in itself can lead to better performance when you have the knowledge to take your code further. The performance of an application can be improved very quickly if the original work is well-written.
It depends on how much CPU cycles will the sum of the queries use.
1 query can consume way more CPU cycles than 100. It all depends on their contents.
You could begin by optimizing them following this guide: http://beginner-sql-tutorial.com/sql-query-tuning.htm
I think its not a problem. 10 Queries are not so much for a site. Less is better no question but when you have 3000 - 5000 then you should think about your structure.
And when you go in one query through a table with millions of rows without an index then are 10 to much.
I have seen a Typo3 site with a lot of extensions that make 7500 requests with the cache. This happens when you install and install and don't look at what happens.
But you can look that you make logical JOIN's over the tables that you have less queries.
Well there are big queries and small trivial queries. Which ones are yours? Generally, you should try to fetch the data in as few queries as possible. The heavier the load is on the database server the harder it will be to serve the clients as the traffic increases.
Just to add a bit of a different perspective to the other good answers:
First, to concur, the type and complexity of queries you are making will matter more 99% of the time than the number of queries.
However, in the rare situation where there is high latency on the network path to your database server (i.e. the db server is remote or such, not saying this is a logical or sane setup, but I have seen it done) then you want to minimize the number of queries done, because every single time you talk to the database server the network transmission time will take an order of magnitude or two longer than it takes to compute the query. This situation can really kill your page loading times, and so you'd really want to minimize the number of queries (actually, you just want to change your server setup...).