So I'm working on a data mining project where we're looking at code elements and their relationships and changes to these things over time. What we want is to ask some questions about how often related elements are changed. I've set it up as a view, but it's taking like 10 min to run. I believe the problem is that I'm having to do a lot of subtraction, concatenation, and string comparisons to compare entries (for our window size), but I don't know a good way to fix this. The query looks like
select aw.same
, rw.k
, count(distint concat_ws(',', r1.id, r2.id)) as num
from deltamethoddeclaration dmd1
join revision r1
on r1.id=FKrevID
join methodinvocation mi
on mi.FKcallerID = dmd1.FKMDID
join deltamethoddeclaration dmd2
on mi.FKcalleeID = dmd2.FKMDID
join revision r2
on r2.id = dmd2.FKrevID
join revisionwindow rw
join authorwindow aw
where (dmd1.FKrevID - dmd2.FKrevID) < rw.k
and (dmd2.FKrevID - dmd1.FKrevID) < rw.k
and case aw.same
when 1 then
r1.author = r2.author
when 0 then
r1.author <> r2.author
else
1=1
end
group by aw.same
, rw.k
;
Ok, so revisionwindow stores the revision windows we're interested in (10, 20, 50, 100) and authorwindow stores which author types we want (same, different, and don't care). Part of the problem is, we could have the same revision pair with different elements matching, so the only hack i could come up with was that ugly count(distinct concat()) thing. This should return a table with 12 rows, one for each combination of the author and revision windows. The entries under 'num' are the unique pairs of revisions related in the manner specified (in this case, both change methods and one of the methods calls the other). It works perfectly, it's just crazy slow (~10 min running time). I'm basically looking for any advice or help to make this work better without sacrificing accuracy.
where (dmd1.FKrevID - dmd2.FKrevID) < rw.k
The most damaging about this statement is the less than operator < not the arithmetic. B-trees cannot use this and forces a full table scan every time, any time. Gory details why this true: http://explainextended.com/2010/05/19/things-sql-needs-determining-range-cardinality/
I doubt your CASE statement can be optimized by the backend and <> operator suffers from the same problem as above. I would think about ways to join with = operators, perhaps breaking up the query and using UNION statements so you can always use indexes.
Your not using EXPLAIN. You need to start using it to optimize queries. You have no idea what indexes are being used and what are not, or if your condition is selective enough where they would even be helpful (if its not very selective see the last point) http://dev.mysql.com/doc/refman/5.0/en/explain.html
Since this a data mining application you have a great opportunity to use temp tables of intermediate values. Since the data is probably dumped at periodic intervals (or maybe even only once!) it is easy to rebuild the long running temp table every so often without running the risk of data corruption (Or it may just not matter since you looking for aggregate patterns.)
I have taken queries that were running over 60 minutes and reduced them to less than 100 ms (instant) by building temp tables that cached the hard stuff. If you are not able to use any of the ideas above, this is probably the lowest lying fruit. Take all the 'hard stuff' - case joins and non equality joins and do it one place. Then add an index to your temp table :-) The trick is to make it general enough that you can query the temp table so you still have flexibility to ask different questions.
I suspect the two joins (join revisionwindow rw) and (join authorwindow aw) that do not have an ON condition but use the WHERE, cause this.
How many records do these two tables have? MySQL probably does first a CROSS JOIN on these and only later checks the complex (WHERE) conditions.
But please post the results of EXPLAIN.
--EDIT--
Oops, I missed your last paragraph which explains that the two tables have 4 and 3 rows.
Can you try this:
(where the concat has been replaced
and the where clauses have been moved as JOIN ON ...)
select aw.same
, rw.k
, count(distint r1.id, r2.id) as num
from deltamethoddeclaration dmd1
join revision r1
on r1.id = dmd1.FKrevID
join methodinvocation mi
on mi.FKcallerID = dmd1.FKMDID
join deltamethoddeclaration dmd2
on mi.FKcalleeID = dmd2.FKMDID
join revision r2
on r2.id = dmd2.FKrevID
join revisionwindow rw
on (dmd1.FKrevID - dmd2.FKrevID) < rw.k
and (dmd2.FKrevID - dmd1.FKrevID) < rw.k
join authorwindow aw
on case aw.same
when 1 then
r1.author = r2.author
when 0 then
r1.author <> r2.author
else
1=1
end
group by aw.same
, rw.k
;
Related
I have trawled many of the similar responses on this site and have improved my code at several stages along the way. Unfortunately, this 3-row query still won't run.
I have one table with 100k+ rows and about 30 columns of which I can filter down to 3-rows (in this example) and then perform INNER JOINs across 21 small lookup tables.
In my first attempt, I was lazy and used implicit joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `lookup_table` x 21
WHERE `master_table`.`indexed_col` = "value"
AND `lookup_table`.`id` = `lookup_col` x 21
The query looked to be timing out:
#2013 - Lost connection to MySQL server during query
Following this, I tried being explicit about the joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `master_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `master_table`.`lookup_col` x 21
WHERE `master_table`.`indexed_col` = "value"
Still got the same result. I then realised that the query was probably trying to perform the joins first, then filter down via the WHERE clause. So after a bit more research, I learned how I could apply a subquery to perform the filter first and then perform the joins on the newly created table. This is where I got to, and it still returns the same error. Is there any way I can improve this query further?
SELECT `temp_table`.*, `lookup_table`.`data_point` x 21
FROM (SELECT * FROM `master_table` WHERE `indexed_col` = "value") as `temp_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `temp_table`.`lookup_col` x 21
Is this the best way to write up this kind of query? I tested the subquery to ensure it only returns a small table and can confirm that it returns only three rows.
First, at its most simple aspect you are looking for
select
mt.*
from
Master_Table mt
where
mt.indexed_col = 'value'
That is probably instantaneous provided you have an index on your master table on the given indexed_col in the first position (in case you had a compound index of many fields)…
Now, if I am understanding you correctly on your different lookup columns (21 in total), you have just simplified them for redundancy in this post, but actually doing something in the effect of
select
mt.*,
lt1.lookupDescription1,
lt2.lookupDescription2,
...
lt21.lookupDescription21
from
Master_Table mt
JOIN Lookup_Table1 lt1
on mt.lookup_col1 = lt1.pk_col1
JOIN Lookup_Table2 lt2
on mt.lookup_col2 = lt2.pk_col2
...
JOIN Lookup_Table21 lt21
on mt.lookup_col21 = lt21.pk_col21
where
mt.indexed_col = 'value'
I had a project well over a decade ago dealing with a similar situation... the Master table had about 21+ million records and had to join to about 30+ lookup tables. The system crawled and queried died after running a query after more than 24 hrs.
This too was on a MySQL server and the fix was a single MySQL keyword...
Select STRAIGHT_JOIN mt.*, ...
By having your master table in the primary position, where clause and its criteria directly on the master table, you are good. You know the relationships of the tables. Do the query in the exact order I presented it to you. Don't try to think for me on this and try to optimize based on a subsidiary table that may have smaller record count and somehow think that will help the query faster... it won't.
Try the STRAIGHT_JOIN keyword. It took the query I was working on and finished it in about 1.5 hrs... it was returning all 21 million rows with all corresponding lookup key descriptions for final output, hence still needed a longer duration than just 3 records.
First, don't use a subquery. Write the query as:
SELECT mt.*, lt.`data_point`
FROM `master_table` mt INNER JOIN
`lookup_table` l
ON l.`id` = mt.`lookup_col`
WHERE mt.`indexed_col` = value;
The indexes that you want are master_table(value, lookup_col) and lookup_table(id, data_point).
If you are still having performance problems, then there are multiple possibilities. High among them is that the result set is simply too big to return in a reasonable amount of time. To see if that is the case, you can use select count(*) to count the number of returned rows.
Both SQL, return the same results. The first my joins are on the subqueries the second the final queryis a join with a temporary that previously I create/populate them
SELECT COUNT(*) totalCollegiates, SUM(getFee(c.collegiate_id, dateS)) totalMoney
FROM collegiates c
LEFT JOIN (
SELECT collegiate_id FROM collegiateRemittances r
INNER JOIN remittances r1 USING(remittance_id)
WHERE r1.type_id = 1 AND r1.name = remesa
) hasRemittance ON hasRemittance.collegiate_id = c.collegiate_id
WHERE hasRemittance.collegiate_id IS NULL AND c.typePayment = 1 AND c.active = 1 AND c.exentFee = 0 AND c.approvedBoard = 1 AND IF(notCollegiate, c.collegiate_id NOT IN (notCollegiate), '1=1');
DROP TEMPORARY TABLE IF EXISTS hasRemittance;
CREATE TEMPORARY TABLE hasRemittance
SELECT collegiate_id FROM collegiateRemittances r
INNER JOIN remittances r1 USING(remittance_id)
WHERE r1.type_id = 1 AND r1.name = remesa;
SELECT COUNT(*) totalCollegiates, SUM(getFee(c.collegiate_id, dateS)) totalMoney
FROM collegiates c
LEFT JOIN hasRemittance ON hasRemittance.collegiate_id = c.collegiate_id
WHERE hasRemittance.collegiate_id IS NULL AND c.typePayment = 1 AND c.active = 1 AND c.exentFee = 0 AND c.approvedBoard = 1 AND IF(notCollegiate, c.collegiate_id NOT IN (notCollegiate), '1=1');
Which will have better performance for a few thousand records?
The two formulations are identical except that your explicit temp table version is 3 sql statements instead of just 1. That is, the overhead of the back and forth to the server makes it slower. But...
Since the implicit temp table is in a LEFT JOIN, that subquery may be evaluated in one of two ways...
Older versions of MySQL were 'dump' and re-evaluated it. Hence slow.
Newer versions automatically create an index. Hence fast.
Meanwhile, you could speed up the explicit temp table version by adding a suitable index. It would be PRIMARY KEY(collegiate_id). If there is a chance of that INNER JOIN producing dups, then say SELECT DISTINCT.
For "a few thousand" rows, you usually don't need to worry about performance.
Oracle has a zillion options for everything. MySQL has very few, with the default being (usually) the best. So ignore the answer that discussed various options that you could use in MySQL.
There are issues with
AND IF(notCollegiate,
c.collegiate_id NOT IN (notCollegiate),
'1=1')
I can't tell which table notCollegiate is in. notCollegiate cannot be a list, so why use IN? Instead simply use !=. Finally, '1=1' is a 3-character string; did you really want that?
For performance (of either version)
remittances needs INDEX(type_id, name, remittance_id) with remittance_id specifically last.
collegiateRemittances needs INDEX(remittance_id) (unless it is the PK).
collegiates needs INDEX(typePayment, active, exentFee , approvedBoard) in any order.
Bottom line: Worry more about indexes than how you formulate the query.
Ouch. Another wrinkle. What is getFee()? If it is a Stored Function, maybe we need to worry about optimizing it?? And what is dateS?
It depends actually. You'll have to test performance of every option. On my website I had 2 tables with articles and comments to them. It turned out it's faster to call comment counts 20 times for each article, than using a single union query. MySQL (like other DBs) caches queries, so small simple queries can run amazingly fast.
I did not saw that you have tagged the question as mysql so I initialy aswered for Oracle. Here is what I think about mySQL.
MySQL
There are two options when it comes to temporary tables Memory or Disk. And for Disk you can have MyIsam - non transactional and InnoDB transactional. Of course you can expect better performance for non transactional type of storage.
Additionaly you need to figure out how big resultset are you dealing with. For small resultset the memory option would be faster for large resultset the disk option would be faster.
Again at the end as in my original answer you need to figure out what performance is good enough and go for the most descriptive and easy to read option.
Oracle
It depends on what kind of temporary table you are dealing with.
You can have session based temporary tables - data is held until logout, or transaction based - data is held until commit . On top of this they can support transaction logging or not support it. Depending on configuration you can get better performance from a temporary table.
As everything in the world performance is relative therm. Most probably for few thousand records it will not do significant difference between the two queries. In which case I would go not for the most performant on but for the most easier to read and understand one.
I have an SQL query(see below) that returns exactly what I need but when ran through phpMyAdmin takes anywhere from 0.0009 seconds to 0.1149 seconds and occasionally all the way up to 7.4983 seconds.
Query:
SELECT
e.id,
e.title,
e.special_flag,
CASE WHEN a.date >= '2013-03-29' THEN a.date ELSE '9999-99-99' END as date
CASE WHEN a.date >= '2013-03-29' THEN a.time ELSE '99-99-99' END as time,
cat.lastname,
FROM e_table as e
LEFT JOIN a_table as a ON (a.e_id=e.id)
LEFT JOIN c_table as c ON (e.c_id=c.id)
LEFT JOIN cat_table as cat ON (cat.id=e.cat_id)
LEFT JOIN m_table as m ON (cat.name=m.name AND cat.lastname=m.lastname)
JOIN (
SELECT DISTINCT innere.id
FROM e_table as innere
LEFT JOIN a_table as innera ON (innera.e_id=innere.id AND
innera.date >= '2013-03-29')
LEFT JOIN c_table as innerc ON (innere.c_id=innerc.id)
WHERE (
(
innera.date >= '2013-03-29' AND
innera.flag_two=1
) OR
innere.special_flag=1
) AND
innere.flag_three=1 AND
innere.flag_four=1
ORDER BY COALESCE(innera.date, '9999-99-99') ASC,
innera.time ASC,
innere.id DESC LIMIT 0, 10
) AS elist ON (e.id=elist.id)
WHERE (a.flag_two=1 OR e.special_flag) AND e.flag_three=1 AND e.flag_four=1
ORDER BY a.date ASC, a.time ASC, e.id DESC
Explain Plan:
The question is:
Which part of this query could be causing the wide range of difference in performance?
To specifically answer your question: it's not a specific part of the query that's causing the wide range of performance. That's MySQL doing what it's supposed to do - being a Relational Database Management System (RDBMS), not just a dumb SQL wrapper around comma separated files.
When you execute a query, the following things happen:
The query is compiled to a 'parametrized' query, eliminating all variables down to the pure structural SQL.
The compilation cache is checked to find whether a recent usable execution plan is found for the query.
The query is compiled into an execution plan if needed (this is what the 'EXPLAIN' shows)
For each execution plan element, the memory caches are checked whether they contain fresh and usable data, otherwise the intermediate data is assembled from master table data.
The final result is assembled by putting all the intermediate data together.
What you are seeing is that when the query costs 0.0009 seconds, the cache was fresh enough to supply all data together, and when it peaks at 7.5 seconds either something was changed in the queried tables, or other queries 'pushed' the in-memory cache data out, or the DBMS has other reasons to suspect it needs to recompile the query or fetch all data again. Probably some of the other variations have to do with used indexes still being cached freshly enough in memory or not.
Concluding this, the query is ridiculously slow, you're just sometimes lucky that caching makes it appear fast.
To solve this, I'd recommend looking into 2 things:
First and foremost - a query this size should not have a single line in its execution plan reading "No possible keys". Research how indexes work, make sure you realize the impact of MySQL's limitation of using a single index per joined table, and tweak your database so that each line of the plan has an entry under 'key'.
Secondly, review the query in itself. DBMS's are at their fastest when all they have to do is combine raw data. Using programmatic elements like CASE and COALESCE are by all means often useful, but they do force the database to evaluate more things at runtime than just take raw table data. Try to eliminate such statements, or move them to the business logic as post-processing with the retrieved data.
Finally, never forget that MySQL is actually a rather stupid DBMS. It is optimized for performance in simple data fetching queries such as most websites require. As such it is much faster than SQL Server and Oracle for most generic problems. Once you start complicating things with functions, cases, huge join or matching conditions etc., the competitors are frequently much better optimized, and have better optimization in their query compilers. As such, when MySQL starts becoming slow in a specific query, consider splitting it up in 2 or more smaller queries just so it doesn't become confused, and do some postprocessing in PHP or whatever language you are calling with. I've seen many cases where this increased performance a LOT, just by not confusing MySQL, especially in cases where subqueries were involved (as in your case). Especially the fact that your subquery is a derived table, and not just a subquery, is known to complicate stuff for MySQL beyond what it can cope with.
Lets start that both your outer and inner query are working with the "e" table WITH a minimum requirement of flag_three = 1 AND flag_four = 1 (regardless of your inner query's (( x and y ) or z) condition. Also, your outer WHERE clause has explicit reference to the a.Flag_two, but no NULL which forces your LEFT JOIN to actually become an (INNER) JOIN. Also, it appears every "e" record MUST have a category as you are looking for the "cat.lastname" and no coalesce() if none found. This makes sense at it appears to be a "lookup" table reference. As for the "m_table" and "c_table", you are not getting or doing anything with it, so they can be removed completely.
Would the following query get you the same results?
select
e1.id,
e1.Title,
e1.Special_Flag,
e1.cat_id,
coalesce( a1.date, '9999-99-99' ) ADate,
coalesce( a1.time, '99-99-99' ) ATime
cat.LastName
from
e_table e1
LEFT JOIN a_table as a1
ON e1.id = a1.e_id
AND a1.flag_two = 1
AND a1.date >= '2013-03-29'
JOIN cat_table as cat
ON e1.cat_id = cat.id
where
e1.flag_three = 1
and e1.flag_four = 1
and ( e1.special_flag = 1
OR a1.id IS NOT NULL )
order by
IF( a1.id is null, 2, 1 ),
ADate,
ATime,
e1.ID Desc
limit
0, 10
The Main WHERE clause qualifies for ONLY those that have the "three and four" flags set to 1 PLUS EITHER the ( special flag exists OR there is a valid "a" record that is on/after the given date in question).
From that, simple order by and limit.
As for getting the date and time, it appears that you only want records on/after the date to be included, otherwise ignore them (such as they are old and not applicable, you don't want to see them).
The order by, I am testing FIRST for a NULL value for the "a" ID. If so, we know they will all be forced to a date of '9999-99-99' and time of '99-99-99' and want them pushed to the bottom (hence 2), otherwise, there IS an "a" record and you want those first (hence 1). Then, sort by the date/time respectively and then the ID descending in case many within the same date/time.
Finally, to help on the indexes, I would ensure your "e" table has an index on
( id, flag_three, flag_four, special_flag ).
For the "a" table, index on
(e_id, flag_two, date)
My SQL Query with all the filters applied is returning 10 lakhs (one million) records . To get all the records it is taking 76.28 seconds .. which is not acceptable . How can I optimize my SQL Query which should take less time.
The Query I am using is :
SELECT cDistName , cTlkName, cGpName, cVlgName ,
cMmbName , dSrvyOn
FROM sspk.villages
LEFT JOIN gps ON nVlgGpID = nGpID
LEFT JOIN TALUKS ON nGpTlkID = nTlkID
left JOIN dists ON nTlkDistID = nDistID
LEFT JOIN HHINFO ON nHLstGpID = nGpID
LEFT JOIN MEMBERS ON nHLstID = nMmbHhiID
LEFT JOIN BNFTSTTS ON nMmbID = nBStsMmbID
LEFT JOIN STATUS ON nBStsSttsID = nSttsID
LEFT JOIN SCHEMES ON nBStsSchID = nSchID
WHERE (
(nMmbGndrID = 1 and nMmbAge between 18 and 60)
or (nMmbGndrID = 2 and nMmbAge between 18 and 55)
)
AND cSttsDesc like 'No, Eligible'
AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )
GROUP BY cDistName , cTlkName, cGpName, cVlgName ,
DATE_FORMAT(dSrvyOn , '%m-%Y')
I have searched on the forum and outside and used some of the tips given but it hardly makes any difference . The joins that i have used in above query is left join all on Primary Key and Foreign key . Can any one suggest me how can I modify this sql to get less execution time ....
You are, sir, a very demanding user of MySQL! A million records retrieved from a massively joined result set at the speed you mentioned is 76 microseconds per record. Many would consider this to be acceptable performance. Keep in mind that your client software may be a limiting factor with a result set of that size: it has to consume the enormous result set and do something with it.
That being said, I see a couple of problems.
First, rewrite your query so every column name is qualified by a table name. You'll do this for yourself and the next person who maintains it. You can see at a glance what your WHERE criteria need to do.
Second, consider this search criterion. It requires TWO searches, because of the OR.
WHERE (
(MEMBERS.nMmbGndrID = 1 and MEMBERS.nMmbAge between 18 and 60)
or (MEMBERS.nMmbGndrID = 2 and MEMBERS.nMmbAge between 18 and 55)
)
I'm guessing that these criteria match most of your population -- females 18-60 and males 18-55 (a guess). Can you put the MEMBERS table first in your list of LEFT JOINs? Or can you put a derived column (MEMBERS.working_age = 1 or some such) in your table?
Also try a compound index on (nMmbGndrID,nMmbAge) on MEMBERS to speed this up. It may or may not work.
Third, consider this criterion.
AND DATE_FORMAT(dSrvyOn , '%m-%Y') < DATE_FORMAT('2012-08-01' , '%m-%Y' )
You've applied a function to the dSrvyOn column. This defeats the use of an index for that search. Instead, try this.
AND dSrvyOn >= '2102-08-01'
AND dSrvyOn < '2012-08-01' + INTERVAL 1 MONTH
This will, if you have an index on dSrvyOn, do a range search on that index. My remark also applies to the function in your ORDER BY clause.
Finally, as somebody else mentioned, don't use LIKE to search where = will do. And NEVER use column LIKE '%something%' if you want acceptable performance.
You claim yourself you base your joins on good and unique indexes. So there is little to be optimized. Maybe a few hints:
try to optimize your table layout, maybe you can reduce the number of joins required. That probably brings more performance optimization than anything else.
check your hardware (available memory and things) and the server configuration.
use mysqls explain feature to find bottle necks.
maybe you can create an auxilliary table especially for this query, which is filled by a background process. That way the query itself runs faster, since the work is done before the query in background. That usually works if the query retrieves data that must not neccessarily be synchronous with every single change in the database.
check if an RDBMS is really the right type of database. For many purposes graph databases are much more efficient and offer better performance.
Try adding an index to nMmbGndrID, nMmbAge, and cSttsDesc and see if that helps your queries out.
Additionally you can use the "Explain" command before your select statement to give you some hints on what you might do better. See the MySQL Reference for more details on explain.
If the tables used in joins are least use for updates queries, then you can probably change the engine type from INNODB to MyISAM.
Select queries in MyISAM runs 2x faster then in INNODB, but the updates and insert queries are much slower in MyISAM.
You can create Views in order to avoid long queries and time.
Your like operator could be holding you up -- full-text search with like is not MySQL's strong point.
Consider setting a fulltext index on cSttsDesc (make sure it is a TEXT field first).
ALTER TABLE articles ADD FULLTEXT(cSttsDesc);
SELECT
*
FROM
table_name
WHERE MATCH(cSttsDesc) AGAINST('No, Eligible')
Alternatively, you can set a boolean flag instead of cSttsDesc like 'No, Eligible'.
Source: http://devzone.zend.com/26/using-mysql-full-text-searching/
This SQL has many things that are redundant that may not show up in an explain.
If you require a field, it shouldn't be in a table that's in a LEFT JOIN - left join is for when data might be in the joined table, not when it has to be.
If all the required fields are in the same table, it should be the in your first FROM.
If your text search is predictable (not from user input) and relates to a single known ID, use the ID not the text search (props to Patricia for spotting the LIKE bottleneck).
Your query is hard to read because of the lack of table hinting, but there does seem to be a pattern to your field names.
You require nMmbGndrID and nMmbAge to have a value, but these are probably in MEMBERS, which is 5 left joins down. That's a redundancy.
Remember that you can do a simple join like this:
FROM sspk.villages, gps, TALUKS, dists, HHINFO, MEMBERS [...] WHERE [...] nVlgGpID = nGpID
AND nGpTlkID = nTlkID
AND nTlkDistID = nDistID
AND nHLstGpID = nGpID
AND nHLstID = nMmbHhiID
It looks like cSttsDesc comes from STATUS. But if the text 'No, Eligible' matches exactly one nBStsSttsID in BNFTSTTS then find out the value and use that! If it is 7, take out LEFT JOIN STATUS ON nBStsSttsID = nSttsID and replace AND cSttsDesc like 'No, Eligible' with AND nBStsSttsID = '7'. This would see a massive speed improvement.
I'm looking to take the total time a user worked on each batch at his workstation, the total estimated work that was completed, the amount the user was paid, and how many failures the user has had for each day this year. If I can join all of this into one query then I can use it in excel and format things nicely in pivot tables and such.
EDIT: I've realized that is only possible to do this in multiple queries so I have narrowed my scope down to this:
SELECT batch_log.userid,
batches.operation_id,
SUM(TIME_TO_SEC(ramses.batch_log.time_elapsed)),
SUM(ramses.tasks.estimated_nonrecurring + ramses.tasks.estimated_recurring),
DATE(start_time)
FROM batch_log
JOIN batches ON batch_log.batch_id=batches.id
JOIN ramses.tasks ON ramses.batch_log.batch_id=ramses.tasks.batch_id
JOIN protocase.tblusers on ramses.batch_log.userid = protocase.tblusers.userid
WHERE DATE(ramses.batch_log.start_time) > "2011-01-01"
AND protocase.tblusers.active = 1
GROUP BY userid, batches.operation_id, start_time
ORDER BY start_time, userid ASC
The cross join was causing the problem.
No, in general a Having clause is used to filter the results of your Group by - for example, only reporting those who were paid for more than 24 hours in a day (HAVING SUM(ramses.timesheet_detail.paidTime) > 24). Unless you need to perform filtering of aggregate results, you shouldn't need a having clause at all.
Most of those conditions should be moved into a where clause, or as part of the joins, for two reasons - 1) Filtering should in general be done as soon as possible, to limit the work the query needs to perform. 2) If the filtering is already done, restating it may cause the query to perform additional, unneeded work.
From what I've seen so far, it appears that you're trying to roll things up by the day - try changing the last column in the group by clause to date(ramses.batch_log.start_time), or you're grouping by (what I assume is) a timestamp.
EDIT:
About schema names - yes, you can name them in the from and join sections. Often, too, the query may be able to resolve the needed schemas based on some default search list (how or if this is set up depends on your database).
Here is how I would have reformatted the query:
SELECT tblusers.userid, operations.name AS name,
SUM(TIME_TO_SEC(batch_log.time_elapsed)) AS time_elapsed,
SUM(tasks.estimated_nonrecurring + tasks.estimated_recurring) AS total_estimated,
SUM(timesheet_detail.paidTime) as hours_paid,
DATE(start_time) as date_paid
FROM tblusers
JOIN batch_log
ON tblusers.userid = batch_log.userid
AND DATE(batch_log.start_time) >= "2011-01-01"
JOIN batches
ON batch_log.batch_id = batches.id
JOIN operations
ON operations.id = batches.operation_id
JOIN tasks
ON batches.id = tasks.batch_id
JOIN timesheet_detail
ON tblusers.userid = timesheet_detail.userid
AND batch_log.start_time = timesheet_detail.for_day
AND DATE(timesheet_detail.for_day) = DATE(start_time)
WHERE tblusers.departmentid = 8
GROUP BY tblusers.userid, name, DATE(batch_log.start_time)
ORDER BY date_paid ASC
Of particular concern is the batch_log.start_time = timesheet_detail.for_day line, which is comparing (what are implied to be) timestamps. Are these really equal? I expect that one or both of these should be wrapped in a date() function.
As for why you may be getting unexpected data - you appear to have eliminated some of your join conditions. Without knowing the exact setup and use of your database, I cannot give the exact reason for your results (or even able to say they are wrong), but I think the fact that you join to the operations table without any join condition is probably to blame - if there are 2 records in that table, it will double all of your previous results, and it looks like there may be 12. You also removed operations.name from the group by clause, which may or may not give you the results you want. I would look into the rest of your table relationships, and see if there are any further restrictions that need to be made.