Counting rows in multiple tables cause large delay - mysql

I have 3 tables with mainly string data and unique id column:
categories ~45 rows
clientfuncs ~800 rows
serverfuncs ~600 rows
All tables have unique primary AI column 'id'.
I try to count rows in one query:
SELECT COUNT(categories.id), COUNT(serverfuncs.id), COUNT(clientfuncs.id) FROM categories, serverfuncs, clientfuncs
It takes 1.5 - 1.7 s.
And when I try
SELECT COUNT(categories.id), COUNT(serverfuncs.id) FROM categories, serverfuncs
or
SELECT COUNT(categories.id), COUNT(clientfuncs.id) FROM categories, clientfuncs
or
SELECT COUNT(clientfuncs.id), COUNT(serverfuncs.id) FROM clientfuncs, serverfuncs
, it takes 0.005 - 0.01 s. (as it should be)
Can someone explain, what is the reason for this?

You're doing a cross join of 45*800*600 rows, you'll notice that when you check the result of the counts :-)
Try this instead:
SELECT
(SELECT COUNT(*) FROM categories),
(SELECT COUNT(*) FROM serverfuncs),
(SELECT COUNT(*) FROM clientfuncs);

The queries are doing cartesian product since no join condition is applied so:
1 query : 800*600*45 = 21,6 mil
2 query : 45*600 = 27 k
3 query : 45*800 ...

It's because your query is joining the tables (the commas in the last part of the query are shorthand for a join) rather than counting them individually. So your queries with only two tables will be quicker.

First of all, do you really want to use three tables in the FROM clause to compute counts that are specific to each table? This will cause the SELECT statement to produce a Cartesian product of the three tables which will result in a total number of rows of 45 x 800 x 600 from which counts are computed. Hence many duplicates of categories.id values will be counted and so are the other counts. In any case if you use first two tables in the FROM clause, the Cartesian product will contain only 45 X 800 rows which is much less than the rows the three tables produce. Hence the queries with two tables are much faster. Primary keys are of no use in this cases.
Better use three different statements to get count from each table.
If you still insist on getting counts at one shot, you may use the following syntax:
SELECT (SELECT COUNT(categories.id) FROM categories),
(SELECT COUNT(serverfuncs.id) FROM serverfuncs),
(SELECT COUNT(clientfuncs.id) FROM clientfuncs);
if your RDBMS supports SELECT statements without FROM clause. These will give correct counts and would be very fast.

Related

Mysql join query with where condition and distinct records

I have two tables called tc_revenue and tc_rates.
tc_revenue contains :- code, revenue, startDate, endDate
tc_rate contains :- code, tier, payout, startDate, endDate
Now I need to get records where code = 100 and records should be unique..
I have used this query
SELECT *
FROM task_code_rates
LEFT JOIN task_code_revenue ON task_code_revenue.code = task_code_rates.code
WHERE task_code_rates.code = 105;
But I am getting repeated records help me to find the correct solution.
eg:
in this example every record is repeated 2 time
Thanks
Use a group by for whatever field you need unique. For example, if you want one row per code, then:
SELECT * FROM task_code_rates LEFT JOIN task_code_revenue ON task_code_revenue.code = task_code_rates.code
where task_code_rates.code = 105
group by task_code_revenue.code, task_code_revenue.tier
If code admits duplicates in both tables and you perform join only using code, then you will get the cartessian product between all matching rows from one table and all matching rows from the other.
If you have 5 records with code 100 in first table and 2 records with code 100 in second table, you'll get 5 times 2 results, all combinations between matching rows from the left and the right.
Unless you have duplicates inside one (or both) tables, all 10 results will differ in colums coming either from one table, the other or both.
But if you were expecting to get two combined rows and three rows from first table with nulls for second table columns, this will not happen.
This is how joins work, and anyway, how should the database decide which rows to combine if it didn't generate all combinations and let you decide in where clause?
Maybe you need to add more criteria to the ON clause, such as also matching dates?

Specific where clause in Mysql query

So i have a mysql table with over 9 million records. They are call records. Each record represents 1 individual call. The columns are as follows:
CUSTOMER
RAW_SECS
TERM_TRUNK
CALL_DATE
There are others but these are the ones I will be using.
So I need to count the total number of calls for a certain week in a certain Term Trunk. I then need to sum up the number of seconds for those calls. Then I need to count the total number of calls that were below 7 seconds. I always do this in 2 queries and combine them but I was wondering if there were ways to do it in one? I'm new to mysql so i'm sure my syntax is horrific but here is what I do...
Query 1:
SELECT CUSTOMER, SUM(RAW_SECS), COUNT(*)
FROM Mytable
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;
Query 2:
SELECT CUSTOMER, COUNT(*)
FROM Mytable2
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2') AND RAW_SECS < 7
GROUP BY CUSTOMER;
Is there any way to combine these two queries into one? Or maybe just a better way of doing it? I appreciate all the help!
There are 2 ways of achieving the expected outcome in a single query:
conditional counting: use a case expression or if() function within the count() (or sum()) to count only specific records
use self join: left join the table on itself using the id field of the table and in the join condition filter the alias on the right hand side of the join on calls shorter than 7 seconds
The advantage of the 2nd approach is that you may be able to use indexes to speed it up, while the conditional counting cannot use indexes.
SELECT m1.CUSTOMER, SUM(m1.RAW_SECS), COUNT(m1.customer), count(m2.customer)
FROM Mytable m1
LEFT JOIN Mytable m2 ON m1.id=m2.id and m2.raw_secs<7
WHERE TERM_TRUNK IN ('Mytrunk1', 'Mytrunk2')
GROUP BY CUSTOMER;

How do I query for the sum of averages?

I have data that resemble stock data that is being updated every hour. So there are 24 entries every day for each stock. (just using stock as an example). But sometimes, the data may not be updated.
For example, let's assume we have 3 stocks, A, B, C. And assume that we gather data at various intervals during the day for each stock. The data would look something like this...
row A B C
1 3 4 5
2 3.5 4.1 5
3 2.9 3.8 4.3
What I want is to sum up the average value of each stock for this time period or
Avg(A) + Avg(B) + Avg(C)
In reality I have hundreds of stocks and hundreds of thousands of rows. I need this to calculate for a single day.
I tried this (stock names are in an array - stocks = array('A','B','C'))
SELECT SUM(AVG(stock_price)) FROM table WHERE date = [mydate] AND stock_name IN () ('".implode("','", $stocks)."') GROUP BY stock_name
but that didn't work. Can someone provide some insight?
Thanks, in advance.
Calculate the per-stock averages in a sub-query, then sum them in the main query.
SELECT SUM(average_price) AS total_averages
FROM (SELECT AVG(price) AS average_price)
FROM table
WHERE <conditions>
GROUP BY stock_name) AS averages
One way to do it, use an inline view as a rowsource:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM ( SELECT AVG(t.stock_price) AS avg_stock_price
FROM table t
WHERE t.date = [mydate]
AND t.stock_name IN ('a','b','c')
GROUP BY t.stock_name
) a
You can run just the query from the inline view (aliased as a) to get verify the results it returns. The outer query runs against the set of rows returned by the inline view query. (MySQL refers to the inline view (aliased as a) as a "derived table".
The outer query is effectively like this:
SELECT SUM(a.avg_stock_price) AS sum_avg_stock_price
FROM a
The "trick" is that "a" isn't a regular table, it's a set of rows returned by a query; but in terms of relational algebra theory, it works the same... it's a set or rows. If a were a regular table, we could write:
SELECT b.col
FROM (
SELECT col FROM a
) b
We don't want to do that in MySQL when we don't have to, because of the inefficient way that MySQL processes that. MySQL first runs the inner query (the query in the inline view). MySQL creates a temporary MyISAM table, and inserts the rows returned by the query into the temporary MyISAM table. MySQL then runs the outer query, against that temporary table (which MySQL refers to as a "derived table") to return the result. Creating and populating a temporary table that's a copy of a regular table is a lot of overhead, especially with large sets.
What makes this powerful is that inline view query can include JOINs, WHERE clause, aggregates, GROUP BY, whatever. As long as it returns a set of rows (with appropriate column names), we can wrap the query in parens, and reference it in another query like it was a table.

Valid SQL without JOIN?

I came across the following SQL statement and I was wondering if it was valid:
SELECT COUNT(*)
FROM
registration_waitinglist,
registration_registrationprofile
WHERE
registration_registrationprofile.activation_key = "ALREADY_ACTIVATED"
What does the two tables separated by a comma mean?
When you SELECT data from multiple tables you obtain the Cartesian Product of all the tuples from these tables. It can be illustrated in the following way:
This means you get each row from the first table paired with all the rows from the second table. Most of the time, it is not what you want. If you really want it, then it's clearer to use the CROSS JOIN notation:
SELECT * FROM A CROSS JOIN B;
In this context, it means that you are going to be joining every row from registration_waitinglist to every row in registration_registrationprofile
It's called a cartesian join
That query is 'syntactically' correct, meaning it will run. What the query will return is the entire product of every row in registration_waitinglist x registration_registrationprofile.
For example, if there were 2 rows in waitinglist and 3 rows in profile, then 6 rows will be returned.
From a practical matter, this is almost always a logical error and not intended. With rare exception, there should be either join criteria or criteria in the where clause.

speed up SELECT DISTINCT using keys

If I use the SELECT DISTINCT query on a table with 100 rows where 98 entries of the table are identical and the other 2 are identical, would it still go through all 100 rows just to return the 2 distinct results?
Is there a way to use indexing/keys etc so that instead of going through all 100 rows, it would instead go through 2 rows?
####EDIT#####
so I added this index:
KEY `column` (`column`(1)),
but then when I do
EXPLAIN SELECT DISTINCT column FROM tablename
it's still saying that it's going through all rows rather than just distinct ones
Creating an index on the column or set of columns being queried with DISTINCT will speed up the query. Rather than looking through every row it will use the two entries in the index. With only 100 rows though, the difference may not even be detectable.
I am working on almost similar thing. I am trying to get the distinct values from a table with 400Mill rows.
I even have the key on that attribute. It is still doing the full scan. the only difference is is that it is full index scan rather than a disc scan.
i have only 10 distinct values but i didnt resulted even after 5minutes and i killed it.