mysql group by performance tuning - mysql

select field1,count(*) from table where $condition group by field1
select field2,count(*) from table where $condition group by field2
Basically that's what I'm doing the job now,is there a way to optimize the performance so that MySQL doesn't need to search two times to group by for the where clause?

If the table is large and $condition is 'rare', it might help to create a temporary table in memory. This way, you group it twice but filter it only once.
CREATE TEMPORARY TABLE temp ENGINE=MEMORY
select field1,field2 from table where $condition;
select field1,count(*) from temp group by field1;
select field1,count(*) from temp group by field2;

No magic bullet here... the key upon which the aggregation takes place is distinct, so SQL needs to iterated over two different lists. To make this a bit more evident, ask yourself how you would like the data returned: all field1 first, then all field 2, or intertwined, or possibly "pivoted" (but how?..)
To avoid an extra "trip" to the server we could get these two results set returned together, or we could even group the two, using UNION ALL (and being careful to add a prefix column to to know what is what), but this latter solution would end up taxing the server a bit more if anything.

Related

Pull records from one table where 1 variable exists in a second table? Very large tables

I am completely new to database coding, and I've tried Googling but cannot seem to figure this out. I imagine there's a simple solution. I have a very large table with MemberIDs and a few other relevant variables that I want to pull records from (table1). I also have a second large table of distinct MemberIDs (table2). I want to pull rows from table 1 where the MemberID exists in table2.
Here’s how I tried to do it, and for some reason I suspect this isn’t working correctly, or there may be a much better way to do this.
proc sql;
create table tablewant as select
MemberID, var1, var2, var3
from table1
where exists (select MemberID from table2)
;
quit;
Is there anything wrong with the way I’m doing this? What's the best way to solve this when working with extremely large tables (over 100 million records)? Would doing some sort of join be better? Also, do I need to change
where exists (select MemberID from table2)
to
where exists (select MemberID from table2 where table1.MemberID = table2.MemberID)
?
You want to implement a "semi-join". You second solution is correct:
select MemberID, var1, var2, var3
from table1
where exists (
select 1 from table2 where table1.MemberID = table2.MemberID
)
Notes:
There's no need to select anything special in the subquery since it's not checking for values, but for row existence instead. For example, 1 will do, as well as *, or even null. I tend to use 1 for clarity.
The query needs to access table2 and this should be optimized specially for such large tables. You should consider adding the index below, if you haven't created it already:
create index ix1 on table2 (MemberID);
The query does not have a filtering criteria. That means that the engine will read 100 million rows and will check each one of them for the matching rows in the secondary table. This will unavoidably take a long time. Are you sure you want to read them all? Maybe you need to add a filtering condition, but I don't know your requirements in this respect.

Subquery for fetching table name

I have a query like this :
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo WHERE group='BEST'
Basically, the table name for the main query is fetched through the subquery.
I get an error that #1054 - Unknown column 'group' in 'where clause'
When I investigate (removing the where clause), I find that the query only returns the subquery result at all times.
Subquery table adm_linkedfields has structure id | name | linktable
Currently am using MySQL with PDO but the query should be compatible with major DBs (viz. Oracle, MSSQL, PgSQL and MySQL)
Update:
The subquery should return the name of the table for the main query. In this case it will return tbl_company
The table tbl_company for the main query has this structure :
id | name | group
Thanks in advance.
Dynamic SQL doesn't work like that, what you created is an inline-view, read up on that. What's more, you can't create a dynamic sql query that will work on every db. If you have a limited number of linktables you could try using left-joins or unions to select from all tables but if you don't have a good reason you don't want that.
Just select the tablename in one query and then make another one to access the right table (by creating the query string in php).
Here is an issue:
SELECT * FROM (SELECT linktable FROM adm_linkedfields WHERE name = 'company') as cbo
WHERE group='BEST';
You are selecting from DT which contains only one column "linktable", then you cant put any other column in where clause of outer block. Think in terms of blocks the outer select is refering a DT which contains only one column.
Your problem is similar when you try to do:
create table t1(x1 int);
select * from t1 where z1 = 7; //error
Your query is:
SELECT *
FROM (SELECT linktable
FROM adm_linkedfields
WHERE name = 'company'
) cbo
WHERE group='BEST'
First, if you are interested in cross-database compatibility, do not name columns or tables after SQL reserved words. group is a really, really bad name for a column.
Second, the from clause is returning a table containing a list of names (of tables, but that is irrelevant). There is no column called group, so that is the problem you are having.
What can you do to fix this? A naive solution would be to run the subquery, run it, and use the resulting table name in a dynamic statement to execute the query you want.
The fundamental problem is your data structure. Having multiple tables with the same structure is generally a sign of a bad design. You basically have two choices.
One. If you have control over the database structure, put all the data in a single table, linktable for instance. This would have the information for all companies, and a column for group (or whatever you rename it). This solution is compatible across all databases. If you have lots and lots of data in the tables (think tens of millions of rows), then you might think about partitioning the data for performance reasons.
Two. If you don't have control over the data, create a view that concatenates all the tables together. Something like:
create view vw_linktable as
select 'table1' as which, t.* from table1 t union all
select 'table2', t.* from table2 t
This is also compatible across all databases.

How can I optimize large table in mysql?

I have a table with nearly 30 M records and size is 6.6 GB. I need to query some data from it and use group by and order by. It takes me too long to query the data, I lost connection to DB so many times...
I have index on all necessary fields as key and composite key. What else can I do to make it faster for the query?
Example query:
select id, max(price), avg(order) from table group by id, date order by id, location.
use EXPLAIN query, where query is your query. For example: EXPLAIN select * from table group by id, date order by id, location.
You'll see a table where MySQL analyses your query and shows which indices it looks for. Possibly you don't have sufficient (god enough) indices.
I don't think you can. With no filter (WHERE clause) and AVG the entire tables has to be read.
The only thing I can think of is to have a new table with ID, AVG_ORDER, MAX_PRICE (or whatever you need) and update that using a trigger or stored procedure when you insert/update new rows.
an index on ID,PRICE index might help you if you didn't need that pesky average.
Indexing isn't going to do you any good. You're averaging a column, so you have to read every row in the table. That's going to take time.

MYSQL: how to find entries corresponding to MIN() of costly function

I am running a complicated and costly query to find the MIN() values of a function grouped by another attribute. But I don't just need the value, I need the entry that produces it + the value.
My current pseudoquery goes something like this:
SELECT MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) FROM (prefiltering) as a GROUP BY a.group_att;
but I want a.* and MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) as my result.
The only way I can think of is using this ugly beast:
SELECT a1.*, COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2)
FROM (prefiltering) as a1
WHERE COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) =
(SELECT MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) FROM (prefiltering) as a GROUP BY a.group_att)
But now I am executing the prefiltering_query 2 times and have to run the costly function twice. This is ridiculous and I hope that I am doing something seriously wrong here.
Possible solution?:
Just now I realize that I could create a temporary table containing:
(SELECT a1.*, COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) as complex FROM (prefiltering) as a1)
and then run the MIN() as subquery and compare it at greatly reduced cost. Is that the way to go?
A problem with your temporary table solution is that I can't see any way to avoid using it twice in the same query.
However, if you're willing to use an actual permanent table (perhaps with ENGINE = MEMORY), it should work.
You can also move the subquery into the FROM clause, where it might be more efficient:
CREATE TABLE temptable ENGINE = MEMORY
SELECT a1.*,
COSTLY_FUNCTION(a1.att1,a1.att2,$v1,$v2) AS complex
FROM prefiltering AS a1;
CREATE INDEX group_att_complex USING BTREE
ON temptable (group_att, complex);
SELECT a2.*
FROM temptable AS a2
NATURAL JOIN (
SELECT group_att, MIN(complex) AS complex
FROM temptable GROUP BY group_att
) AS a3;
DROP TABLE temptable;
(You can try it without the index too, but I suspect it'll be faster with it.)
Edit: Of course, if one temporary table won't do, you could always use two:
CREATE TEMPORARY TABLE temp1
SELECT *, COSTLY_FUNCTION(att1,att2,$v1,$v2) AS complex
FROM prefiltering;
CREATE INDEX group_att_complex ON temp1 (group_att, complex);
CREATE TEMPORARY TABLE temp2
SELECT group_att, MIN(complex) AS complex
FROM temp1 GROUP BY group_att;
SELECT temp1.* FROM temp1 NATURAL JOIN temp2;
(Again, you may want to try it with or without the index; when I ran EXPLAIN on it, MySQL didn't seem to want to use the index for the final query at all, although that might be just because my test data set was so small. Anyway, here's a link to SQLize if you want to play with it; I used CONCAT() to stand in for your expensive function.)
You can use the HAVING clause to get columns in addition to that MIN value. For example:
SELECT a.*, COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2) FROM (prefiltering) as a GROUP BY a.group_att HAVING MIN(COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2)) = COSTLY_FUNCTION(a.att1,a.att2,$v1,$v2);

What is the most efficient MySQL query to find all entries that start with a number?

In a database that has over 1 million entries, occasionally we need to find all rows that have a column name that starts with a number.
This is what currently is being used, but it just seems like there may be a more efficient manner in doing this.
SELECT * FROM mytable WHERE name LIKE '0%' OR name LIKE '1%' OR name ...
etc...
Any suggestions?
select * from table where your_field regexp '^[0-9]'
Hey,
you should add an index with a length of 1 to the field in the db. The query will then be significantly faster.
ALTER TABLE `database`.`table` ADD INDEX `indexName` ( `column` ( 1 ) )
Felix
My guess is that the indexes on the table aren't being used efficiently (if at all)
Since this is a char field of some type, and if this is the primary query on this table, you could restructure your indexes (and my mysql knowledge is a bit short here, somebody help out) such that this table is ordered (clustered index in ms sql) by this field, thus you could say something like
select * from mytable where name < char(57) and name > char(47)
Do some testing there, I'm not 100% on the details of how mysql would rank those characters, but that should get you going.
Another option is to have a new column that gives you a true/false on "starts_with_number". You could setup a trigger to populate that column. This might give the best and most predictable results.
If you're not actually using each and every field in the rows returned, and you really want to wring every drop of efficiency out of this query, then don't use select *, but instead specify only the fields you want to process.
I'm thinking...
SELECT * FROM myTable WHERE IF( LEFT( name, 1) = '#', 1,0)