MySQL: How to speed up that join? - mysql

I have 2 tables and I need to join them. Unfortunately, I dont have ids that I could use, the only criteria are some varchar column. This is the "on" part of the join:
join sales_flat_order_address sfoa on
concat(lower(trim(sfoa.firstname)), lower(trim(sfoa.lastname))) = concat(lower(trim(so.firstname)), lower(trim(so.lastname)))
I know that this is not the fastest way, but is there a way to spped this up a bit more? Or maybe a workaround I cant think of right now?
Thanks!

Indexing firstname and lastname columns would be a start.
First of all though I would run your query with EXPLAIN in front of it and see if there are any indexes you may already have and think are being used which actually might not be being used.
Secondly, JOINing on varchar is never going to be 'super' quick compared to joining on an int for example.
http://dev.mysql.com/doc/refman/5.5/en/explain.html

You could try to use this:
ON trim(sfoa.firstname) = trim(so.firstname)
AND trim(sfoa.lastname) = trim(so.lastname)
of course, you could try to index firstname and lastname in both tables.

You could add an "and" between first name and last name so as to avoid the concat. Also you can use indexes on these columns (firstname and lastname) so as to speed thing a lot (especially if you use the comparison a lot).
join sales_flat_order_address sfoa on
trim(sfoa.firstname) = trim(so.firstname) and trim(sfoa.lastname) = trim(so.lastname)

This query is essentially doing a cross join on the table, and then matching the condition. To fix this, you could:
Add a new column into each table called full name.
Give this the value of something like:
concat(lower(trim(firstname)), lower(trim(lastname))).
Build an index on the value.
Actually, you could do this on only one of the tables, and MySQL will use the index for the comparison (still requiring a full table scan on the first table.
You could also have a "full names" table, and use a foreign key from each of these tables to get the full name.
Indexing the names independently won't affect the query. The names are being accessed inside functions, which generally turns off the ability to use an index.

At a minimum, you should create an index on firstname and lastname in both tables. Also, run an explain on your query. That can show you inefficiencies.

Related

SQL way to index across tables?

Is there a way to create a multi column index across tables?
For example, if I had the following tables:
Foo (Table Name)
FooID (PK)
BarID (FK)
FooName
Bar (Table Name)
BarID (PK)
BarName
I can do a
SELECT *
FROM Foo
LEFT JOIN Bar ON Foo.BarID = Bar.BarID
WHERE
FooName < "Bob"
AND BarName > "Smith";
In this case, I want a multi column index against Foo.FooName then Bar.BarName.
I did some research but wasn't able to find anything, perhaps I'm not using the right terms. My question may depend on the SQL engine, in which case I'm interested in MySQL specifically, but I am interested in any other engines as well.
Doing the multi column index on Foo with the Foreign Key doesn't help, as the underlying value of its Name is what I want for the speed.
Came across my own post years later and figured I could add some details, in case others have similar issues. As Mark B pointed out, each index is per table, however we can set things up that make this efficent, see below.
There are a couple of different things going on here, so can use indexes to help accomplish what we need. We need an index to help filter the main table, then an index that works well for the join and filter of the 2nd table. To help accomplish this, we can create the following 2 indexes:
CREATE INDEX idx_fooname ON Foo (FooName);
CREATE INDEX idx_barid_barname ON Bar (BarID, BarName);
Once those indexes are in place, a query can be used like:
SELECT *
FROM Foo USE INDEX(idx_fooname)
LEFT JOIN Bar USE INDEX (idx_barid_barname) ON Foo.BarID = Bar.BarID
WHERE
FooName < "Bob"
AND BarName > "Smith";
Smells like "over-normalization". Might it be worth moving those two fields into the same table?
Akiban was an Engine that could do cross-table JOINs, etc. But it no longer exists.
"Materialized Views" do not exist in MySQL (unless you implement them yourself).
As xQbert mentioned, you can use materialized views (or indexed views if you use Microsft SQL Server).
But first you have to change your LEFT JOIN to a INNER JOIN (because of materialized/indexed views can't handle outer joins. The left join does not seem to make sense in your query because of the WHERE ... BarName > "Smith".
That link might help you if you use SQL Server: https://www.simple-talk.com/sql/learn-sql-server/sql-server-indexed-views-the-basics/
After you created a materialized/indexed view, you can query the view directly (i'm not sure if the query optimizer will use it automatically).
Be aware that the materialized/indexed view will reduce your performance when you INSERT, DELETE or UPDATE into the used tables (what every index will do, too). The best idear is only to add the realy neccesary fields to the materialized/indexed view.

Optimized SELECT query in MySQL

I have a very large number of rows in my table, table_1. Sometimes I just need to retrieve a particular row.
I assume, when I use SELECT query with WHERE clause, it loops through the very first row until it matches my requirement.
Is there any way to make the query jump to a particular row and then start from that row?
Example:
Suppose there are 50,000,000 rows and the id which I want to search for is 53750. What I need is: the search can start from 50000 so that it can save time for searching 49999 rows.
I don't know the exact term since I am not expert of SQL!
You need to create an index : http://dev.mysql.com/doc/refman/5.1/en/create-index.html
ALTER TABLE_1 ADD UNIQUE INDEX (ID);
The way I understand it, you want to select a row with id 53750. If you have a field named id you could do this:
SELECT * FROM table_1 WHERE id = 53750
Along with indexing the id field. That's the fastest way to do so. As far as I know.
ALTER table_1 ADD UNIQUE INDEX (<collumn>)
Would be a great first step if it has not been generated automatically. You can also use:
EXPLAIN <your query here>
To see which kind of query works best in this case. Note that if you want to change the where statement (anywhere in the future) but see a returning value in there it will be a good idea to put an index on that aswell.
Create an index on the column you want to do the SELECT on:
CREATE INDEX index_1 ON table_1 (id);
Then, select the row just like you would before.
But also, please read up on databases, database design and optimization. Your question is full of false assumptions. Don't just copy and paste our answers verbatim. Get educated!
There are several things to know about optimizing select queries like Range and Where clause Optimization, the documentation is pretty informative baout this issue, read the section: Optimizing SELECT Statements. Creating an index on the column you evaluate is very helpfull regarding performance too.
One possible solution You can create View then query from view. here is details of creating view and obtain data from view
http://www.w3schools.com/sql/sql_view.asp
now you just split that huge number of rows into many view (i. e row 1-10000 in one view then 10001-20000 another view )
then query from view.
I am pretty sure that any SQL database with a little respect for themselves does not start looping from the first row to get the desired row. But I am also not sure how they makes it work, so I can't give an exact answer.
You could check out what's in your WHERE-clause and how the table is indexed. Do you have a proper primary key? Like using a numeric data type for that. Do you have indexes on more columns, that is used in your queries?
There is also alot to concider when installing the database server, like where to put the data and log files, how much memory to give the server and setting the growth. There's a lot you can do to tune your server.
You could try and split your tables in partitions
More about alter tables to add partitions
Selecting from a specific partition
In your case you could create a partition on ID for every 50.000 rows and when you want to skip the first 50.000 you just select from partition 2. How to do this ies explained quite well in the MySQL documentation.
You may try simple as this one.
query = "SELECT * FROM tblname LIMIT 50000,0
i just tried it with phpmyadmin. WHERE the "50,000" is the starting row to look up.
EDIT :
But if i we're you i wouldn't use this one, because it will lapses the 1 - 49999 records to search.

What do the dots mean in this SQL query?

I'm new to MySQL. Can anyone describe lines below which I get theme from the demo of the jqgrid, what is the meaning of a.id? What is the meaning of these dots?
$SQL = "SELECT a.id, a.invdate, b.name, a.amount,a.tax,a.total,a.note FROM invheader a, clients b WHERE a.client_id=b.client_id ORDER BY $sidx $sord LIMIT $start , $limit";
You can find the example here:
http://trirand.com/blog/jqgrid/jqgrid.html
in the advanced>Multi select
You've asked several questions here. To address the dots:
In the FROM clause, a is used as an alias for the invheader table. This means you can reference that table by the short alias a instead of the full table name.
Therefore, a.id refers the the id column of the invheader table.
It is generally considered bad practice to simply give your tables the aliases a, b, c, etc. and I would recommend you use something more useful.
I suggest you read some basic MySQL tutorials as this is a fundamental principal.
The dot(.) is used to separate the board scope.So Songs.songId mean that first find the table named Songs and then in the Songs table find the field named songId.
In my opinion, that DOT NOTATION is used for fetching information from right side of the syntax. That means, a.id which means you fetch the data from "a table". In this case, you use the aliases name, then it runs '.id'which means it fetches data 'ID'from a table.If it is wrong, please comment that wrong statement. thank you

Do i really need to include table names or AS in JOINS if columns are different?

I noticed te other day I can joins in mysql just as easily by doing,
SELECT peeps, persons, friends FROM tablea JOIN tableb USING (id) WHERE id = ?
In stead of using,
SELECT a.peeps, a.persons, b.friends FROM tablea a JOIN tableb b USING (id) WHERE id = ?
It only works if there is no matching column names, why should I do the second rather than the first?
No, you don't need to, but in my humble opinion you really should. It's almost always better in my experience to be explicit with what you're trying to do.
Consider the feelings of the poor guy (or girl) who has to come behind you and try to figure out what you were trying to accomplish and in which tables each column resides. Explicitly stating the source of the column allows one to look at the query and glean that information without deep knowledge of the schema.
Query 1 will work (as long as there are no ambiguous column names).
Query 2 will
be clearer
be more maintainable (think of someone who doesn't know the database schema by heart)
survive the addition of an ambiguous column name to one of the tables
So, don't be lazy because of that pitiful few saved keystrokes.
It's not necessary if you have no duplicate column names. If you do, the query will fail.

MYSQL: How to refer to the 1st column

SELECT first_name FROM user WHERE FIRST_COLUMN = '10'
What I need to know is how to reference "FIRST_COLUMN" in MYSQL syntax. The first column can be any name and so I need to make it flexible as long it should get the 1st column of any table. thanks
The short answer is : you can't. (in a portable way, there are 'tricks' to look up the name of the first column and similar workarounds)
Many projects use as convention to name the first column ID and use that as the primary key.
By the query it looks like you are uing it as a primary key.
I recommend reading an introduction to relational databases as this is a rather strange request in the context of a relational database.
The relational model does not care one tiny little bit what order columns are in within a table, nor (without an ordering clause) what order rows are returned.
Your requirement makes little sense. What if the first column were a varchar or a date?
The whole point of having named columns is that you reference them by name.
Now DBMS' often contain metadata in system tables, like DB2's sysibm.systables and sysibm.syscolumns, but you need to extract not just the names but all the other metadata as well (column type, size, nullable, and so on) in order to use them properly. We'd probably understand better what you were after if you told us the reason behind doing this.
SELECT COLUMN_NAME FROM information_schema.COLUMNS
WHERE TABLE_NAME = 'tablename'
AND ORDINAL_POSITION =1