Is there a way to create a multi column index across tables?
For example, if I had the following tables:
Foo (Table Name)
FooID (PK)
BarID (FK)
FooName
Bar (Table Name)
BarID (PK)
BarName
I can do a
SELECT *
FROM Foo
LEFT JOIN Bar ON Foo.BarID = Bar.BarID
WHERE
FooName < "Bob"
AND BarName > "Smith";
In this case, I want a multi column index against Foo.FooName then Bar.BarName.
I did some research but wasn't able to find anything, perhaps I'm not using the right terms. My question may depend on the SQL engine, in which case I'm interested in MySQL specifically, but I am interested in any other engines as well.
Doing the multi column index on Foo with the Foreign Key doesn't help, as the underlying value of its Name is what I want for the speed.
Came across my own post years later and figured I could add some details, in case others have similar issues. As Mark B pointed out, each index is per table, however we can set things up that make this efficent, see below.
There are a couple of different things going on here, so can use indexes to help accomplish what we need. We need an index to help filter the main table, then an index that works well for the join and filter of the 2nd table. To help accomplish this, we can create the following 2 indexes:
CREATE INDEX idx_fooname ON Foo (FooName);
CREATE INDEX idx_barid_barname ON Bar (BarID, BarName);
Once those indexes are in place, a query can be used like:
SELECT *
FROM Foo USE INDEX(idx_fooname)
LEFT JOIN Bar USE INDEX (idx_barid_barname) ON Foo.BarID = Bar.BarID
WHERE
FooName < "Bob"
AND BarName > "Smith";
Smells like "over-normalization". Might it be worth moving those two fields into the same table?
Akiban was an Engine that could do cross-table JOINs, etc. But it no longer exists.
"Materialized Views" do not exist in MySQL (unless you implement them yourself).
As xQbert mentioned, you can use materialized views (or indexed views if you use Microsft SQL Server).
But first you have to change your LEFT JOIN to a INNER JOIN (because of materialized/indexed views can't handle outer joins. The left join does not seem to make sense in your query because of the WHERE ... BarName > "Smith".
That link might help you if you use SQL Server: https://www.simple-talk.com/sql/learn-sql-server/sql-server-indexed-views-the-basics/
After you created a materialized/indexed view, you can query the view directly (i'm not sure if the query optimizer will use it automatically).
Be aware that the materialized/indexed view will reduce your performance when you INSERT, DELETE or UPDATE into the used tables (what every index will do, too). The best idear is only to add the realy neccesary fields to the materialized/indexed view.
Related
I have 2 tables and I need to join them. Unfortunately, I dont have ids that I could use, the only criteria are some varchar column. This is the "on" part of the join:
join sales_flat_order_address sfoa on
concat(lower(trim(sfoa.firstname)), lower(trim(sfoa.lastname))) = concat(lower(trim(so.firstname)), lower(trim(so.lastname)))
I know that this is not the fastest way, but is there a way to spped this up a bit more? Or maybe a workaround I cant think of right now?
Thanks!
Indexing firstname and lastname columns would be a start.
First of all though I would run your query with EXPLAIN in front of it and see if there are any indexes you may already have and think are being used which actually might not be being used.
Secondly, JOINing on varchar is never going to be 'super' quick compared to joining on an int for example.
http://dev.mysql.com/doc/refman/5.5/en/explain.html
You could try to use this:
ON trim(sfoa.firstname) = trim(so.firstname)
AND trim(sfoa.lastname) = trim(so.lastname)
of course, you could try to index firstname and lastname in both tables.
You could add an "and" between first name and last name so as to avoid the concat. Also you can use indexes on these columns (firstname and lastname) so as to speed thing a lot (especially if you use the comparison a lot).
join sales_flat_order_address sfoa on
trim(sfoa.firstname) = trim(so.firstname) and trim(sfoa.lastname) = trim(so.lastname)
This query is essentially doing a cross join on the table, and then matching the condition. To fix this, you could:
Add a new column into each table called full name.
Give this the value of something like:
concat(lower(trim(firstname)), lower(trim(lastname))).
Build an index on the value.
Actually, you could do this on only one of the tables, and MySQL will use the index for the comparison (still requiring a full table scan on the first table.
You could also have a "full names" table, and use a foreign key from each of these tables to get the full name.
Indexing the names independently won't affect the query. The names are being accessed inside functions, which generally turns off the ability to use an index.
At a minimum, you should create an index on firstname and lastname in both tables. Also, run an explain on your query. That can show you inefficiencies.
I have a very large number of rows in my table, table_1. Sometimes I just need to retrieve a particular row.
I assume, when I use SELECT query with WHERE clause, it loops through the very first row until it matches my requirement.
Is there any way to make the query jump to a particular row and then start from that row?
Example:
Suppose there are 50,000,000 rows and the id which I want to search for is 53750. What I need is: the search can start from 50000 so that it can save time for searching 49999 rows.
I don't know the exact term since I am not expert of SQL!
You need to create an index : http://dev.mysql.com/doc/refman/5.1/en/create-index.html
ALTER TABLE_1 ADD UNIQUE INDEX (ID);
The way I understand it, you want to select a row with id 53750. If you have a field named id you could do this:
SELECT * FROM table_1 WHERE id = 53750
Along with indexing the id field. That's the fastest way to do so. As far as I know.
ALTER table_1 ADD UNIQUE INDEX (<collumn>)
Would be a great first step if it has not been generated automatically. You can also use:
EXPLAIN <your query here>
To see which kind of query works best in this case. Note that if you want to change the where statement (anywhere in the future) but see a returning value in there it will be a good idea to put an index on that aswell.
Create an index on the column you want to do the SELECT on:
CREATE INDEX index_1 ON table_1 (id);
Then, select the row just like you would before.
But also, please read up on databases, database design and optimization. Your question is full of false assumptions. Don't just copy and paste our answers verbatim. Get educated!
There are several things to know about optimizing select queries like Range and Where clause Optimization, the documentation is pretty informative baout this issue, read the section: Optimizing SELECT Statements. Creating an index on the column you evaluate is very helpfull regarding performance too.
One possible solution You can create View then query from view. here is details of creating view and obtain data from view
http://www.w3schools.com/sql/sql_view.asp
now you just split that huge number of rows into many view (i. e row 1-10000 in one view then 10001-20000 another view )
then query from view.
I am pretty sure that any SQL database with a little respect for themselves does not start looping from the first row to get the desired row. But I am also not sure how they makes it work, so I can't give an exact answer.
You could check out what's in your WHERE-clause and how the table is indexed. Do you have a proper primary key? Like using a numeric data type for that. Do you have indexes on more columns, that is used in your queries?
There is also alot to concider when installing the database server, like where to put the data and log files, how much memory to give the server and setting the growth. There's a lot you can do to tune your server.
You could try and split your tables in partitions
More about alter tables to add partitions
Selecting from a specific partition
In your case you could create a partition on ID for every 50.000 rows and when you want to skip the first 50.000 you just select from partition 2. How to do this ies explained quite well in the MySQL documentation.
You may try simple as this one.
query = "SELECT * FROM tblname LIMIT 50000,0
i just tried it with phpmyadmin. WHERE the "50,000" is the starting row to look up.
EDIT :
But if i we're you i wouldn't use this one, because it will lapses the 1 - 49999 records to search.
I have a table, tblNoComp, that has two columns, both foreign keys pointing to tblPackage.ID. The purpose of tblNoComp is to store which packages are not compatible with each other, by simply storing the ID of those packages in two columns, OneID and TwoID.
May not be the best way of storing it, but since multiple packages aren't compatible with others, it seemed to be the most logical.
Attempting to create a view that shows the tblPackage.Name for the two side by side - I have the following, but unsure how to get the TwoID Package Name..
SELECT tblNoComp.OneID, tblPackages.Package,tblNoComp.TwoID,tblPackages.Package
FROM tblNoComp, tblPackages
WHERE (tblNoComp.OneID = tblPackages.PID)
Currently the second tblPackages.Package is simply showing OneID name, not TwoID.. Not sure how to resolve?
Thank you!
--Apologies if a simple question, I've searched for an hour but haven't quite been able to describe my problem correctly.
The code you have in your comment:
SELECT
tblNoComp.OneID,
tblPackages.Package AS OneIDPackageName,
tblNoComp.TwoID,
tblPackages.Package AS TwoIDPackageName
FROM
tblNoComp
LEFT JOIN tblPackages
ON tblNoComp.OneID = tblPackages.PID
Is aliasing the columns instead of the tables. The idea behind the aliasing is to JOIN the same table twice as two different tables, using two different aliases. You're only joining it once and trying to use it twice.
You probably intent something more like this:
SELECT
tblNoComp.OneID,
tblOnePackages.Package AS OneIDPackageName,
tblNoComp.TwoID,
tblTwoPackages.Package AS TwoIDPackageName
FROM
tblNoComp
LEFT JOIN tblPackages AS tblOnePackages
ON tblNoComp.OneID = tblOnePackages.PID
LEFT JOIN tblPackages AS tblTwoPackages
ON tblNoComp.TwoID = tblTwoPackages.PID
(Note that I don't have a MySQL syntax checker handy, so this may need to be tweaked in order to run properly.)
Note that the same table is joined twice on two different keys, and that each time it's given a different alias so that it can be referenced within the SELECT clause as two separate tables.
Let's look at two tables, for example tables Post and User
In Post table, column user_id is foreign key.
Table Post
-id
-user_id
-post
Table User
-userID
-username
If we want to get poster's username, we have to use join query and get it from User table.
Wouldn't it be easier that we add extra column in Post table for storing posters' usernames in order to simplify SQL queries. In this case, Post table would have user_id and username columns (username column is redundant) but that would eliminate join queries for catching posters' usernames.
What is the best choice, to store usernames in Post table or not
Creating a view is an opition: and ust query that view. However, views in msql don't offer as much benefit in terms of performance as they once did.
It might be that just running the join is fine. Depending on the amount of data you will eventually store you may need to consider other means to improve performance such as partitioning tables etc etc. - but that is for down-the-line
Adding redundant data to the DB is not a good choice.
If you really have to simplify your select queries then create a view for that case. That way the data is in one place and you don't need to join every time. But a simple join is actually not that big a deal.
I noticed te other day I can joins in mysql just as easily by doing,
SELECT peeps, persons, friends FROM tablea JOIN tableb USING (id) WHERE id = ?
In stead of using,
SELECT a.peeps, a.persons, b.friends FROM tablea a JOIN tableb b USING (id) WHERE id = ?
It only works if there is no matching column names, why should I do the second rather than the first?
No, you don't need to, but in my humble opinion you really should. It's almost always better in my experience to be explicit with what you're trying to do.
Consider the feelings of the poor guy (or girl) who has to come behind you and try to figure out what you were trying to accomplish and in which tables each column resides. Explicitly stating the source of the column allows one to look at the query and glean that information without deep knowledge of the schema.
Query 1 will work (as long as there are no ambiguous column names).
Query 2 will
be clearer
be more maintainable (think of someone who doesn't know the database schema by heart)
survive the addition of an ambiguous column name to one of the tables
So, don't be lazy because of that pitiful few saved keystrokes.
It's not necessary if you have no duplicate column names. If you do, the query will fail.