I have a question about SQL indexing in my theory class. It asks me to choose which column should be indexed to optimize these 2 queries: (This is 'paper' question so I am not provided the database to test these indexes with EXPLAIN)
1, First query.
SELECT BranchName, No_Of_Copies
FROM BOOK as B, BOOK_COPIES as BC, LIBRARY_BRANCH as LB
WHERE B.BookId = BC.BookId and BC.BranchId=LB.BranchId and title ="The Lost Tribe";
I have the answer of this one, which is that BOOK.title, BOOK_COPIES.BranchId and LIBRARY_BRANCH.BranchId should be used for indexing. However, I don't really understand why BOOK.BookId and BOOK_COPIES.BookId are not chosen for indexing.
2, Second query
SELECT B.cardNo, Name, Address, COUNT(BookId,BranchId)
FROM BORROWER as B, BOOK_LOANS as BL
WHERE (BL.CardNo=B.CardNo)
GROUP BY B.CardNo, Name, Address
HAVING COUNT(BL.BookId, BranchId)>5;
Would it be optimized if I create index on BOOK_LOANS.CardNo, BORROWER.CardNo, Name and Address ?
That class needs to be updated. Using commas in JOIN is antiquated; the new style uses JOIN .. ON
The question is ambiguous -- what table contains title? I'll assume it is B.
Since the only filtering is on title, The Optimizer will pick B as the first table to look at:
B needs INDEX(title)
From B, it can reach for BC:
BC needs INDEX(BookId)
Similarly:
LB needs INDEX(BranchId)
If you are using MySQL, be aware that a PRIMARY KEY is an index. And every table needs a PRIMARY KEY. Also a PRIMARY KEY is necessarily unique. So, when I said "needs", you may find that there is already a PRIMARY KEY satisfying the need.
More: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
I will quibble with the schema -- why is the "number of books" not simply a column in Books?
As for query 2, it is even less clear which table each column might be in.
Do be aware that an INDEX can contain only columns from a single table.
Related
I have two tables - books and images. The books table has many columns - including id (primary key), name (which is not unique), releasedate, etc. The images table have two columns - id (which is not unique, i.e one book id may have multiple images associated with it, and we need all those images. This column has a non-unique index), and poster (which is unique primary key, all images lie in the same bucket, hence cannot have duplicate names). My requirement is given a book name, find all images associated with it (along with the year of release and the bucketname for each image, the bucketname being just a number in this case).
I am running this query:
select books.id,poster,bucketname,year(releasedate) from books
inner join images where images.bookId = books.id and books.name = "<name>";
A sample result set may look like this:
As you can see there are two results matching - one with id 2 and year 1989, having 5 images, other one with id 261009, year 2013 and one image.
The problem is, the query is extremely slow. It takes around .14 seconds from MySQL console itself, under zero load (in production there may be several concurrent requests and they may be queued, leading to further delay), which is unacceptable for autocomplete. Can anyone tell me how to optimize the query by adding correct indices/keys to the tables? If it is not possible from MySQL, suggestions regarding a proper Redis schema would be useful as well.
Edit: Approx no. of rows in images - 480k, in books - 285k. In future, autocomplete will show result for book authors as well as book names, hence the query will need to expand to take into account a separate table authors where each author will have an id and name, just like a book.
For optimal performance, you want suitable covering indexes available. For example:
... on `books` (`name`,`id`,`releasedate`)
... on `images` (`bookid`,`poster`,`bucketname`)
We want name as the leading column in the index, because of the equality predicate in the WHERE clause. We want id and releasedate also included in the index to make it a "covering index", so the query can be satisfied from the index, without a need to visit pages of the underlying table to retrieve values.
We want bookid as the leading column because of the reference in the ON clause. Again, having poster and bucketname available right in the index make it a "covering" index.
Use EXPLAIN to see the query execution plan.
Also, note that the inner join operation won't return a row from books if a matching row in images is not found. If we want to return a row from books even when no image is available, we could use an outer join.
I'd write the query like this:
SELECT b.id
, i.poster
, i.bucketname
, YEAR(b.releasedate)
FROM books b
LEFT
JOIN images i
ON i.bookid = b.id
WHERE b.name = ?
Suppose I have two tables patient, person
Mysql query is like below.
select fname , lname
from patient p
left join per on (per.person_id=p.person_id)
where p.account_id=2 and (per.fname like 'will%' OR per.lname like 'will%' ).
In case of this query how mysql will use index created on (p.account_id,p.person_id)
person_id is a foreign key from person table in patient table. .
I suspect you do not want LEFT. With LEFT JOIN, you are asking for account #2 whether or not he is named 'will'.
SELECT fname, lname
FROM patient p
JOIN per ON per.person_id = p.person_id
WHERE p.account_id = 2
AND (per.fname LIKE 'will% OR per.lname LIKE 'will%')
will find the full name of account #2 if it is a 'will', else return nothing.
You have not said what indexes you have, so we cannot explain your existing indexes. Please provide SHOW CREATE TABLE for each table.
For either version of the query, these indexes are the only useful ones:
p: INDEX(account_id) -- if it is not already the PRIMARY KEY
per: INDEX(person_id) -- again, if it is not already the PRIMARY KEY.
A PRIMARY KEY is a UNIQUE index.
The first index (or PK) would be a quick lookup to find the row(s) with account_id=2. The second would make the join work well. No index is useful for "will" because of the OR.
The query will look at patient first, then per, using "Nested Loop Join".
Please also provide EXPLAIN SELECT ..., so we can discuss the things I am guessing about.
Here is my Database structure (basic relations):
I'm attempting to formulate a one-line query that will populate the clients_ID, Job_id, tech_id, & Part_id and return back all the work orders present. Nothing more nothing less.
Thus far I've struggled to generate this Query:
SELECT cli.client_name, tech.tech_name, job.Job_Name, w.wo_id, w.time_started, w.part_id, w.job_id, w.tech_id, w.clients_id, part.Part_name
FROM work_orders as w, technicians as tech, clients as cli, job_types as job, parts_list as part
LEFT JOIN technicians as techy ON tech_id = techy.tech_name
LEFT JOIN parts_list party ON part.part_id = party.Part_Name
LEFT JOIN job_types joby ON job_id = joby.Job_Name
LEFT JOIN clients cliy ON clients_id = cliy.client_name
Apparently, once all the joining happens it does not even populate the correct foreign key values according to their reference.
[some values came out as the actual foreign key id, not even
corresponding value.]
It just goes on about 20-30 times depending on largest row of a table that I have (one of the above).
I only have two work orders created, So ideally it should return just TWO Records, and columns, and fields with correct information. What could I be doing wrong? Haven't been with MySQL too long but am learning as much as I can.
Your join conditions are wrong. Join on tech_id = tech_id, not tech_id = tech_name. Looks like you do this for all your joins, so they all need to be fixed.
I really don't follow the text of your question, so I am basing my answer solely on your query.
Edit
Replying to your comment here. You said you want to "load up" the tech name column. I assume you mean you want tech name to be part of your result set.
The SELECT part of the query is what determines the columns that are in the result set. As long as the table where the column lives is referenced in the FROM/JOIN clauses, you can SELECT any column from that table.
Think of a JOIN statement as a way to "look up" a value in one table based on a value in another table. This is a very simplified definition, but it's a good way to start thinking about it. You want tech name in your result set, so you look it up in the Technicians table, which is where it lives. However, you want to look it up by a value that you have in the Work Orders table. The key (which is actually called a foreign key) that you have in the Work Orders table that relates it to the Technicians table is the tech_id. You use the tech_id to look up the related row in the Technicians table, and by doing so can include any column in that table in your result set.
I have these tables:
IdToName:
Id Name
1 A
2 B
RawData:
Son Father
B A
I want to create a new table called Data, in which instead of string, I will have Id's, i.e.:
Data:
Son Father
2 1
I do this using this query:
INSERT INTO `Data`
SELECT L.`ID`, P.`ID`
FROM `IdToName` L,
`IdToName` P,
`RawData` T
WHERE T.Father = P.Name
AND T.Son = L.Name
I have keys on RawData's son and father and on IdToName's Name. This query takes about 7 minutes for 2,800,000 lines. Does anyone have any idea how I can improve the performance for this?
Check the time of the query alone. I strongly suspect that what you have is really "MySQL performance issues in INSERT", and not "in SELECT". 7000 inserts per second is quite a lot, it might be the physical limit of your machine.
uhm, and btw [edit]: we don't know the exact shape and content of your tables (and of memory), but I don't think in your case any index can help.
The only apparent reason why that would be slow is the lack of propper indexes.
Please index Id in table IdToName to UNIQUE, and both columns in table RawData to INDEX.
Is there any way how to create an functioning index for this query and to get rid of "filesort"?
SELECT id, title FROM recipes use index (topcat) where
(topcat='$cid' or topcat2='$cid' or topcat3='$cid')
and approved='1' ORDER BY id DESC limit 0,10;
I created index "topcat" ( columns: topcat1+topcat2+topcat3+approved+id) but still ge "Using where; Using filesort".
I can create one more column, lets say, "all_topcats" to store topcat numbers in an array - 1,5,7 and then to run query "... where $cid iIN ()...". But the probem is that in this case "all_topcats" column will be "varchar" but "approved" and "id" columns - int, and index will not be used anyway.
Any ideas? Thanks.
You might improve performance for that query if you reordered the columns in the index:
approved, topcat1, topcat2, topcat3, id
It would be useful to know what the table looks like and why you have three columns named like that. It might be easier to organise a good query if you had a subsidiary table to store the topcat values, with a link back to the main table, but without knowing why you have it set up like that it's hard to know whether that would be sensible.
Can you post the CREATE TABLE?
Edit in response to user message
Your table doesn't sound like it's well-designed. The following design would be better: Add two new tables, Category and Category_Recipe (a cross-referencing table). Category will contain a list of your categories and Category_Recipe will contain two columns, one a foreign key to Category and one a foreign key to the existing Recipe table. A row of Category_Recipe is a statement "this recipe is in this category". You will then be able to very simply write a query that will search for recipes in a given category. You also have the ability to put a recipe in arbitrarily many categories, rather than being limited to 3. Look up "database normalisation" and "foreign keys".