I was asked below question in an interview which i could not answer.Could anyone please help.
A primary school teacher wants to store the first name,last name,date of birth,gender(0=female and 1=male) and home phone number of each of her pupils in a MySQL database.She came up with the following table definition:
CREATE TABLE pupil(
pupil_id INT NOT NULL AUTO_INCREMENT,
first_name CHAR(50),
last_name CHAR(50),
date_of_birth CHAR(50),
gender INT,
phone_number CHAR(50),
PRIMARY_KEY (pupil_id)
)ENGINE=MyISAM DEFAULT CHARSET=utf8;
she frequently runs the following queries
select * from pupil where pupil_id = 2;
select * from pupil where first_name = 'John';
select * from pupil where first_name = 'John' and last_name = 'Doe';
What changes will you make to this table? and why?
My answer would have been I would add two indexes. One for first_name and one for first_name, last_name.
If you frequently query according to the first name or the first and last names, it might be a good idea to index them:
CREATE INDEX pupil_names_ind ON pupil (first_name, last_name);
Having said that, you should really run a benchmark first. If the table has just a couple of hundreds of rows most of them will be caches anyway, and indexing it would be a wasted effort.
You just want one index
CREATE INDEX pupils_by_name ON pupil (first_name, last_name)
first query(by id) is optimised thanks to primary key
second query(by first name) is handled by pupils_by_name index, because it fits the index column orders starting from left hand side, if you would want to optimise a query by 'last_name' field, this index would't work because the order of columns in where clause must match an order of columns in and index starting from the left
thrid query fits the index pupils_by_name perfectly
in addition
date_of_birth CHAR(50) should be date
gender INT should be tinyint
I would create indexes on the first_name and last_name (in that order). it is important. more info here : http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Then i would change the date_of_birth datatype to date
You should add an index in the table in the columns first_name and last_name.
Syntax : CREATE INDEX `ind` ON `table`(`col`);
This will make the searches on the indexed columns fast. Go through this article => When to use indexes? and MySQL's docs. Basically you use indexes on frequently searched items to boost the performance. But remember one thing : Too many indexes slow insertions into the table. So, an effective usage of indexes will definitely speed up a query.
Related
This is my first question in stackoverflow and I am delighted to be part of this community because it has helped me many times.
I'm not an expert in SQL and MySQL but I'm working in a project that needs large tables (million rows). I have a problem when doing a join and I don't understand why it takes so long. Thanks in advance:)
Here are the tables:
CREATE TABLE IF NOT EXISTS tabla_maestra(
id int UNIQUE,
codigo_alta char(1),
nombre varchar(100),
empresa_apellido1 varchar(150),
apellido2 varchar(50),
tipo_via varchar(20),
nombre_via varchar(100),
numero_via varchar(50),
codigo_via char(5),
codigo_postal char(5),
nombre_poblacion varchar(100),
codigo_ine char(11),
nombre_provincia varchar(50),
telefono varchar(250) UNIQUE,
actividad varchar(100),
estado char(1),
codigo_operadora char(3)
);
CREATE TABLE IF NOT EXISTS tabla_actividades_empresas(
empresa_apellido1 varchar(150),
actividad varchar(100)
);
Here is the query I want to do:
UPDATE tabla_maestra tm
INNER JOIN tabla_actividades_empresas tae
ON (tm.nombre!='' AND tae.empresa_apellido1=tm.empresa_apellido1)
SET tm.actividad=tae.actividad;
This query takes too long, and before executing it I was trying to test how long takes this simplier query:
SELECT COUNT(*) FROM tabla_maestra tm
INNER JOIN tabla_actividades_empresas tae
ON (tm.nombre!='' AND tae.empresa_apellido1=tm.empresa_apellido1);
It is still taking too long, and I don't understand why. Here are the indexes I use:
CREATE INDEX cruce_nombre
USING HASH
ON tabla_maestra (nombre);
CREATE INDEX cruce_empresa_apellido1
USING HASH
ON tabla_maestra (empresa_apellido1);
CREATE INDEX index_actividades_empresas
USING HASH
ON tabla_actividades_empresas(empresa_apellido1);
If I use the EXPLAIN statement, these are the results:
http://oi59.tinypic.com/2zedoy0.jpg
I would be so grateful to receive any answer that could help me. Thanks a lot,
Dani.
A join involving half a million rows -- as your query plan shows -- is bound to take some time. The count(*) query is quicker because it doesn't need to read the tabla_maestra table itself, but it still needs to scan all the rows of index cruce_empresa_apellido1.
It might help some if you made index index_actividades_empresas a unique index (supposing that that's indeed appropriate) or if instead you drop that index and make column empresa_apellido1 a primary key of table tabla_actividades_empresas.
If even that does not give you sufficient performance, then the only other thing I see to do is to give table tabla_actividades_empresas a synthetic primary key of integer type, and to change the corresponding column of tabla_maestra to match. That should help because comparing an integer to an integer is faster than comparing a string to a string, even when you can filter out (most) mismatches via a hash.
I agree with the other ones (see John Bollinger i.e.) about the lack of Primary Keys on it. It's highly adiviced for IDs (I noticed you worry about it be repeated, but PK smoothly treats it too - I meant MySQL's AUTOINCREMENT).
Why do you use the tabla_actividades_empresas.empresa_apellido1 instead of look for tabla_maestra's ID to be referenced in?
If so, you could define Foreign Key to it: tabla_actividades_empresas.maestra_id i.e.
Because it gets better if you associate tables with non-strings types.
You also can subquery the tables before the JOIN action between them. It's an example:
UPDATE (SELECT * FROM tabla_maestra WHERE nombre != '') AS tm
INNER JOIN tabla_actividades_empresas AS tae
ON tae.empresa_apellido1 = tm.empresa_apellido1
SET tm.actividad = tae.actividad;
I have not tested it. But it seems to be a nice behavior to follow since then.
Oh... everytime do you need to update all the data rows? Unless, you can update only the forgotten ones. You can apply the UPDATE by INNER JOIN after one LEFT JOIN to determine the needed ones to be updated. Does it have sense? I'm not any expert, but it can be useful to think about.
EDIT
You may test some subquery too:
UPDATE tabla_maestra AS main, tabla_actividades_empresas AS aggr
SET main.actividad = aggr.actividad
WHERE main.empresa_apellido1 = aggr.empresa_apellido1
AND main.nombre <> ''
Don't forget to try of adjusting the relationship.
Thank you so much for your answers.
The fact is that table 'tabla_maestra' is a table that contain information about enterprises, but does't contain the values for the 'actividad' field (activity of the enterprise). Moreover, the 'id' field is still empty (I will it in a future. It is difficult to explain why, but it has to be done this way).
I need to add the activity of each enterprise joining with an auxiliar table 'tabla_actividades_empresas', which contain the activity for each enterprise name. And I only have to do it one time, no more. Then I will be able to drop the table 'tabla_actividades_empresas' because I won't need it.
And the only way to join them is by the field 'empresa_apellido1', it is to say, the name of the enterprise.
I have made the field 'tabla_actividades_empresas.empresa_apellido1' unique, but it doesn't improve the performance.
And it doesn't have sense to define a foreign key on 'tabla_actividades_empresas' because the field 'empresa_apellido1' is UNIQUE only for the 'tabla_actividades_empresas', not for the 'tabla_maestra' (in this table, an enterprise name can appear many times because enterprises can have different offices in different places). It is to say, 'tabla_actividades_empresas' doesn't contain repeated enterprises, but 'tabla_maestra' has repeated name enterprises.
By the way, what do you mean by "adjusting the relationship"? I have tried your subqueries with the explain statement, and it doesn't use the indexes correctly, the performance is worse.
I am working in a project. In my project database, I have student and trainer. I need to use auto-increment with alpha-numeric for student id and trainer id.
For example:
student id should be automatically incremented as STU1,STU2....
trainer id should be automatically incremented as TRA1,TRA2....
I am using MySQL as my DB.
If it is possible, please give solution for other databases like oracle, Sql server.
MySQL does not have any built in functionality to handle this. If the value you want to add on the front of the auto incremented id is always the same, then you should not need it at all and just add it to the front in your SELECT statement:
SELECT CONCAT('STU', CAST(student_id AS CHAR)) AS StudentID,
CONCAT('TRA', CAST(trainer_id AS CHAR)) AS TrainerID
FROM MyTable
Otherwise the following would work for you:
CREATE TABLE MyTable (
student_id int unsigned not null auto_increment,
student_id_adder char(3) not null
trainer_id int unsigned not null auto_increment,
trainer_id_adder char(3) not null
)
The SELECT to pull them together might look like the following:
SELECT CONCAT(student_id_adder, CAST(student_id AS CHAR)) AS StudentID,
CONCAT(trainer_id_adder, CAST(trainer_id AS CHAR)) AS TrainerID
FROM MyTable
You are mixing two different concepts here. The autoincrement feature is for ID based database tables.
You can build a student table where each student gets an ID, which can be a number or something else and will probably be printed in the student card. Such a table would look like this:
Table student
student_card_id
first_name
last_name
...
There can be other tables using the student_card_id. Now some people say this is good. Students are identified by their card IDs, and these will never change. They use this natural key as the primary key in the table. Others, however, say that there should be a technical ID for each table, so if one day you decide to use different student numbers (e.g. STUDENT01 instead of STU01), then you would not have to update the code in all referencing tables. You would use an additional technical ID as shown here:
Table student
id
student_card_id
first_name
last_name
...
You would use the ID as primary key and should use the auto increment feature with it. So student STU01 may have the technical ID 18654; it just doesn't matter, for it's only a technical reference. The student card will still contain STU01. The student won't even know that their database record has number 18654.
Don't mix these two concepts. Decide whether you want your tables to be ID based or natural key based. In either case you must think of a way to generate the student card numbers. I suggest you write a function for that.
How can I search for a person full name if I have first name stored in a column called "first_name" and the last name is stored in a column called "last_name"
Note that this table has couple million records so I need an efficient way to do the search. and I am using MySQL Server
the column first_name and the column last_name are both type VARCHAR(80).
I have tried the following so far which works but slow because it ignores the indexes because of the concat function
SELECT first_name, phone FROM people
WHERE CONCAT_WS(' ',first_name, last_name) like '%John Smith%'
I also have tried to add index full text index on( first_name, last_name)
and then this query
SELECT *
FROM people
WHERE MATCH(first_name, last_name) AGAINST('John Smith')
but it is not the fastest query. is there a better approach to this problem?
Thanks
Try searching each field independently:
WHERE first_name = 'James' and last_name = 'Hetfield';
I'd also add a composite index for both, using last name first as it would have a higher cardinality (more unique rows), meaning that searching on the last name 'Hetfield' should be faster, than searching on a first name 'James'.
ALTER TABLE `some_table` ADD key (`last_name`, `first_name`);
I would create a FULLTEXT index on (first_name, last_name) and use MATCH()
WHERE MATCH(first_name, last_name) AGAINST ('John Smith')
Actually if you want to search performance you should change your approach (Millions and more record need more efficiant solution). Could you use NoSQL or FullText search solutions for partial columns.
I'm developing an online reservation system where people can reserve items based on availability for a particular hour of the day. For that i'm using two tables 1.Item 2.Reservation
Item:(InnoDB)
-------------------------
id INT (PRIMARY)
category_id MEDIUMINT
name VARCHAR(20)
make VARCHAR(20)
availability ENUM('1','0')
Reservation:(InnoDB)
-------------------------
id INT (PRIMARY)
date DATE
Item_id INT
slot VARCHAR(50)
SELECT Item.id,Item.category,Item.make,Item.name,reservation.slot
FROM Item
INNER JOIN reservation ON Item.id=reservation.Item_id AND Item.category_id=2
AND Item.availability=1 AND reservation.date = DATE(NOW());
I'm using the above query to display all the items under a particular category with free timeslots which a user can reserve on a particular date.
slot field in reservation table contains string(ex:0:1:1:1:0:0:0:0:0:0:0:0:1:1:1:0:0:0:0:0:0:0:0:0) where 1 means that hour is reserved and 0 means available.
availability in Item table shows wether that item is available for reservation or not(may be down for servicing).
First of all is my table structure fine ?Secondly what is the best way to optimize my query(multi column indexing etc).
thanks,
ravi.
Put a foreign key constraint and index on your FK, this should speed things up a little. You appear to mix INT for the item ID and MEDIUMINT for the FK, not sure this is what nature intended.
Points to remember while choosing an Attribute on which an Index will be created
A column that is frequently used in a SELECT list and a WHERE clause.
A column in which data will be accessed in sequence by a range of values.
A column that will be used with the GROUP By or ORDER BY clause to sort data.
A column used in a join, such as the FOREIGN KEY column.
A column that is used as a PRIMERY KEY.
Try to create index on numeric values. can introduce a surrogate key if no numeric pK is there.
Why are you putting all of those conditions in the join clause? Why not:
SELECT Item.id,Item.category,Item.make,Item.name,reservation.slot
FROM Item INNER JOIN reservation ON Item.id=reservation.Item_id
WHERE Item.category_id=2 AND Item.availability=1 AND reservation.date = DATE(NOW());
I'm not enough of a SQL expert to say whether this will make it faster, but it looks more obvious to me.
I would suggest creating an index on Reservation.item_id. That should help you improve query performance.
I am interested to know what people think about (AND WHY) the following 3 different conventions for naming database table primary keys in MySQL?
-Example 1-
Table name: User,
Primary key column name: user_id
-Example 2-
Table name: User,
Primary key column name: id
-Example 3-
Table name: User,
Primary key column name: pk_user_id
Just want to hear ideas and perhaps learn something in the process :)
Thanks.
I would go with option 2. To me, "id" itself seems sufficient enough.
Since the table is User so the column "id" within "user" indicates that it is the identification criteria for User.
However, i must add that naming conventions are all about consistency.
There is usually no right / wrong as long as there is a consistent pattern and it is applied across the application, thats probably the more important factor in how effective the naming conventions will be and how far they go towards making the application easier to understand and hence maintain.
I always prefer the option in example 1, in which the table name is (redundantly) used in the column name. This is because I prefer to see ON user.user_id = history.user_id than ON user.id = history.user_id in JOINs.
However, the weight of opinion on this issue generally seems to run against me here on Stackoverflow, where most people prefer example 2.
Incidentally, I prefer UserID to user_id as a column naming convention. I don't like typing underscores, and the use of the underscore as the common SQL single-character-match character can sometimes be a little confusing.
ID is the worst PK name you can have in my opinion. TablenameID works much better for reporting so you don't have to alias a bunch of columns named the same thing when doing complex reporting queries.
It is my personal belief that columns should only be named the same thing if they mean the same thing. The customer ID does not mean the same thing as the orderid and thus they should conceptually have different names. WHen you have many joins and a complex data structure, it is easier to maintain as well when the pk and fk have the same name. It is harder to spot an error in a join when you have ID columns. For instance suppose you joined to four tables all of which have an ID column. In the last join you accidentally used the alias for the first table and not the third one. If you used OrderID, CustomerID etc. instead of ID, you would get a syntax error because the first table doesn't contain that column. If you use ID it would happily join incorrectly.
I tend to go with the first option, user_id.
If you go with id, you usually end up with a need to alias excessively in your queries.
If you go with more_complicated_id, then you either must abbreviate, or you run out of room, and you get tired of typing such long column names.
2 cents.
I agree with #InSane and like just Id. And here's why:
If you have a table called User, and a column dealing with the user's name, do you call it UserName or just Name? The "User" seems redundant. If you have a table called Customer, and a column called Address, do you call the column CustomerAddress?
Though I have also seen where you would use UserId, and then if you have a table with a foreign key to User, the column would also be UserId. This allows for the consistency in naming, but IMO, doesn't buy you that much.
In response to Tomas' answer, there will still be ambiguity assuming that the PK for the comment table is also named id.
In response to the question, Example 1 gets my vote. [table name]_id would actually remove the ambiguity.
Instead of
SELECT u.id AS user_id, c.id AS comment_id FROM user u JOIN comment c ON u.id=c.user_id
I could simply write
SELECT user_id, comment_id FROM user u JOIN comment c ON u.user_id=c.user_id
There's nothing ambiguous about using the same ID name in both WHERE and ON. It actually adds clarity IMHO.
I've always appreciated Justinsomnia's take on database naming conventions. Give it a read: http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/
I would suggest example 2. That way there is no ambiguity between foreign keys and primary keys, as there is in example 1. You can do for instance
SELECT * FROM user, comment WHERE user.id = comment.user_id
which is clear and concise.
The third example is redundant in a design where all id's are used as primary keys.
OK so forget example 3 - it's just plain silly, so it's between 1 and 2.
the id for PK school of thought (2)
drop table if exists customer;
create table customer
(
id int unsigned not null auto_increment primary key, -- my names are id, cid, cusid, custid ????
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
id int unsigned not null auto_increment primary key, -- my names are id, oid, ordid
cid int unsigned not null -- hmmm what shall i call this ?
)engine=innodb;
insert into orders (cid) values (1),(2),(1),(1),(2);
-- so if i do a simple give me all of the customer orders query we get the following output
select
c.id,
o.id
from
customer c
inner join orders o on c.id = o.cid;
id id1 -- big fan of column names like id1, id2, id3 : they are sooo descriptive
== ===
1 1
2 2
1 3
1 4
2 5
-- so now i have to alias my columns like so:
select
c.id as cid, -- shall i call it cid or custid, customer_id whatever ??
o.id as oid
from
customer c
inner join orders o on c.id = o.cid; -- cid here but id in customer - where is my consistency ?
cid oid
== ===
1 1
2 2
1 3
1 4
2 5
the tablename_id prefix for PK/FK name school of thought (1)
(feel free to use an abbreviated form of tablename i.e cust_id instead of customer_id)
drop table if exists customer;
create table customer
(
cust_id int unsigned not null auto_increment primary key, -- pk
name varchar(255) not null
)engine=innodb;
insert into customer (name) values ('cust1'),('cust2');
drop table if exists orders;
create table orders
(
order_id int unsigned not null auto_increment primary key,
cust_id int unsigned not null
)engine=innodb;
insert into orders (cust_id) values (1),(2),(1),(1),(2);
select
c.cust_id,
o.order_id
from
customer c
inner join orders o on c.cust_id = o.cust_id; -- ahhhh, cust_id is cust_id is cust_id :)
cust_id order_id
======= ========
1 1
2 2
1 3
1 4
2 5
so you see the tablename_ prefix or abbreviated tablename_prefix method is ofc the most
consistent and easily the best convention.
I don't disagree with what most of the answers note - just be consistent. However, I just wanted to add that one benefit of the redundant approach with user_id allows for use of the USING syntactic sugar. If it weren't for this factor, I think I'd personally opt to avoid the redundancy.
For example,
SELECT *
FROM user
INNER JOIN subscription ON user.id = subscription.user_id
vs
SELECT *
FROM user
INNER JOIN subscription USING(user_id)
It's not a crazy significant difference, but I find it helpful.