If I want to read the table id from a mysql database in order to add it to a write operation would I need to perform two queries? Or is there a way to perform just one when using mysql? As a rule of thumb, you rarely need two queries but should never exceed two queries for a single operation correct?
Can I use one query?
When you say "table id" I suppose you mean the id column of a table... No need to use two queries. You can use one query and you can insert multiple records at once if you wish (recommended).
An example: Insert two products from a products list as new entries of an order (with 37 as order_id) into an orders table. Each product_id (2, resp. 3) will be read from the products table based on the specified product_code value (6587, resp. 9678).
INSERT INTO orders (
order_id,
product_id
) VALUES (
37,
(SELECT id FROM products WHERE product_code = 6587)
), (
37,
(SELECT id FROM products WHERE product_code = 9678)
)
Where the tables have the following structures:
CREATE TABLE `products` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`product_code` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
CREATE TABLE `orders` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(11) DEFAULT NULL,
`product_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and the table products have the following values:
INSERT INTO `products` (`id`, `product_code`)
VALUES
(1,'1234'),
(2,'6587'),
(3,'9678'),
(4,'5676');
The result in the orders table will look like this:
id order_id product_id
--------------------------
1 37 2
2 37 3
Rules of thumb
I, also, never heard about such a rule regarding the number of queries needed for an operation. Anyway, these are some main rules that I strictly follow:
If you have the chance to achieve specific data access operations using only one query - even if it becomes very complex - then don't hesitate to do it. Put the database engine to work to a maximum whenever you have the chance, even if it seems easier to "chunk" the db operations and benefit from the features of some programming language in order to run them.
Make use of - good designed - indexes. They are very powerful in regard of speed optimization. Use EXPLAIN for checking them.
Design your tables in such a way, that no redundant data is to be found in them. For example, in analogy to my example above, the product_code should be saved only in the products table, even it could make some sense to be saved also in orders table.
"Standardize" your own naming rules across the tables in a/all database(s).
Good luck!
I do not think that there are such rules of thumb.
You should use as many queries as you like to achieve your goal. What you end up with actually will depend on the actions to be done, on the performance needed, on the maintainability needed (think of co-workers which might have to change it) and on other requirements or preferences.
If it really would be possible to do every "single operation" with one query, we probably would not need transactions.
My advice would be to solve your problem with as many queries as you need so that you and your co-workers some years in the future still understand what has been done, and to look into transactions (the MySQL manual and a myriad of tutorials on the net explain them quite well).
Related
I am new to MySQL and databases overall. Is it possible two create a table where a column is a sum of two other columns from two other tables.
For instance if I have database `Books :
CREATE TABLE `books` (
`book_id` int(100) NOT NULL,
`book_name` varchar(20) NOT NULL,
`book_author` varchar(20) NOT NULL,
`book_co-authors` varchar(20) NOT NULL,
`book_edition` tinyint(4) NOT NULL,
`book_creation` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`book_amount` int(100) NOT NULL COMMENT 'Amount of book copies in both University libraries'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
How can make column book_amount be a sum of the two book_amount columns from library1 and library2 tables where book_id = book_id?
Library1 :
CREATE TABLE `library1` (
`book_id` int(11) NOT NULL,
`book_amount` int(11) NOT NULL,
`available_amount` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
You can define a column with whatever type you want, so long as it's valid, and then populate it as you will with data from other tables. This is generally called "denormalizing" as under ideal circumstances you'll want that data stored in other tables and computed on demand so there's never a chance of your saved value and the source data falling out of sync.
You can also define a VIEW which is like a saved query that behaves as if it's a table. This can do all sorts of things, like dynamically query other tables, and presents the result as a column. A "materialized view" is something some databases support where the view is automatically updated and saved based on some implicit triggers. A non-materialized view is slower, but in your case the speed difference might not be a big deal.
So you have options in how you represent this.
One thing to note is that you should use INT as a default "integer" field, not wonky things like INT(100). The number for integer fields specifies how many significant digits you're expecting, and as INT can only store at most 11 this is wildly out of line.
Not directly, however there are a few ways to achieve what you're after.
Either create a psuedo column in your select clause which adds the other two columns
select *, columna+columnb AS `addition` from books
Don't forget to swap out columna and columnb to the name of the columns, and addition to the name you'd like the psuedo column to have.
Alternatively, you could use a view to auto add the psuedo field in the same way. However, views do not have indexes, so performing lookups in them and joining them can get rather slow very easily.
You could also use triggers to set the values upon insert and update, or simply calculate the value within the language that inserts into the DB.
Following query will work if library1 and library2 table has similar schema as table books:
Insert into books
select l1.book_id,
l1.book_name,
l1.book_authors,
l1.book_co_authors,
l1.book_edition,
l1.book_creation,
(l1.book_amount + l2.book_amount)
from library1 l1
inner join library2 l2
on l1.book_id = l2.book_id
group by l1.book_id
;
I want to update the statistic count in mysql.
The SQL is as follow:
REPLACE INTO `record_amount`(`source`,`owner`,`day_time`,`count`) VALUES (?,?,?,?)
Schema :
CREATE TABLE `record_amount` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`owner` varchar(50) NOT NULL ,
`source` varchar(50) NOT NULL ,
`day_time` varchar(10) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `src_time` (`owner`,`source`,`day_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
However, it caused a DEADLOCK exception in multi-processes running (i.e. Map-Reduce).
I've read some materials online and confused about those locks. I know innodb uses row-level lock. I can just use the table-lock to solve the business problem but it is a little extreme. I found some possible solutions:
change REPLACE INTO to transaction with SELECT id FOR UPDATE and UPDATE
change REPLACE INTO to INSERT ... ON DUPLICATE KEY UPDATE
I have no idea that which is practical and better. Can someone explain it or offer some links for me to read and study? Thank you!
Are you building a summary table, one source row at a time? And effectively doing UPDATE ... count = count+1? Throw away the code and start over. MAP-REDUCE on that is like using a sledge hammer on a thumbtack.
INSERT INTO summary (source, owner, day_time, count)
SELECT source, owner, day_time, COUNT(*)
FROM raw
GROUP BY source, owner, day_time
ON DUPLICATE KEY UPDATE count = count + VALUES(count);
A single statement approximately like that will do all the work at virtually disk I/O speed. No SELECT ... FOR UPDATE. No deadlocks. No multiple threads. Etc.
Further improvements:
Get rid of the AUTO_INCREMENT; turn the UNIQUE into PRIMARY KEY.
day_time -- is that a DATETIME truncated to an hour? (Or something like that.) Use DATETIME, you will have much more flexibility in querying.
To discuss further, please elaborate on the source data (`CREATE TABLE, number of rows, frequency of processing, etc) and other details. If this is really a Data Warehouse application with a Summary table, I may have more suggestions.
If the data is coming from a file, do LOAD DATA to shovel it into a temp table raw so that the above INSERT..SELECT can work. If it is of manageable size, make raw Engine=MEMORY to avoid any I/O for it.
If you have multiple feeds, my high-speed-ingestion blog discusses how to have multiple threads without any deadlocks.
I've been thinking about keeping a history in the following table structure:
`id` bigint unsigned not null auto_increment,
`userid` bigint unsigned not null,
`date` date not null,
`points_earned` int unsigned not null,
primary key (`id`),
key `userid` (`userid`),
key `date` (`date`)
This will allow me to do something like SO does with its Reputation Graph (where I can see my rep gain since I joined the site).
Here's the problem, though: I just ran a simple calculation:
SELECT SUN(DATEDIFF(`lastclick`,`registered`)) FROM `users`
The result was as near as makes no difference 25,000,000 man-days. If I intend to keep one row per user per day, that's a [expletive]ing large table, and I'm expecting further growth. Even if I exclude days where a user doesn't come online, that's still huge.
Can anyone offer any advice on maintaining such a large amount of data? The only queries that will be run on this table are:
SELECT * FROM `history` WHERE `userid`=?
SELECT SUM(`points_earned`) FROM `history` WHERE `userid`=? AND `date`>?
INSERT INTO `history` VALUES (null,?,?,?)
Would the ARCHIVE engine be of any use here, for instance? Or do I just not need to worry because of the indexes?
Assuming its mysql:
for history tables you should consider partitioning you can set the best partition rule for you and looking at what queries you have there are 2 choices :
a. partition by date (1 partition = 1 month for example)
b. partition by user (lets say you have 300 partitions and 1 partition = 100000 users)
this will help you allot if you will use partition pruning (here)
you could use a composite index for user,date (it will be used for the first 2 queries)
avoid INSERT statement, when you have huge data use LOAD DATA (this will not work is the table is partitioned )
And most important ... the best engine for huge volumes of data is MyISAM
I have 2 tables:
Dictionary - Contains roughly 36,000 words
CREATE TABLE IF NOT EXISTS `dictionary` (
`word` varchar(255) NOT NULL,
PRIMARY KEY (`word`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Datas - Contains roughly 100,000 rows
CREATE TABLE IF NOT EXISTS `datas` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`hash` varchar(32) NOT NULL,
`data` varchar(255) NOT NULL,
`length` int(11) NOT NULL,
`time` int(11) NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `hash` (`hash`),
KEY `data` (`data`),
KEY `length` (`length`),
KEY `time` (`time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=105316 ;
I would like to somehow select all the rows from datas where the column data contains 1 or more words.
I understand this is a big ask, it would need to match all of these rows together in every combination possible, so it needs the best optimization.
I have tried the below query, but it just hangs for ages:
SELECT `datas`.*, `dictionary`.`word`
FROM `datas`, `dictionary`
WHERE `datas`.`data` LIKE CONCAT('%', `dictionary`.`word`, '%')
AND LENGTH(`dictionary`.`word`) > 3
ORDER BY `length` ASC
LIMIT 15
I have also tried something similar to the above with a left join, and on clause that specified the like statement.
This is actually not an easy problem, what you are trying to perform is called Full Text Search, and relational databases are not the best tools for such a task. If this is some kind of a core functionality consider using solutions dedicated for this kind of operations, like Sphinx Search Server.
If this is not a "Mission Critical" system, you can try with something else. I can see that datas.data column isn't really long, so you can create a structure dedicated for your task and keep maintaining it during operational use. Fore example, create table:
dictionary_datas (
datas_id FK (datas.id),
word FK (dictionary.word)
)
Now anytime you insert, delete or simply modify datas or dictionary tables you update dictionary_datas placing there info which datas_id contains which words (basically many to many relations). Of course it will degradate your performance, so if you have high high transactional load on your system, you have to do this periodicaly. For example place a Cron Job which runs every night at 03:00 am and actualize the table. To simplify the task you can add a flag TO_CHECK into DATAS table, and actualize data only for those records having there 1 (after you actualise dictionary_datas you switch the value to 0). Remember by the way to refresh whole DATAS table after an update to DICTIONARY table. 36 000 and 100 000 aren't big numbers in terms of data processing.
Once you have this table you can just query it like:
SELECT datas_id, count(*) AS words_num FROM dictionary_datas GROUP BY datas_id HAVING count(*) > 3;
To speed up the query (and yet slow down it's update) you can create a composite index on its columns datas_id, word (in EXACTLY that order). If you decide to refresh the data periodicaly you should remove the index before refresh, than refresh the data, and finaly create the index after refreshing - this way will be faster.
I'm not sure if I understood your problem, but I think this could be a solution. Also, I think people don't like Regular Expression but this works for me to select columns where their value has more than 1 word.
SELECT * FROM datas WHERE data REGEXP "([a-z] )+"
Have you tried this?
select *
from dictionary, datas
where position(word,data) > 0
;
This is very inefficient, but might be good enough for you. Here is a fiddle.
For better performance, you could try placing a text search index on your text column DATA and then using the CONTAINS function instead of POSITION.
I have a table Items which stores fetched book data from Amazon. This Amazon data is inserted into Items as users browse the site, so any INSERT that occurs needs to be efficient.
Here's the table:
CREATE TABLE IF NOT EXISTS `items` (
`Item_ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Item_ISBN` char(13) DEFAULT NULL,
`Title` varchar(255) NOT NULL,
`Edition` varchar(20) DEFAULT NULL,
`Authors` varchar(255) DEFAULT NULL,
`Year` char(4) DEFAULT NULL,
`Publisher` varchar(50) DEFAULT NULL,
PRIMARY KEY (`Item_ID`),
UNIQUE KEY `Item_Data` (`Item_ISBN`,`Title`,`Edition`,`Authors`,`Year`,`Publisher`),
KEY `ISBN` (`Item_ISBN`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT AUTO_INCREMENT=1 ;
Normalizing this table would presumably mean creating tables for Titles, Authors, and Publishers. My concern with doing this is that the insert would become too complex.. To insert a single Item, I'd have to:
Check for the Publisher in Publishers to SELECT Publisher_ID, otherwise insert it and use mysql_insert_id() to get Publisher_ID.
Check for the Authors in Authors to SELECT Authors_ID, otherwise insert it and use mysql_insert_id() to get Authors_ID.
Check for the Title in Titles to SELECT Title_ID, otherwise insert it and use mysql_insert_id() to get Title_ID.
Use those ID's to finally insert the Item (which may in fact be a duplicate, so this whole process would have been a waste..)
Does that argue against normalization for this table?
Note: The goal of Items is not to create a comprehensive database of books, so that a user would say "Show me all the books by Publisher X." The Items table is just used to cache Items for my users' search results.
Considering your goal, I definitely wouldn't normalize this.
You've answered your own question - don't normalize it!
YES you should normalize it if you don't think it is already. However, as far as I can tell it's already in 5th Normal Form anyway - at least it seems to be based on the "obvious" interpretation of those column names and if you ignore the nullable columns. Why do you doubt it? Not sure why you want to allow nulls for some of those columns though.
1.Check for the Publisher in Publishers to SELECT Publisher_ID,
otherwise insert it and use
mysql_insert_id() to get Publisher_ID
There is no "Publisher_ID" in your table. Normalization has nothing to do with inventing a new "Publisher_ID" attribute. Substituting a "Publisher_ID" in place of Publisher certainly wouldn't make it any more normalized than it already is.
The only place where i can see normalization useful in your case is if you want to store information about each author.
However -
Where normalization could help you - Saving space! Especially if there is a lot of repetition in terms of publishers, authors (that is, if you normalize individual authors table).
So if you are dealing with 10s of millions of rows, normalization will show an impact in terms of space(even performance). If you don't face that situation (which i believe should be the case) you don't need to normalize.
ps - Also think of the future... will there ever be a need? DBs are a long term infrastructure... never design them keeping the now in mind.