I have 30 tables in a database, all InnoDB. They have their structure.
What I want to do is actually adding the bellow columns to EVERY table.
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` text COLLATE utf8_unicode_ci NOT NULL,
`description` text COLLATE utf8_unicode_ci NOT NULL,
`categoryId` int(11) NOT NULL,
`imageId` int(11) NOT NULL,
`created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`createdId` int(11) NOT NULL,
`updatedId` int(11) NOT NULL,
`allowedEmployeeIds` text COLLATE utf8_unicode_ci NOT NULL,
In a programming language (assume PHP) the normal approach is to create an abstract class, put all of the common variables there, and then inherit from it. What about MySQL? how should I do this?
For reasons coming from design, it's not possible for me to create a commonData table and use foreign keys / join. Please note that I am writing the create statements from scratch, so there is no need for update.
I'd think through this for a while -- it's complicated and will add a lit to what you're trying to do. It'll add a lot of work later.
But if you really want to do this, the way to do it is not to put this in the table definition, but to create a script that can add these columns to a table.
Something like:
ALTER TABLE table_name
ADD `id` int(11) NOT NULL AUTO_INCREMENT,
ADD `name` text COLLATE utf8_unicode_ci NOT NULL,
ADD `description` text COLLATE utf8_unicode_ci NOT NULL,
ADD `categoryId` int(11) NOT NULL,
ADD `imageId` int(11) NOT NULL,
ADD `created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
ADD `updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD `createdId` int(11) NOT NULL,
ADD `updatedId` int(11) NOT NULL,
ADD `allowedEmployeeIds` text COLLATE utf8_unicode_ci NOT NULL;
Then simply execute this script for each table you have.
If you want to write a script to do this flor all tables, use the command:
SHOW TABLES;
to find all the tables. Then loop through all the tables and execute the alter script on all the tables.
I think the most practical way to do this would be to use a GUI mysql interface to duplicate the tables. For example, on mac you could use SequelPro and duplicate the table as needed, which would only take a few seconds, rather than writing the CREATE TABLEs in a shell script or otherwise.
Mysql has no concept of table inheritance. You will need to add all those columns to every table. I would write a script to generate the MODIFY TABLE ... ADD ... statements for every table you care about, then run the result against your database.
I'm skeptical you can't find some other way to organize your data that doesn't require adding the same exact columns to 30 tables.
I know this is an old topic, but the solutions presented show how to create the required columns at table creation time - really a typing-saving device.
I was thinking about this more from a point of design. In this case, a second table could be defined with the common columns. For example :
CREATE TABLE article
:
:
CREATE TABLE classified_ad
:
:
CREATE TABLE details
:
`details_id` int(11) NOT NULL AUTO_INCREMENT,
`name` text COLLATE utf8_unicode_ci NOT NULL,
`description` text COLLATE utf8_unicode_ci NOT NULL,
:
`categoryId` int(11) NOT NULL,
`allowedEmployeeIds` text COLLATE utf8_unicode_ci NOT NULL,
parent_id int(11) NOT NULL
parent_table enum('classified_ad', 'article')
The one issue with this would be the performance hit from forcing a join. So therefore there probably won't be much benefit to these schema.
However, if you need to report across the details. For example report on employees allowed, then this will see benefits, for example
SELECT * from details WHERE allowedEmployeeIds = '1';
compared to say, a union when doing it with separate tables.
Related
I'm experiencing a strange issue where my existing table rows (RDS MySQL) are being overwritten. Running a SPA (Vuetify). When a user POSTs data, it overwrites an existing table row, rather than creating a new row.
The weird thing is it happens only sometimes, seemingly at random. Sometimes it will function correctly, other times it overwrites existing data. I cannot link anything in the logs to these events, nor connect it to a specific error.
We have two DATETIME fields that sometimes give incorrect timestamps, other times the timestamp comes in blank 0000-00-00 00:00:00.
The issue seems to have come out of nowhere. Has anyone experienced anything like this?
CREATE TABLE media (
id int(11) NOT NULL AUTO_INCREMENT,
content_id int(11) DEFAULT NULL,
type enum('image','video','pdf','link') COLLATE utf8_unicode_ci DEFAULT NULL,
title varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
url varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
created_at datetime NOT NULL,
updated_at datetime NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=132 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I made an app in which polls are sent to users via push notifications, and they have a short time to answer. We now have a deal with a news agency, and chances are that up to 100 000 people will answer to the polls sent by this company in a short period of time (5 minutes for example).
I have a MySQL database stored on Amazon RDS. Polls are stored in an innodb table:
CREATE TABLE `polls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`categoryId` int(11) NOT NULL,
`question` text CHARACTER SET utf8 NOT NULL,
`expiresAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sentAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`type` int(11) NOT NULL,
`opt1` int(11) DEFAULT '0',
`opt2` int(11) DEFAULT '0',
`text1` varchar(30) CHARACTER SET utf8 DEFAULT NULL,
`text2` varchar(30) CHARACTER SET utf8 DEFAULT NULL,
`special` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3284 DEFAULT CHARSET=latin1;
When people start voting, we increment the value of opt1 or opt2 by 1. For example if someone voted for option 1:
UPDATE polls SET opt1=opt1 +1 WHERE id=4644";
How can I configure MySQL to ensure it can support this load of traffic? I tried to go through the official docs but I can not find a clear overview of the steps I should take. Obviously I can buy a better database on AWS, but I want to be sure I am not making a mistake on scalability here.
By the way, all select queries (when people just read the polls) are sent to a replicated database on AWS.
Many thanks for your help, please ask for more information if I forgot something.
I'd create a separate table for the poll results in order to have rows has short as possible for the update statement to work with.
CREATE TABLE `pollResults` (
`poolId` int(11) NOT NULL AUTO_INCREMENT,
`opt1` int(11) DEFAULT '0',
`opt2` int(11) DEFAULT '0',
PRIMARY KEY (`poolId`)
) ENGINE=InnoDB AUTO_INCREMENT=3284 DEFAULT CHARSET=latin1;
In your polls table, I would put all the text column at the end of the table, but this might not be a big deal.
I have table house
CREATE TABLE `house`
`idhouse` int(11) NOT NULL AUTO_INCREMENT,
`type` mediumint(2) DEFAULT NULL,
`address` varchar(5) DEFAULT NULL,
`county` varchar(5) ...
Now, I have the ads functionality. So want to bring the house into ads
Method 1 (directly add columns for adds)
CREATE TABLE `house`
`idhouse` int(11) NOT NULL AUTO_INCREMENT,
`type` mediumint(2) DEFAULT NULL,
`address` varchar(5) DEFAULT NULL,
`county` varchar(5) ...
`ad_type` mediumint(2) DEFAULT NULL,
`ad_urgency` int(11) DEFAULT NULL,
`ad_status` int(11) DEFAULT NULL,
Method 2 (normalization, split it into table Ads)
CREATE TABLE `house`
`idhouse` int(11) NOT NULL AUTO_INCREMENT,
`type` mediumint(2) DEFAULT NULL,
`address` varchar(5) DEFAULT NULL,
`county` varchar(5) ...
CREATE TABLE `Ads`
`idAds` int(11) NOT NULL AUTO_INCREMENT,
`idhouse` int(11) NOT NULL AUTO_INCREMENT,
`ad_type` mediumint(2) DEFAULT NULL,
`ad_urgency` int(11) DEFAULT NULL,
`ad_status` int(11) DEFAULT NULL,
I'll do more SELECT (90%) operations instead of INSERT, UPDATE, DELETE (10%)
SELECT operations will ALL be based on variables such ad_type, ad_urgency, and ad_status.
I'm taking consideration of performance a lot.
Which method should I use ?
Is using method 1 (SELECT without joining) is faster than method 2 (SELECT with joining) ?
If faster, by how much ? A lot ?
Normalization has alot of advantages.
It helps you avoid redundancies.
It makes your database structure flexible.
It helps you avoid anomalies.
Complex queries are usually easier.
It minimizes your data
...and a few more.
The speed of queries cannot be easily determined by the data structure alone, it is affected by many different aspects like database configuration, server hardware, indexing, data load and much more.
But since less data usually means faster queries (with or without joins): Go for the normalzied approach. The database admin taking the system over will thank you.
This question expects a generic answer to the wide problematic of indexes creation on MySQL database.
Let's take this table example :
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) NOT NULL DEFAULT '0',
`author_id` int(11) unsigned NOT NULL,
`modificator_id` int(11) unsigned DEFAULT NULL,
`category_id` int(11) unsigned DEFAULT NULL,
`title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`headline` text COLLATE utf8_unicode_ci NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`url_alias` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`priority` mediumint(11) unsigned NOT NULL DEFAULT '50',
`publication_date` datetime NOT NULL,
`creation_date` datetime NOT NULL,
`modification_date` datetime NOT NULL,
PRIMARY KEY (`id`)
);
Over such a sample there is a wide range of queries that could be performed on different criterions :
category_id
published
publication_date
e.g.:
SELECT id FROM article WHERE NOT published AND category_id = '2' ORDER BY publication_date;
On many tables you can see a wide range of state fields (like published here), date fields or reference fields (like author_id or category_id). What strategy should be picked to make indexes ?
Which can be developed under the following points:
Make an index on every fields that can be used in query (either as where argument or order by) even if this can lead to have a lot of indexes per table ?
Also make an index on fields that have only a small set of values like boolean or enum, this just does reduce the scope size of the scan by a n factor (assuming n being the number of inputs and every value homogeneously used) ?
I've read that MySQL prior to 5.0 used only one index per request how do the system picks it ? (by choosing the more restrictive one ?)
How does a OR statement is processed ?
How much does this is going to slow insert ?
Does InnoDB/MyISAM change anything to this problem ?
I know the EXPLAIN statement could be used to know whether a request is optimized or not, but a bit of concrete theoretical stuff would really be more constructive than a purely empirical approach !
I'm trying to dedup a table, where I know there are 'close' (but not exact) rows that need to be removed.
I have a single table, with 22 fields, and uniqueness can be established through comparing 5 of those fields. Of the remaining 17 fields, (including the unique key), there are 3 fields that cause each row to be unique, meaning the dedup proper method will not work.
I was looking at the multi table delete method outlined here: http://blog.krisgielen.be/archives/111 but I can't make sense of the final line of code (AND M1.cd*100+M1.track > M2.cd*100+M2.track) as I am unsure what the cd*100 part achieves...
Can anyone assist me with this? I suspect I could do better exporting the whole thing to python, doing something with it, then re-importing it, but then (1)I'm stuck with knowing how to dedup the string anyway! and (2) I had to break the record into chunks to be able to import it into mysql as it was timing out after 300 seconds so it turned into a whole debarkle to get into mysql in the first place.... (I am very novice at both mysql and python)
The table is a dump of some 40 log files from some testing. The test set for each log is some 20,000 files. The repeating values are either the test conditions, the file name/parameters or the results of the tests.
CREATE SHOW TABLE:
CREATE TABLE `t1` (
`DROID_V` int(1) DEFAULT NULL,
`Sig_V` varchar(7) DEFAULT NULL,
`SPEED` varchar(4) DEFAULT NULL,
`ID` varchar(7) DEFAULT NULL,
`PARENT_ID` varchar(10) DEFAULT NULL,
`URI` varchar(10) DEFAULT NULL,
`FILE_PATH` varchar(68) DEFAULT NULL,
`NAME` varchar(17) DEFAULT NULL,
`METHOD` varchar(10) DEFAULT NULL,
`STATUS` varchar(14) DEFAULT NULL,
`SIZE` int(10) DEFAULT NULL,
`TYPE` varchar(10) DEFAULT NULL,
`EXT` varchar(4) DEFAULT NULL,
`LAST_MODIFIED` varchar(10) DEFAULT NULL,
`EXTENSION_MISMATCH` varchar(32) DEFAULT NULL,
`MD5_HASH` varchar(10) DEFAULT NULL,
`FORMAT_COUNT` varchar(10) DEFAULT NULL,
`PUID` varchar(15) DEFAULT NULL,
`MIME_TYPE` varchar(24) DEFAULT NULL,
`FORMAT_NAME` varchar(10) DEFAULT NULL,
`FORMAT_VERSION` varchar(10) DEFAULT NULL,
`INDEX` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`INDEX`)
) ENGINE=MyISAM AUTO_INCREMENT=960831 DEFAULT CHARSET=utf8
The only unique field is the PriKey, 'index'.
Unique records can be established by looking at DROID_V,Sig_V,SPEED.NAME and PUID
Of the ¬900,000 rows, I have about 10,000 dups that are either a single duplicate of a record, or have upto 6 repetitions of the record.
Row examples: As Is
5;"v37";"slow";"10266";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"191977"
5;"v37";"slow";"10268";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"191978"
5;"v37";"slow";"10269";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"191979"
5;"v37";"slow";"10270";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"191980"
5;"v37";"slow";"12766";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"193977"
5;"v37";"slow";"12768";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"193978"
5;"v37";"slow";"12769";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"193979"
5;"v37";"slow";"12770";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"193980"
Row Example: As It should be
5;"v37";"slow";"10266";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/7";"image/tiff";"Tagged Ima";"3";"191977"
5;"v37";"slow";"10268";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/8";"image/tiff";"Tagged Ima";"4";"191978"
5;"v37";"slow";"10269";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/9";"image/tiff";"Tagged Ima";"5";"191979"
5;"v37";"slow";"10270";;"file:";"V1-FL425817.tif";"V1-FL425817.tif";"BINARY_SIG";"MultipleIdenti";"20603284";"FILE";"tif";"2008-11-03";;;;"fmt/10";"image/tiff";"Tagged Ima";"6";"191980"
Please note, you can see from the index column at the end that I have cut out some other rows - I have only idenitified a very small set of repeating rows. Please let me know if you need any more 'noise' from the rest of the DB
Thanks.
I figured out a fix - using the count function, I was using a COUNT(*) that just returned everything in the table, by using a COUNT (distinct NAME) function I am able to weed out the dup rows that fit the dup critera (as set out by the field selection in a WHERE clause)
Example:
SELECT `PUID`,`DROID_V`,`SIG_V`,`SPEED`, COUNT(distinct NAME) as Hit FROM sourcelist, main_small WHERE sourcelist.SourcePUID = 'MyVariableHere' AND main_small.NAME = sourcelist.SourceFileName
GROUP BY `PUID`,`DROID_V`,`SIG_V`,`SPEED` ORDER BY `DROID_V` ASC, `SIG_V` ASC, `SPEED`;