EDIT: As to avoid confusion because of the word "tables" having two meanings: Every time I refer to "100 tables", I'm refering to 100 physical tables in a single business available for booking each day.
I've come to the conclusion that for a table-booking system such as the one I'm trying to develop, a single MySQL table with a unique index made up of tableid and date will suffice, meaning I can have my table reservations in a single table and according to my research store at least +100 years into the future without any performance issues. Please correct me if I'm wrong.
Further explained: I have a set of let's say 100 bookable tables just to not run out of tables (for this project I will rarely require more than 30, but you never know). Each table is numbered 1-100 and the combination of table number (tableid) and the date is a unique entry in the database. I.e. you can only have the row of table 4 on date 2014-06-18 once. That's fine, and I can just generate 100 rows for each day for the next 100 years, yes?
I use a BIGINT as Primary Key for each row with Auto-Increment starting at 1.
Now - what is the easiest solution to generating all these rows in MySQL? Each row just needs to have INSERT INTO tables (TABLEID,DATE) VALUES ([id],[date]) as the rest of the fields populate by default. Dates can just start from today as this project has no business in the past. I did some Googling but cannot really figure out the difference between a script and a stored procedure, and the variable declaration for each seems to be different and confuses me a bit.
This should be fairly simply though, and the question is also aimed at whether this approach is a good practice or not.
Thanks
Related
I'm currently designing the database architecture for a product that I'm in the process of building. I'm simply drawing out everything in an Excel file before I begin creating everything in MySQL.
Currently, I have two different tables that are almost identical to one another.
TABLE A that contains the most recent values of each data point for each user.
TABLE B that contains daily records of each data point for each user.
My reasoning for creating TABLE A, instead or relying solely on TABLE B, is that the number of rows in TABLE B will grow everyday by the number of customers I have. For instance, say I have 20,000 customers, TABLE B will grow by 20,000 rows every single day. So by creating TABLE A, I'll only ever have to search through 20,000 records to find the most recent values of each data point for each user since I'll be updating these values everyday; whereas for TABLE B, I'd have to search through an ever-growing number of rows for the most recent insertion for each user.
Is this acceptable or good practice?
Or should I just forget about TABLE A to reduce "bloat" in my database?
This is not the right approach. You basically have two reasonable options:
Use indexes on the history table to access the most recent day's records.
Use table partitioning to store each day in a separate partition.
You can manage two tables, but that is a lot of trouble and there are built-in methods to handle this situation.
In situations where I need both "current" data and a "history", that is what I do -- One table with the current data and one with history. They are possibly indexed differently for the different usage, etc.
I would think through what is different between "history" and "current", then make the tables different not identical.
When a new record comes in (or 20K rows in your case), I will at least put it into Current. I may also write it to History, thereby keeping it complete (at the cost of a small redundancy). Or I may move the row(s) to History when the next row(s) come into Current.
I see no need for PARTITIONing unless I intend to purge 'old' data. In that case, I would use PARTITION BY RANGE(TO_DAYS(..)) and choose weekly/monthly/whatever such that the number of partitions does not exceed about 50. (If you pick 'daily', History will slow down after a few months, just because of the partitioning.)
The 20K rows each day -- Are many of them unchanged since yesterday? That is probably not the proper way to do things. Please elaborate on what happens each day. You should avoid having duplicate rows in History (except for the date).
Question from a total mySQL newbie. I'm trying to build a table containing information about machine parts (screws, brackets, cylinders, etc), and each part corresponds to a machine that the part belongs to. The database will be designed so that whenever the client reads from the table, all of the parts from one specified machine will be selected. I'm trying to figure out the fastest way in which all rows falling under a certain category can be read from the disk.
Sorting the table is not an option as many people might be adding rows to the table at once. Using a table for each machine is not practical either, since new machines might be created. I expect it to have to handle lots of INSERT and SELECT operations, but almost no DELETE operations. I've come up with a plan to quickly identify each part belonging to any machine, and I've come to ask if it's practical:
Each row containing the data for a machine part will contain the row number of the previous part and the next part for the same machine. A separate table will contain the row number of the last part of each machine that appears on the table. A script could follow the list of these 'pointers,' skipping to different parts of the table until all of the parts were found.
TL;DR
Would this approach of searching a row by its row number be any faster than searching instead by an integer primary key (since a primary key does not necessarily indicate a position on the table)? How much faster would it be? Would it yield noticeable performance improvements over using an index?
This would be a terrible approach. Selecting rows which match some criteria is a fundamental feature of MySQL (or any other DB engine really...).
Just create a column called machine_id in your parts table and give an id to each machine.
You could put your machines in a machines table and use their primary key in the machine_id field of the parts table.
Then all you have to do to retrieve ALL parts of machine 42 is:
SELECT * FROM parts WHERE machine_id = 42;
If your database is massive you may also consider indexing the machine_id column for better performances.
I have a basic question about database designing.
I have a lot of files which I have to read and insert them in database. Each file has some thousand lines and each line has about 30 fields (by these types: small int, int, big int, varchar, json). Of course I use multi threads along with bulk inserting in order to increase insert speed (finally I have 30-40 millions records).
After inserting I want to have some sophisticated analysis and the performance is important to me.
Now I get each line fields and I'm ready to insert so I have 3 approaches:
1- One big table:
In this case I can create a big table with 30 columns and stores all of the files fields in that. So there is a table with huge size which I want to have a lot of analysis on it.
2- A fairly large table (A) and some little tables (B)s
In this case I can create some little tables which consist of the columns that have fairly identical records if we separate them from the other columns. So these little tables just has some hundred or thousand records instead of 30 millions records. So in fairly large table (A), I emit the columns which I put them in another table and I use a foreign key instead of them. Finally I has a table (A) with 20 columns and 30 millions records and some tables (B) with 2-3 columns and 100-50000 records for each of them. So in order to analysis the table A, I have to use some joins ,for example in select and ...
3- just a fairly large table
In this case I can create a fairly large table like table A in above case (with 20 columns) and instead of using foreign keys, I use a mapping between source columns and destination columns (this is something like foreign keys but has a little difference). For example I have 3 columns c1,c2,c3 that in case 2, I put them in another table B and use foreign key to access them, but now I assign a specific number to each distinctive records consist of c1,c2,c3 at inserting time and store the relation between the record and its assigned value in the program codes. So this table is completely like the table A in case number 2 but there is no need to use join in select or ...
While the inserting time is important, the analysis time that I will have is more important to me, so I want to know your opinion about which of these case is better and also I will glad to see the other solutions.
From a design perspective 30 to 40 million is not that bad a number. Performance is fully dependent on how you would design your DB to be.
If you are using SQL Server then you could consider putting the large table on a separate database file group. I have worked on one case in a similar fashion where we had around 1.8 Billion record in a single table.
For the analysis if you are not going to look into the entire data in one shot. You could consider a vertical partitioning of the data. You could use a partition schema based on your need. Some sample could be to split the data as yearly partitions and this will help if your analysis will be limited to a years worth of data(just an example).
The major thing would be de-normalization /normalization based on your need and of course non clustered/clustered indexing of the data. Again this will depend on what sort of analysis queries you would be using.
A single thread can INSERT one row at a time and finish 40M rows in a day or two. With LOAD DATA, you can do it in perhaps an hour or less.
But is loading the real question? For doing grouping, summing, etc, the question is about SELECT. For "analytics", the question is not one of table structure. Have a single table for the raw data, plus one or more "Summary tables" to make the selects really fast for your typical queries.
Until you give more details about the data, I cannot give more details about a custom solution.
Partitioning (vertical or horizontal) is unlikely to help much in MySQL. (Again, details needed.)
Normalization shrinks the data, which leads to faster processing. But, it sounds like the dataset is so small that it will all fit in RAM?? (I assume your #2 is 'normalization'?)
Beware of over-normalization.
I have to store 2 dates for almost every table in database e.g. tbl_clients tbl_users tbl_employers tbl_position tbl_payments tbl_quiz tbl_email_reminder etc.
Most times i store "date_created" and "date_modified" sometimes few extra dates.
Whats would be the best approach to storing dates in MySQL database performance wise (site that might have a lot of customers later maybe 500,000+)
Option 1: Add 2 columns for dates to each table.
Option 2: Create table "tbl_dates" exclusively for dates.
I was thinking option 2 will work faster as i only need dates displayed on one specific page e.g. "report.php" am i right?
Also how many columns i should put max in "tbl_dates" without driving it too slow.
For the general case (a row creation and a row modification timestamp) I would put them in the same table as they relate to. Otherwise, you'll find that the consequent joins you will need will slow down your queries more than the simple approach.
In any case, you don't want to get into the habit of building "general tables" to which many tables can JOIN - this is because ideally you would create foreign keys for each relationship, but this won't work if some rows belong to tbl_clients, some to tbl_users... (etc).
Admittedly your MySQL engine may prevent you from using foreign keys - depending on which one you're using - but (for me at least) the point stands.
I have a MySQL table which needs to store several bitfields...
notification.id -- autonumber int
association.id -- BIT FIELD 1 -- stores one or more association ids (which are obtained from another table)
type.id -- BIT FIELD 2 -- stores one or more types that apply to this notification (again, obtained from another table)
notification.day_of_week -- BIT FIELD 3 -- stores one or more days of the week
notification.target -- where to send the notification -- data type is irrelevant, as we'll never index or sort on this field, but
will probably store an email address.
My users will be able to configure their notifications to trigger on one or more days, in one or more associations, for one or more types. I need a quick, indexable way to store this data.
Bit fields 1 and 2 can expand to have more values than they do presently. Currently 1 has values as high as 125, and 2 has values as high as 7, but both are expected to go higher.
Bit field 3 stores days of the week, and as such, will always have only 7 possible values.
I'll need to run a script frequently (every few minutes) that scans this table based on type, association, and day, to determine if a given notification should be sent. Queries need to be fast, and the simpler it is to add new data, the better. I'm not above using joins, subqueries, etc as needed, but I can't imagine these being faster.
One last requirement -- if I have 1000 different notifications stored in here, with 125 association possibilities, 7 types, and 7 days of the week, the combination of records is too high for my taste if just using integers, and storing multiple copies of the row, instead of using bit fields, so it seems like using bit fields is a requirement.
However, from what I've heard, if I wanted to select everything from a particular day of the week, say Tuesday (b0000100 in a bit field, perhaps), bit fields are not indexed such that I can do...
SELECT * FROM \`mydb\`.\`mytable\` WHERE \`notification.day_of_week\` & 4 = 4;
This, from my understanding, would not use an index at all.
Any suggestions on how I can do this, or something similar, in an indexable fashion?
(I work on a pretty standard LAMP stack, and I'm looking for specifics on how the MySQL indexing works on this or a similar alternative.)
Thanks!
There's no "good" way (that I know of) to accomplish what you want to.
Note that the BIT datatype is limited to a size of 64 bits.
For bits that can be statically defined, MySQL provides the SET datatype, which is in some ways the same as BIT, and in other ways it is different.
For days of the week, for example, you could define a column
dow SET('SUN','MON','TUE','WED','THU','FRI','SAT')
There's no builtin way (that I know of of getting the internal bit represntation back out, but you can add a 0 to the column, or cast to unsigned, to get a decimal representation.
SELECT dow+0, CONVERT(dow,UNSIGNED), dow, ...
1 1 SUN
2 2 MON
3 3 SUN,MON
4 4 TUE
5 5 SUN,TUE
6 6 MON,TUE
7 7 SUN,MON,TUE
It is possible for MySQL to use a "covering index" to satisfy a query with a predicate on a SET column, when the SET column is the leading column in the index. (i.e. EXPLAIN shows 'Using where; Using index') But MySQL may be performing a full scan of the index, rather than doing a range scan. (And there may be differences between the MyISAM engine and the InnoDB engine.)
SELECT id FROM notification WHERE FIND_IN_SET('SUN',dow)
SELECT id FROM notification WHERE (dow+0) MOD 2 = 1
BUT... this usage is non-standard, and can't really be recommended. For one thing, this behavior is not guaranteed, and MySQL may change this behavior in a future release.
I've done a bit more research on this, and realized there's no way to get the indexing to work as I outlined above. So, I've created an auxiliary table (somewhat like the WordPress meta table format) which stores entries for day of week, etc. I'll just join these tables as needed. Fortunately, I don't anticipate having more than ~10,000 entries at present, so it should join quickly enough.
I'm still interested in a better answer if anyone has one!