Subtable type stucture in MySQL [closed] - mysql

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I would like to know if it is possible to have the following database behaviour:
Create a USER table, with primary key USER_ID
Data comes in from external source: e.g. "USER_ID, TIMESTAMP, DATA"
For each user in the USER table, create a table to store only data entries pertinent to USER_ID, and store all incoming Data with the correct USER_ID into that table
When querying all the data entries for a specific USER_ID, just return all rows from that table.
I could of course do this all in one table "ALLDATALOG" and then search for all entries in ALLDATALOG that contain USER_ID, but my concern is that as the ALLDATALOG table grows, searches will take too long.

You should not split your tables like that. You will want an index on the USER_ID column in your data log table. Searches do become slower as data size increases, but your strategy will not necessarily mitigate that. It will however make your application more complex to write, harder to debug, and quite likely actually slow it down.
You should also consider unpacking that data blob into additional columns and tables as appropriate in order to take advantage of the relational nature of the database.
How many rows do you expect the table to hold over time? Thousands? Millions? Billions? At what rate do you expect rows to be added?

Suggestion 1: As answered by Jonathan, do not split the tables. It won't help you increase the speed.
Suggestion 2: If you still want to split the tables. Use the logic in your PHP code. Check if the table for a particular use already exists or not. If it does, insert values in it and if it doesn't create a new one. Should be quite straight forward. If you want me to share code for this, let me know.

Related

Should I use another column to show whether LONGTEXT contains data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have a doubt (probably very basic) when it comes to know if a field has been stored with information in the database.
I'm working with Laravel and MySQL as DB engine.
The situation is as follows:
I have a field in the database that stores information (in this case it is a LONGTEXT field with a large amount of information in it). This field stores information in an automated way (by means of a CRON).
When listing the information related to the records of that table, I need to know if the field in question contains information or not.
At first I had thought of including another field (column) in the same table that tells me if the field is empty or not. Although I consider that this would be a correct way to do it, on the other hand I think that I could save this column by simply checking if the field in question is empty or not. However, I'm not sure if this would be the right way to do it and if this could affect the performance of the application (I don't know how MySQL does exactly this comparison or if it could be optimised by making use of the new field).
I hope I have explained myself correctly.
Schematically, the options are:
Option 1:
Have a single field (very large amount of information).
When obtaining the list with the records, check in the corresponding search if the field in question contains information.
Option 2:
Have two fields: one of them contains the information and the other is a boolean that indicates if the first one contains information.
When obtaining the list of records, look at the boolean.
The aim of the question is to use good practices as well as to optimise both the search and minimise the impact on the performance of the process.
Thank you very much in advance.
It takes extra work for MySQL to retrieve the contents of a LONGTEXT or any other BLOB / CLOB column. You'll incur that work even if your query says.
SELECT id FROM tbl WHERE longtext IS NOT NULL /* slow! */
or
SELECT id FROM tbl WHERE CHAR_LENGTH(longtext) >= 1 /* slow! */
So, yes, you should also use another column to indicate whether the LONGTEXT column is populated if you need to run a lot of queries like that.
You could consider using a generated -- virtual --- colum like this for the purpose.
textlen BIGINT GENERATED ALWAYS AS (CHAR_LENGTH(longtext)) STORED
The generated column will get its value at the time you INSERT or UPDATE the row. Then WHERE textlen >= 1 will be fast. You can even put an index on it.
Go for the length rather than the Boolean value. It doesn't take significantly more space, it gives you slightly more information, and it gives you a nice way to sanity-check your logic during testing.

Database Design Pattern for Multiple Large Lists [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Consider a trip itinerary. There are 20 possible stops on a tour. A standard tour involves stops 1 through 20 in order. However, each user may create their own tour consisting of 5 or more stops in any order with possibility for repeats. What is the most efficient way to model this in a database?
If we use a join table
user_id, stop_id, order
we would have millions of records very quickly but we could easily pull the stop & user attributes on queries.
If we stored the stops as an array,
user_id, stop_id_array_in_order
we have a much smaller, non-normalized table and we cannot easily access the stop attributes.
Are there other options that allow for accessing of parent attributes while minimizing table size?
I would define the entities and create tables for them with the relations between them in separate tables as you described in the first example:
users table
tours table
stops table
tours_users table (a User can go to a Tour more than once)
stops_order table: stop_id, order, tours_users_id
For querying the tables, for any user you want to check their tour you can achieve this with the tours_users table , if the stops needs to be retrieved , you can easily join the tours_users table with the stops_order table through the tours_users_id.
If the tables are indexed correctly, there should be no problem with performance and you will be using the relational database engine as you supposed to.
You're thinking that saving some space will help you. It won't. It's also arguable how much space you'd actually save.
You'd also be using an unordered data structure - that's something you don't want. You want ordered structure (table) which can relate to other records - and that's exactly the reason why we normalize tables - so we can extrapolate all kinds of data without altering physical location. The other benefit is that ordered structures can be indexed and we can reduce the amount of time finding the records. Tradeoff is spending space to keep the index records.
However, millions, billions - even trillions of rows are ok. Just imagine how difficult it would be querying a structure where an array is saved as a comma separated list in a column (or multiple columns). It would be a nightmare to write a query, and performance would go down linearly as amount of records goes up.
TL;DR: keep it normalized.

Dynamically create tables or not [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have reviewed some q&a but thought something specific to my subject would help me get off the fence.
I have an app that calculates pricing based on several different formulas and hundreds of different material types.
user A may use formula A and material A, B, C
user B uses formula A and material A, B, C, + they want to add a
material that no one else uses Material unique_A
when user A is on the app he doesn't want to see user B's unique material.
I was thinking of using a unique table of materials for each user so that it is "faster??? more efficient??? to grab the list of materials, instead of trying to set up some sort of off, on function that grabbed only the materials the user wants from one global table.
Which way is better? One table or a unique table for each user?
You can have a table of all materials.
materials = (id, name, other attributes...)
and a table of users:
myusers = (id, name, etc....)
then you can have a table that basically represents the many to many relationship between these two:
user_materials = (user_id, material_id)
You can then select the specific materials used by a user by joining these tables. Application wise, this arrangement is better than trying to create a table for each user. Queries will become difficult. This way you also have answer to the question: Which users are using material A?
Unless you have very few users, each with his own stable non changing items,
I don't see any sense in doing this.
Plus , most likely you will not get into performance issues
if you are talking about a domain of users and materials.
It's not like there are millions of either , right?
One "best practice" for databases is to reduce information duplicity. Actually variations of that exists for just about any field of theory there is.
It would mean however that your approach of a unique table per user would not be a good idea.
Not only would it duplicate data, but maintaining such a database would become a gigantic task as the number of users increases.
I would prefer to have a global table of materials, a table of users and a table over which user want's which materials.
The 'one-table-approach' can be considered better because it reduces complexity, both in database and in the code which should access the database, and duplication of information.

Modifying MySQL table [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
How do i modify a MySQL table where hundreds of records are being inserted every second without having any downtime / losing data or errors .
Ex: Adding a new field
Thanks
Gut feel says that you should avoid modifying the table. An alternative would be to add your column to a new table and link to the original table to maintain referential integrity, so the original table remains untouched.
Another, fairly typical approach would be to create a new table with the added column, swap it out with the old table and then add the data back to the new table. Not a great solution though.
MySQL gurus will likely disagree with me.
Well if only inserts i think best would be:
CREATE TABLE the_table_copy LIKE the_table;
ALTER TABLE the_table_copy ADD new_field VARCHAR(60);
RENAME TABLE the_table TO the_table_backup;
RENAME TABLE the_table_copy TO the_table;
But first make a copy of it and try with copy if fast enough. Then do it on real live thing! :)
Well, I would lock the table, add the field, unlock the table. There would then be a little delay on the inserts, but it wouldn't be much of a hassle I guess.
when you say "add a field" you are creating new columns, or just updating values in a column? Both are quite different things. If you are merely updating a column, change your data model to support multiple versions of the same row in the same table, and keep adding the new rows at the end of the table. This gives you the best read-write concurrency.
Other than that, you can take a look at the "Handler Socket" method.
http://yoshinorimatsunobu.blogspot.com/search/label/handlersocket
If you only need to add a column, there is no reason for that to require downtime. The table will be locked for the duration of the ALTER TABLE statement (milliseconds). Any DML submitted during this time will have to wait, so it could briefly impact performance, but it should not cause any exceptions, unless the code somewhere is explicitly checking for locks.

What is the most efficient way to store a sort-order on a group of records in a database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Assume PHP/MYSQL but I don't necessarily need actual code, I'm just interested in the theory behind it.
A good use-case would be Facebook's photo gallery page. You can drag and drop a photo on the page, which fires an Ajax event to save the new sort order. I'm implementing something very similar.
For example, I have a database table "photos" with about a million records:
photos
id : int,
userid : int,
albumid : int,
sortorder : int,
filename : varchar,
title : varchar
Let's say I have an album with 100 photos. I drag/drop a photo into a new location and the Ajax event fires off to save on the server.
Should I be passing the entire array of photo ids back to the server and updating every record? Assume input validation by "WHERE userid=loggedin_id", so malicious users can only mess with the sort order of their own photos
Should I be passing the photo id, its previous sortorder index and its new sortorder index, retrieve all records between these 2 indices, sort them, then update their orders?
What happens if there are thousands of photos in a single gallery and the sort order is changed?
What about just using an integer column which defines the order? By default you assign numbers * 1000, like 1000, 2000, 3000.... and if you move 3000 between 1000 and 2000 you change it to 1500. So in most cases you don't need to update the other numbers at all. I use this approach and it works well. You could also use double but then you don't have control about the precision and rounding errors, so rather don't use it.
So the algorithm would look like: say you move B to position after A. First perform select to see the order of the record next to A. If it is at least +2 higher than the order of A then you just set order of B to fit in between. But if it's just +1 higher (there is no space after A), you select the bordering records of B to see how much space is on this side, divide by 2 and then add this value to the order of all the records between A and B. That's it!
(Note that you should use transaction/locking for any algorithm which contains more than a single query, so this applies to this case too. The easiest way is to use InnoDB transaction.)
Store as a linked list, sortorder is a foreign key reference to the next photo_id in the set.
this would probably be a 'linked list' construct.
To me the second method of updating is the way to go (update only the range that changes). You are mentioning "What happens if there are thousands of photos in a single gallery ...", and to me that is never going to happen. Lets take your facebook example. Facebook doesn't show thousands of photos on one page, they split it up to about 10-20 per page.
The way I'd do this in a nonrelational database is to store a list of photo IDs on the 'album' entity/record, in the order desired. Reordering the photos results in reordering the list, and only a single database write.
Some SQL databases (Eg, PostgreSQL) have native list datatypes, but MySQL doesn't. You could serialize the list as a string or binary on MySQL.
3rd-normal-form trained database gurus will scream at you that this is a terrible approach, but RDBMSes are optimized for OLAP type queries, where query flexibility is more important than read performance. Webapps are best written with a 'write heavy, read light' strategy in mind, and this sort of denormalization is exactly in line with that.