Should I use another column to show whether LONGTEXT contains data? [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have a doubt (probably very basic) when it comes to know if a field has been stored with information in the database.
I'm working with Laravel and MySQL as DB engine.
The situation is as follows:
I have a field in the database that stores information (in this case it is a LONGTEXT field with a large amount of information in it). This field stores information in an automated way (by means of a CRON).
When listing the information related to the records of that table, I need to know if the field in question contains information or not.
At first I had thought of including another field (column) in the same table that tells me if the field is empty or not. Although I consider that this would be a correct way to do it, on the other hand I think that I could save this column by simply checking if the field in question is empty or not. However, I'm not sure if this would be the right way to do it and if this could affect the performance of the application (I don't know how MySQL does exactly this comparison or if it could be optimised by making use of the new field).
I hope I have explained myself correctly.
Schematically, the options are:
Option 1:
Have a single field (very large amount of information).
When obtaining the list with the records, check in the corresponding search if the field in question contains information.
Option 2:
Have two fields: one of them contains the information and the other is a boolean that indicates if the first one contains information.
When obtaining the list of records, look at the boolean.
The aim of the question is to use good practices as well as to optimise both the search and minimise the impact on the performance of the process.
Thank you very much in advance.

It takes extra work for MySQL to retrieve the contents of a LONGTEXT or any other BLOB / CLOB column. You'll incur that work even if your query says.
SELECT id FROM tbl WHERE longtext IS NOT NULL /* slow! */
or
SELECT id FROM tbl WHERE CHAR_LENGTH(longtext) >= 1 /* slow! */
So, yes, you should also use another column to indicate whether the LONGTEXT column is populated if you need to run a lot of queries like that.
You could consider using a generated -- virtual --- colum like this for the purpose.
textlen BIGINT GENERATED ALWAYS AS (CHAR_LENGTH(longtext)) STORED
The generated column will get its value at the time you INSERT or UPDATE the row. Then WHERE textlen >= 1 will be fast. You can even put an index on it.
Go for the length rather than the Boolean value. It doesn't take significantly more space, it gives you slightly more information, and it gives you a nice way to sanity-check your logic during testing.

Related

What is a good way to save multiple file upload path to database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 months ago.
Improve this question
When I save multiple file, I convert them to json type and save them in a file column of Varchar type.
Ex )
title
content
file
game
darksoul, M&B, skyrim
["/upload/test1.jpg","/upload/test2.jpg","/upload/test3.jpg"]
book
bookbookbookbookbook
["/upload/test5.jpg","/upload/test10.jpg"]
And when loading multiple file, convert the string format back to json and expose it to the screen.
I don't think the method looks very good.
So i will be going to change the file column from varchar to json type so that you don't have to change the type when loading.
However, I think there is a better way to save/load multiple files in DB.
If you know that, please tell me..!
Thank you.
Please let me know if you have any further questions or if my explanation seems strange.
I'll fix it right away.
You could indeed use the JSON datatype, and that would work okay.
MySQL is a relation database though, which means that data is placed in tables with same properties and then referred to other rows from other tables via identifies (typically primary keys).
So what you could do is make another table called fileuploads which will contain columns:
id (primary key, autoincrement)
path (the path to the uploaded file eg "/upload/test1.jpg")
foreign_id (this must contain the id to the table you have shown in the question)
The advantage of doing this is, that you have a strict data structure. If you want to add extra information on the different fileuploads you will be forced to ensure that the information is added to all the existing rows.
If you were to add extra information on the JSON, that is also possible, but you would have a loose datastructure which could cause bugs in your application code (you are not garantied that the "columns" exists).
You furthermore have the option to make use of the standard MySQL features for tables such as indexes, partitions etc.
If I were you I would change the data type to JSON, and then if demands increase in the future I would right away refactor to a separate table.

Storing Sub-Questions in a database for a Survey/Questionnaire System

I am currently in the process of creating my tables for a survey/questionnaire system. As I got to creating the questions table, I think I came across a slight issue that could impact the whole application if I continue. Within my questions table, I have a column called "subBelongsToQuestion", which is an integer value for identifying which sub-questions belong to which parent questions (if any). Then in my answers table, I have a column called "responseRevealSubQuestion", which is an integer value for identifying which sub-questions to reveal if the trigger answer within the "responseRevealSubQuestion" column value matches with the "response" column value.
So for example, if a user answered yes to a question such as "Do you like cheese?", then a sub-question would appear saying "What do you like about cheese?".
I am wanting to convert this vision into a database format and I wasn't sure if I should continue with the approach I am using, or to change? It is so that if say a user deletes a question that contains sub-questions, then the application can run the required code to also delete the sub-questions and trigger answers as well.
Usually for survey apps you dont use SubQuestions, you define flow conditions
Imagine you have this questions on your db
Q_ID Question
1 Do you like cheese?
2 What do you like about cheese?
3 Do you like meat?"
4 What do you like about meat?
5 ...
Then you have a flow table to validate after one answer.
Q_FROM Q_VALUE Q_TO
1 NO 3
3 NO 5
In this case you only take detour for NO answer. otherwise you continue with the next question.
After your end each question you do
SELECT Q_to
FROM FlowTable
WHERE Q_from = #CurrentQuestion
AND Q_value = #CurrentAnswer

Updating the dependent table based on the primary table

I am trying to build a table for Multiple Choice Question system where each question has unbounded number of choices to select from(Not a fixed number of choices). These number of choices vary from question to question.I am trying to build a database which stores the question as well as the choices.
Table Question
{ // Though just two fields are shown, there are many fields in the table actually
questionId;
question;
}
Table Choices{
choiceId;
questionId;
choice;
}
One can argue that we can dynamically enter the choice directly into Question Table by having a field but this duplicates the other field data. Like if we have 10 choices for a single Question,then we would have 10 rows in Question table with a lot of duplication. So I have separated the Tables as Question and Choice.
The main problem here. We do not know what the question Id is till the question is actually created. We cannot use the Question Id from the Questions table during entering data into the Choices Table. Any suggestion on how to do this?
Your structure would be able to handle the requirement you are looking for. In the table Choices you can use a primary key combining questionID and choiceID so that you can use choideIDs starting from 1 for each of the questions rather than trying to find out which ID the choices start for each question.
As for your problem on not knowing what questionID is generated, assuming your questionID is an auto_increment column, you can use your last_insert_id function in whatever programming language you are using to find out what questionID was generated by the last insert. As you will be having multiple entries for the choices, it would be hard for you to do this in a single SQL insert command.
If you are using Entity Framework...
You should save Question (Even field "question" is empty) and get ID...
If User Cancels everything just remove that question by ID...

Subtable type stucture in MySQL [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I would like to know if it is possible to have the following database behaviour:
Create a USER table, with primary key USER_ID
Data comes in from external source: e.g. "USER_ID, TIMESTAMP, DATA"
For each user in the USER table, create a table to store only data entries pertinent to USER_ID, and store all incoming Data with the correct USER_ID into that table
When querying all the data entries for a specific USER_ID, just return all rows from that table.
I could of course do this all in one table "ALLDATALOG" and then search for all entries in ALLDATALOG that contain USER_ID, but my concern is that as the ALLDATALOG table grows, searches will take too long.
You should not split your tables like that. You will want an index on the USER_ID column in your data log table. Searches do become slower as data size increases, but your strategy will not necessarily mitigate that. It will however make your application more complex to write, harder to debug, and quite likely actually slow it down.
You should also consider unpacking that data blob into additional columns and tables as appropriate in order to take advantage of the relational nature of the database.
How many rows do you expect the table to hold over time? Thousands? Millions? Billions? At what rate do you expect rows to be added?
Suggestion 1: As answered by Jonathan, do not split the tables. It won't help you increase the speed.
Suggestion 2: If you still want to split the tables. Use the logic in your PHP code. Check if the table for a particular use already exists or not. If it does, insert values in it and if it doesn't create a new one. Should be quite straight forward. If you want me to share code for this, let me know.

What is the most efficient way to store a sort-order on a group of records in a database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Assume PHP/MYSQL but I don't necessarily need actual code, I'm just interested in the theory behind it.
A good use-case would be Facebook's photo gallery page. You can drag and drop a photo on the page, which fires an Ajax event to save the new sort order. I'm implementing something very similar.
For example, I have a database table "photos" with about a million records:
photos
id : int,
userid : int,
albumid : int,
sortorder : int,
filename : varchar,
title : varchar
Let's say I have an album with 100 photos. I drag/drop a photo into a new location and the Ajax event fires off to save on the server.
Should I be passing the entire array of photo ids back to the server and updating every record? Assume input validation by "WHERE userid=loggedin_id", so malicious users can only mess with the sort order of their own photos
Should I be passing the photo id, its previous sortorder index and its new sortorder index, retrieve all records between these 2 indices, sort them, then update their orders?
What happens if there are thousands of photos in a single gallery and the sort order is changed?
What about just using an integer column which defines the order? By default you assign numbers * 1000, like 1000, 2000, 3000.... and if you move 3000 between 1000 and 2000 you change it to 1500. So in most cases you don't need to update the other numbers at all. I use this approach and it works well. You could also use double but then you don't have control about the precision and rounding errors, so rather don't use it.
So the algorithm would look like: say you move B to position after A. First perform select to see the order of the record next to A. If it is at least +2 higher than the order of A then you just set order of B to fit in between. But if it's just +1 higher (there is no space after A), you select the bordering records of B to see how much space is on this side, divide by 2 and then add this value to the order of all the records between A and B. That's it!
(Note that you should use transaction/locking for any algorithm which contains more than a single query, so this applies to this case too. The easiest way is to use InnoDB transaction.)
Store as a linked list, sortorder is a foreign key reference to the next photo_id in the set.
this would probably be a 'linked list' construct.
To me the second method of updating is the way to go (update only the range that changes). You are mentioning "What happens if there are thousands of photos in a single gallery ...", and to me that is never going to happen. Lets take your facebook example. Facebook doesn't show thousands of photos on one page, they split it up to about 10-20 per page.
The way I'd do this in a nonrelational database is to store a list of photo IDs on the 'album' entity/record, in the order desired. Reordering the photos results in reordering the list, and only a single database write.
Some SQL databases (Eg, PostgreSQL) have native list datatypes, but MySQL doesn't. You could serialize the list as a string or binary on MySQL.
3rd-normal-form trained database gurus will scream at you that this is a terrible approach, but RDBMSes are optimized for OLAP type queries, where query flexibility is more important than read performance. Webapps are best written with a 'write heavy, read light' strategy in mind, and this sort of denormalization is exactly in line with that.