My server is using a MySQL DB, connecting to it via the C++ connector. I'm nearing production and I've been spending some time trying to break things as part of hardening the server.
One action item I had was to see what would happen if I execute a statement with a string that is longer than VARCHAR. For example, if I have a column defined as VARCHAR(4) and then set it to the string "hello".
This of course throws an exception with the error code 1406 (Data too long for column).
What I was wondering was if there was a good or standard way to defend against this? Obviously one thing is to check against the string length and truncate manually. I can do this, however there are many tables and several columns with VARCHAR. So my worry is updating server code if one of the columns using VARCHAR has its length increased (i.e. code maintainability)
Note that the server does do some validation up front. I'm just trying to defend against a subtle bug or corner case that lets something slip through.
A couple of other options on the table are to disable strict so it will give a warning and truncate or to convert VARCHAR to TEXT.
I was wondering a few things.
Is there a recommended method to handle this situation?
What are the disadvantages of disabling strict?
Is it worth (and is it possible) to query the DB at runtime the VARCHAR lengths? Note that I'm using the C++ connector. I suppose I could also write a tool that is run before compiling which would extract out VARCHAR lengths from the SQL code used to generate tables. But that then makes me wonder is I'm over engineering this.
I'm just sorting through the possible approaches now and thought I'd seek advice from those with more experience with MySQL.
As an experience database engineer I would recommend a combination of the follow two strategies:
1) If you that know that a there is a chance, however small, that data for your varchar(4) could go higher than 4 then make the varchar field larger than 4. For example, if you expect that the field can go as high as 8 then set the field to varchar(10). The beauty of using a varchar field instead of a char is that a varchar will only use whatever storage it needs.
2) If there is a real issue with data constantly being larger than the varchar field length then you should right your own exception handler to trap for the 1406 error. For the exception to work properly you will need to come up with some type of strategy on exactly how you want to handle the exception. For example, you could send an error to the user and ask them to fix the problem, you could accept the data but truncated it so it fits into the field, or you could send the error to a log file to get fixed at a later time.
Related
I am using EF to update a field in my MySql DB and ran across the issue of attempting to save data that is not allowable due to collation. For example, ^âÂêÊîÎôÔûÛŵŷ has characters outside the column with character set of latin1.
Running an update/insert with above example I get the exception:
The database update did not take place due to..Incorrect string value
I know what the problem is, but I don't want to keep the characters, the data being provided is usually via UI which would often control what is passed in, however it is also callable by API allowing whatever data the caller would like to send. In the above case, I would like to drop those characters or just replace with a question mark, basically ignore them.
This system already exists in an older language and the rule to (silently..) ignore them exists, I need the error not to be raised and for it to save what it can. I have seen how I can modify the statements for this, or how I can modify the string data coming in. I have 1000s of these. Is there another method to achieve this?
Trying to do sqoop export from HDFS to MYSQL. Getting mapper error because of different date format between input file vs MySQL. Input file have data in mm/dd/yyyy format where in SQL it is date. I guess MySQL is yyyy-mm-dd.
Because of same getting an error as:
caused by: java.lang.RuntimeException: Can't parse input data: '2/18/2019'
My limitation as the source is from different provider and we can not request them to change it. So in this situation what options do i have? Any suggestions
edit
Unfortunately this answer may not be for you. If you are using a program that you don't have control over the source for, this won't help you.
I'll leave it up only because it is a common question that I see with people new to rdbms programming.
Original answer
Why are you treating dates and times as strings? For that matter why are you building SQL for each row? On the MySql side there is a better way to handle that.
Most RDBMS support the concept of a Prepared Statement, although the implementation differs by vendor. Java had support through jdbc for all of the major vendors flavor of prepared statement, so you don't need to worry about the implementation details.
Every time you execute SQL the database engine goes through several phases before the data is applied or returned. The first and most time consuming phase, called the "prepare" phase, is to analyze the SQL string and computer the ideal access path to complete it with. 50 to 80 percent of the SQL "execution" time is spent in this "Prepare" phase.
A simple optimization is to recognize that the ideal access path in a mature database rarely varies, which allows the programmer to prepare the statement once, return a handle to the access path, then pass only the handle and it's parameters across the wire from the application to the database. This minimizes overheads of access path computation, data type conversions, and network communication while automatically protecting from SQL injection attacks and taking care of such administrivia as date formatting.
In Java, this is represented with the PreparedStatement class.
Always use prepared statements. If used properly, they will eliminate 50 to 80% of the overheads of each database call. They also allow you to choose more simply by using native java types and simply passing the value into the execution with the PS.
Using PreparedStatement also eliminates much of the need to sanitize inputs. By it's nature, you don't need to worry about special characters, apart from those the target will reject (example: dropping a character with a codeine greater than 127 into a database that was built for ASCII only on a platform that enforces character set).
If you need to take input as String, and convert to Date, use java's DateFormat class.
Actually i need Your help in datastage 11.7 tool. i am reading a AES encrypted column from my source and type of column is nvarchar so when we start our job and read data from source. The job run Successfully and exactly same data is moved to my target data base with same column type.
And the Problem Actually occur is that when i query the data to check whether the my source and target values are same, the query does not show any result and visually if we look source,target value they are same value but sql statement return nothing and the database is Vertica.
Column value are special Alpha numeric and special characters like �D�&7��x��d$�Q
I'm not at all sure this is even properly possible via datastage - treated encrypted data and a varchar. Some DB's have internal keys that go with the data that require decrypting before extracting. I'm assuming that decrypting, transporting, landing and then encrypting is not an option.
But if I had to take a stab in the dark.
The very first thing I'd check is that the character set and collation is the same on both databases on a table level. A difference can result in blank results on the target side.
Also check that the NLS map in the datastage (map for stages and collation locale) is set accordingly. What that settings is, I don't know but making it the same in DataSTage and the DBs would be ideal ; Google. You need to comment on what is already set in the DB's. And run tests. I'm not sure the DataStage default of ISO-8859-1 will work.
Please post your solution if you find one.
If I have a table with some varchar columns, whose lengths will obviously be limited, then I would have to show on the front-end whenever insertion of too large values fails. For example, if the limit on the name column is 20, but someone enters a name that is 30 characters long, I should notify them of the error.
This gets to be a lot of work when the application becomes big.
What I would like, to make life a bit easier, and skip taking care of individual limits for every step of the users' journey, is to just carry on with the normal functioning of the application, but show them a warning that their data was not saved in entirety because it was too long. So if MySQL would provide some method that would allow me to ask if all data was saved in its entirety, or some strings were truncated due to their respective varchar fields being shorter (or maybe a property on the MySQLi object that I can check), then my main method for saving data in the database could always check that after any inserts or updates have been executed and just issue a warning on the next page load.
Does MySQL provide such functionality?
Sure you can. MySQL throws a warning, when data is truncated.
You can check is any warning occured by checking ##warning_count
SELECT ##warning_count;
Or
SHOW COUNT(*) WARNINGS;
To check what warning has occured:
SHOW WARNINGS [LIMIT [offset,] row_count]
More info:
http://dev.mysql.com/doc/refman/5.0/en/show-warnings.html
I'm working on implementing and designing my first database and have a lot of columns with names and addresses and the like.
It seems logical to place a CHECK constraint on these columns so that the DB only accepts values from an alphanumeric range (disallowing any special characters).
I am using MySQL which, as far as I can tell doesn't support user defined types, is there an easy way to do this?
It seems worth while to prevent bad data from entering the DB, but should this complex checking be offloaded to the application instead?
You can't do it with a CHECK constraint if you're using mysql (question is tagged wth mysql, so I presume this is the case) - mysql doesn't support check constraints. They are allowed in the syntax (to be compatible with DDL from other databases), but are otherwise ignored.
You could add a trigger to the table that fires on insert and update, that checks the data for compliance, but if you find a problem there's no way to raise an exception from a mysql stored proc.
I have used a workaround of hitting a table that doesn't exist, but has a name that conveys the meaning you want, eg
update invalid_characters set col1 = 1;
and hope that the person reading the "table invalid_characters does not exist" message gets the idea.
There are several settings that allows you to change how MySQL handles certain situation (but those aren't enough) for your case.
I would stick with data validation on application side but if you need validation on database side, you have two options:
CREATE PROCEDURE that would validate and insert data, do nothing or raise error by calling SIGNAL
CREATE TRIGGER ... BEFORE INSERT which would validate data and stop insert like suggested in this stackoverflow answer