I am trying to find text within text using MySQL. I have a field of values that is somewhat unstructured, but the data entry fortunately is delimited by new lines. I'm trying to see if I can pull the value for "Education", which would be basically a substring that starts after "Education:" and ends with \n new line character in data below:
'Children: 5
Education: College
Employment: Homemaker
Marital Status: Married'
I've looked at the MID function, but since the values for education vary, the length isn't standard. I have searched MySQL string functions, and I have not found a solution that will allow me to search between two positions, including one that is defined by a regex character -- the REGEX simply provides a match, not a position.
SELECT id,MID(value,POSITION('Education:' IN value),30)
FROM client_data
the code performed as expected, but due to fixed length rather than position of \n new line character, results either truncated or included extra characters from subsequent text.
I'm guessing there is a way to do this that I'm just not finding.
You can use REGEXP_SUBSTR to get actual the string that matched the regular expression:
REGEXP_SUBSTR(value, '^Education:.*', 1, 1, 'm')
This gets you the Education line. Then you just need to extract the part after the : from that string:
REGEXP_REPLACE(
REGEXP_SUBSTR(value, '^Education:.*', 1, 1, 'm'),
'^Education:', '')
It happens occasionally that users erroneously enter text with a trailing space in a text column, which is hard to spot visually. This can later cause problems when this text field has to be matched against another where the trailing space is not present. Is it possible in mySQL to enforce that a text string cannot contain a certain character (space in this case)?
Thankful for feedback!
There are a number of ways to achieve what you want:
Check the user input in your application and reject it if it contains a space. If your primary worry is the quality of the user inputs, then this is probably the best way to do this.
You can remove spaces (or just starting / trailing spaces) from the user input either in the application logic or using sql.
If you opt for removing all spaces from the user input in sql, then use the replace() function. If you just want to remove the starting and trailing spaces, then use the trim() function to achieve the desired results.
Using mysql function a simple way is based on trim()
select
trim(' try with trim ')
, length (trim(' try with trim '))
, length (' try with trim ')
from dual ;
I have a CSV file, which has two columns and 4500 rows. In one column, I have several phrases that are surrounded in quotation marks. I need to delete all the text that comes before and after the quotations marks.
For example:
How would you say "Hello, my Friend" when speaking outside?
should become "Hello, my Friend"
I also have several rows that have the word NULL in the second column. I need these rows deleted in full.
What's the best way of doing something like this? I have been looking at regular expressions, but I'm not sure if they are flexible enough to do what I want to do, or how you would use them on a CSV file (I need the table structure to remain).
EDIT:
1) At the moment I am just using Apple Numbers, but I know that wont don't it, so I am happy to any suggestions. It must support Kanji characters.
2) I have removed all the NULL rows, so that is no longer needed (I simply added a column of numbers, sorted the table so all the NULLs were together, deleted them and the sorted back by the column of numbers).
Find a text editor that supports regular expression search and replace.
Something like this would match ,NULL in the second column: ^.*,NULL.*$. Replace it with "DELETEMEDELETEME" to mark the line, or as an empty string or find a way to have it match on `\n' or '\r' to catch the line break and remove the entire line completely.
Stripping out parts of the quoted string might work like this:
^(.*,){n}(.*)(\".\")(.*)(,.*)$ replaced with \1\3\5 where n is the number of columns preceding the one you want to edit. Repeat (.*,) if that's not available. It will depend on the regex flavor of your tool.
I have a table called media with a column called accounts_used in which the rows appear in the following format
68146, 67342, 60577, 61506, 67194, 67034, 63484, 49113, 61518, 66971, 67511,
67351, 63621, 67725, 63638, 68141, 66114, 67262, 67537, 67537, 61765, 63701,
67087, 62641, 61294, 67063, 67049, 67038, 67170, 67147, 67289, 61264, 67091,
63690, 63505, 63505, 49172, 52313, 67070, 66945, 67234, 62265, 61368, 67870,
67211, 67586, 49240, 67538, 67538, 67809, 67183, 67164, 62712, 67519, 66895,
67693, 60266, 60266, 67593, 67031, 67137, 62570, 60682, 61195, 67569, 67569,
67069, 62082, 67345, 61748, 61553, 52029, 66877, 62630, 67196, 67196, 67196,
67196, 67196, 67196, 66873, 63677, 68174, 67127, 63594, 67107, 60419, 66601,
68156, 67203, 68161, 60233, 66586, 52654, 63570, 66887, 67191, 60877, 52108,
67131, 61784, 67566, 67162, 67073, 67092, 67064, 60133, 66907, 67559, 66846,
60490, 60347, 66558, 48737, 61539, 67236, 68135, 67238 , 63656, 67585, 67512
If the row has a comma at the end I want to remove this, so for example if the row looks like the following
1,2,3,4,5,6,
I want to replace it to just this
1,2,3,4,5,6
Is this possible to do using just a simple query?
It is a bad idea to store lists of ids in rows. But, you are doing it. You can fix this by doing:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 1)
where accounts_used = '%,';
Instead, you should have a MediaAccounts table, with one row per "media" and one row per account.
EDIT:
Possibly, the row ends with a ', ' rather than just a comma:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 2)
where accounts_used = '%, ';
We faced a similar string-replacement issue with a large dataset of bibliographic entries, where we also needed to trim extraneous punctuation from a large number of strings stored in the database which had been imported verbatim from another system. Many of the records in our dataset also contained Unicode characters, as such we needed to find a suitable SQL query that would allow us to find the relevant records that needed to be updated, and then to update them in a way that was Unicode (multibyte character) compatible under MySQL.
In testing with our dataset, I found performing a search for the relevant records we needed to update using MySQL's LEFT() and RIGHT() substring methods, performed better than using a LIKE pattern-match query. Additionally, MySQL's LENGTH() method returns the number of bytes in a string, rather than the number of characters, and the distinction is important when dealing with string fields that potentially contain multibyte character sequences as MySQL's substring methods operate on the number of characters to select, rather than the number of bytes. Thus using the LENGTH() method did not work in our case where many of strings under test contained multibyte characters. These requirements resulted in an UPDATE query with the format presented below:
UPDATE media
SET accounts_used = LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1)
WHERE RIGHT(accounts_used, 1) = ',';
The query selects records in the media table where the accounts_used column ends with a comma , (found here using the WHERE RIGHT(accounts_used, 1) = ',' clause to perform the filtering where the RIGHT() method returns a substring of specified length starting on the right of the provided string/column), and then uses the LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1) method call to perform the string trim operation, here trimming the last character from the accounts_used column value, where LEFT() returns a substring of specified length starting on the left of the provided string/column).
Here the use of the multibyte-aware CHAR_LENGTH() method – rather than the basic LENGTH() method – was important in our case due to the countless records in our dataset that contained multibyte characters. If you are only dealing with an ASCII-encoded or another single-byte encoded character set then the LENGTH() method would work perfectly, and indeed in that case CHAR_LENGTH() and LENGTH() would return the same length count, and could even be used interchangeably. When dealing with data that could contain multibyte characters, or if in doubt use the CHAR_LEGNTH() method instead as it will return an accurate character length count in either case.
Please note that the column and field names used in the example query above match those noted in the original question, and should be modified as needed to suit your own dataset needs.
I have a database table with a primary key called PremiseID.
Its MySQL column definition is CHAR(10).
The data that goes into the column is always 10 digits, which is either a 9-digit number followed by a space, like '113091000 ' or a 9-digit number followed by a letter, like '113091000A'.
I've tried writing one of these values into a table in a test MySQL database table t1. It has three columns
mainid integer
parentid integer
premiseid char(10)
If I insert a row that has the following values: 1,1,'113091000 ' and try to read row back, the '113991000 ' value is truncated, so it reads '113091000'; that is the space is removed. If I insert a number like '113091000A', that value is retained.
How can I get the CHAR(10) field retain the space character?
I have a programmatic way around this problem. It would be to take the len('113091000'), realize it's nine characters, and then realize a length of 9 infers there is a space suffix for that number.
To quote from the MySQL reference:
The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.
So there's no way around it. If you're using MySQL 5.0.3 or greater, then using VARCHAR is probably the best way to go (the overhead is only 1 extra byte):
VARCHAR values are not padded when they are stored. Handling of trailing spaces is version-dependent. As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.
If you're using MySQL < 5.0.3, then I think you just have to check returned lengths, or use a character other than a space.
Probably the most portable solution would be to just use CHAR and check the returned length.
Q: How can I get the CHAR(10) field retain the space character?
Actually, that space is retained and stored. It's the retrieval of the value that's removing the spaces. (The removal of the trailing spaces on returned values is a documented "feature".)
One option (as a workaround) is to modify your SQL query to append trailing spaces to the returned value, e.g.
SELECT RPAD(premiseid,10,' ') AS premiseid FROM t1
That will return your value with as a character string with a length of 10 characters, padded with spaces if the value is shorter than 10 characters, or truncated to 10 characters if its longer.
A standard CHAR(10) column will always have trailing spaces to pad out the string to the required length of 10 characters. As such, any deliberately trailing spaces will be blended in and, typically, stripped by your database adapter.
If possible, convert to a VARCHAR(10) column if you want to preserve the trailing spaces. You can do this with the ALTER TABLE statement.
Though Gordon's answer may still be right by itself, there is on later versions than mentioned a solution.
In your code run SET sql_mode = 'PAD_CHAR_TO_FULL_LENGTH';
With this session setting you'll retrieve perfect columns on full length of the CHAR(10), while VARCHAR does not when no trailing spaces are entered beforehand. If you don't need the spaces, you can always rtrim().