Finding text within text delimited by new line character - mysql

I am trying to find text within text using MySQL. I have a field of values that is somewhat unstructured, but the data entry fortunately is delimited by new lines. I'm trying to see if I can pull the value for "Education", which would be basically a substring that starts after "Education:" and ends with \n new line character in data below:
'Children: 5
Education: College
Employment: Homemaker
Marital Status: Married'
I've looked at the MID function, but since the values for education vary, the length isn't standard. I have searched MySQL string functions, and I have not found a solution that will allow me to search between two positions, including one that is defined by a regex character -- the REGEX simply provides a match, not a position.
SELECT id,MID(value,POSITION('Education:' IN value),30)
FROM client_data
the code performed as expected, but due to fixed length rather than position of \n new line character, results either truncated or included extra characters from subsequent text.
I'm guessing there is a way to do this that I'm just not finding.

You can use REGEXP_SUBSTR to get actual the string that matched the regular expression:
REGEXP_SUBSTR(value, '^Education:.*', 1, 1, 'm')
This gets you the Education line. Then you just need to extract the part after the : from that string:
REGEXP_REPLACE(
REGEXP_SUBSTR(value, '^Education:.*', 1, 1, 'm'),
'^Education:', '')

Related

Using a calculated field to pull x characters before a specific identifier Access

I have a field, "ID" that is formatted as ###""-""##. For example 545R-T67. I have another field "Name" in the same table that pulls the information before the dash and creates it into its own field.
Right now the field is calculated to pull the first four characters Left([ID],4) the Data edition requires the data to be formatted to include data with two letters after the three numbers.
I am wondering how I can update the field to pull all characters before the dash instead of just a set amount. Is there a function in Access that makes this possible?
You can combine LEFT with INSTR to do this. INSTR returns the position of the character you're looking for (the dash).
Left([ID],INSTR([ID], "-"))
Note this will return everything before the first dash, but will return a zero-length string if there's no dash at all.

Delete all characters before and after quotation marks

I have a CSV file, which has two columns and 4500 rows. In one column, I have several phrases that are surrounded in quotation marks. I need to delete all the text that comes before and after the quotations marks.
For example:
How would you say "Hello, my Friend" when speaking outside?
should become "Hello, my Friend"
I also have several rows that have the word NULL in the second column. I need these rows deleted in full.
What's the best way of doing something like this? I have been looking at regular expressions, but I'm not sure if they are flexible enough to do what I want to do, or how you would use them on a CSV file (I need the table structure to remain).
EDIT:
1) At the moment I am just using Apple Numbers, but I know that wont don't it, so I am happy to any suggestions. It must support Kanji characters.
2) I have removed all the NULL rows, so that is no longer needed (I simply added a column of numbers, sorted the table so all the NULLs were together, deleted them and the sorted back by the column of numbers).
Find a text editor that supports regular expression search and replace.
Something like this would match ,NULL in the second column: ^.*,NULL.*$. Replace it with "DELETEMEDELETEME" to mark the line, or as an empty string or find a way to have it match on `\n' or '\r' to catch the line break and remove the entire line completely.
Stripping out parts of the quoted string might work like this:
^(.*,){n}(.*)(\".\")(.*)(,.*)$ replaced with \1\3\5 where n is the number of columns preceding the one you want to edit. Repeat (.*,) if that's not available. It will depend on the regex flavor of your tool.

Force mySQL queries to be characters not numeric in R

I'm using RODBC to interface R with a MySQL database and have encountered a problem. I need to join two tables based on unique ID numbers (IDNUM below). The issue is that the ID numbers are 20 digit integers and R wants to round them. OK, no problem, I'll just pull these IDs as character strings instead of numeric using CAST(blah AS CHAR).
But R sees the incoming character strings as numbers and thinks "hey, I know these are character strings... but these character strings are just numbers, so I'm pretty sure this guy wants me to store this as numeric, let me fix that for him" then converts them back into numeric and rounds them. I need to force R to take the input as given and can't figure out how to make this happen.
Here's the code I'm using (Interval is a vector that contains a beginning and an ending timestamp, so this code is meant to only pull data from a chosen timeperiod):
test = sqlQuery(channel, paste("SELECT CAST(table1.IDNUM AS CHAR),PartyA,PartyB FROM
table1, table2 WHERE table1.IDNUM=table2.IDNUM AND table1.Timestamp>=",Interval[1],"
AND table2.Timestamp<",Interval[2],sep=""))
You will most likely want to read the documentation for the function you are using at ?sqlQuery, which includes notes about the following two relevant arguments:
as.is which (if any) columns returned as character should be
converted to another type? Allowed values are as for read.table. See
‘Details’.
and
stringsAsFactors logical: should columns returned as character and
not excluded by as.is and not converted to anything else be converted
to factors?
In all likelihood you want to specify the columns in questions in as.is.

How can I read a CSV with quoted commas as a flat file in BusinessObjects Data Services Designer?

I'm trying to get SAP BusinessObjects Data Services Designer 12.2.3.1 to read a CSV file that contains rows like:
"00501","P",0,0,"Nassau-Suffolk, NY","SUFFOLK"
The results I'm getting with column delimiter set to Comma, however, read that line as seven columns rather than six:
"00501" "P" 0 0 "Nassau-Suffolk NY" "SUFFOLK"
What additional options do I need in order to read the file as-is, without external preprocessing? (If this isn't possible, please say so and I'll stop getting grey matter all over this nice brick wall. Thanks!)
Solution to load data with double quote:
The solution was to set the Text delimiter to ".
Text: Denotes the start and end of a text string. All characters (including those specified as column delimiters) between the first and second occurrence of this character is a single text string. The treatment of the row characters is defined by the "Row within text string" setting.

VBA Trim() function truncating text oddly!

I'm trying to trim extraneous white space at the end of a memo field in MS Access. I've tried doing it a number of ways:
1) an update query with the field being updated to Trim([fieldname]). For some reason, that doesn't do anything. The whitespace is still there.
2) an update using a Macro function in which the field contents are passed as a String and then processed using the Trim() function and passed back. This one is really bizarre, in that it seems to truncate the text in the field at completely random places (different for each record). Sometimes 366 characters, sometimes 312, sometimes 280.
3) same as above but with RTrim()
How can I possibly be messing up such a simple function?! Any help much appreciated. Would like to keep my hair.
-Sam
According to this article:
Both Text and Memo data types store only the characters entered in a field; space characters for unused positions in the field aren't stored.
As hypoxide suggested, they may not in fact be spaces
Edit
I suspect that the last character in the field is a carriage return or linefeed character. If this is the case, then Trim (or any variations of Trim - RTrim\LTrim) won't work since they only remove space characters. As 'onedaywhen' suggested in the comment, try using the ASC function to determine the actual character code of the last character in the memo field. You can use something like the following in a query to do this:
ASC(Right(MyFieldName,1))
Compare the result of the query to the Character Set to determine the actual character that ends the memo field. (Space = 32, Linefeed = 10, Carriage Return = 13).
You may have to test the last character and if it is a linefeed or carriage return remove the character and then apply the trim function to the rest of the string.
This may date me, but does Access have different character types for fixed vs. variable lengths? in SQL, CHAR(10) will always by 10 chars long, padded if necessary, while VARCHAR(10) will be 'the' size up to 10. Truncating a CHAR(10) will just put the blanks back.