MySQL LOAD DATA INFILE non consistent field - mysql

I have a file with four fields per line that looks like this:
<uri> <uri> <uri> <uri> .
:_non-spaced-alphanumeric <uri> "25"^^<uri:integer> <uri> .
:_non-spaced-alphanumeric <uri> "Hello"#en <uri> .
:_non-spaced-alphanumeric <uri> "just text in quotes" <uri> .
...
and this sql script:
LOAD DATA LOCAL INFILE 'data-0.nq'
IGNORE
INTO TABLE btc.btc_2012
FIELDS
TERMINATED BY ' ' OPTIONALLY ENCLOSED BY '"'
LINES
TERMINATED BY '.\n'
(subject,predicate,object,provenance);
The third field in the examples can be of any of the formats seen above. I don't really care about the 3rd value unless it's a uri, which is parsed fine by the script anyway. But if it's not then the fourth field consists of the part of the third after the quotation plus the fourth itself.
Is there a way I can get it working without manipulating the file, which by the way is 17GB?

Yes, there's a way to work with this. Have the data fields loaded into MySQL user variables, and then assign expressions to the actual columns.
For example, in place of:
(subject,predicate,object,provenance
do something like this:
(subject, predicate, #field3, #field4)
SET object = CASE WHEN #field3 LIKE '"%"_%' THEN ... ELSE #field3 END
, provenance = CONCAT(CASE WHEN #field3 LIKE '"%"%_"' THEN ... ELSE '' END,#field4)
That's just an outline. Obviously, those ... need to replaced with appropriate expressions that return the portions of the field values you want assigned to the columns. (That will be some combination of SUBSTRING, SUBSTRING_INDEX, INSTR, LOCATE, REPLACE, et al. string functions, and you may need additional WHEN constructs to handle variations.
(I'm not exactly clear on what conditions you need to check.)
If this is running on Unix or Linux, another option would be to make use of a named pipe, and external program to read the file, perform the require manipulation, and write to the named pipe, run that in the background.
e.g.
> mkfifo /tmp/mydata.pipe
> myprogram <myfile >/tmp/mydata.pipe 2>/tmp/mydata.err &
mysql> LOAD DATA LOCAL INFILE /tmp/mydata.pipe ...
FOLLOWUP
With an input line like this:
abc def "Hello"#en klm .
given FIELDS TERMINATED BY ' ' OPTIONALLY ENCLOSED BY '"'
field1 = 'abc'
field2 = 'def'
field3 = '"Hello"#en'
field4 = 'klm'
To test for the case when field3 contains double quotes, with the first double quote as the first character in the string, we could use something like this:
LIKE '"%"%'
That says the First character has to be a double quote, followed by zero one or more characters, followed by another double quote, followed again by zero one or more characters.
To get the portion of the field3 before the second double quote:
SUBSTRING_INDEX(#field3,'"',2)
To get rid of the leading double quote from that, i.e. to return what's between the double quotes in field3, you could do something like this:
SUBSTRING_INDEX(SUBSTRING_INDEX(#field3,'"',2),'"',-1)
To get the portion of field3 following the last double quote:
SUBSTRING_INDEX(SUBSTRING_INDEX(#field3,'"',-1)
(These expressions assume that there are at most two double quotes in field3.)
To get the value for the third column:
CASE
-- when field starts with a double quote and is followed by another double quote
WHEN #field3 LIKE '"%"%"'
-- return whats between the double quotes in field3
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(#field3,'"',2),'"',-1)
-- otherwise return the entirety of field3
ELSE #field3
END
To get the value to be prepended to the fourth column, when field3 contains two double quotes:
CASE
-- when field starts with a double quote and is followed by another double quote
WHEN #field3 LIKE '"%"%"'
-- return whats after the last double quote in field3
THEN SUBSTRING_INDEX(#field3,'"',-1)
-- otherwise return an empty string
ELSE ''
END
To prepend that to field4, use the CONCAT function with te CASE expression above and field4.
And these are the values we'd expect to have inserted into the table:
column1 = 'abc'
column2 = 'def'
column3 = 'Hello'
column4 = '#enklm'
ANOTHER FOLLOWUP
If the LOAD DATA isn't recognizing the line delimiter because it's not recognizing the field delimiters, then you'd have to ditch the field delimiters, and do the parsing yourself. Load the whole line into a user variable, and parse away.
e.g.
LINES TERMINATED BY '.\n'
(#line)
SET subject
= SUBSTRING_INDEX(#line,' ',1)
, predicate
= SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',2),' ',-1)
, object
= CASE
WHEN SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',3),' ',-1) LIKE '"%'
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(#line,'"',2),'"',-1)
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',3),' ',-1)
END
, provenance
= CASE
WHEN SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',3),' ',-1) LIKE '"%'
THEN SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(#line,'"',-1),' ',2),' ',-1)
ELSE SUBSTRING_INDEX(SUBSTRING_INDEX(#line,' ',4),' ',-1)
END
This will work for all the lines in your example data, with fields delimited by a single space, with the exception of matching double quotes in the third field.
NOTE: The functions available in SQL for string manipulation lead to clumsy and awkward syntax; SQL wasn't specifically designed for easy string manipulation.

Related

Unable to insert and save an escaped double quote in MySQL

I am trying to save JSON values in MySQL, but everytime the value contains double quotes, even if the query is properly written, MySQL manages to removed the escaped \"
For example:
INSERT into JSON_VALUES SET
ID = 150,
RESULT = '[{"ID":"150","VALUE":"THIS IS A \"TEST\" THAT IS IGNORED","DATE":"2021-08-26"}]'
After executing the query the inserted value in MySQL looks like this:
[{
"ID":"150",
"VALUE":"THIS IS A "TEST" THAT IS IGNORED",
"DATE":"2021-08-26"
}]
When "TEST" was supposed to be saved as \"TEST\"
Since TEST is not properly escpaed, the JSON value has a syntax error and becomes unreadable.
How do I force MySQL to preserve escaped content, or more precisely escaped double quotes?
I had the same issue some time ago. I had to use \\\" instead of \". In your case would be:
INSERT into JSON_VALUES SET
ID = 150,
RESULT = '[{"ID":"150","VALUE":"THIS IS A \\\"TEST\\\" THAT IS IGNORED","DATE":"2021-08-26"}]'

Progress ABL format decimal number without leading characters

I just want to format a decimal number for output to a simple CSV formatted file.
I feel like I'm stupid, but I can't find a way to do it without leading zeroes or spaces, of course I can simply trim the leading spaces, but there has to be a proper way to just format like I that, isn't there?
Example
define variable test as decimal.
define variable testString as character.
test = 12.3456.
testString = string(test, '>>>>>9.99').
message '"' + testString + '"' view-as alert-box. /* " 12.35" */
I tried using >>>>>9.99 and zzzzz9.99 for the number format, but both format the string with leading spaces. I actually have no idea what the difference is between using > and z.
The SUBSTITUTE() function will do what you describe wanting:
define variable c as character no-undo.
c = substitute( "&1", 1.23 ).
display "[" + c + "]".
(Toss in a TRUNCATE( 1.2345, 2 ) if you really only want 2 decimal places.)
Actually, this also works:
string( truncate( 1.2345, 2 )).
If you are creating a CSV file you might want to think about using EXPORT. EXPORT format removes leading spaces and omits decorations like ",". The SUBSTITUTE() function basically uses EXPORT format to make its substitutions. The STRING() function uses EXPORT format when no other format is specified.
The EXPORT statement will format your data for you. Here is an example:
DEFINE VARIABLE test AS DECIMAL NO-UNDO.
DEFINE VARIABLE testRound AS DECIMAL NO-UNDO.
DEFINE VARIABLE testString AS CHARACTER NO-UNDO.
test = 12.3456.
testRound = ROUND(test, 2).
testString = STRING(test).
OUTPUT TO VALUE("test.csv").
EXPORT DELIMITER "," test testRound testString.
OUTPUT CLOSE.
Here is the output:
12.3456,12.35,"12.3456"
The EXPORT statement's default delimiter is a space so you have to specify a comma for your CSV file. Since the test and testRound variables are decimals, they are not in quotes in the output. testString is character so it is in quotes.

Turn a Comma Delimited String to a List of Strings with Single Quotes

I have a string like this:
'111,222,333,444'
What I want to do is to turn it into something like this:
'111','222','333','444'
I can write a function to split the string into a temp table and loop through each row to add quotes. But I don't really want to use a cursor to do this. Is there an easier way?
Could you not just use REPLACE to replace each , with a ','?
This assumes the string has the initial and final single quotes in them.
REPLACE(TheString, ',', ''',''')
If not, you could just add them.
'''' + REPLACE(TheString, ',', ''',''') + ''''

how to get rid of newline space while using substring_index

i have a field with value like below:
utf8: "\xE2\x9C\x93"
id: "805265"
plan: initial
acc: "123456"
last: "1234"
doc: "1281468479"
validation: field
commit: Accept
i used below query to extract acc value
select SUBSTRING_INDEX(SUBSTRING_INDEX(columnname, 'acc: "', -1),'last',1) as acc from table_name;
i am able to retrieve acc value but problem is when i export the result to csv file, the field is taking newline space which is before last...how do i get rid of that space???
I would expect you would want to strip out the end quote as well. But to answer your specific question, you can just update your SUBSTRING_INDEX delimiter to include the newline, i.e. select SUBSTRING_INDEX(SUBSTRING_INDEX(columnname, 'acc: "', -1),'\nlast',1) as acc from table_name;.
Or, if you prefer, you can use the REPLACE function to strip out any unwanted characters.

Mysql - how to strip certain characters from the end of the string

I have strings like this :
column:
----------
word[1]
word[2]
word
word[2]
word
word[3]
Where word is a variable length random characters string.
How would I remove square brackets with numbers in them from the end of these strings in mysql table?
Does mysql allow regexes?
update test
set name = SUBSTRING_INDEX(name,'[',1)
where name=name
DEMO
You could use the following select:
IF(RIGHT[(myColumn, 1) = "]", SUBSTRING(myColumn, -3), myColumn)
RIGHT(mycolumn, 1) == ] will check if your entry lasts with a closing bracket.
SUBSTRING(myColumn, -3) will return the string without the closing bracket, if there is one.
myColumns will return the full string, if there is no bracket.