I have exported some sql statements, and now need to replay them. The issue arises when I play this to mysql where actual java code is included in the VALUES. This code includes back slashes, quotes, double quotes - sometimes causing mysql to mis-interpret escaping and quoting:
INSERT INTO sonar.snapshot_sources(ID, SNAPSHOT_ID, DATA) VALUES (267, 420,
'package com.company.gateway.dl.util
"((\''|\")*)(stuff|moreStuff)((\''|\")*):((\''|\")*)([0-9]+)((\''|\")*)";
');
Above is not the full text, but the quoted bit here, cause the insert to fail. A work around is to enclose the complex text value in back quotes, e.g.
INSERT INTO sonar.snapshot_sources(ID, SNAPSHOT_ID, DATA) VALUES (267, 420,
`'package com.company.gateway.dl.util
"((\''|\")*)(stuff|moreStuff)((\''|\")*):((\''|\")*)([0-9]+)((\''|\")*)";
'`);
The question I have is, how can I sed precisely these statements to add back quotes to encapsulate where required? I want to replace 'package with `'package and '); with '`); The closing bracket seems the most complicated since many more statements will match this.
The below python script can help you. The script will read the sql dump file and add backtick to the matched regex and prints it to output file. Input file and output file should be given as a commandline argument.
#!/usr/bin/env python3
import re
import os
import sys
if len(sys.argv) < 3:
print('Provide input and output file to begin processing')
sys.exit(1)
else:
if not os.path.isfile(sys.argv[1]):
print('Input file does not exit')
sys.exit(1)
ifile = sys.argv[1]
ofile = sys.argv[2]
# Compiling the regex
rex = re.compile(r"""(INSERT[\s\S]+?)('package[\s\S]+?')(\);)""")
# Reading the file as a string
dataFile = open(ifile, 'r').read()
fp = open(ofile, 'w')
# Substituting it
print(re.sub(rex, r'\1`\2`\3', dataFile), end='', file=fp)
fp.close()
Output
INSERT INTO sonar.snapshot_sources(ID, SNAPSHOT_ID, DATA) VALUES (267, 420,
`'package com.company.gateway.dl.util
"((\''|\")*)(stuff|moreStuff)((\''|\")*):((\''|\")*)([0-9]+)((\''|\")*)";
'`);
Here is the Pseudocode for Lempel-Ziv-Welch Compression.
pattern = get input character
while ( not end-of-file ) {
K = get input character
if ( <<pattern, K>> is NOT in
the string table ){
output the code for pattern
add <<pattern, K>> to the string table
pattern = K
}
else { pattern = <<pattern, K>> }
}
output the code for pattern
output EOF_CODE
I am trying to code this in Lua, but it is not really working. Here is the code I modeled after an LZW function in Python, but I am getting an "attempt to call a string value" error on line 8.
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for c in uncompressed do
-- while c is in the function compress
local wc = w + c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
-- Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
-- Output the code for w.
if w then
dictionary[w] = ""
end
end
return dictionary
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
I would really like some help either getting my code to run, or helping me code the LZW compression in Lua. Thank you so much!
Assuming uncompressed is a string, you'll need to use something like this to iterate over it:
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
-- etc
end
There's another issue on line 10; .. is used for string concatenation in Lua, so this line should be local wc = w .. c.
You may also want to read this with regard to the performance of string concatenation. Long story short, it's often more efficient to keep each element in a table and return it with table.concat().
You should also take a look here to download the source for a high-performance LZW compression algorithm in Lua...
I try to capture lines with calculations in my text document
and execute them.
I use this in my function:
for i in range(startline,endline)
let calculation = getline(i)
...
let out = eval(calculation)
...
endfor
sometimes something goes wrong and I receive this message:
Error detected while processing function....
Line ...
E488: Trailing Characters
Line .. is the line-nr in my function.
I would like to know also which calculation it concerns (which line in my text doc):
If Error detected = echo calculation
How can I check if there is an error message and echo the variable "calculation"?
There are two ways to handle script errors inside a function:
The first is suppressing the error via :silent!. Two downsides: You have to manually check for success, and any normal output from the evaluated script is suppressed, too (unless you do contortions with :unsilent).
let v:errmsg = ''
silent! let out = eval(calculation)
if v:errmsg != ''
" error
endif
I would recommend the second way via try...catch, which avoids the issues with the output and having to explicitly check for an error:
try
let out = eval(calculation)
catch /^Vim\%((\a\+)\)\=:E/
" v:exception contains what is normally in v:errmsg, but with extra
" exception source info prepended, which we cut away.
let v:errmsg = printf("Line: %d\nCalculation: %s\nError: %s", i, calculation, substitute(v:exception, '^Vim\%((\a\+)\)\=:', '', ''))
echohl ErrorMsg
echomsg v:errmsg
echohl None
endtry
Is there a way in XQuery to do something like a tail() function?
What I'm trying to accomplish is to get the contents of a file (using "xdmp:filesystem-file($path)") and then display only the last 100 lines. I can't seem to find a good way to do this. Any ideas?
Thank you.
In plain XQuery, this can be accomplished by splitting into lines and getting the desired number of lines from the end of the sequence, then rejoining them, if necessary, i.e.
declare function local:tail($content as xs:string, $number as xs:integer)
{
let $linefeed := "
"
let $lines := tokenize($content, $linefeed)
let $tail := $lines[position() > last() - $number]
return string-join($tail, $linefeed)
};
A pure and short XPath 2.0 solution -- can be used not only in XQuery but in XSLT or in any other PL hosting XPath 2.0:
for $numLines in count(tokenize(., '
'))
return
tokenize(., '
')[position() gt $numLines -100]
Or:
for $numLines in count(tokenize(., '
'))
return
subsequence(tokenize(., '
'), $numLines -100 +1)
if your xdmp:file-xx is a nature of text file then you could use something like
let $f := 'any file system path'
return fn:tokenize(xdmp:filesystem-file($f), '[\n\r]+')[ fn:last() - 2 to fn:last()]
here
i have used newline & carriage return as my token splitter. if you need something else to tokenize u could. but simple log file tailing then this solution works fine.
given example tails last 2 lines of a given file. if you want more than alter fn:last()-2 to fn:last() - x
Given a string like this:
a,"string, with",various,"values, and some",quoted
What is a good algorithm to split this based on commas while ignoring the commas inside the quoted sections?
The output should be an array:
[ "a", "string, with", "various", "values, and some", "quoted" ]
Looks like you've got some good answers here.
For those of you looking to handle your own CSV file parsing, heed the advice from the experts and Don't roll your own CSV parser.
Your first thought is, "I need to handle commas inside of quotes."
Your next thought will be, "Oh, crap, I need to handle quotes inside of quotes. Escaped quotes. Double quotes. Single quotes..."
It's a road to madness. Don't write your own. Find a library with an extensive unit test coverage that hits all the hard parts and has gone through hell for you. For .NET, use the free FileHelpers library.
Python:
import csv
reader = csv.reader(open("some.csv"))
for row in reader:
print row
If my language of choice didn't offer a way to do this without thinking then I would initially consider two options as the easy way out:
Pre-parse and replace the commas within the string with another control character then split them, followed by a post-parse on the array to replace the control character used previously with the commas.
Alternatively split them on the commas then post-parse the resulting array into another array checking for leading quotes on each array entry and concatenating the entries until I reached a terminating quote.
These are hacks however, and if this is a pure 'mental' exercise then I suspect they will prove unhelpful. If this is a real world problem then it would help to know the language so that we could offer some specific advice.
Of course using a CSV parser is better but just for the fun of it you could:
Loop on the string letter by letter.
If current_letter == quote :
toggle inside_quote variable.
Else if (current_letter ==comma and not inside_quote) :
push current_word into array and clear current_word.
Else
append the current_letter to current_word
When the loop is done push the current_word into array
The author here dropped in a blob of C# code that handles the scenario you're having a problem with:
CSV File Imports in .Net
Shouldn't be too difficult to translate.
What if an odd number of quotes appear
in the original string?
This looks uncannily like CSV parsing, which has some peculiarities to handling quoted fields. The field is only escaped if the field is delimited with double quotations, so:
field1, "field2, field3", field4, "field5, field6" field7
becomes
field1
field2, field3
field4
"field5
field6" field7
Notice if it doesn't both start and end with a quotation, then it's not a quoted field and the double quotes are simply treated as double quotes.
Insedently my code that someone linked to doesn't actually handle this correctly, if I recall correctly.
Here's a simple python implementation based on Pat's pseudocode:
def splitIgnoringSingleQuote(string, split_char, remove_quotes=False):
string_split = []
current_word = ""
inside_quote = False
for letter in string:
if letter == "'":
if not remove_quotes:
current_word += letter
if inside_quote:
inside_quote = False
else:
inside_quote = True
elif letter == split_char and not inside_quote:
string_split.append(current_word)
current_word = ""
else:
current_word += letter
string_split.append(current_word)
return string_split
I use this to parse strings, not sure if it helps here; but with some minor modifications perhaps?
function getstringbetween($string, $start, $end){
$string = " ".$string;
$ini = strpos($string,$start);
if ($ini == 0) return "";
$ini += strlen($start);
$len = strpos($string,$end,$ini) - $ini;
return substr($string,$ini,$len);
}
$fullstring = "this is my [tag]dog[/tag]";
$parsed = getstringbetween($fullstring, "[tag]", "[/tag]");
echo $parsed; // (result = dog)
/mp
This is a standard CSV-style parse. A lot of people try to do this with regular expressions. You can get to about 90% with regexes, but you really need a real CSV parser to do it properly. I found a fast, excellent C# CSV parser on CodeProject a few months ago that I highly recommend!
Here's one in pseudocode (a.k.a. Python) in one pass :-P
def parsecsv(instr):
i = 0
j = 0
outstrs = []
# i is fixed until a match occurs, then it advances
# up to j. j inches forward each time through:
while i < len(instr):
if j < len(instr) and instr[j] == '"':
# skip the opening quote...
j += 1
# then iterate until we find a closing quote.
while instr[j] != '"':
j += 1
if j == len(instr):
raise Exception("Unmatched double quote at end of input.")
if j == len(instr) or instr[j] == ',':
s = instr[i:j] # get the substring we've found
s = s.strip() # remove extra whitespace
# remove surrounding quotes if they're there
if len(s) > 2 and s[0] == '"' and s[-1] == '"':
s = s[1:-1]
# add it to the result
outstrs.append(s)
# skip over the comma, move i up (to where
# j will be at the end of the iteration)
i = j+1
j = j+1
return outstrs
def testcase(instr, expected):
outstr = parsecsv(instr)
print outstr
assert expected == outstr
# Doesn't handle things like '1, 2, "a, b, c" d, 2' or
# escaped quotes, but those can be added pretty easily.
testcase('a, b, "1, 2, 3", c', ['a', 'b', '1, 2, 3', 'c'])
testcase('a,b,"1, 2, 3" , c', ['a', 'b', '1, 2, 3', 'c'])
# odd number of quotes gives a "unmatched quote" exception
#testcase('a,b,"1, 2, 3" , "c', ['a', 'b', '1, 2, 3', 'c'])
Here's a simple algorithm:
Determine if the string begins with a '"' character
Split the string into an array delimited by the '"' character.
Mark the quoted commas with a placeholder #COMMA#
If the input starts with a '"', mark those items in the array where the index % 2 == 0
Otherwise mark those items in the array where the index % 2 == 1
Concatenate the items in the array to form a modified input string.
Split the string into an array delimited by the ',' character.
Replace all instances in the array of #COMMA# placeholders with the ',' character.
The array is your output.
Heres the python implementation:
(fixed to handle '"a,b",c,"d,e,f,h","i,j,k"')
def parse_input(input):
quote_mod = int(not input.startswith('"'))
input = input.split('"')
for item in input:
if item == '':
input.remove(item)
for i in range(len(input)):
if i % 2 == quoted_mod:
input[i] = input[i].replace(",", "#COMMA#")
input = "".join(input).split(",")
for item in input:
if item == '':
input.remove(item)
for i in range(len(input)):
input[i] = input[i].replace("#COMMA#", ",")
return input
# parse_input('a,"string, with",various,"values, and some",quoted')
# -> ['a,string', ' with,various,values', ' and some,quoted']
# parse_input('"a,b",c,"d,e,f,h","i,j,k"')
# -> ['a,b', 'c', 'd,e,f,h', 'i,j,k']
I just couldn't resist to see if I could make it work in a Python one-liner:
arr = [i.replace("|", ",") for i in re.sub('"([^"]*)\,([^"]*)"',"\g<1>|\g<2>", str_to_test).split(",")]
Returns ['a', 'string, with', 'various', 'values, and some', 'quoted']
It works by first replacing the ',' inside quotes to another separator (|),
splitting the string on ',' and replacing the | separator again.
Since you said language agnostic, I wrote my algorithm in the language that's closest to pseudocode as posible:
def find_character_indices(s, ch):
return [i for i, ltr in enumerate(s) if ltr == ch]
def split_text_preserving_quotes(content, include_quotes=False):
quote_indices = find_character_indices(content, '"')
output = content[:quote_indices[0]].split()
for i in range(1, len(quote_indices)):
if i % 2 == 1: # end of quoted sequence
start = quote_indices[i - 1]
end = quote_indices[i] + 1
output.extend([content[start:end]])
else:
start = quote_indices[i - 1] + 1
end = quote_indices[i]
split_section = content[start:end].split()
output.extend(split_section)
output += content[quote_indices[-1] + 1:].split()
return output