Grooviest way to clean malformed json in memory

Grooviest way to clean malformed json in memory - json

Working with some json files, ran into some malformed ones with c style comments included. Assume I don't have ownership of these files, and changing them is not an option, I need to analyze the json data in an automated fashion. JsonSlurper dies when it sees these comments, so I wrote a method to remove the offending lines:
def filterComments(String raw){
def filtered = ""
raw.eachLine { line ->
def tl = line.trim()
if(!(tl.startsWith("//") || tl.startsWith("/**") || tl.startsWith("*"))){
filtered += line;}}
return filtered;
}
I really like Groovy and have turned to it as my maintenance tool of choice, but am not the "Grooviest" developer, this is an example. I would like a more Groovy way of accomplishing this.
Some additional notes: this is run as a script. If there is a way to make JsonSlurper disregard the comments instead of using this utility method, that solution would be considered more valuable. Thanks in advance!

Here's one way to do it:
def json = '''
// A comment
Foo f = new Foo(); // this is a comment
/*
* Multiline comment
*
*/
'''
def filterComments(str) {
str?.replaceAll(/(\/\/|\/\*|\*).*\n?/, '')?.trim()
}
assert filterComments(json) == 'Foo f = new Foo();'
This will remove any line that begins with /* or *, as well as anything after //.

My take:
def filterComments(str){
str.readLines().findAll{ !(it ==~ /^\s*(\*|\/\*\*|\/\/).*/) }.join('\n')
}

Related

Confused about this nested function

I am reading the Python Cookbook 3rd Edition and came across the topic discussed in 2.6 "Searching and Replacing Case-Insensitive Text," where the authors discuss a nested function that is like below:
def matchcase(word):
def replace(m):
text = m.group()
if text.isupper():
return word.upper()
elif text.islower():
return word.lower()
elif text[0].isupper():
return word.capitalize()
else:
return word
return replace
If I have some text like below:
text = 'UPPER PYTHON, lower python, Mixed Python'
and I print the value of 'text' before and after, the substitution happens correctly:
x = matchcase('snake')
print("Original Text:",text)
print("After regsub:", re.sub('python', matchcase('snake'), text, flags=re.IGNORECASE))
The last "print" command shows that the substitution correctly happens but I am not sure how this nested function "gets" the:
PYTHON, python, Python
as the word that needs to be substituted with:
SNAKE, snake, Snake
How does the inner function replace get its value 'm'?
When matchcase('snake') is called, word takes the value 'snake'.
Not clear on what the value of 'm' is.
Can any one help me understand this clearly, in this case?
Thanks.

When you pass a function as the second argument to re.sub, according to the documentation:
it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
The matchcase() function itself returns the replace() function, so when you do this:
re.sub('python', matchcase('snake'), text, flags=re.IGNORECASE)
what happens is that matchcase('snake') returns replace, and then every non-overlapping occurrence of the pattern 'python' as a match object is passed to the replace function as the m argument. If this is confusing to you, don't worry; it is just generally confusing.
Here is an interactive session with a much simpler nested function that should make things clearer:
In [1]: def foo(outer_arg):
...: def bar(inner_arg):
...: print(outer_arg + inner_arg)
...: return bar
...:
In [2]: f = foo('hello')
In [3]: f('world')
helloworld
So f = foo('hello') is assigning a function that looks like the one below to a variable f:
def bar(inner_arg):
print('hello' + inner_arg)
f can then be called like this f('world'), which is like calling bar('world'). I hope that makes things clearer.

Using Groovy in Confluence

I'm new to Groovy and coding in general, but I've come a long way in a very short amount of time. I'm currently working in Confluence to create a tracking tool, which connects to a MySql Database. We've had some great success with this, but have hit a wall with using Groovy and the Run Macro.
Currently, we can use Groovy to populate fields within the Run Macro, which really works well for drop down options, example:
{groovy:output=wiki}
import com.atlassian.renderer.v2.RenderMode
def renderMode = RenderMode.suppress(RenderMode.F_FIRST_PARA)
def getSql = "select * from table where x = y"
def getMacro = '{sql-query:datasource=testdb|table=false} ${getSql} {sql-query}"
def get = subRenderer.render(getMacro, context, renderMode)
def runMacro = """
{run:id=test|autorun=false|replace=name::Name, type::Type:select::${get}|keepRequestParameters = true}
{sql:datasource=testdb|table=false|p1=\$name|p2=\$type}
insert into table1 (name, type) values (?, ?)
{sql}
{run}
"""
out.println runMacro
{groovy}
We've also been able to use Groovy within the Run Macro, example:
enter code here
{run:id=test|autorun=false|replace=name::Name, type::Type:select::${get}|keepRequestParameters = true}
{groovy}
def checkSql = "{select * from table where name = '\name' and type = '\$type'}"
def checkMacro = "{sql-query:datasource=testdb|table=false} ${checkSql} {sql-query}"
def check = subRenderer.render(checkMacro, context, renderMode)
if (check == "")
{
println("This information does not exist.")
} else {
println(checkMacro)
}
{groovy}
{run}
However, we can't seem to get both scenarios to work together, Groovy inside of a Run Macro inside of Groovy.
We need to be able to get the variables out of the Run Macro form so that we can perform other functions, like checking the DB for duplicates before inserting data.
My first thought is to bypass the Run Macro and create a simple from in groovy, but I haven't been too lucky with finding good examples. Can anyone help steer me in the right direction for creating a simple form in Groovy that would replace the Run Macro? Or have suggestions on how to get the rendered variables out of the Run Macro?

Excluding Content From SQL Bulk Insert

I want to import my IIS logs into SQL for reporting using Bulk Insert, but the comment lines - the ones that start with a # - cause a problem becasue those lines do not have the same number f fields as the data lines.
If I manually deleted the comments, I can perform a bulk insert.
Is there a way to perform a bulk insert while excluding lines based on a match such as : any line that beings with a "#".
Thanks.

The approach I generally use with BULK INSERT and irregular data is to push the incoming data into a temporary staging table with a single VARCHAR(MAX) column.
Once it's in there, I can use more flexible decision-making tools like SQL queries and string functions to decide which rows I want to select out of the staging table and bring into my main tables. This is also helpful because BULK INSERT can be maddeningly cryptic about the why and how of why it fails on a specific file.
The only other option I can think of is using pre-upload scripting to trim comments and other lines that don't fit your tabular criteria before you do your bulk insert.

I recommend using logparser.exe instead. LogParser has some pretty neat capabilities on its own, but it can also be used to format the IIS log to be properly imported by SQL Server.

Microsoft has a tool called "PrepWebLog" http://support.microsoft.com/kb/296093 - which strips-out these hash/pound characters, however I'm running it now (using a PowerShell script for multiple files) and am finding its performance intolerably slow.
I think it'd be faster if I wrote a C# program (or maybe even a macro).
Update: PrepWebLog just crashed on me. I'd avoid it.
Update #2, I looked at PowerShell's Get-Content and Set-Content commands but didn't like the syntax and possible performance. So I wrote this little C# console app:
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
Regex hashString = new Regex("^#.+\r\n", RegexOptions.Multiline | RegexOptions.Compiled);
foreach (string file in Directory.GetFiles(path, "*.log"))
{
string data;
using (StreamReader sr = new StreamReader(file))
{
data = sr.ReadToEnd();
}
string output = hashString.Replace(data, string.Empty);
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
sw.Write(output);
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
It's pretty quick.

Following up on what PeterX wrote, I modified the application to handle large log files since anything sufficiently large would create an out-of-memory exception. Also, since we're only interested in whether or not the first character of a line starts with a hash, we can just use StartsWith() method on the read operation.
class Program
{
static void Main(string[] args)
{
if (args.Length == 2)
{
string path = args[0];
string outPath = args[1];
string line;
foreach (string file in Directory.GetFiles(path, "*.log"))
{
using (StreamReader sr = new StreamReader(file))
{
using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
{
while ((line = sr.ReadLine()) != null)
{
if(!line.StartsWith("#"))
{
sw.WriteLine(line);
}
}
}
}
}
}
else
{
Console.WriteLine("Source and Destination Log Path required or too many arguments");
}
}
}

what is the preferable way to pass argument from function to function?

i'm creating a function() but this func() is getting bigger so i need to break it down into smaller part. here's the illustration :
def myfunc(x,y,z):
out=(x*y*z)+ val
return out
a,b,c=1,2,3
A=myfunc(a,b,c)
print A
let say i want to separate (break down) the (x*y*z) into another function like this :
def myotherfunc(x,y,z):
return x*y*z
def myfunc(x,y,z):
out=myotherfunc(x,y,z) + val
return out
That is a simple breakdown, but i got another workflow as follow :
def myotherfunc(x,y,z)
return x*y*z
def myfunc(xx):
out=xx+val
return out
a,b,c=1,2,3
A=myfunc( myotherfunc(a,b,c) )
print A
Same result, but for more complex programming case, which is preferable workflow ? and why ?

The answer should also depend on (potential) reuse: is it likely, e.g., that myotherfunc would in the future be called with more than three parameters? if so, prefer option 2.

Grails: can I make a validator apply to create only (not update/edit)

I have a domain class that needs to have a date after the day it is created in one of its fields.
class myClass {
Date startDate
String iAmGonnaChangeThisInSeveralDays
static constraints = {
iAmGonnaChangeThisInSeveralDays(nullable:true)
startDate(validator:{
def now = new Date()
def roundedDay = DateUtils.round(now, Calendar.DATE)
def checkAgainst
if(roundedDay>now){
Calendar cal = Calendar.getInstance();
cal.setTime(roundedDay);
cal.add(Calendar.DAY_OF_YEAR, -1); // <--
checkAgainst = cal.getTime();
}
else checkAgainst = roundedDay
return (it >= checkAgainst)
})
}
}
So several days later when I change only the string and call save the save fails because the validator is rechecking the date and it is now in the past. Can I set the validator to fire only on create, or is there some way I can change it to detect if we are creating or editing/updating?
#Rob H
I am not entirely sure how to use your answer. I have the following code causing this error:
myInstance.iAmGonnaChangeThisInSeveralDays = "nachos"
myInstance.save()
if(myInstance.hasErrors()){
println "This keeps happening because of the stupid date problem"
}

You can check if the id is set as an indicator of whether it's a new non-persistent instance or an existing persistent instance:
startDate(validator:{ date, obj ->
if (obj.id) {
// don't check existing instances
return
}
def now = new Date()
...
}

One option might be to specify which properties you want to be validated. From the documentation:
The validate method accepts an
optional List argument which may
contain the names of the properties
that should be validated. When a List
is passed to the validate method, only
the properties defined in the List
will be validated.
Example:
// when saving for the first time:
myInstance.startDate = new Date()
if(myInstance.validate() && myInstance.save()) { ... }
// when updating later
myInstance.iAmGonnaChangeThisInSeveralDays = 'New Value'
myInstance.validate(['iAmGonnaChangeThisInSeveralDays'])
if(myInstance.hasErrors() || !myInstance.save(validate: false)) {
// handle errors
} else {
// handle success
}
This feels a bit hacky, since you're bypassing some built-in Grails goodness. You'll want to be cautious that you aren't bypassing any necessary validation on the domain that would normally happen if you were to just call save(). I'd be interested in seeing others' solutions if there are more elegant ones.
Note: I really don't recommend using save(validate: false) if you can avoid it. It's bound to cause some unforeseen negative consequence down the road unless you're very careful about how you use it. If you can find an alternative, by all means use it instead.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Grooviest way to clean malformed json in memory - json

My take: def filterComments(str){ str.readLines().findAll{ !(it ==~ /^\s(\|\/\\|\/\/).*/) }.join('\n') }

Related

Confused about this nested function

Using Groovy in Confluence

Excluding Content From SQL Bulk Insert

what is the preferable way to pass argument from function to function?

Grails: can I make a validator apply to create only (not update/edit)

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Grooviest way to clean malformed json in memory - json

My take: def filterComments(str){ str.readLines().findAll{ !(it ==~ /^\s*(\*|\/\*\*|\/\/).*/) }.join('\n') }

Related

Confused about this nested function

Using Groovy in Confluence

Excluding Content From SQL Bulk Insert

what is the preferable way to pass argument from function to function?

Grails: can I make a validator apply to create only (not update/edit)

Categories

Resources

My take: def filterComments(str){ str.readLines().findAll{ !(it ==~ /^\s(\|\/\\|\/\/).*/) }.join('\n') }