parsing JSON nested input in Scalding - json

I have some JSON input that I need to parse and process (this is the first time I am using JSON). My input is as follows:
{"id":"id2","v":2, "d":{"Location":"JPN"})
{"id":"id1","v":1, "d":{"Location":"USA"}}
{"id":"id2","v":1, "d":{"Location":"JPN"}}
{"id":"id1","v":2, "d":{"Location":"USA"}}
My goal is to write a scalding script that groups the input by the Location field and output the count. SO in the above example, "JPN" and "USA" should have a count of 2 each.
Scalding provides a class called JsonLine. My script is as follows:
class ParseJsonLine(args: Args) extends Job(args) {
JsonLine(args("input"), ('id, 'v, 'd)).read
.groupBy('d){_.size}
.write(args("output"))
}
The above code compiles ok, but at runtime generates the following error:
Caused by: java.lang.ClassCastException: scala.collection.immutable.Map$Map1 cannot be cast to java.lang.Comparable
Basically, I am not sure how to reference the Location field. "d.Location" did not work and grouping by the complex structure "d" produces the arity error above.
I did not find too many examples of nested input parsing using json in scalding. Also, I am not sure if there is something better than JsonLine for nested input.
I would appreciate your help.
thanks

Perhaps using Symbol?
Take a look at the unit tests: https://github.com/twitter/scalding/blob/0.11.0/scalding-json/src/test/scala/com/twitter/scalding/JsonLineTest.scala
JsonLine(args("input"), ('id, 'v, Symbol("d.Location"))).read
.groupBy(Symbol("d.Location")){_.size}
.write(args("output"))
Note: Learner here so feel free to improvise/correct/educate.

Related

What mechanism works to show component ID in chisel3 elaboration

Chisel throws an exception with an elaboration error message. The following is a result of my code as an example.
chisel3.core.Binding$ExpectedHardwareException: data to be connected 'chisel3.core.Bool#81' must be hardware, not a bare Chisel type. Perhaps you forgot to wrap it in Wire(_) or IO(_)?
This exception message is interesting because 81 behind chisel3.core.Bool# looks like ID, not hashcode.
Indeed, Data type extends HasId trait which has _id field, and
_id field seems to generate a unique ID for each components.
I've thought Data type overrides toString to make string that has type#ID, but it does not override. That is why $node in below code should not be able to use ID.
throw Binding.ExpectedHardwareException(s"$prefix'$node' must be hardware, " +
"not a bare Chisel type. Perhaps you forgot to wrap it in Wire(_) or IO(_)?")
Instead of toString, toNamed method exists in Data. However, this method seems to be called to generate a firrtl code, not to convert component into string.
Why can Data type show its ID?
If it is not ID, but exactly hashcode, this question is from my misunderstanding.
I think you should take a look at Chisel PR #985. It changes the way that Data's toString method is implemented. I'm not sure if it answers your question directly but it's possible this will make the meaning and location of the error clearer. If not you should comment on it.
Scala classes come with a default toString method that is of the form className#hashCode.
As you noted, the chisel3.core.Bool#81 sure looks like it's using the _id rather than the hashCode. That's because in the most recently published version of Chisel (3.1.6), the hashcode was the id! You can see this if you inspect the source files at the tag for that version: https://github.com/freechipsproject/chisel3/blob/dc4200f8b622e637ec170dc0728c7887a7dbc566/chiselFrontend/src/main/scala/chisel3/internal/Builder.scala#L81
This is no longer the case on master which probably the source of any confusion! As Chick noted, we have just changed the .toString method to be more informative than the default; expect more informative representations in 3.2.0!

Correct way to use Mockito for JdbcOperation

I am new to Mockito and trying to cover following source code:
jdbcOperations.update(insertRoleQuery,new Object[]{"menuName","subMenuName","subSubMenuName","aa","bb","cc","role"});
In this query is taking 7 string parameters. I have written the mockito test case for the code and it's also covering the source code but I am not sure whether it's the correct way or not.
when(jdbcOperations.update(Mockito.anyString(), new Object[]{Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString()})).thenThrow(runtimeException);
Please suggest if i am doing it right way or not.
Thanks
As per the docs, you can either use exact values, or argument matchers, but not both at the same time:
Warning on argument matchers:
If you are using argument matchers, all arguments have to be provided
by matchers.
If you do mix them, like in your sample, mockito will complain with something similar to
org.mockito.exceptions.misusing.InvalidUseOfMatchersException:
Invalid use of argument matchers!
2 matchers expected, 1 recorded:
-> at MyTest.shouldMatchArray(MyTest.java:38)
This exception may occur if matchers are combined with raw values:
//incorrect:
someMethod(anyObject(), "raw String");
When using matchers, all arguments have to be provided by matchers.
For example:
//correct:
someMethod(anyObject(), eq("String by matcher"));
For more info see javadoc for Matchers class.
In your case you don't seem to care about the array contents, so you can just use any():
when(jdbcOperation.update(anyString(), any())).thenThrow(runtimeException);
If you want to at least check the number of parameters, you can use either
org.mockito.Mockito's argThat(argumentMatcher):
when(jdbcOperation.update(anyString(), argThat(array -> array.length == 7))).thenThrow(runtimeException);
org.mockito.hamcrest.MockitoHamcrest's argThat(hamcrestMatcher):
when(jdbcOperation.update(anyString(), argThat(arrayWithSize(7)))).thenThrow(runtimeException);
If you're interested in matching certain values, you can use AdditionalMatchers.aryEq(expectedArray), or just Mockito.eq(expectedArray) which has a special implementation for arrays, but I fell that the first one expresses your intent in a clearer way.
when(jdbcOperation.update(anyString(), aryEq(new Object[]{"whatever"}))).thenThrow(runtimeException);

MXNet initialization error on label variable

When I call module.fit() I'm getting an error
ValueError: Unknown initialization pattern for labelidx.
The symbol "labelidx" is the name I'm using for my label data -- I didn't want to use softmax_label because I'm not using softmax output, but that seems to be the default for a lot of thigns. It seems to be trying to initialize labelidx as a parameter, which is a mistake. How can I tell it this is an input not a learned parameter?
I figured this out.
When constructing the Module object, you need to tell it the names of the data (data_names) and labels (label_names). Each of these should be a list of string names. By default data_names=('data',), label_names=('softmax_label',), Otherwise it assumes everything else is learned parameters and will try to initialize them, leading to this error. Docs: http://mxnet.io/api/python/module.html#mxnet.module.module.Module
So in my case it needs Module(label_names=('labelidx',), ...)

JSON Parsing Query in AS3

I'm creating an application that uses a License Plate Recognition System.
The API that I'm using is rest based and returns a JSON to my application, a JSON which I parse and which basically looks like this:
{"plate":
{"data_type": "alpr_results", "epoch_time": 1469660951857, "img_height": 288, "img_width": 432, "results":
[{"plate": "MBR527D", "confidence": 88.891518.....
This is what my parse looks like when I load it into Actionscript:
var ThePlate:Object = JSON.parse(e.target.data)
The Issue I'm having is that I'm unable to trace the Plate entitled "MBR527D" within results, basically because I'm a noob when it comes to JSON.
This is what I try when I attempt to trace the plate and I know I'm doing something wrong:
trace(ThePlate.results.plate);
It returns "undefined", however when I try to trace the image height:
trace(ThePlate.img_height);
It returns the 288 just fine, so I know I'm making a basic error but would appreciate any help you guys have! Thanks!
I'm unable to trace the Plate entitled "MBR527D" within results
That's because it's not (directly) in it. results is an array, which has an object as first element, which has a property named "plate" which has the desired value:
"results": [{"plate": "MBR527D",
trace(ThePlate.results.plate);
Try
trace(ThePlate.results[0].plate);
instead.

Readtimearray function in Julia TimeSeries package

I would like to read a csv file of the following form with readtimearray:
"","ES1 Index","VG1 Index","TY1 Comdty","RX1 Comdty","GC1 Comdty"
"1999-01-04",1391.12,3034.53,66.515625,86.2,441.39
"1999-01-05",1404.86,3072.41,66.3125,86.17,440.63
"1999-01-06",1435.12,3156.59,66.4375,86.32,441.7
"1999-01-07",1432.32,3106.08,66.25,86.22,447.67
"1999-01-08",1443.81,3093.46,65.859375,86.36,447.06
"1999-01-11",1427.84,3005.07,65.71875,85.74,449.5
"1999-01-12",1402.33,2968.04,65.953125,86.31,442.92
"1999-01-13",1388.88,2871.23,66.21875,86.52,439.4
"1999-01-14",1366.46,2836.72,66.546875,86.73,440.01
However, here's what I get when I evaluate readtimearray("myfile.csv")
ERROR: `convert` has no method matching convert(::Type{UTF8String}, ::Float64)
in push! at array.jl:460
in readtimearray at /home/juser/.julia/v0.3/TimeSeries/src/readwrite.jl:25
What is it that I am not seeing?
That looks like a bug in readtimearray.
Empty lines are removed but, to identify them,
the code only looks at the first column.
Since the header has an empty string in the first column, it is removed...
Changing the header of your file to
"date","ES1 Index","VG1 Index","TY1 Comdty","RX1 Comdty","GC1 Comdty"
addresses the problem.
You're using convert, which is meant for use with julia types (see doc for more info).
You parse the string using Date:
d=Date("1999-04-01","yyyy-mm-dd")
#...
array_of_dates = map(x->Date(x,"yyyy-mm-dd"),array_of_strings)