Subsetting JSON data - json

Please consider this dataset:
type Deck = JsonProvider<"...">
let dt = Deck.GetSamples()
dt
[{"collectible":true,"health":4,"artist":"Zoltan Boros","type":"MINION","cost":1,"attack":2},
{"collectible":true,"health":8,"artist":"James Ryman","type":"MINION","cost":8,"attack":8},
{"collectible":true,"health":3,"artist":"Warren Mahy", "type":"LAND","cost":2,"attack":2}]
I am trying to build a function capable of extracting certain info from it and, eventually, store them in a smaller dataset. It should, given a list-like dataset deck, consider only the cards that for the keys equal to given values.
let rec filter deck key value =
let rec aux l1 l2 l3 =
match l1 with
[] -> []
| x::xs when x.l2 = l3 -> x::(aux xs key value)
aux deck key value
For example,
filter dt type minion
should subset the deck in a smaller one with only the first and second card. I think I did few steps forward in getting the concept, but still it does not work, throwing an error of kind
FS0072: Lookup on object of indeterminate type based on information prior to
this program point. A type annotation may be needed prior to this program point to
constrain the type of the object. This may allow the lookup to be resolved.
How should I define the type of key? I tried with key : string and key : string list, without succeed.

Are you trying to re-implement filter?
#if INTERACTIVE
#r #"..\packages\FSharp.Data\lib\net40\FSharp.Data.dll"
#endif
open FSharp.Data
[<Literal>]
let jsonFile = #"C:\tmp\test.json"
type Json = JsonProvider<jsonFile>
let deck = Json.Load(jsonFile)
deck |> Seq.filter (fun c -> c.Type = "MINION")
Gives me:
val it : seq.Root> = seq
[{ "collectible": true, "health": 4, "artist": "Zoltan Boros", "type": "MINION", "cost": 1, "attack": 2 };
{ "collectible": true, "health": 8, "artist": "James Ryman", "type": "MINION", "cost": 8, "attack": 8 }]

You actually need to annotate the type of l1.
setting l1: something list should be what you want.
Key doesn't help as type inference is top to bottom and x.l2 is before aux is called with key as an argument

Related

Read and store game state as CSV

Thanks to the great help from Tenfour04, I've got wonderful code for handling CSV files.
However, I am in trouble like followings.
How to call these functions?
How to initialize 2-dimensional array variables?
Below is the code that finally worked.
MainActivity.kt
package com.surlofia.csv_tenfour04_1
import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import java.io.File
import java.io.IOException
import com.surlofia.csv_tenfour04_1.databinding.ActivityMainBinding
var chk_Q_Num: MutableList<Int> = mutableListOf (
0,
1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 13, 14, 15,
16, 17, 18, 19, 20,
)
var chk_Q_State: MutableList<String> = mutableListOf (
"z",
"a", "b", "c", "d", "e",
"f", "g", "h", "i", "j"
)
class MainActivity : AppCompatActivity() {
private lateinit var binding: ActivityMainBinding
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// setContentView(R.layout.activity_main)
binding = ActivityMainBinding.inflate(layoutInflater)
val view = binding.root
setContentView(view)
// Load saved data at game startup. It will be invalid if performed by other activities.
val filePath = filesDir.path + "/chk_Q.csv"
val file = File(filePath)
binding.fileExists.text = isFileExists(file).toString()
if (isFileExists(file)) {
val csvIN = file.readAsCSV()
for (i in 0 .. 10) {
chk_Q_Num[i] = csvIN[i][0].toInt()
chk_Q_State[i] = csvIN[i][1]
}
}
// Game Program Run
val csvOUT = mutableListOf(
mutableListOf("0","OK"),
mutableListOf("1","OK"),
mutableListOf("2","OK"),
mutableListOf("3","Not yet"),
mutableListOf("4","Not yet"),
mutableListOf("5","Not yet"),
mutableListOf("6","Not yet"),
mutableListOf("7","Not yet"),
mutableListOf("8","Not yet"),
mutableListOf("9","Not yet"),
mutableListOf("10","Not yet")
)
var tempString = ""
for (i in 0 .. 10) {
csvOUT[i][0] = chk_Q_Num[i].toString()
csvOUT[i][1] = "OK"
tempString = tempString + csvOUT[i][0] + "-->" + csvOUT[i][1] + "\n"
}
binding.readFile.text = tempString
// and save Data
file.writeAsCSV(csvOUT)
}
// https://www.techiedelight.com/ja/check-if-a-file-exists-in-kotlin/
private fun isFileExists(file: File): Boolean {
return file.exists() && !file.isDirectory
}
#Throws(IOException::class)
fun File.readAsCSV(): List<List<String>> {
val splitLines = mutableListOf<List<String>>()
forEachLine {
splitLines += it.split(", ")
}
return splitLines
}
#Throws(IOException::class)
fun File.writeAsCSV(values: List<List<String>>) {
val csv = values.joinToString("\n") { line -> line.joinToString(", ") }
writeText(csv)
}
}
chk_Q.csv
0,0
1,OK
2,OK
3,Not yet
4,Not yet
5,Not yet
6,Not yet
7,Not yet
8,Not yet
9,Not yet
10,Not yet
1. How to call these functions?
The code below seems work well.
Did I call these funtions in right way?
Or are there better ways to achieve this?
read
if (isFileExists(file)) {
val csvIN = file.readAsCSV()
for (i in 0 .. 10) {
chk_Q_Num[i] = csvIN[i][0].toInt()
chk_Q_State[i] = csvIN[i][1]
}
}
write
file.writeAsCSV(csvOUT)
2. How to initialize 2-dimensional array variables?
val csvOUT = mutableListOf(
mutableListOf("0","OK"),
mutableListOf("1","OK"),
mutableListOf("2","OK"),
mutableListOf("3","Not yet"),
mutableListOf("4","Not yet"),
mutableListOf("5","Not yet"),
mutableListOf("6","Not yet"),
mutableListOf("7","Not yet"),
mutableListOf("8","Not yet"),
mutableListOf("9","Not yet"),
mutableListOf("10","Not yet")
)
I would like to know the clever way to use a for loop instead of writing specific values one by one.
For example, something like bellow.
val csvOUT = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = i
csvOUT[i][1] = "OK"
}
But this gave me the following error message:
Not enough information to infer type variable T
It would be great if you could provide an example of how to execute this for beginners.
----- Added on June 15, 2022. -----
[Question 1]
Regarding initialization, I got an error "keep stopping" when I executed the following code.
The application is forced to terminate.
Why is this?
val csvOUT: MutableList<MutableList<String>> = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = "$i"
csvOUT[i][1] = "OK"
}
[Error Message]
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.surlofia.csv_endzeit_01/com.surlofia.csv_endzeit_01.MainActivity}: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
In my opinion there are basically two parts to your question. First you need an understanding of the Kotlin type system including generics. Secondly you want some knowledge about approaches to the problem at hand.
type-system and generics
The function mutableListOf you're using is generic and thus needs a single type parameter T, as can be seen by definition its taken from the documentation:
fun <T> mutableListOf(): MutableList<T>
Most of the time the Kotlin compiler is quite good at type-inference, that is guessing the type used based on the context. For example, I do not need to provide a type explicitly in the following example, because the Kotlin compiler can infer the type from the usage context.
val listWithInts = mutableListOf(3, 7)
The infered type is MutableList<Int>.
However, sometimes this might not be what one desires. For example, I might want to allow null values in my list above. To achieve this, I have to tell the compiler that it should not only allow Int values to the list but also null values, widening the type from Int to Int?. I can achieve this in at least two ways.
providing a generic type parameter
val listWithNullableInts = mutableListOf<Int?>(3, 7)
defining the expected return type explicitly
val listWithNullableInts: MutableList<Int?> = mutableListOf(3, 7)
In your case the compiler does NOT have enough information to infer the type from the usage context. Thus you either have to provide it that context, e.g. by passing values of a specific type to the function or using one of the two options named above.
initialization of multidimensional arrays
There are questions and answers on creating multi-dimensional arrays in Kotlin on StackOverflow already.
One solution to your problem at hand might be the following.
val csvOUT: MutableList<MutableList<String>> = mutableListOf(mutableListOf())
for (i in 0 .. 10) {
csvOUT[i][0] = "$i"
csvOUT[i][1] = "OK"
}
You help the Kotlin compiler by defining the expected return type explicitly and then add the values as Strings to your 2D list.
If the dimensions are fixed, you might want to use fixed-size Arrays instead.
val csvArray = Array(11) { index -> arrayOf("$index", "OK") }
In both solutions you convert the Int index to a String however.
If the only information you want to store for each level is a String, you might as well use a simple List<String and use the index of each entry as the level number, e.g.:
val csvOut = List(11) { "OK" }
val levelThree = csvOut[2] // first index of List is 0
This would also work with more complicated data structures instead of Strings. You simply would have to adjust your fun File.writeAsCSV(values: List<List<String>>) to accept a different type as the values parameter.
Assume a simple data class you might end up with something along the lines of:
data class LevelState(val state: String, val timeBeaten: Instant?)
val levelState = List(11) { LevelState("OK", Instant.now()) }
fun File.writeAsCSV(values: List<LevelState>) {
val csvString = values
.mapIndexed { index, levelState -> "$index, ${levelState.state}, ${levelState.timeBeaten}" }
.joinToString("\n")
writeText(csvString)
}
If you prefer a more "classical" imperative approach, you can populate your 2-dimensional Array / List using a loop like for in.
val list: MutableList<MutableList<String>> = mutableListOf() // list is now []
for (i in 0..10) {
val innerList: MutableList<String> = mutableListOf()
innerList.add("$i")
innerList.add("OK")
innerList.add("${Instant.now()}")
list.add(innerList)
// list is after first iteration [ ["0", "OK", "2022-06-15T07:03:14.315Z"] ]
}
The syntax listName[index] = value is just syntactic sugar for the operator overload of the set operator, see the documentation on MutableList for example.
You cannot access an index, that has not been populated before, e.g. during the List's initialization or by using add; or else you're greeted with a IndexOutOfBoundsException.
If you want to use the set operator, one option is to use a pre-populated Array as such:
val array: Array<Array<String>>> = Array(11) {
Array(3) { "default" }
} // array is [ ["default, "default", "default"], ...]
array[1][2] = "myValue"
However, I wouldn't recommend this approach, as it might lead to left over, potentially invalid initial data, in case one misses to replace a value.

How to get map keys from Arrow dataset

What is the recommended approach to obtain a unique list of map keys from an Arrow dataset?
For a dataset with schema containing:
...
PARQUET:field_id: '19'
detail: map<string, struct<reported: bool, incidents_per_month: int32>
...
Sample data:
"detail": {"a": {"reported": true, "incidents_per_month: 3}, "b": {"reported": true, "incidents_per_month: 3}},
"detail": {"c": {"reported": false, "incidents_per_month: 3}}
What is the right approach to obtaining a list of unique map keys for field detail? i.e. a,b,c
Currrent (slow) approach:
map_data = dataset.field('a)
map_keys = list(set([key for chunk in map_data.iterchunks() for key in chunk.keys.unique().tolist()]))
You already found the .keys attribute of a MapArray. This gives an array of all keys, of which you can take the unique values.
But a dataset (Table) can consist of many chunks, and then accessing the data of a column gives a ChunkedArray which doesn't have this keys attribute. For that reason, you loop over the different chunks, and combine the unique values of all of those.
For now, looping over the chunks is still needed I think, but calculating the overall uniques can be done a bit more efficiently with pyarrow:
# set-up small example
map_type = pa.map_(pa.string(), pa.struct([('reported', pa.bool_()), ('incidents_per_month', pa.int32())]))
values = [
[("a", {"reported": True, "incidents_per_month": 3}), ("b", {"reported": True, "incidents_per_month": 3})],
[("c", {"reported": False, "incidents_per_month": 3})]
]
dataset = pa.table({'detail': pa.array(values, map_type)})
# then creating a chunked array of keys
map_data = dataset.column('detail')
keys = pa.chunked_array([chunk.keys for chunk in map_data.iterchunks()])
# and taking the unique of those in one go:
>>> keys.unique()
<pyarrow.lib.StringArray object at 0x7fbc578af940>
[
"a",
"b",
"c"
]
For optimal efficiency, it would still be good to avoid the python loop of pa.chunked_array([chunk.keys for chunk in map_data.iterchunks()]), and for this I opened https://issues.apache.org/jira/browse/ARROW-12564 to track this enhancement.

Decoding a thing or a list of things in json

I'm trying to parse JSON-LD, and one of the possible constructs is
"John" : {
"type": "person",
"friend": [ "Bob", "Jane" ],
}
I would like to decode into records of type
type alias Triple =
{ subject: String, predicate: String, object: String }
so the example above becomes:
Triple "John" "type" "person"
Triple "John" "friend" "Bob"
Triple "John" "friend" "Jane"
But "friend" in the JSON object could also be just a string:
"friend": "Mary"
in which case the corresponding triple would be
Triple "John" "friend" "Mary"
Any idea?
First, you'll need a way to list all key/value pairs from a JSON object. Elm offers the Json.Decode.keyValuePairs function for this purpose. It gives you a list of key names which you'll use for the predicate field, but you'll also have to describe a decoder for it to use for the values.
Since your values are either a string or a list of strings, you can use Json.Decode.oneOf to help. In this example, we'll just convert a string to a singleton list (e.g. "foo" becomes ["foo"]), just because it makes it easier to map over later.
stringListOrSingletonDecoder : Decoder (List String)
stringListOrSingletonDecoder =
JD.oneOf
[ JD.string |> JD.map (\s -> [ s ])
, JD.list JD.string
]
Since the output of keyValuePairs will be a list of (String, List String) values, we'll need a way to flatten those into a List (String, String) value. We can define that function like this:
flattenSnd : ( a, List b ) -> List ( a, b )
flattenSnd ( key, vals ) =
List.map (\val -> ( key, val )) vals
Now you can use these two functions to split up an object into a triple. This accepts a string argument which is the key to look up in your calling function (e.g. we need to look up the wrapping "John" key).
itemDecoder : String -> Decoder (List Triple)
itemDecoder key =
JD.field key (JD.keyValuePairs stringListOrSingletonDecoder)
|> JD.map
(List.map flattenSnd
>> List.concat
>> List.map (\( a, b ) -> Triple key a b)
)
See a working example here on Ellie.
Note that the order of keys may not match how you listed them in the input JSON, but that is just how JSON works. It's a lookup table, not an ordered list

Elm - Decode Json with dynamic keys

I'd like to decode a Json file that would look like this:
{ 'result': [
{'id': 1, 'model': 'online', 'app_label': 'some_app_users'},
{'id': 2, 'model': 'rank', 'app_label': 'some_app_users'},
]}
or like this:
{ 'result': [
{'id': 1, 'name': 'Tom', 'skills': {'key': 'value', ...}, {'key': 'value', ...}},
{'id': 1, 'name': 'Bob', 'skills': {'key': 'value', ...}, {'key': 'value', ...}},
]}
Basically, the content under result is a list of dicts with the same keys - but I don't know these keys in advance and I don't know their value types (int, string, dict, etc.).
The goal is to show databases tables content; the Json contains the result of the SQL query.
My decoder looks like this (not compiling):
tableContentDecoder : Decode.Decoder (List dict)
tableContentDecoder =
Decode.at [ "result" ] (Decode.list Decode.dict)
I use it like this:
Http.send GotTableContent (Http.get url tableContentDecoder)
I'm getting that error:
Function list is expecting the argument to be:
Decode.Decoder (Dict.Dict String a)
But it is:
Decode.Decoder a -> Decode.Decoder (Dict.Dict String a)
What's the correct syntax to use the dict decoder? Will that work? I couldn't find any universal Elm decoder...
Decode.list is a function that takes a value of type Decoder a and returns a value of the type Decoder (List a). Decode.dict is also a function that takes a value of type Decoder a that returns a decoder of Decoder (Dict String a). This tells us two things:
We need to pass a decoder value to Decode.dict before we pass it to Decoder.list
A Dict may not fit your use case as Dicts can only map between two fixed types and do not support nest values like 'skills': {'key': 'value', ...}
Elm doesn't provide a universal decoder. The motivation for this has to do with Elm's guarantee of "no runtime errors". When dealing with the outside world, Elm needs to protect its runtime from the possibility of external failures, mistakes, ect. Elm's primary mechanism for doing this is types. Elm only lets data in that is correctly described and by doing so eliminates the possibility of errors that a universal decoder would introduce.
Since your primary goal is to display content, something like Dict String String might work, but it depends on how deeply nested your data is. You could implement this with a small modification to your code: Decode.at [ "result" ] <| Decode.list (Decode.dict Decode.string).
Another possibility is using Decode.value and Decode.andThen to test for values that indicate which table we are reading from.
It's important that our decoder has a single consistent type, which means we would need to represent our possible results as a sum type.
-- represents the different possible tables
type TableEntry
= ModelTableEntry ModelTableFields
| UserTableEntry UserTableFields
| ...
-- we will use this alias as a constructor with `Decode.map3`
type alias ModelTableFields =
{ id : Int
, model : String
, appLabel : String
}
type alias UserTableFields =
{ id : Int
, ...
}
tableContentDecoder : Decoder (List TableEntry)
tableContentDecoder =
Decode.value
|> Decode.andThen
\value ->
let
tryAt field =
Decode.decodeValue
(Decode.at ["result"] <|
Decode.list <|
Decode.at [field] Decode.string)
value
in
-- check the results of various attempts and use
-- the appropriate decoder based on results
case ( tryAt "model", tryAt "name", ... ) of
( Ok _, _, ... ) ->
decodeModelTable
( _, Ok _, ... ) ->
decodeUserTable
...
(_, _, ..., _ ) ->
Decode.fail "I don't know what that was!"
-- example decoder for ModelTableEntry
-- Others can be constructed in a similar manner but, you might
-- want to use NoRedInk/Json.Decode.Pipline for more complex data
decodeModel : Decoder (List TableEntry)
decodeModel =
Decode.list <|
Decode.map3
(ModelTableEntry << ModelTableFields)
(Decode.field "id" Decode.int)
(Decode.field "model" Decode.string)
(Decode.field "app_label" Decode.string)
decodeUser : Decoder (List TableEntry)
decodeUser =
...
It is fair to say that this is a lot more work than most other languages would make you do to parse JSON. However, this comes with the benefit of being able to use outside data without worrying about exceptions.
One way of thinking about it is that Elm makes you do all the work upfront. Where other languages might let you get up and running faster but, do less to help you get to a stable implementation.
I couldn't figure out how to get the Decode.dict to work so I have changed my Json and splited the columns and results:
data={
'columns': [column.name for column in cursor.description],
'results': [[str(column) for column in record] for record in cursor.fetchall()]
}
I also had to convert all the results to String to make it simple. The Json will have 'id': "1" for example.
With the Json done that way, the Elm code is really simple:
type alias QueryResult =
{ columns : List String, results : List (List String) }
tableContentDecoder : Decode.Decoder QueryResult
tableContentDecoder =
Decode.map2
QueryResult
(Decode.field "columns" (Decode.list Decode.string))
(Decode.field "results" (Decode.list (Decode.list Decode.string)))

Expanding a JSON column in R

I am reading in a data table from a CSV file. Some elements in the CSV are in JSON format, so one of the columns has JSON formatted data, for example:
user_id tv_sec action_info
1: 47074 1426791420 {"foo": {"bar":12345,"baz":309}, "type": "type1"}
2: 47074 1426791658 {"foo": '{"bar":23409,"baz":903}, "type": "type2"}
3: 47074 1426791923 {"foo": {"bar":97241,"baz":218}, "type": "type3"}
I would like to flatten out the action_info column and add the data as columns, as follows:
user_id tv_sec bar baz type
1: 47074 1426791420 12345 309 type1
2: 47074 1426791658 23409 903 type2
3: 47074 1426791923 97241 218 type3
I am not sure how to achieve this. I found a library to convert strings to JSON in R (RJSONIO) but I'm having a hard time figuring out what to do next. When I experiment with just trying to convert all rows in the action_info column to JSON with the command userActions[,.(fromJSON(action_info))] I basically get a data table with what seems like all the values accumulated in some way that's not entirely clear to me. For example, running that with my (non-example) data I get:
V1
1: 2.188603e+12,2.187628e+12,2.186202e+12,1.164000e+03
2: type1
Warning messages:
1: In if (is.na(encoding)) return(0L) :
the condition has length > 1 and only the first element will be used
2: In if (is.na(i)) { :
the condition has length > 1 and only the first element will be used
So, I'm trying to figure out:
how to operate on the column to convert it from JSON to values (I think I am doing this correctly though, but I'm not certain)
how to get the values and create columns out of them in either the current or new data table.
Rather ugly but should work:
library(dplyr)
library(data.table)
lapply(as.character(df$action_info), RJSONIO::fromJSON) %>%
lapply(function(e) list(bar=e$foo[1], baz=e$foo[2], type=e$type)) %>%
rbindlist() %>%
cbind(df) %>%
select(-action_info)
Data:
library(data.table)
df <- data.table(structure(list(user_id = c(47074L, 47074L, 47074L), tv_sec = c(1426791420L,
1426791658L, 1426791923L), action_info = c("{\"foo\": {\"bar\":12345,\"baz\":309}, \"type\": \"type1\"}",
"{\"foo\": {\"bar\":23409,\"baz\":903}, \"type\": \"type2\"}",
"{\"foo\": {\"bar\":97241,\"baz\":218}, \"type\": \"type3\"}"
)), .Names = c("user_id", "tv_sec", "action_info"), row.names = c(NA,
-3L), class = "data.frame"))
Here's one way to do it with data_table:
df[, c('bar', 'baz', 'type'):=as.list(unlist(fromJSON(action_info[1]))),
by=action_info]
How it works:
The by=action_info essentially makes sure we just call fromJSON once per unique action_info (once per row in your case); this is because fromJSON doesn't work on vectorised input.
The fromJSON(action_info[1]) converts the action_info to JSON (the [1] is on the off chance that you have multiple rows with the same action_info since fromJSON doesn't work on vector input).
The unlist flattens the nested "foo: {bar...}" (do fromJSON(df$action_info[1]) and unlist(fromJSON(df$action_info[1])) to see what I mean).
The as.list converts the result back into a list, with one element per "column" (data.table needs this to do the multiple assignment)
Then the c('bar', 'baz', 'type'):= assigns the output back out to the columns.
Note we don't match by name, so 'bar' is always the first part of the JSON, 'baz' is always the second, etc. If your action_info could have a {bar: ..., baz: ...} as well as a {baz: ..., bar: ...} the baz of the second will be assigned to the bar column. If you want to be cleverer and assign by name, you will have to think of something cleverer (for you could do as.list(...)[c('foo.bar', 'foo.baz', 'type')] to ensure the elements are in the right order before assigning).