jq produces memory overflow - json

I have a json file where a time series in stored under data key and and an object id is in info key:
{info:
{id: abc},
data:[
[10, 5, 3],
[12, 6, 4],
# 5000 list items
]
}
I would like to flatten the json and produce something similar to:
[
{id: abc, time: 10, x: 5, y: 3},
{id: abc, time: 12, x: 6, y: 4},
# the rest of 5000 points
]
I'm running a jq query and seems to work well to produce a series of items:
"{time: .data[][0], x: .data[][2], y: .data[][1], item: .info.id}"
When I try to put the same expression into a list to create a list of dicts, I'm hitting a memory overflow limit:
"[{time: .data[][0], x: .data[][2], y: .data[][1], item: .info.id}]"
Is there anyhting else I can do differently? Many thanks in advance.

#peak has already pointed out the problem with your query, and here is the solution based on the insight he provided:
[ (.data[] | {time: .[0], x: .[1], y: .[2]}) + {id: .info.id} ]
See it online on jqplay.org

Related

Get category of movie from json struct using spark scala

I have a df_movies and col of geners that look like json format.
|genres |
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 37, 'name': 'Western'}]
How can I extract the first field of 'name': val?
way #1
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
""" 'name': (\w+)""",1)).show(false)
way #2
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':\s\d,\s 'name':\s(\w+)""",1))
Excepted: Action
You can use get_json_object function:
Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
.toDF("genres")
.withColumn("genres_extract", get_json_object(col("genres"), "$[0].name" ))
.show()
+--------------------+--------------+
| genres|genres_extract|
+--------------------+--------------+
|[{"id": 28, "name...| Action|
+--------------------+--------------+
Another possibility is using the from_json function together with a self defined schema. This allows you to "unwrap" the json structure into a dataframe with all of the data in there, so that you can use it however you want!
Something like the following:
import org.apache.spark.sql.types._
Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
.toDF("genres")
// Creating the necessary schema for the from_json function
val moviesSchema = ArrayType(
new StructType()
.add("id", StringType)
.add("name", StringType)
)
// Parsing the json string into our schema, exploding the column to make one row
// per json object in the array and then selecting the wanted columns,
// unwrapping the parsedActions column into separate columns
val parsedDf = df
.withColumn("parsedMovies", explode(from_json(col("genres"), moviesSchema)))
.select("parsedMovies.*")
parsedDf.show(false)
+---+---------+
| id| name|
+---+---------+
| 28| Action|
| 12|Adventure|
| 37| Western|
+---+---------+

How to explode/unwrap all documents using jq?

I have a very large file that looks like this:
[
{a: 4, b: [1,2,3]},
{a: 6, b: [7,8,9]},
]
and I would like to transform it to
{a: 4, b: 1},
{a: 4, b: 2},
{a: 4, b: 3},
{a: 6, b: 7},
{a: 6, b: 8},
{a: 6, b: 9}
using jq. The filter .[] | {a: .a, b: .b[]} would work for a smaller set of input. Given the size of the file, I want to use --streaming. Anyone who could give a pointer on how to use streaming to solve this problem?
If the "very large file" fits into your memory, just decompose the array .[], and create your objects as needed using iterations {a, b: .b[]}:
jq -c '.[] | {a, b: .b[]}'
{"a":4,"b":1}
{"a":4,"b":2}
{"a":4,"b":3}
{"a":6,"b":7}
{"a":6,"b":8}
{"a":6,"b":9}
Demo
If not, but an array item alone would, use the --stream flag to read the file in parts, only consider the items level using truncate_stream with level 1, re-compose the array items using fromstream, and create the final objects as above:
jq --stream -cn 'fromstream(1 | truncate_stream(inputs)) | {a, b: .b[]}'
{"a":4,"b":1}
{"a":4,"b":2}
{"a":4,"b":3}
{"a":6,"b":7}
{"a":6,"b":8}
{"a":6,"b":9}

Mathematica, combine ContourPlot3D and ListPointPlot3D

I would like to combine a 3-dimensional function plot with some 3D Points. Some lines that work separately are:
D3Plot= ContourPlot3D[x^2+y^2+z^2== 2, {x, 0, 2}, {y, 0, 2}, {z, 0,2}, ColorFunction -> Function[{x, y, z}, Hue[1*(1 - z)]]]
and:
atest3D = {{1, 1, 1}, {2, 1, 1}, {1, 2, 1}, {1, 1, 2}, {2, 2, 2}};
However, I get some problems when combining them together:
Show[atest3D,D3Plot,AxesOrigin -> {0, 0, 0}, PlotRange -> {{0, 3}, {0, 3}, {0, 3}}]
Is there any way to get this to work or some other way to show these two plots together?
Something like this?
Show[D3Plot, Graphics3D[{Red, PointSize[0.1], Point[atest3D]}], PlotRange -> All]
Is there also any way to make the points always viewable, also if they are on the other side of the surface?

Merge several json arrays in circe

Let's say we have 2 json arrays. How to merge them into a single array with circe? Example:
Array 1:
[{"id": 1}, {"id": 2}, {"id": 3}]
Array 2:
[{"id": 4}, {"id": 5}, {"id": 6}]
Needed:
[{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 5}, {"id": 6}]
I've tried deepMerge, but it only keeps the contents of the argument, not of the calling object.
Suppose we've got the following set-up (I'm using circe-literal for convenience, but your Json values could come from anywhere):
import io.circe.Json, io.circe.literal._
val a1: Json = json"""[{"id": 1}, {"id": 2}, {"id": 3}]"""
val a2: Json = json"""[{"id": 4}, {"id": 5}, {"id": 6}]"""
Now we can combine them like this:
for { a1s <- a1.asArray; a2s <- a2.asArray } yield Json.fromValues(a1s ++ a2s)
Or:
import cats.std.option._, cats.syntax.cartesian._
(a1.asArray |#| a2.asArray).map(_ ++ _).map(Json.fromValues)
Both of these approaches are going to give you an Option[Json] that will be None if either a1 or a2 don't represent JSON arrays. It's up to you to decide what you want to happen in that situation .getOrElse(a2) or .getOrElse(a1.deepMerge(a2)) might be reasonable choices, for example.
As a side note, the current contract of deepMerge says the following:
Null, Array, Boolean, String and Number are treated as values, and values from the argument JSON completely replace values from this JSON.
This isn't set in stone, though, and it might not be unreasonable to have deepMerge concatenate JSON arrays—if you want to open an issue we can do some more thinking about it.

Looping through all json elements using Unity Boomlagoon Json

I'm using Boomlagoon Json in my Unity project. My Json file has several lines in it, and so far I can only get Boomlagoon to read the first one only. Is there a way I can make a loop where it will go through all parse the entire json file?
Here is my json:
{"type": 1, "squads": [{"player_id": 1, "squad": [1, 2, 3, 4]}, {"player_id": 2, "squad": [6, 7, 8, 9]}], "room_number": 1, "alliance_id": 1, "level": 1}
{"type": 2, "squads": [{"player_id": 2, "squad": [1, 2, 3, 4]}, {"player_id": 3, "squad": [6, 7, 8, 9]}], "room_number": 2, "alliance_id": 1, "level": 1}
{"type": 3, "squads": [{"player_id": 3, "squad": [1, 2, 3, 4]}, {"player_id": 4, "squad": [6, 7, 8, 9]}], "room_number": 3, "alliance_id": 1, "level": 1}
And when I do a loop like this:
foreach (KeyValuePair<string, JSONValue> pair in emptyObject) { ... }
it only gives me results for the first entry (in this example type:1). Thanks.
Your file actually contains 3 JSON objects, and what happens when you parse it is that the parsing stops once the first object ends. You need to parse each line separately to get all of the data.
As an aside, you'll notice that if you paste your JSON into the validator at jsonlint.com it'll give you a parsing error where the second object begins.