Loading lines of JSON from Amzon S3 to DyanmoDB - json

I have some output from my apache-spark (PySpark) code that looks like this (very simple JSON objects on per line):
{'id': 1, 'value1': 'blah', 'value2': 1, 'value3': '2016-07-19 19:35:13'}
{'id': 2, 'value1': 'yada', 'value2': 1, 'value3': '2016-07-19 19:35:13'}
{'id': 3, 'value1': 'blah', 'value2': 2, 'value3': '2016-07-19 19:35:13'}
{'id': 4, 'value1': 'yada', 'value2': 2, 'value3': '2016-07-19 19:35:13'}
{'id': 5, 'value1': 'blah', 'value2': 3, 'value3': '2016-07-19 19:35:13'}
{'id': 6, 'value1': 'yada', 'value2': 4, 'value3': '2016-07-19 19:35:13'}
I want to write them to a DynamoDB table as documents. I don't want to convert this to the Map format (if I can avoid it). Any ideas on how to pull this off? So little documentation on the formatting issue.
There is some new DocumentClient() thing, but I can't use it from CLI. For example, feeding one of the above lines as an item to the 'put-item' aws cli command gives error:
aws dynamodb put-item --table-name mytable --item file://item.txt
Parameter validation failed:
Invalid type for parameter Item.......

A JSON string, such as the following, can't be put-itemed directly in DynamoDB:
{'id': 1, 'value1': 'blah', 'value2': 1, 'value3': '2016-07-19 19:35:13'}
It needs to have a format like:
{"id": {"N": 1}, "value1": {"S": "blah"}, "value2": {"N": 1}, "value3": {"S": "2016-07-19 19:35:13"}}
That is because, from the former, DynamoDB doesn't have a way to know the data types of id, value1 etc.
As I see it, you have two options:
Transform your data, from the former to latter, by using some utility. For example, jq.
Use AWS Data Pipeline.

Related

Get category of movie from json struct using spark scala

I have a df_movies and col of geners that look like json format.
|genres |
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 37, 'name': 'Western'}]
How can I extract the first field of 'name': val?
way #1
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
""" 'name': (\w+)""",1)).show(false)
way #2
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':\s\d,\s 'name':\s(\w+)""",1))
Excepted: Action
You can use get_json_object function:
Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
.toDF("genres")
.withColumn("genres_extract", get_json_object(col("genres"), "$[0].name" ))
.show()
+--------------------+--------------+
| genres|genres_extract|
+--------------------+--------------+
|[{"id": 28, "name...| Action|
+--------------------+--------------+
Another possibility is using the from_json function together with a self defined schema. This allows you to "unwrap" the json structure into a dataframe with all of the data in there, so that you can use it however you want!
Something like the following:
import org.apache.spark.sql.types._
Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
.toDF("genres")
// Creating the necessary schema for the from_json function
val moviesSchema = ArrayType(
new StructType()
.add("id", StringType)
.add("name", StringType)
)
// Parsing the json string into our schema, exploding the column to make one row
// per json object in the array and then selecting the wanted columns,
// unwrapping the parsedActions column into separate columns
val parsedDf = df
.withColumn("parsedMovies", explode(from_json(col("genres"), moviesSchema)))
.select("parsedMovies.*")
parsedDf.show(false)
+---+---------+
| id| name|
+---+---------+
| 28| Action|
| 12|Adventure|
| 37| Western|
+---+---------+

JSON_Extract error: Invalid JSON text in argument 1 to function json_extract: "Missing a name for object member."

I'm trying to extract "name" from JSON column "Value":
Table t:
id
Value
1
[{'id': 116298, 'name': 'Data Analysis', 'language': 'en'}, {'id': 5462, 'name': 'Visualization', 'language': '00'}]
My query is:
select
json_extract(t.value,'$name')
from t
Also tried:
select
JSON_SEARCH(t.value, 'all', 'name')
from t
The error I get is:
Data truncation: Invalid JSON text in argument 1 to function
json_extract: "Missing a name for object member." at position 2.
What am I missing?
Appreciate your help!
Check your JSON value. I copied your Value to a JSON validator and it does not like single quotes. So if I run the following I get no results:
SET #mapJSON = "[{'id': 116298, 'name': 'Data Analysis', 'language': 'en'}, {'id': 5462, 'name': 'Visualization', 'language': '00'}]";
SELECT JSON_SEARCH(#mapJSON, "all", "name") as t WHERE JSON_VALID(#mapJSON)=1; -- no result returned
The other problem is that JSON_SEARCH will search on a value, so if you run the following SQL you will get ["$[0].name", "$[2].name"] (being the first object and third object in the array with a name key with value matching "Data Analysis").
SET #mapJSON = '[{"id": 116298, "name": "Data Analysis", "language": "en"}, {"id": 5462, "name": "Visualization", "language": "00"}, {"id": 988, "name": "Data Analysis", "language": "es"}]';
SELECT JSON_SEARCH(#mapJSON, "all", "Data Analysis") as t WHERE JSON_VALID(#mapJSON)=1
Since your Value is an array of objects:
[
{'id': 116298, 'name': 'Data Analysis', 'language': 'en'},
{'id': 5462, 'name': 'Visualization', 'language': '00'}
]
...each Value should be a single object such as:
{'id': 116298, 'name': 'Data Analysis', 'language': 'en'}
...in which case you should get "Data Analysis" when you run:
SET #mapJSON = '{"id": 116298, "name": "Data Analysis", "language": "en"}';
SELECT json_extract(#mapJSON,'$.name') as t WHERE JSON_VALID(#mapJSON)=1
[FYI: I'm using MySQL v8]
You are using the wrong syntax:
select value->"$.name"
from t
as explained here
You can add a where condition like this
select value->"$.name"
from t
WHERE JSON_EXTRACT(value, "$.name") = 'Meital'
The -> operator serves as an alias for the JSON_EXTRACT() function when used with two arguments, a column identifier on the left and a JSON path on the right that is evaluated against the JSON document (the column value). You can use such expressions in place of column identifiers wherever they occur in SQL statements.

Mathematica, combine ContourPlot3D and ListPointPlot3D

I would like to combine a 3-dimensional function plot with some 3D Points. Some lines that work separately are:
D3Plot= ContourPlot3D[x^2+y^2+z^2== 2, {x, 0, 2}, {y, 0, 2}, {z, 0,2}, ColorFunction -> Function[{x, y, z}, Hue[1*(1 - z)]]]
and:
atest3D = {{1, 1, 1}, {2, 1, 1}, {1, 2, 1}, {1, 1, 2}, {2, 2, 2}};
However, I get some problems when combining them together:
Show[atest3D,D3Plot,AxesOrigin -> {0, 0, 0}, PlotRange -> {{0, 3}, {0, 3}, {0, 3}}]
Is there any way to get this to work or some other way to show these two plots together?
Something like this?
Show[D3Plot, Graphics3D[{Red, PointSize[0.1], Point[atest3D]}], PlotRange -> All]
Is there also any way to make the points always viewable, also if they are on the other side of the surface?

Merge several json arrays in circe

Let's say we have 2 json arrays. How to merge them into a single array with circe? Example:
Array 1:
[{"id": 1}, {"id": 2}, {"id": 3}]
Array 2:
[{"id": 4}, {"id": 5}, {"id": 6}]
Needed:
[{"id": 1}, {"id": 2}, {"id": 3}, {"id": 4}, {"id": 5}, {"id": 6}]
I've tried deepMerge, but it only keeps the contents of the argument, not of the calling object.
Suppose we've got the following set-up (I'm using circe-literal for convenience, but your Json values could come from anywhere):
import io.circe.Json, io.circe.literal._
val a1: Json = json"""[{"id": 1}, {"id": 2}, {"id": 3}]"""
val a2: Json = json"""[{"id": 4}, {"id": 5}, {"id": 6}]"""
Now we can combine them like this:
for { a1s <- a1.asArray; a2s <- a2.asArray } yield Json.fromValues(a1s ++ a2s)
Or:
import cats.std.option._, cats.syntax.cartesian._
(a1.asArray |#| a2.asArray).map(_ ++ _).map(Json.fromValues)
Both of these approaches are going to give you an Option[Json] that will be None if either a1 or a2 don't represent JSON arrays. It's up to you to decide what you want to happen in that situation .getOrElse(a2) or .getOrElse(a1.deepMerge(a2)) might be reasonable choices, for example.
As a side note, the current contract of deepMerge says the following:
Null, Array, Boolean, String and Number are treated as values, and values from the argument JSON completely replace values from this JSON.
This isn't set in stone, though, and it might not be unreasonable to have deepMerge concatenate JSON arrays—if you want to open an issue we can do some more thinking about it.

Looping through all json elements using Unity Boomlagoon Json

I'm using Boomlagoon Json in my Unity project. My Json file has several lines in it, and so far I can only get Boomlagoon to read the first one only. Is there a way I can make a loop where it will go through all parse the entire json file?
Here is my json:
{"type": 1, "squads": [{"player_id": 1, "squad": [1, 2, 3, 4]}, {"player_id": 2, "squad": [6, 7, 8, 9]}], "room_number": 1, "alliance_id": 1, "level": 1}
{"type": 2, "squads": [{"player_id": 2, "squad": [1, 2, 3, 4]}, {"player_id": 3, "squad": [6, 7, 8, 9]}], "room_number": 2, "alliance_id": 1, "level": 1}
{"type": 3, "squads": [{"player_id": 3, "squad": [1, 2, 3, 4]}, {"player_id": 4, "squad": [6, 7, 8, 9]}], "room_number": 3, "alliance_id": 1, "level": 1}
And when I do a loop like this:
foreach (KeyValuePair<string, JSONValue> pair in emptyObject) { ... }
it only gives me results for the first entry (in this example type:1). Thanks.
Your file actually contains 3 JSON objects, and what happens when you parse it is that the parsing stops once the first object ends. You need to parse each line separately to get all of the data.
As an aside, you'll notice that if you paste your JSON into the validator at jsonlint.com it'll give you a parsing error where the second object begins.