Reading Parquet dataset using PyArrow filter is not working

Reading Parquet dataset using PyArrow filter is not working - pyarrow

I want to implement below query using PyArrow filter,
'(salary == 150280.17 or country == "Finland" ) and (first_name == "Amanda" or last_name == "Gray")'
dataset = pq.ParquetDataset(
parquit_file,
(use_legacy_dataset = False),
(filters = [
([("salary", "==", 150280.17)], [("country", "==", "Canada")]),
([("first_name", "==", "Amanda")], [("last_name", "==", "Gray")]),
])
);
dataset.read().to_pandas();
but it is giving me error .
ValueError: not enough values to unpack (expected 3, got 1)

The filters should be a list[tuple] or a list[list[tuple]]:
dataset = pq.ParquetDataset(
parquet_file,
use_legacy_dataset = False,
filters = [
[
("salary", "==", 150280.17),
("country", "==", "Canada"),
],
[
("first_name", "==", "Amanda"),
("last_name", "==", "Gray"),
]
]
)
dataset.read().to_pandas()
You had an extra [].

Related

Parsing nested JSON and collecting data in a list

I am trying to parse a nested JSON and trying to collect data into a list under some condition.
Input JSON as below:
[
{
"name": "Thomas",
"place": "USA",
"items": [
{"item_name":"This is a book shelf", "level":1},
{"item_name":"Introduction", "level":1},
{"item_name":"parts", "level":2},
{"item_name":"market place", "level":3},
{"item_name":"books", "level":1},
{"item_name":"pens", "level":1},
{"item_name":"pencils", "level":1}
],
"descriptions": [
{"item_name": "Books"}
]
},
{
"name": "Samy",
"place": "UK",
"items": [
{"item_name":"This is a cat house", "level":1},
{"item_name":"Introduction", "level":1},
{"item_name":"dog house", "level":3},
{"item_name":"cat house", "level":1},
{"item_name":"cat food", "level":2},
{"item_name":"cat name", "level":1},
{"item_name":"Samy", "level":2}
],
"descriptions": [
{"item_name": "cat"}
]
}
]
I am reading json as below:
with open('test.json', 'r', encoding='utf8') as fp:
data = json.load(fp)
for i in data:
if i['name'] == "Thomas":
#collect "item_name", values in a list (my_list) if "level":1
#my_list = []
Expected output:
my_list = ["This is a book shelf", "Introduction", "books", "pens", "pencils"]
Since it's a nested complex JSON, I am not able to collect the data into a list as I mentioned above. Please let me know no how to collect the data from the nested JSON.

Try:
import json
with open("test.json", "r", encoding="utf8") as fp:
data = json.load(fp)
my_list = [
i["item_name"]
for d in data
for i in d["items"]
if d["name"] == "Thomas" and i["level"] == 1
]
print(my_list)
This prints:
['This is a book shelf', 'Introduction', 'books', 'pens', 'pencils']
Or without list comprehension:
my_list = []
for d in data:
if d["name"] != "Thomas":
continue
for i in d["items"]:
if i["level"] == 1:
my_list.append(i["item_name"])
print(my_list)

Once we have the data we iterate over the outermost list of objects.
We check if the object has the name equals to "Thomas" if true then we apply filter method with a lambda function on items list with a condition of level == 1
This gives us a list of item objects who have level = 1
In order to extract the item_name we use a comprehension so the final value in the final_list will be as you have expected.
["This is a book shelf", "Introduction", "books", "pens", "pencils"]
import json
def get_final_list():
with open('test.json', 'r', encoding='utf8') as fp:
data = json.load(fp)
final_list = []
for obj in data:
if obj.get("name") == "Thomas":
x = list(filter(lambda item: item['level'] == 1, obj.get("items")))
final_list = final_list + x
final_list = [i.get("item_name") for i in final_list]
return final_list

Dash Tabulator : "movableRowsConnectedTables" is not working

I’m trying to use the "movableRowsConnectedTables" built-in functionality as explained in the tabulator.js examples
It doesn’t seem to work as expected:
import dash
from dash import html
import dash_bootstrap_components as dbc
import dash_tabulator
columns = [
{ "title": "Name",
"field": "name",}
]
options_from = {
'movableRows' : True,
'movableRowsConnectedTables':"tabulator_to",
'movableRowsReceiver': "add",
'movableRowsSender': "delete",
'height':200,
'placeholder':'No more Rows'
}
options_to = {
'movableRows' : True,
'height':200,
'placeholder':'Drag Here'
}
data = [
{"id":1, "name":"a"},
{"id":2, "name":"b"},
{"id":3, "name":"c"},
]
layout = html.Div(
[
dbc.Row(
[
dbc.Col(
[ html.Header('DRAG FROM HERE'),
dash_tabulator.DashTabulator(
id='tabulator_from',
columns=columns,
options=options_from,
data=data,
),
], width = 6
),
dbc.Col(
[ html.Header('DROP HERE'),
dash_tabulator.DashTabulator(
id='tabulator_to',
columns=columns,
options=options_to,
data = []
),
], width = 6
)
]
)
]
)
app = dash.Dash(external_stylesheets=[dbc.themes.BOOTSTRAP])
app.layout = dbc.Container(layout, fluid=True)
if __name__ == '__main__':
app.run_server(debug=True)
Is it also possible to get callbacks when elements were dropped?
It would be great to have this functionality inside dash!
example

im not familiar with tabulator_dash but the table your sending too also needs a 'movableRowsConnectedTables':"tabulator_from" option

Finding differences in .JSON files using Powershell Compare-Object, problems with square bracket values

I am trying to compare two .json files using
# Compare two files
$BLF = (Get-Content -Path C:\Users\...\Documents\Android1.json)
$CLF = (Get-Content -Path C:\Users\...\Documents\Android2.json)
$aUnsorted = Compare-Object -ReferenceObject $BLF -DifferenceObject $CLF -IncludeEqual
I have written code to compare regular variables such as the below "passcodeBlockSimple", and register the value in one file vs the other. I would like to do the same with values that are defined inside brackets:
"variableName": [
"99"
]
I have a solution for when the two files have the same value, by adding onto the value until it finds the end-bracket ]. That gives "variableName": ["99"]. See the variable "roleScopeTagIds" below as an example of such a variable.
I would like to find the value when they are different, but the output from Compare-Object lists the difference line-by-line. Example below, where I loose the information about the variable "restrictedApps" when it is different in the two files (File1 has value "1" and File2 has Value "2").
InputObject SideIndicator
----------- -------------
"#odata.type": "#microsoft.graph.iosCompliancePolicy", ==
"roleScopeTagIds": [ ==
"0" ==
], ==
"version": 1, ==
"passcodeMinutesOfInactivityBeforeScreenTimeout": 1, ==
"managedEmailProfileRequired": false, ==
"restrictedApps": [ ==
], ==
"passcodeBlockSimple": false, =>
"2" =>
"passcodeBlockSimple": true, <=
"1" <=
The files are, File 1:
"#odata.type": "#microsoft.graph.iosCompliancePolicy",
"roleScopeTagIds": [
"0"
],
"version": 1,
"passcodeBlockSimple": true,
"passcodeMinutesOfInactivityBeforeScreenTimeout": 1,
"managedEmailProfileRequired": false,
"restrictedApps": [
"1"
],
And File 2:
"#odata.type": "#microsoft.graph.iosCompliancePolicy",
"roleScopeTagIds": [
"0"
],
"version": 1,
"passcodeBlockSimple": false,
"passcodeMinutesOfInactivityBeforeScreenTimeout": 1,
"managedEmailProfileRequired": false,
"restrictedApps": [
"2"
],
So my question is: How can i find the value of the variable when they are different, and the Compare-Object result is as above?
(I have to use the Compare-Object function)

I dont get it totally, becuase youre jsons are invalid, but this code should resolve your issue:
Json1:
{
"#odata.type": "#microsoft.graph.iosCompliancePolicy",
"roleScopeTagIds": [
"0"
],
"version": 1,
"passcodeBlockSimple": true,
"passcodeMinutesOfInactivityBeforeScreenTimeout": 1,
"managedEmailProfileRequired": false,
"restrictedApps": [
"1"
]
}
Json2:
{
"#odata.type": "#microsoft.graph.iosCompliancePolicy",
"roleScopeTagIds": [
"0"
],
"version": 1,
"passcodeBlockSimple": false,
"passcodeMinutesOfInactivityBeforeScreenTimeout": 1,
"managedEmailProfileRequired": false,
"restrictedApps": [
"2"
]
}
[Array]$FileSoll = Get-Content C:\Users\Alex\Desktop\1.json
[Array]$FileIst = Get-Content C:\Users\Alex\Desktop\2.json
$FileSoll = $($FileSoll | ConvertFrom-Json | ConvertTo-Json) -split ([Environment]::NewLine)
$FileIst = $($FileIst | ConvertFrom-Json | ConvertTo-Json) -split ([Environment]::NewLine)
$diff = Compare-Object -ReferenceObject $FileSoll -DifferenceObject $FileIst
Returns the differences:
InputObject SideIndicator
----------- -------------
"passcodeBlockSimple": false, =>
"2" =>
"passcodeBlockSimple": true, <=
"1" <=

Access child token in JSON

In the below JSON I was trying to access the second array in the header.Basically "Node", "Percentage", "Time","File System" needs to captured as I will have to insert into SQL.My code is giving the complete array of header.
JObject jsonObject = JObject.Parse(jsonString);
List<string> childTokens = new List<string>();
foreach (var childToken in jsonObject.Children<JProperty>())
childTokens.Add(childToken.Name);
foreach (string childToken in childTokens)
{
if (jsonObject[childToken] is JObject)
{
JObject jObject = (JObject)jsonObject[childToken];
var jProperty = jObject.Children<JProperty>();
try
{
if (jProperty.LastOrDefault(x => x.Name == "header") != null)
{
foreach (var headerValue in jProperty.LastOrDefault(x => x.Name == "header").Value.Children())
table.Columns.Add("[" + headerValue.ToString() + "]");
table.Columns.Add("[ID]");
table.Columns.Add("[comments]");
}
JSON sample:
"DISK" : {
"alarm_count" : 5,
"column_width" : [
12,
14,
16,
14
],
"header" : [
[
"",
"Max Disk Usage",
3
],
[
"Node",
"Percentage",
"Time",
"File System"
]
] }
I can have number of arrays in header .. don't want to hardcode it.. I should be always be able to pick the last array in header child token..Please advise .. thanks

Assuming your json snippet was valid and is part of an object like:
{
"DISK": {
"alarm_count": 5,
"column_width": [
12,
14,
16,
14
],
"header": [
[
"",
"Max Disk Usage",
3
],
[
"Node",
"Percentage",
"Time",
"File System"
]
]
}
}
You could do this:
JObject obj = ...;
var secondHeader = obj["DISK"]["header"].Last();

Flatten a JSON document using jq

I'm considering the following array of JSON objects:
[
{
"index": "index1",
"type": "type1",
"id": "id1",
"fields": {
"deviceOs": [
"Android"
],
"deviceID": [
"deviceID1"
],
"type": [
"type"
],
"country": [
"DE"
]
}
},
{
"index": "index2",
"type": "type2",
"id": "id2",
"fields": {
"deviceOs": [
"Android"
],
"deviceID": [
"deviceID2"
],
"type": [
"type"
],
"country": [
"US"
]
}
}
]
and I would like to flatten it to get:
[
{
"index": "index1",
"type": "type",
"id": "id1",
"deviceOs": "Android",
"deviceID": "deviceID1",
"country": "DE"
},
{
"index": "index2",
"type": "type",
"id": "id2",
"deviceOs": "Android",
"deviceID": "deviceID2",
"country": "US"
}
]
I'm trying to work with jq but I fail to flatten the "fields". How should I do it? At the moment I'm interested in command-line tools, but I'm open to other suggestions as well.

This one was a tricky one to craft.
map
(
with_entries(select(.key != "fields"))
+
(.fields | with_entries(.value = .value[0]))
)
Let's break it down and explain the bits of it
For every item in the array...
map(...)
Create a new object containing the values for all except the fields property.
with_entries(select(.key != "fields"))
Combine that with...
+
Each of the fields projecting each of the values to the first item of each array
(.fields | with_entries(.value = .value[0]))

There's a tool called gron and you can pipe that json in, to get something like this.
$ gron document.json
json = [];
json[0] = {};
json[0].fields = {};
json[0].fields.country = [];
json[0].fields.country[0] = "DE";
json[0].fields.deviceID = [];
json[0].fields.deviceID[0] = "deviceID1";
json[0].fields.deviceOs = [];
json[0].fields.deviceOs[0] = "Android";
json[0].fields.type = [];
json[0].fields.type[0] = "type";
json[0].id = "id1";
json[0].index = "index1";
json[0].type = "type1";
json[1] = {};
json[1].fields = {};
json[1].fields.country = [];
json[1].fields.country[0] = "US";
json[1].fields.deviceID = [];
json[1].fields.deviceID[0] = "deviceID2";
json[1].fields.deviceOs = [];
json[1].fields.deviceOs[0] = "Android";
json[1].fields.type = [];
json[1].fields.type[0] = "type";
json[1].id = "id2";
json[1].index = "index2";
json[1].type = "type2";

You can use this filter:
[.[] | {index: .index, type: .type, id: .id, deviceOs: .fields.deviceOs[],deviceID: .fields.deviceID[],country: .fields.country[]}]
You can test here https://jqplay.org

Here are some variations that start by merging .fields into the containing object with + and then flattening the array elements. First we take care of .fields with
.[]
| . + .fields
| del(.fields)
that leaves us with objects which look like
{
"index": "index1",
"type": [
"type"
],
"id": "id1",
"deviceOs": [
"Android"
],
"deviceID": [
"deviceID1"
],
"country": [
"DE"
]
}
then we can flatten the keys multiple ways.
One way is to use with_entries
| with_entries( .value = if .value|type == "array" then .value[0] else .value end )
another way is to use reduce and setpath
| . as $v
| reduce keys[] as $k (
{};
setpath([$k]; if $v[$k]|type != "array" then $v[$k] else $v[$k][0] end)
)

Similar to the #jq170727 anwser:
jq 'map(. + (.fields | with_entries(.value |= .[])) | del(.fields))'
(assuming no field inside .fields is itself called .fields).
The |with_entries(.value|=.[]) part is to flatten the value arrays in .fields -- beware only the first item is preserved. .value|=join(", ") could be used to join multiple string values into one.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Reading Parquet dataset using PyArrow filter is not working - pyarrow

Related

Parsing nested JSON and collecting data in a list

Dash Tabulator : "movableRowsConnectedTables" is not working

Finding differences in .JSON files using Powershell Compare-Object, problems with square bracket values

Access child token in JSON

Flatten a JSON document using jq

Categories

Resources