I'm trying to import a JSON file with the following format into Excel:
[
[1,2,3,4],
[5,6,7,8]
]
I want to get a spreadsheet with 2 rows and 4 columns, where each row contains the contents of the inner array as separate column values, e.g.
Column A
Column B
Column C
Column D
1
2
3
4
5
6
7
8
Although this would seem to be an easy problem to solve, I can't seem to find the right PowerQuery syntax, or locate an existing answer that covers this scenario. I can easily import as a single column with 8 values, but can't seem to split the inner array into separate columns.
Assuming the JSON looks like
[
[1,2,3,4],
[5,6,7,8]
]
then this code in Powerquery
let Source = Json.Document(File.Contents("C:\temp\j.json")),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Custom", each Text.Combine(List.Transform([Column1], each Text.From(_)),",")),
ColumnTitles = List.Transform({1 .. List.Max(Table.AddColumn(#"Added Custom", "count", each List.Count(Text.PositionOfAny([Custom],{","},Occurrence.All))+1)[count])}, each "Column." & Text.From(_)),
#"Split Column by Delimiter" = Table.SplitColumn(#"Added Custom", "Custom", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), ColumnTitles),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Delimiter",{"Column1"})
in #"Removed Columns"
generates
It converts the JSON to a list of lists, converts to a table of lists, expands the list to be text with commas, then expands that into columns after dynamically creating column names for the new columns by counting the max number of commas
Related
I am geocoding using OpenRefine. I pulled data from OpenStreetMaps to my datasetstructure of data
I am adding a "column based on this column" for the coordinates.I want to check that the display_name contains "Rheinland-Pfalz" and if it does, I want to extract the latitude and longitude,i.e. pair.lat + ',' + pair.lon. I want to do this iteratively but I don't know how. I have tried the following:
if(display_name[0].contains("Rheinland-Pfalz"), with(value.parseJson()[0], pair, pair.lat + ',' + pair.lon),"nothing")
but I want to do this for each index [0] up to however many there are. I would appreciate if anyone could help.
Edit: Thanks for your answer b2m.
How would I extract the display_name corresponding to the coordinates that we get. I want the output to be display_name lat,lon for each match (i.e. contains "Rheinland-Pfalz", because I have a different column containing a piece of string that I want to match with the matches generated already.
For example, using b2m's code and incorporating the display_name in the output we get 2 matches:
Schaumburg, Balduinstein, Diez, Rhein-Lahn-Kreis, Rheinland-Pfalz, Deutschland 50.33948155,7.9784308849342604
Schaumburg, Horhausen, Flammersfeld, Landkreis Altenkirchen, Rheinland-Pfalz, Deutschland 52.622319,14.5865283
For each row, I have another string in a different column. Here the entry is "Rhein-Lahn-Kreis". I want to filter the two matches above to only keep those containing my string in the other column. In this case "Rhein-Lahn-Kreis" but the other column entry is different for each row. I hope this is clear and I would greatly appreciate any help
Assuming we have the following json data
[
{"display_name": "BW", "lat": 0, "lon": 1},
{"display_name": "NRW 1", "lat": 2, "long": 3},
{"display_name": "NRW 2", "lat": 4, "lon": 5}
]
You can extract the combined elements lat and long with forEach and filter using the following GREL expression e.g. in the Add column based on this column dialog.
forEach(
filter(
value.parseJson(), geodata, geodata.display_name.contains("NRW")
), el, el.lat + "," + el.lon)
.join(";")
This will result in a new field with the value 2,3;4,5.
You can then split the new multi valued field on the semicolon ";" to obtain separated values (2,3 and 4,5).
Another approach would be to split the JSON Array elements into separate rows, avoiding the forEach and filter functions.
I get the following error when trying to split a column by space delimiter on PowerQuery in Data Factory :
UserQuery : Expression.Error: An error occurred invoking 'Table.AddColumn': We can't get the expression for the specified value.
What is causing this and how would I go about resolving it?
Many thanks
This is the error
The PowerQuery itself is :
let
Source = dedupedetipscsv,
#"Split Column by Delimiter" = Table.SplitColumn(Source, "Candidate", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"Candidate.1", "Candidate.2"}),
#"Split Column by Delimiter1" = Table.SplitColumn(Table.TransformColumnTypes(#"Split Column by Delimiter", {{"ApprovedDate", type text}}, "en-GB"), "ApprovedDate", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, true), {"ApprovedDate.1", "ApprovedDate.2"})
in
#"Split Column by Delimiter1"
Note: Power Query will split the column into as many columns as needed. The name of the new columns will contain the same name as the
original column. A suffix that includes a dot and a number that
represents the split sections of the original column will be appended
to the name of the new columns.
In the Table.AddColumn step might refer to variable which is a List. You need to refer to #"Renamed Columns" which is the last step that results in a table.
Split columns by delimiter into columns | Here
Make sure : An alternate for split by length and by position is listed below | M script workarounds
Table.AddColumn(Source, "First characters", each Text.Start([Email], 7), type text)
Table.AddColumn(#"Inserted first characters", "Text range", each Text.Middle([Email], 4, 9), type text)
I have a table containing other tables in its values. These other tables can be formatted either as CSV or JSON.
Can you please tell me how I can import this data into individual tables in PowerBI? I've tried with the PowerQuery GUI but so far unsuccessfully, perhaps there will be a need to use the code in the advanced editor.
I can't just parse this data outside PowerBI because of company guidelines prohibiting the use of scripts, so everything must be done within PowerBI - though the PowerQuery code is allowed.
csv:
"id,normdist,poissondist,binomial\r\n1,0.00013383,0.033689735,0.009765625\r\n2,0.004431848,0.084224337,0.043945313\r\n3,0.053990967,0.140373896,0.1171875\r\n4,0.241970725,0.17546737,0.205078125\r\n5,0.39894228,0.17546737,0.24609375\r\n6,0.241970725,0.146222808,0.205078125\r\n7,0.053990967,0.104444863,0.1171875\r\n8,0.004431848,0.065278039,0.043945313\r\n9,0.00013383,0.036265577,0.009765625\r\n10,1.49E-06,0.018132789,0.000976563\r\n"
json (by row)
[{"id":1,"normdist":0.0001,"poissondist":0.0337,"binomial":0.0098},{"id":2,"normdist":0.0044,"poissondist":0.0842,"binomial":0.0439},{"id":3,"normdist":0.054,"poissondist":0.1404,"binomial":0.1172},{"id":4,"normdist":0.242,"poissondist":0.1755,"binomial":0.2051},{"id":5,"normdist":0.3989,"poissondist":0.1755,"binomial":0.2461},{"id":6,"normdist":0.242,"poissondist":0.1462,"binomial":0.2051},{"id":7,"normdist":0.054,"poissondist":0.1044,"binomial":0.1172},{"id":8,"normdist":0.0044,"poissondist":0.0653,"binomial":0.0439},{"id":9,"normdist":0.0001,"poissondist":0.0363,"binomial":0.0098},{"id":10,"normdist":1.49e-06,"poissondist":0.0181,"binomial":0.001}]
Let's say that the data is in the CSV version but just a string in a database so that it looks like this in the query editor:
In order to expand this into a table, we need to split it into rows and columns. The Home tab has a Split Column tool we'll use like this using By Delimiter option from the dropdown:
That is, we use "\r\n" to split the cell into multiple rows.
Now our column looks like this:
Remove any blank rows and use the Split Column tool again. This time, you can leave the defaults since it will automatically guess that you want to split by comma and expand into rows.
If you promote the headers and clean up the column types, the final result should be
Full M Query for this example that you can paste into the Advanced Editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZFJbsMwDEXvkrUacBKHA/QUcRYpshHQ2EXc+6OS4mzkC3Th3Qf5+ThdLqdyT/PyfNzL+pt+lrKuy9z1V5mXR7l9T9NzmmZMcAYAZHZuklk9jHMPh2lWyi8n9ZAIo4s37UIkzNa0cEhm5Je1kzJHQGhLowAbe2jTaOi2MaUGSDAMjFpLtCxqHUmQwRzf3VuWw6P29MEoCsFvoo5EUaol4HukjVPW5URceZzSx801kzlw7DeP8ZxKmrPZ/pwICc8Snx/QRgZ05Ard6rtzQ56u6Xjm8czjmf/wmdc/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Split by \r\n into Rows" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Column1", Splitter.SplitTextByDelimiter("\r\n", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Filtered Blank Rows" = Table.SelectRows(#"Split by \r\n into Rows", each [Column1] <> null and [Column1] <> ""),
#"Split into Columns" = Table.SplitColumn(#"Filtered Blank Rows", "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Split into Columns", [PromoteAllScalars=true]),
#"Filtered Repeated Headers" = Table.SelectRows(#"Promoted Headers", each ([id] <> "id")),
#"Changed Type" = Table.TransformColumnTypes(#"Filtered Repeated Headers",{{"id", Int64.Type}, {"normdist", type number}, {"poissondist", type number}, {"binomial", type number}})
in
#"Changed Type"
Below is the code to convert csv file to json format in python.
I have two fields 'recommendation' and 'rating'. Based on the recommendation value I need to set the value for rating field like if recommendation is 1 then rating =1 and vice versa. With the answer I got I'm getting output for only one record entry instead of getting all the records. I think it's overriding. Do I need to create separate list for that and append each record entry to the list to get the output for all records.
here's the updated code:
def main(input_file):
csv_rows = []
with open(input_file, 'r') as csvfile:
reader = csv.DictReader(csvfile, delimiter='|')
title = reader.fieldnames
for row in reader:
entry = OrderedDict()
for field in title:
entry[field] = row[field]
[c.update({'RATING': c['RECOMMENDATIONS']}) for c in reader]
csv_rows.append(entry)
with open(json_file, 'w') as f:
json.dump(csv_rows, f, sort_keys=True, indent=4, ensure_ascii=False)
f.write('\n')
I want to create the nested format like the below:
"rating": {
"user_rating": {
"rating": 1
},
"recommended": {
"rating": 1
}
After you've read the file in, using the csv.DictReader, you'll have a list of dicts. Since you want to set the values now, it's a simple dict manipulation. There are several ways, of which one is:
[c.update({'rating': c['recommendation']}) for c in read_csvDictReader]
Hope that helps.
The API output from Smartsheet returns rows and columns as separate objects,that are independent of each other.
This results in separate records for the columns(A list of field names) and another set of records for the rows(records with a single field of values from various fields)
Is there a way to return a single list of JSON (with rows and columns resulting in a single list of records)?
This is the code I'm using in the Query Editor that returns separate Rows and Columns
= Web.Contents(
"https://api.smartsheet.com/1.1/sheet/[SHEET_ID]",
[
Headers =
[
#"Authorization" = "Bearer YOUR_API_TOKEN"
]
]
)
I used the sample data on their site to come up with this set of transformations:
let
Source = Json.Document(File.Contents("D:\testdata\foo.json")),
ColumnIds = List.Transform(Source[columns], each Text.From([id])),
ColumnNames = List.Transform(Source[columns], each [title]),
Table = Table.FromList(Source[rows], Splitter.SplitByNothing(), null, null, ExtraValues.Error),
Expanded = Table.ExpandRecordColumn(Table, "Column1", {"rowNumber", "cells"}, {"rowNumber", "cells"}),
Mapped = Table.TransformColumns(Expanded, {"cells",
each Record.Combine(List.Transform(_, each Record.AddField([], Text.From([columnId]), [value])))}),
Result = Table.ExpandRecordColumn(Mapped, "cells", ColumnIds, ColumnNames)
in
Result