I am looking for a pure excel solution to unpack nested JSON data into rows.
I have an excel file which looks like this
id
Attributes
1
{"salt":0.8,"sodium":0.3,"orign":"CH"}
2
{""region"":[{""value"":""Schweiz"",""language"":""de""}]}
And I want to extract this to the following format.
id
salt
sodium
origin
region
1
0.8
0.3
ch
2
Schweiz,de
While doing something like this is easy with python, how can this be done in a pure excel way?
Related
I want to check if converting an EDI 852 formatted file into a CSV file through ADF or logic apps is possible.
EDI 852 are NOT STANDARDIZED.
Although the various available segments and outer control structures are (ISA/IEA, GS, GE), the internals are NOT.
One company's 852 can be wildly different than another's.
You'll need the company's EDI852 specification before it can be parsed properly, regardless of what tools you have available.
A sample specification looks like this:
Assuming that you had an Integration Account linked with the Logic App so that you could make use of the built-in BizTalk 852 to XML parsing and then XSLT to transform to CSV (or to a flat XML structure - which you can then convert to JSON in the Logic App using json(xml(body('Convert_852'))) in a Compose action and then use a Create CSV Table action to convert to CSV.
If you don't have an Integration Account, it'll be quite a bit harder, but should still be possible in a Logic App, splitting the 852 into an array of strings first by the segment terminator (which will be the character at position 105 of the ISA segment - possibly with a CR and/or LF appended) and then splitting each of those strings into elements using the element separator at position 3 of the ISA segment.
The header data will be in the first two or three elements of the XQ segment (3 if the 852 represents a date range, 2 if there's only a handling code and reporting date).
Line-level data is arranged like this:
LIN : product data
ZA : product activity type (eg. qty sold, qty on order, qty lost, etc)
SDQ : location/quantity reporting (optional)
so you'd need to consolidate SDQ data into an array inside a ZA object and then consolidate an array of ZA objects into its parent LIN object and then roll all of the LIN objects up into an array inside the top level 852 object probably using nested For Each loops containing Switch and Append To Array Variable actions.
You can then use Select actions inside more For Each loops to flatten the data to CSV - or just push the resulting JSON to ADF or CosmosDB
I have a situation in my json file I have two columns eventid & sectionname which is dynamic in nature. As mentioned in diagram input
I need output like this which transformation I can perform and since section name is dynamic i.e instead of 301 it will come 501 also in future & I don't want my stream to fail is there any way in pyspark or scala.
df_target = (df_source.set_index(list of static columns)
.rename_axis([New_Column_Name], axis=1)
.stack()
.reset_index())
Where df_source is your dataframe in pandas
I am struggling with dealing with a csv file that scraped one crowdfunding website.
My goal is successfully load all information as separate columns, but I found some information are mixed in a single column when I load it using 1) R, 2) Stata, and 3) Python.
Since the real data is really dirty, let me suggest abbreviate version of current dataset.
ID
Pledge
creator
000001
13.7
{"urls":{"web":{"user":"www.kickstarter.com/profile/731"}}, "name":John","id":709510333}
000002
26.4
{"urls":{"web":{"user":"www.kickstarter.com/profile/759"}}, "name":Kellen","id":703514812}
000003
7.6
{"urls":{"web":{"user":"www.kickstarter.com/profile/7522"}}, "name":Jach","id":609542647}
My goal was extracting the "name" and "id" as separate columns, though they are all mixed with URLs in the creator column.
Is there any way that I can extract names (John, Kellen, Jach) and ids as separate columns?
I prefer R, but Stata and Python would also be helpful!
Thank you so much for considering this.
if you want to extract the name and id without any other values you can simply replace the code that is setting the creator column with
replace the creator with what ever variable that holds the dictionary
{"name": creator["name"], "id": creator["id"]}
also if the json data is not formatted correctly (like missing a quote) you can try using regular expressions
I have multiple csv files, all with the same format. There are columns with an object and columns with their values.
For example:
.....Animals | Age ....
dog 2
cat 4
dog 6
....etc.
And I want to calculate the mean age of all the dogs in all the csv files.
Which language will be easier to use for this calculation? Any help regarding the implementation?
Done it with Java and openCSV library.
I am trying to load a flat file which mixed multiple data sets. The flat file looks like.
1999XX9999
2XXX99
1999XX9999
2XXX99
3XXXXX999.99
1999XX9999
The first character of the every row defines the record type of the line. I want to create a script component in data flow and parse the raw rows (as the below) and save three output (1, 2, 3) to three different tables. Is it possible?
Table1(col1, col2, col3):
999, XX, 9999
999, XX, 9999
999, XX, 9999
Table2(col1, col2):
XXX, 99
XXX, 99
Table3(col1, col2):
XXXXX, 999.99
Any other way in SSIS if script component cannot do it? The best solution is writing a program to split the file into three files and load them using SSIS?
It is possible, and you probably should use a script transformation to create a maintainable solution.
You won't be able to completely parse your input file into columns using a flat file source and connection manager. Read your lines as full and use string functions in the script transformation to parse each line into the desired columns.
Now to distribute records to different destinations, you can either:
Define multiple outputs on your transformation and use a condition on the first character of each line to determine the output to which you send the columns.
Only use the script transformation to parse the line into columns and use a Conditional Split Transformation to logically divide your records over multiple data paths.
Both methods are logically similar, the implementation is different.