I have imported JSON data from Hive database. The structure looks like the attached. JSON data has been dumped to Hive without normalizing. Is it possible to parse the data?. For example, in the attached image, the mentionedlocations column has some places mentioned and I want them to be in separate rows.
You can use the Json.Document function to read the column as JSON.
I'd suggest creating a custom column with this formula:
Record.ToTable(Json.Document([mentionedlocations]))
and then expanding that column to get the multiple rows you want.
Putting these together:
= Table.ExpandTableColumn(
Table.AddColumn(PreviousStep, "Custom",
each Record.ToTable(Json.Document([mentionedlocations]))),
"Custom", {"Name"}, {"locations"})
This takes the PreviousStep in the query, adds a Custom column which converts the JSON text into a table and then expands the Name column in each of the tables in the Custom column and renames the column locations.
More easy peasy:
Right click on the column > Transform > JSON
Now use the button at the top right of the same column and Power BI is going to use the JSON element as the column name
Related
I am taking data from a web source and have expanded all my columns. When everything is split and expanded, my columns look like:
result.col1| result.col2| result.col3| result.col4
val1 | val2 | Record |val4
In here we can see that result.col3 has another JSON record.
What I am trying to accomplish is: How can I create a custom column that reads each Record and outputs a value based off one of the items?
For example the JSON record has:
link: https://google.com
value: 444333
I want the custom column's row to display "A" if value inside the record is "444333"
You could copy the result.col3 column, expand the copy, build your custom column based on the expanded fields, then delete them.
We've got a usecase to display a grid in power bi dashbord which contains a JSON obejct in one of its column. The JSON data will based on another column value.
Based on the category selection, the JSON content should be expanded to table(always one row, multiple columns).
Please find the table below.
Select the JsonData Column, select the Transform TAB and click Parse/Json.
Result is your text becomes records. Now on the right of your JsonData Column you have the dubble arrow, click on it. You do not need the prefix of original name so unselect checkbox. Click OK.
Now you went on level deeper (your column changed to DynamicProperties). Click the dubble arrow again.
End result is the table with all propoerties as in your Json.
I have a couple of questions about the task on which I am stuck and any answer would be greatly appreciated.
I have to extract data from a flat file (CSV) as an input and load the data into the destination table with a specific format based on position.
For example, if I have order_id,Total_sales,Date_Ordered with some data in it, I have to extract the data and load it in a table like so:
The first field has a fixed length of 2 with numeric as a datatype.
total_sales is inserted into the column of total_sales in the table with a numeric datatype and length 10.
date as datetime in a format which would be different than that of the flat file, like ccyy-mm-dd.hh.mm.ss.xxxxxxxx (here x has to be filled up with zeros).
Maybe I don't have the right idea to solve this - any solution would be appreciated.
I have tried using the following ways:
Used a flat file source to get the CSV file and then gave it as an input to OLE DB destination with a table of fixed data types created. The problem here is that the columns are loaded, but I have to fill them up with zeros in case the date when it is been loaded or in most of the columns if I am not utilizing the total length then it has to preceded with zeros in it.
For example, if I have an Orderid of length 4 and in the flat file I have an order id like 201 then it has to be changed to 0201 when it is loaded in the table.
I also tried another way of using a flat file source and created a variable which takes the entire row as an input and tried to separate it with derived columns. I was to an extent successful in getting it, but at last the data type in the derived column got fixed to Boolean type explicitly, which I am not able to change to the data type I want.
Please give me some suggestions on how to handle this issue...
Assuming you have a csv file in the following format
order_id,Total_sales,Date_Ordered
1,123.23,01/01/2010
2,242.20,02/01/2010
3,34.23,3/01/2010
4,9032.23,19/01/2010
I would start by creating a Flat File Source (inside a Data Flow Task), but rather than having it fixed width, set the format to Delimited. Tick the Column names in the first data row. On the column tab, make sure row delimiter is set to "{CR}{LF}" and column delimiter is set to "Comma(,)". Finally, on the Advanced tab, set the data types of each column to integer, decimal and date.
You mention that you want to pad the numeric data types with leading zero's when storing them in the database. Numeric data types in databases tend not to hold leading zero's. So you have two options; either hold the data as the type they are in the target system (int, decimal and dateTime) or use the Derived Column control to convert them to strings. If you decide to store them as strings, adding an expression like
"00000" + (DT_WSTR, 5) [order_id]
to the Derived Column control will add up to 5 leading zeros to order id (don't forget to set the data type length to 5) and would result in an order id of "00001"
Create your target within a Data Flow Destination and make the table/field mappings accordingly (or let SSIS create a new table / mappings for you).
I have a flat file that I need to parse in SSIS, part of this parsing is to chop off a load of extra text at the bottom of the file. To help do this I added a row number to each row using a Script Transformation.
In the Script Transformation (ST) under Inputs and Outputs I have an Input Column defined called Column256_in (it has a length of 256) and its ID is 59.
For Output columns I have defined Column256_out, it has an ID of 68 and a MappedColumnID of 59, there is another Output Col called rowCount.
There is script code contained in the ST the calculates the row number for each row.
When I run the SSIS package I have a Data Grid after the Script Transformation I get the following:
Column256_in contains the data from the orginal text file.
rowCount is populated correctly. ( I did something right today!)
Column256_out is empty --> I thought that the MappedColumnId of 59 would populate this col with the data from Column256_in.
What does the MappedColumnID attribute do on the Out put col?
Thanks for your assistance.
KD
MappedColumnID is just an alternative way of identifying the columns instead of using their names.
From MSDN
The use of these properties is not required. These properties provide an easier way for developers to associate related columns, such as input and output columns, in custom data flow components.
I am generating an XSD file based on the columns in my xml. I give them all the type, "xs:string". Then I try to import the file into my database using .NET with SQLbulk import, but for some fields are to small. I get the message, "type of column x in table y is too small to hold data"
What type should I use for large amount of text (so to generate a text field in the database using sqlbulk.execute)? The current one creates a nvarchar(1000) field, and the data in some fields is bigger
The trick was to add sql:datatype="nvarchar(4000)"