We have unstructured data that is being stored as JSON at MySQL (one of the tables along with Structured data). We would like to extract the data, but we are not sure how to extract the JSON Data as JSON Data could contain any property (no common properties).
Could you please help me to exact all properties by not specifying the property names.
SQL cannot dynamically append more columns to its result set after the query begins executing and it examines data in rows. The columns must be fixed in the select-list at the time the SQL query is parsed, before the query begins executing and examining data. So you must spell out the columns in the select-list of the query. This means you must know the names of all properties in advance.
You could do a query to fetch all property names:
SELECT JSON_KEYS(mydata) FROM MyTable;
This returns arrays of keys per row. There will be a lot of duplication. In your client application, you would write code to parse the result, and form a list of distinct keys.
Then you could use that list to form a second SQL query, with one column in the select-list for each key you noted in the first step.
The alternative is to forget about returning properties in separate columns. Just return the JSON documents from the database as-is. Then explode the JSON after you fetch it in the result set, and process it in application code that way.
One way or the other, you need to write application code, either before running your query or after running your query.
Welcome to "flexible" database design! :-)
Related
I have created an SSIS package and I used Merge Join to join a Dimension with the result of another Merge Join and I got the following error:
Both inputs of the transformation must contain at least one sorted column, and those columns must have matching metadata ssis
I have found that the issue is related to the data type of the two sorted columns, I just made a conversion to make both of them "INT" and everything is going fine.
The message is pretty clear. SSIS merge operations required that the data to be compared is sorted so comparisons are faster.
Make sure that you are retrieving ordered data from your database using the ORDER BY clause (if on SQL), and mark the columns with their corresponding order at the property IsSorted.
If you can't have the data ordered at the source, you can add a Sort operation in SSIS which will sort the merging columns (before the actual merge). You will have to do this on both flows before the merge. Please be adviced that using this componene will block the data flow until all rows are sorted.
The Merge error message will go away once you join both data flows with sorted columns.
I've build a job that copy data from a mysql db table to b mysql table.
The table columns are the same except sometimes a new column can be added in table a db.
i want to retrieve all the columns from a to b but only those that exists in table b. i was able to put in the query specific select colume statment that exists in table b like:
select coulmn1,column2,columns3... from table a
the issue is if i add a new column in b that matches a the talend job schema in Mysqlinput should be changed as well cause i work with build in type.
Is there a way to force the schema columns during the job running?
If you are using a subscription version of Talend, you can use the dynamic column type. You can define a single column for your input of type "Dynamic" and map it to a column of the same type in your output component. This will dynamically get columns from table a and map them to the same columns in table b. Here's an example.
If you are using Talend Open Studio, things get a little trickier as Talend expects a list of columns for the input and output components that need to be defined at design time.
Here's a solution I put together to work around this limitation.
The idea is to list all table a's columns that are present in table b. Then convert it to a comma separated list of columns, in my example id,Theme,name and store it in a global variable COLUMN_LIST. A second output of the tMap builds the same list of columns, but this time putting single quotes between columns (so as they can be used as parameters to the CONCAT function later), then add single quotes to the beginning and end, like so: "'", id,"','",Theme,"','",name,"'" and store it in a global variable CONCAT_LIST.
On the next subjob, I query table a using the CONCAT function, giving it the list of columns to be concatenated CONCAT_LIST, thus retrieving each record in a single column like so 'value1', 'value2',..etc
Then at last I execute an INSERT query against table b, by specifying the list of columns given by the global variable COLUMN_LIST, and the values to be inserted as a single string resulting from the CONCAT function (row6.values).
This solution is generic, if you replace your table names by context variables, you can use it to copy data from any MySQL table to another table.
I am using a SSIS Data Flow Task to transfer data from one table to another. Column A in Table A contains a number, the last 3 digits of which I want to store in Column B of Table B.
First I'm trying to grab all of the data in Column A and store in a variable via a simple SELECT statement SELECT COLUMN_A FROM TABLE_A. However, the variable stores the statement as a string when I want the result set of the query. I have set the EvaluateAsExpression property to False but to no avail.
Secondly I want to be able to use the result of this query in the Derived Column of my Data Flow to extract the last 3 digits and store the values in Column_B in the destination. The expression I have is:
(DT_STR,3,1252)RIGHT(#User::[VariableName],3)
I want to store this as a string hence the (DT_STR,3,1252) data type.
All I'm getting so far in Column_B of Table_B is is the last 3 characters of the SELECT statement "E_A". There is a lot of useful information on the web including YouTube videos for things like setting file paths and server names as parameters or variables but I can't see many relevant to the specifics of my query.
I have used an Execute SQL Task to insert row counts from flat files but, in this example, I want to use the Derived Column tool instead.
What am i doing wrong? Any help is gratefully appreciated.
I prefer to do all the work in SQL if you aren't doing anything else with that number.
select right(cast(ColA as varchar(20)),3) from tableA
-- you can add another cast if you want it to be an int
use that in an execute sql to result set = single row.
Map that to a variable.
In a derived column in data flow you can set that variable to the new column.
Thanks KeithL thats one solution I will use in future but I found another.
I dropped the variable and in the Expression box of the Transformation Editor did:
(DT_STR,3,1252)RIGHT((DT_STR,3,1252)Column_A,3).
In my question, I failed to cast Column_A from Table_A as a string. The first use of (DT_STR,3,1252) simply sets the destination column as a string so as not to use the same data type as the source which in my case was int.
Its the 2nd use of (DT_STR,3,1252) that actually casts Column_A from int to a string.
I have job in Talend that is designed to bring together some data from different databases: one is a MySQL database and the other a MSSQL database.
What I want to do is match a selection of loan numbers from the MySQL database (about 82,000 loan numbers) to the corresponding information we have housed in the MSSQL database.
However, the tables in MSSQL to which I am joining the data from MySQL are much larger (~ 2 million rows), are quite wide, and thus cost much more time to query. Ideally I could perform an inner join between the two tables based on the loan number, but since they are in different databases this is not possible. The inner join that is performed inside a tMap occurs after the Lookup input has already returned its data set, which is quite large (especially since this particular MSSQL query will execute a user-defined function for each loan number).
Is there any way to create a global variable out of the output from the MySQL query (namely, the loan numbers selected by the MySQL query) and use that global variable as an IN clause in the MSSQL query?
This should be possible. I'm not working in MySQL but I have something roughly equivalent here that I think you should be able to adapt to your needs.
I've never actually answered a Stackoverflow question and while I was typing this the page started telling me I need at least 10 reputation to post more than 2 pictures/links here and I think I need 4 pics, so I'm just going to write it out in words here and post the whole thing complete with illustrations on my blog in case you need more info (quite likely, I should think!)
As you can see, I've got some data coming out of the table and getting filtered by tFilterRow_1 to only show the rows I'm interested in.
The next step is to limit it to just the field I want to use in the variable. I've used tMap_3 rather than a tFilterColumns because the field I'm using is a string and I wanted to be able to concatenate single quotes around it but if you're using an integer you might not need to do that. And of course if you have a lot of repetition you might also want to get a tUniqueRows in there as well to save a lot of unnecessary repetition
The next step is the one that does the magic. I've got a list like this:
'A1'
'A2'
'B1'
'B2'
etc, and I want to turn it into 'A1','A2','B1','B2' so I can slot it into my where clause. For this, I've used tAggregateRow_1, selecting "list" as the aggregate function to use.
Next up, we want to take this list and put it into a context variable (I've already created the context variable in the metadata - you know how to do that, right?). Use another tMap component, feeding into a tContextLoad widget. tContextLoad always has two columns in its schema, so map the output of the tAggregateRows to the "value" column and enter the name of the variable in the "key". In this example, my context variable is called MyList
Now your list is loaded as a text string and stored in the context variable ready for retrieval. So open up a new input and embed the variable in the sql code like this
"SELECT distinct MY_COLUMN
from MY_SECOND_TABLE where the_selected_row in ("+
context.MyList+")"
It should be as easy as that, and when I whipped it up it worked first time, but let me know if you have any trouble and I'll see what I can do.
I am pretty new to SSIS and BI in general, so first of all sorry if this is a newbie question.
I have my source data for the fact table in a csv, so I want to match the ids against the surrogate keys in lookup tables.
The data structure in the csv is like this
... userId, OriginStationId, DestinyStationId,..
What I am trying to accomplish is to match the data against my lookup table. So what I am doing is
Reading Lookup data using OLE DB Source
Reading my csv file
Sorting both inputs by the same field
Doing a left join by Id, in order to get the SK
This way, if there is no match (aka can't find the surrogate key) I can redirect that to a rejected csv and handle it later.
something like this:
(sorry for the spanish!)
I am doing this for each dimension, so I can handle each one with different error codes.
Since OriginStationId and DestinyStationId are two values from the same dimension (they both match against the same lookup table), I wanted to know if there's a way to avoid reading two times the data from the table (I mean, not to use two ole db sources to read twice the data from the same table).
I tried adding a second output to the sort but I am not allowed to. The same goes to adding another output from OLE DB Source.
I see there's an "cache option", is the best way to go ? (Although it would impy creating anyway another OLE DB source.. right?)
The third option I thought of was joining by the two fields, but since there is only one field in the lookup table (the same field) I am getting an error when I try to map both colums from my csv against the same column in my Lookup table
There are columns missing with the sort order 2 to 2
What is the best way to go for this ?
Or I am thinking something incorrectly ?
If something was not clear let me know and I'll update my question
Any time you wish you could have multiple outputs from a component that only allows one, all you have to do is follow that component with the Multicast component, whose sole purpose is to split a Data Flow stream into multiple outputs.
Gonzalo
I have just used this article on how to derive columns for a data warehouse building:- How to Populate a Fact Table using SSIS (part 1).
Using this I built a simple package that reads a CSV file with two columns that are used to derive separate values from the same CodeTable. The CodeTable has two fields Id and Description.
The Data Flow has two "Lookup" tasks. The first one joins the attribute Lookup1 against the Description to derive its Id. The second joins the attribute Lookup2 against the Description to derive a different Id.
Here is the Data Flow:-
Note the "Data Conversion" was required to convert the string attributes from the CSV file into "Unicode string [DT_WSTR]" so they could be joined to the nvarchar(50) description attribute in the table.
Here is the Data Conversion:-
Here is the first Lookup (the second one joins "Copy of Lookup2" to the Description):-
Here is the Data Viewer output with the to two derived Ids CodeTableFirstId and CodeTableSecondId:-
Hopefully I understand your problem and this is of use to you.
Cheers John