pdi spoon ms-access concat - ms-access

Suppose I have this table named table1:
| f1| f2 |
--------------
| 1 | str1 |
| 1 | str2 |
| 2 | str3 |
| 3 | str4 |
| 3 | str5 |
I wanted to do something like:
Select f1, group_concat(f2) from table1
this is in mysql, I am working with ms-access! And get the result:
| 1 | str1,str2|
| 2 | str3 |
| 3 | str4,str5|
So I searched for a function in ms-access that would do the same and found it! xD
The problem is that everyday I have to download some database in ms-access, create the function to concat there, and then create a new table with those concated values.
I wanted to incorporate that process in the Pentaho Data Integration spoon transformations, that I use after all this work.
So what I want is a way to define a ms-access function in the PDI spoon, or some way to combine steps that would emulate the group_concat from mysql.

Simple - Query from access, and use the "group by" step to do your group_concat - there is an option to concatenate fields separated by , or any string of your choice.
Dont forget that the stream must be sorted by whatever you're grouping by unless you use the memory group by step.

A simple way is you move your data in ms-access to mysql with the same structure (mysql DB structure = ms-access DB structure), then execute your "Select f1, group_concat(f2) from table1". For details follow this below steps :
Create transformation A to move/transfer your ms-access data to mysql
Create transformation B to execute Select f1, group_concat(f2) from table1
Create job to execute transformation A and B (You must execute tranformation A before B)

Related

mySQL - Reiteratively Count rows that have particular CSV string

2-column MySQL Table:
| id| class |
|---|---------|
| 1 | A,B |
| 2 | B,C,D |
| 3 | C,D,A,G |
| 4 | E,F,G |
| 5 | A,F,G |
| 6 | E,F,G,B |
Requirement is to generate a report/output which tells which individual CSV value of class column is in how many rows.
For example, A is present in 3 rows (with id 1,3,5), and C is present in 2 rows (with id 2,3), and G is in 4 rows (3,4,5,6) so the output report should be
A - 3
B - 3
C - 2
...
...
G - 4
Essentially, column id can be ignored.
The draft that I can think of - first all the values of class column need to picked, split on comma, then create a distinct list of each unique value (A,B,C...), and then count how many rows contain the unique value from that distinct list.
While I know basic SQL queries, this is way too complex for me. Am unable to match it with some CSV split function in MySQL. (Am new to SQL so don't know much).
An alternative approach I made it to work - Download class column values in a file, feed it to a perl script which will create a distinct array of A,B,C, then read the downloaded CSV file again foreach element in distinct array and increase the count, and finally publish the report. But this is in perl which will be a separate execution, while the client needs it in SQL report.
Help will be appreciated.
Thanks
You may try split-string-into-rows function to get distinct values and use COUNT function to find number of occurrences. Specifically check here

MS Access: How to compare and filter data in a column

I am a new user and here is my first question,
I have newly started working on MS access and I am having problems to filter maximum of a column data but according to the data in an another column as well.
Let me explain the situation with a test data:
Table consists of Column A, is a short text, and column B is an integer,
Test Data
With a query, i want to filter out only AA-02, BB-04 and CC-06,
I can compare values in a column very easily in excel however i am having problems in Access,
Thanks for your time in advance.
Best Regards,
M.ER
assuming you want the last instance of column B this is a simple sql Totals query. Using the Query Designer:
In the SQL Tab (not shown but bottom right of the query designer)
SELECT Test.ColumnA, Last(Test.ColumnB) AS ColumnB
FROM Test
GROUP BY Test.ColumnA;
Result:
| ColumnA | ColumnB |
| AA | 2 |
| BB | 4 |
| CC | 6 |

More than 255 Fields in Access 2000/2010

I am converting a 20-year old system from DBase IV into Access 2010, via Access 2000, in order to be more suitable for Windows 10. However, I have about 350 fields in the database as it is a parameters table and MS-Access 2000 and MS-Access 2010 are complaining about it. I have repaired the database to removed the internal count problem but am rather surprised that Windows 10 software would have such a low restriction. Does anyone know how to bypass this? Obviously I can break it into 2 tables but this seems rather archaic.
When you start to run up against limitations such as this, it reeks of poor database design.
Given that you state that the table in question is a 'parameters' table, with so many parameters, have you considered structuring the table such that each parameter occupies its own record?
For example, consider the following approach, where ParamName is the primary key for the table:
+----------------+------------+
| ParamName (PK) | ParamValue |
+----------------+------------+
| Param1 | Value1 |
| Param2 | Value2 |
| ... | |
| ParamN | ValueN |
+----------------+------------+
Alternatively, if there is the possibility that each parameter may have multiple values, you can simple add one additional field to differentiate between multiple values for the same parameter, e.g.:
+----------------+--------------+------------+
| ParamName (PK) | ParamID (PK) | ParamValue |
+----------------+--------------+------------+
| Param1 | 1 | Value1 |
| Param1 | 2 | Value2 |
| Param1 | 3 | Value3 |
| Param2 | 1 | Value2 |
| ... | ... | ... |
| ParamN | 1 | Value1 |
| ParamN | N | ValueN |
+----------------+--------------+------------+
I had similar problem - we have more than 300 fields in one Contact table on SQL sever linked to Access. You probably do not need to display 255 fields on one form - that would not be user friendly. You can split it to several sub-forms with different underlined queries for each form with less than the limitation. All sub-forms would be linked by the ID.
Sometimes splitting tables as suggested above is not the best idea because of performance.
As Lee Mac described a sample change in structure of a "parameters" table really would be your better choice. You could then define some constants for each of these to be used in code to prevent accidental misspelling later in code in case used in many places.
Then you could create a function (or functions) that take a parameter of what parameter setting you are looking for, it queries the table for that as the key and returns the value. Not being a VB/Access developer, but would think cant overload the functions to have a single function but returning different data types such as string, int, dates, etc. So you may want functions something like
below samples in C#, but principle would be the same.
public int GetAppParmInt( string whatField )
public DateTime GetAppParmDate( string whatField )
public string GetAppParmString( string whatField )
etc...
Then you could get the values by calling the function that has the sole purpose of querying the parameters table for that one key and returns the value as stored.
Hopefully a combination of offered solutions here can help you in your upgrade, even if your parameter table (expanding a bit on Lee Mac's answer) has each data type you are storing to correspond with the "GetAppParm[type]"
ParmsTable
PkID ParmDescription ParmInt ParmDate ParmString
1 CompanyName Your Company
2 StartFiscalYear 2019-06-22
3 CurrentQuarter 4
4... etc.
Then you don't have to worry about changing / data conversions all over the place. They are stored in the proper data type you expect and return that type.

How to join a group of tables with the same suffix?

So I am no MYSQL expert and I really need some help trying to figure this out. I currently have over 60 tables that I wish to join into a single table, none of the data in those tables match each other, so I need the rows of all the tables into a single one. They do have the same schema if that is the correct term, basically the same format. They all end in the same suffix '_dir'.
What I thought that could work was something like this,
Get all tables under the same suffix,
For each table in the table list join or insert row into main_table.
I don't know how to do this in mysql or if its even possible. I know I can use,
SELECT *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME LIKE '%_dir%'
to get the list of all the tables, but how can I use this to iterate over every table?
Here is an example of input data:
table 1:
| NAME | INST_NAME | Drop
| data 1 | 'this is an example instance1 | 1.5
| data 1 | 'this is an example of instance2| 2.0
table 2:
| NAME | INST_NAME | DROP
| data 2 | 'this is an example instance1 | 3.0
| data 2 | 'this is an example of instance2| 4.0
Output table:
| NAME | INST_NAME | DROP
| data 1 | 'this is an example instance1 | 1.5
| data 1 | 'this is an example of instance2| 2.0
| data 2 | 'this is an example instance1 | 3.0
| data 2 | 'this is an example of instance2| 4.0
Note that I have to do this for over 60 tables not just 2. There are also other tables with different information in the same database, so I cant just join all tables in there.
You really need to fix your data structure. You should not be storing data in tables with the same structure -- that information should all go into a single table. Then you wouldn't have this issue.
For now, you can construct a view with all the data. You can generate the code for the view with something like this:
SELECT CONCAT('CREATE VIEW vw_dir AS',
GROUP_CONCAT(REPLACE('SELECT NAME, INST_NAME, `DROP` FROM [T]', '[T]'), TABLE_NAME)
SEPARATOR ' UNION ALL '
)
) as create_view_sql
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME LIKE '%_dir%';
Then take the SQL, run it, and you'll have a view called vw_dir. The next time you add a table, you'll need to drop the view and then recreate it.
With this solved, you can now start thinking about how to get all the data into a single table, without having the intermediate tables cluttering up your database.

handled dynamically missing source columns in ssis

I have a small SSIS question. I'm extracting data from a MySQL table with a varying column list to a SQL Server table with a fixed column list.
source table: Test(mysql server)
id | name | sal | deptno | loc | referby
1 | abc | 100 |10 | hyd | xyz
2 | mnc | 200 |20 |chen | pqr
First I select MySQL table configuration, then I drag and drop oledbdestination for MySQL server table configuration. I configure the target table, and after that the package works fine and the data looks like below.
Target table : Test (sql server )
id | name | sal |deptno | loc |referby
1 | abc | 100 |10 | hyd | xyz
2 | mnc | 200 |20 |chen | pqr
The second time I run the package, a column has been removed from the source table's schema, so the package fails. I open the MySql server testsource configuration and I edit the query to return NULL for the missing column:
select id,'null' as name,sal,deptno,loc,referby from test
I rerun the package and the data looks like this.
Target table : Test (sql server )
id | name | sal |deptno | loc |referby
1 | null | 100 |10 | hyd | xyz
2 | null | 200 |20 |chen | pqr
I always truncate the target table and load data.
The target table has an unchanging list of columns while the source table's column list can vary. I do not want keep editing the query to account for possible missing columns. How I can handle this at the package level?
A couple ideas:
Use dynamic SQL. Replace your straightforward SELECT ... with a query that iterates through the target table's column list (perhaps fetched via SHOW COLUMNS), builds a SELECT query that inserts NULL for the missing columns then execute it via PREPARE and EXECUTE.
The query-generating query would need to produce a SELECT statement containing the fixed set of columns your target table expects to see. If an expected column doesn't exist in the source, the query-generating query should insert the placeholder NULL AS ColumnName in the query.
(I'm not a MySQL expert so I'm unsure of MySQL's exact capabilities in this regard but in theory this approach sounds workable.)
Use a Script Component as the data source. Configure this component with the output columns you expect. Have the component query the source database (maybe using a simple SELECT * FROM ....) and then copy only the relevant columns that exist from source to output row buffer. With this approach, columns that don't exist will automatically be outputted into the data flow as null/their default value because the Script Component won't have set them to a value.
SSIS is very rigid when it comes to dynamic sources like this. I think your best bet would be to explore BIML which could generate a new package for you each time you need to "refresh" the schema.
http://www.sqlservercentral.com/stairway/100550/