My source task has following columns read from csv file:
Sale_id, sale_date, order_no, sale_amt
This is followed by lookup task that looks into the sales sql table (having same column names) and the join is on order_no column.
The issue is that order_no data in sql sale table has value like 'ABC-12345', 'WXYZ-32111' (there are couple of characters prepended to the order number).
Where as in the csv there is '12345' without any characters prepended.
Hence I cannot do a lookup as there is no direct match. Is there any way to remove the characters and the hyphen from sale sql table data (temporarily) for performing the lookup join.
1st Data Flow Task - Use Flat File Source to LookUp and OLE DB Source via SQL Command with the following to LookUp.
select Sale_id, sale_date, order_no, sale_amt,
substring(order_no,charindex('-',order_no,0),len(order_no)) as [key]
from sql sale table
Use [Key] for your look up transformation.
The functions should provide the numeric values you are looking for.
Problem restatement
In your source, you have order numbers coming in that look like numbers. You need to be able to lookup against a source that has a text string prepended to an order number. We can assume the numeric part of the database's order number is unique.
Setup
I created a simplified version of your table and populated it with data
DROP TABLE IF EXISTS dbo.so_66446302;
CREATE TABLE dbo.so_66446302
(
sales_id int
, order_no varchar(20)
);
INSERT INTO dbo.so_66446302
(sales_id, order_no)
VALUES
(1, 'ABC-12345')
, (2, 'WXYZ-32111')
, (3, 'REALLY_LONG-654321');
A critical piece in using Lookup components is getting the data types to agree. I'm going to assume that the order number from the file is a DT_STR and not an integer.
By default, people pick a table in the Lookup component's Connection tab, dbo.so_66446302 but if you check "Use results of a SQL query", you'll have what you're looking for.
Similar query to what Jatin shows below but I find "showing my work" along the way helps me debug when it goes sideways. In this query, that's why I have the intermediate cross apply steps.
SELECT
S.*
, D0.order_no_as_string_digits
FROM
dbo.so_66446302 AS S
CROSS APPLY
(
SELECT
-- Length of the string less where we find the first dash
LEN(S.order_no) - CHARINDEX('-', S.order_no, 1)
)D(dash_location)
CROSS APPLY
(
SELECT RIGHT(S.order_no, D.dash_location)
)D0 (order_no_as_string_digits);
The results of that query are
sales_id order_no order_no_as_string_digits
1 ABC-12345 12345
2 WXYZ-32111 32111
3 REALLY_LONG-654321 654321
Now you can match the derived order number in the database to the one in your file by dragging the columns together. Check any/all columns that you need to retrieve from the database and send the data to the intended destination.
Related
I have a csv file with identifying and date information and then a several dozen data columns (sample below).
In the spirit of relational DBs (and also to avoid creating a table with 100+ columns), it seems preferable to load the dataset after I've pivoted the data such that column names for the data columns are subsequently included as row entries instead (sample below; data values aren't consistent with first table, including to demonstrate desired layout):
Occurs to me I could load the data to a place-holder table in MySQL, pivot within MySQL and insert into a new table but wonder if there's a more efficient way to do this.
If you want to do this with MySQL, loading the data to an intermediate table, then unpivoting it into the target table seems like the relevant strategy.
To load the file in the table database, you can use the LOAD DATA INTO FILE syntax.
Then, you can unpivot with union all:
insert into targettable (
ticker, dimension, item, calendardate, datekey, reportperiod, astupdated, value
)
select ticker, dimension, 'accoci', calendardate, datekey, reportperiod, astupdated, accoci
from mytable
union all
select ticker, dimension, 'assets', calendardate, datekey, reportperiod, astupdated, assets
from mytable
union all
select ticker, dimension, 'assetsavg', calendardate, datekey, reportperiod, astupdated, assetsavg
from mytable
...
Disclaimer: I am not convinced that unpivoting this dataset will eventually make things better or easier. That would ultimately depend on how you want to consume it, which you did not tell.
I have a query where fields are as follows:
UniqueID | RefNum | FirstName | Surname | Aim |.....
UniqueID - is a unique field (no duplicates)
RefNum - contains duplicates
What I'm trying to do is to create a new query (based on the above or amend this one) to extract only records with unique RefNum (remove duplicates from the RefNum field)
The way I did it was select 'Group By' RefNum in the Query Design View and selecting 'First' for the rest of the fields. It achieves what I need.
The problem is that if I switch to the Datasheet View (and subsequently export it to excel to be sent out) the field names are 'FirstOfUniqueID', 'FirstofFirstName', 'FirstOfSurname', etc. Is there a way of keep the original field names (not prefixing them with 'FirstOf') or is there another way of achieving it?
The query designer automatically assigns an alias for a field expression which is based on an aggregate function. So, if you switch from Design View to SQL View for your query, you will see something like this included in the SELECT field list ...
First(FirstName) AS FirstOfFirstName
You can change the alias to something else, and you have a lot of flexibility. However, at least in some cases, when you attempt to re-use the base field name as the alias, Access complains about a "circular reference". I don't know whether that would happen here, but you can try it like this ...
First(FirstName) AS [FirstName]
Whether or not that does what you want, I'll suggest you consider a different query strategy which almost completely avoids the field name alias issue. First test this query to confirm it returns suitable RefNum/UniqueID pairs. If your base query is named Query1 ...
SELECT q1.RefNum, Min(q1.UniqueID) AS MinOfUniqueID
FROM Query1 AS q1
GROUP BY q1.RefNum
Assuming that one returns the correct rows, join it back to the base query to select only the base query rows which match ...
SELECT q.*
FROM
Query1 AS q
INNER JOIN
(
SELECT q1.RefNum, Min(q1.UniqueID) AS MinOfUniqueID
FROM Query1 AS q1
GROUP BY q1.RefNum
) AS sub
ON q.UniqueID = sub.MinOfUniqueID
If you switch the view of your query to SQL view, you will see for example AS FirstOfFirstName.
Change this to AS FirstName and follow this on the other fields.
If you prefer doing this in design view you can do so by adding FirstName:
in front of the fieldname and so on:
I am working with Excel spreadsheets that I'm importing into MS Access. They include a client name, date of birth, some other personal information, and order information. The same clients often have multiple, unique orders. I am creating a table that is just unique clients (which I'll link to the order table later) and so when I import data from Excel I would like to delete duplicate client records, preserving one. I would like to match them on Name and Date of Birth. The issue I'm running into is that some client names are strings that don't match exactly.
For example:
Name DOB
---- ---
DOE,JOHN 1/1/1960
DOE,JOHN L 1/1/1960
JOHNSON,PAT 12/1/1945
SMITH,BETTY 2/1/1935
In the above set I'd like to limit it to just three records and remove an excess John Doe record.
I basically would like to only look at the client name before the space.
I wouldn't be opposed to losing the middle initial totally, so if there's a way to just chop it off, that'd work too. How can I achieve this?
Sounds like your easiest option is to in fact cut off any middle initals.
You'll want to process as follows.
Use Select DISTINCT when all done and said.
If you use the InStr function Syntax HERE , you can search for the space after the first name.
you can then choose to select only what's left of that with the Left function (left minus 1 as to not include the space). You'll come up with an error if a space isn't found, so add and iif statement to simply output just the name.
After reviewing the data, you'll need to remove column 1 (in the example below) as well as insert the Expr1 code directly into the iif statement, so in the end you'll only have two columns: DOB and Expr2 (or rename AS Name)
Here's an example:
SELECT DISTINCT
Table1.Name,
Table1.DOB,
InStr(1,[Table1].[Name]," ",1) AS Expr1,
IIf([expr1]>0,Left([Table1].[Name],[Expr1]-1),[Table1].[Name]) AS Expr2
FROM Table1;
Wayne beat me to it..
I have a database "warehouse" including tables of daily inventory records, one table for each day.
Now, I need to check the historic change of the inventory level. The output will print the inventory of each day given certain criteria.
I am not sure how to describe it, so I created a simplified sample of the schema, its tables and the expected output.
The schema "warehouse" has a list of tables:
Each table contains the same columns for product ID and inventory, below is table 101
For each table, I need to do a query:
select count(*) as num_of_product_with_inventory from [table name]. After I have the query result from each table, I should have an output like in below:
Can anyone show me how the query should look like to get the final output? I only know the basic queries and have no clue how to put these together. Thank you!
The data model you have is making your work harder than it should be.
If you must keep it, you will need to use a stored procedure or do the loop in your code (not in sql).
But you should really do is change the data model.
It is not recommended at all to create a table per day!
It's a mix of DATA with METADATA. The table structure should represent different types of data that you store, while the fact that you had different inventory on date X vs date Y should be in your data.
So, recommend to create one table with columns date, product_id and warehouse_inventory. If it gets too big, you can partition it by date (week/month/..). Then you can easily get your data with something like:
SELECT date, count(*) AS num_of_products_with_inventory
FROM daily_inventory i
WHERE i.date BETWEEN '<some date>' and '<some date>'
GROUP BY date
I want to create a mysql table with three columns id , name, name_id,
The thing that i want to acheive is whenever user enters a name into database
then system should generate a unique id for name automatically.
e.g.
name is JJ then name_id should be 1 and if name is DD then name_id should be 2
also if name JJ is repeated in database then name_id should be 1.
The number_id values should be assign according to name sorting
i.e A should get 1 and B should get 2.
How this can be achieved by sql script or triggers ?
What about the following?
INSERT INTO tbl (name,name_id)
SELECT newname, COALESCE((SELECT name_id FROM tbl WHERE name=newname ),
(SELECT max(name_id)+1 FROM tbl))
This is assuming that column id takes care of itself, i.e. is auto_incremented.
newname can of course also be a string constant which you will have to work in to your command.
The command above works best when used for indiviual inserts ("by a user"). If you want to carry out a bulk import then it can be quite costly since for each new value the table tbl will be scanned twice. For this case a different logic should be applied:
First find all name-name_id pairs by means of a grouped select and then INNER JOIN the results with the import list. For the remaining items (without existing name_ids) do the following: find out the highest #i=max(name_id) of all records and then import the sorted list with an autonumbering mechanism (#i:=#i+1) for name_id in place ...
create a sql function that returns the name
_id upon passing name as a parameter. one way would be to add all the characters but that wont do because different arrangement of same characters would give the same sum for different names.may be concatenating primary index at the end of sum would do the job. i think you can define a suitable logic in a sql function to achieve the results.