I am creating an ETL in SSIS in which I which I want my data source to be a restricted query, like select * from table_name where id='Variable'. This variable is what I defined as User created variable.
I do not understand how I can have my source query interact with the SSIS scoped Variable.
The only present options are
Table
Table from variable
SQL Command
SQL command from a variable
What I want is to have a SQL statement having a variable as parameter
Simple. Choose SQL command as the Data Access Mode. Enter your query with a question mark as a parameter placeholder. Then click the Parameters button and map your variable to Parameter0 in the Set Query Parameters dialog:
More information is available on MSDN.
An inferior alternative to #Edmund's approach is to use an Expression on another Variable to build your string. Assuming you have #[User::FirstName] already defined, you would then create another variable, #[User::SourceQuery].
In the properties for this variable, set EvaluateAsExpression to True and then set an Expression like "SELECT FirstName, LastName, FROM Person.Person WHERE FirstName = '" + #[User::FirstName] +"'" The double quotes are required because we are building an SSIS String.
There are two big reasons this approach should not be implored.
Caching
This approach is going to bloat your plan cache in SQL Server with N copies of essentially the same query. The first time it runs and the value is "Edmund" SQL Server will create an execution plan and save it (because it can be expensive to build them). You then run the package and the value is "Bill". SQL Server checks to see if it has a plan for this. It doesn't, it only has one for Edmund and so it creates another copy of the plan, this time hard coded to Bill. Lather-rinse-repeat and watch your available memory dwindle until it unloads some plans.
By using the parameter approach, when the plan is submitted to SQL Server, it should be creating a parameterized version of the plan internally and assumes that all parameters supplied will result in equal costing executions. Generally speaking, this is the desired behaviour.
If your database is optimized for ad-hoc workload (it's a setting turned off by default), that should be mitigated as every plan is going to get parameterized.
SQL Injection
The other big nasty you will run into with building your own string is that you open yourself up to SQL Injection attacks or at the least, you can get runtime errors. It's as simple as having a value of "d'Artagnan." That single quote will cause your query to fail resulting in package failure. Changing the value to "';DROP TABLE Person.Person;--" will result in great pain.
You might think it's trivial to safe quote everything but the effort of implementing it consistently everywhere you query is beyond what your employer is paying you. All the more so since there is native functionality provided to do the same thing.
When using OLEDB Connection manager (with SQL Server Native Client 11.0 provider in my case) you can catch an error like this:
Parameters cannot be extracted from the SQL command. The provider
might not help to parse parameter information from the command. In
that case, use the "SQL command from variable" access mode, in which
the entire SQL command is stored in a variable.
So you need to explicitly specify database name in OLEDB Connection manager properties. Otherwise SQL Server Native Client can use different database name then you mean (e.g. master in MSSQL Server).
For some cases you can explicitly specify database name for each database object used in query, e.g.:
select Name
from MyDatabase.MySchema.MyTable
where id = ?
Related
I am implementing a SSIS package and currently trying to do the following.
Truncate the destination table
Fetch the data by executing the stored procedure and insert it into the destination table.
I have created an Execute SQL task to address step 1 and dataflow with oledb source and oledb destination to address the second point. It been working successfully so far but isn't working for one my stored procedure that uses temp tables.
When I edit the oledb source and click the preview button, I get the error no column returned
I know that SSIS has an issue with generating column while executing stored procedures that depend on temp tables. I have converted the stored proc to use temporary table variables and its now able to return columns in SSIS when I do a preview. The only downside is that the stored procedure is taking longer time to execute. Its taking 1 hour 15 mins as compared to 15 mins while using temp tables.
I did see a suggestion to use SET FMTONLY before executing the stored procedure as an alternate solution to changing to temp table variables but that didn't seem to work as I am getting syntax or permission denied error.
Could somebody tell me a solution to my problem which does not compromise on the performance.
Sounds like you've already read all the approaches to using Temp tables in SSIS, including the IF 1=0... trick? If you haven't seen that one yet, google it.
You say that using Table Variables causes your stored procedure to take about 5 times longer than using Temp Tables. The most likely reason for that is that you are indexing your temp tables but not your table variables. If you didn't know that table variables can be indexed, they can. You might try that.
Finally, a solution that you haven't mentioned is that you can replace your temporary table with a real table that gets truncated when you're done using it.
Short comment:
Try EXEC WITH RESULT SETS and specify the metadata yourself for a proc with temp tables; or use the Script Component as a source and specify the Output columns yourself.
Long comment:
Technically speaking, it is the driver/database you are using in SSIS that would decide the behavior when working with temp tables.
Metadata is an important factor when using SSIS's pipeline components. By metadata, I mean the names of the columns, their data types etc that a pipeline component uses. When designing a data flow, someone/something should provide this metadata to the components that require it.
In most cases, SSIS automatically retreives the metadata. Components that do not connect to a external data source, like Conditional Split etc, get their metadata from the other components they are connected to. For the pipeline components that connect to a external data source (like Oledb source, oledb destination, Lookup etc.), SSIS provides a mechanism to get this metadata without human involvement. This mechanism involves the driver connecting to the database and retrieving the metadata of the output. If the driver/database is capable of returning the metadata, then that metadata is used. If the driver/database is incapable, then you get the errors you are seeing. The rest of my comments are based on the assumption that you are using a SQL Server database in your question.
When working with a SQL Server database in SSIS, typically, we use the native client drivers provided by Microsoft. When trying to get the metadata, these drivers try to get the metadata without actually executing the SQL Statement (actual execution can have side effects; and also, might take more than a few seconds/minutes/hours; and you dont want side effects and long wait times during package design time.) So to get the metadata, the driver relies on the metadata of the actual objects used in the sql command. If the command uses a physical table or view, SQL Server already has the metadata available and can supply it to the driver. If it is a temp table, SQL Server does not have the metadata until it can create the temp table. If using FMT ONLY option, you can use it in such a way to create the temp tables, but avoid any heavy processing/side affects and thus be able to retrieve metadata without penalties. Post 2012, these native client drivers rely on some newer functionality to retrieve metadata than the drivers before 2012. In 2012 and after, the driver uses the sp_describe_first_result_set proc to retrieve metadata. So, whether you can get metadata or not is determined by the ability of the sp_describe_first_result_set proc.
So while SSIS can automatically get the metadata (because of the driver/database), it does not automatically get the metadata in some cases (again because of the driver/database). In cases involving the second scenario, some other process (typically a human) can help the driver infer metadata or provide the metadata to the component directly.
To help the driver, in case of SQL Server 2012 and after, you can use the WITH RESULTSETS clause to specify the output metadata. When this clause is present, the driver will use it and doesnt try to query the metadata from system objects; and thus avoid the error which you would otherwise get. If you are using the drivers that came with SQL Server 2008, you can use FMT ONLY. This option is at the driver/database level.
Another option could be to use a Script Component as the Source and in the Output columns, you can specify the columns/metadata. SSIS would not try to retrieve metadata from the datasource in this case, but would rely on the definitions you provided in the Output section of the Script Component.
As you can see, both options involve a human (or some other process) specifying the metadata instead of SSIS trying to retrieve the metadata in an automated fashion. I would prefer the first option if working with SQL Server and the second option if working with databases like MySql.
I am attempting to create several calculated columns in a table with different parts of a parsed filename. Using the InstrRev function is critical to isolate the base file name or extension, but InstrRev is not supported in calculated columns.
I know that there are other ways to solve my problem that don't use calculated columns, but does anyone have a valid calculated column formula that could help me?
Access lets you use VBA functions (including user-defined functions) directly from within a SQL query - however they only work within an Access context - if you have another frontend for a JET (now ACE) database - or inside a computed/calculated column, they won't work - as you've just discovered.
Unfortunately Access (JET and ACE) have only a very meagre and anaemic selection of built-in functions, and the platform has now lagged-behind SQL Server (and even the open-source SQLite) significantly - Access 2016 has not made significant changes to its SQL implementation since Access 2000 (16 years of stagnation!) whereas SQL Server 2016's T-SQL language is so evolved it's almost unrecognizable compared to SQL Server 2000.
JET and ACE support the standard ODBC functions ( https://msdn.microsoft.com/en-us/library/bb208907(v=office.12).aspx ) however none of these perform a "reverse index-of" operation. Also absent is any form of pattern-matching function - though the LIKE operator works, it only returns a boolean result, not a character index.
In short: what you want to do is impossible.
This has been discovered by many people before you:
https://social.msdn.microsoft.com/Forums/office/en-US/6cf82b1b-8e74-4ac8-9997-61cad8bb9310/access-database-engine-incompatible-with-instrrev?forum=accessdev
He maintains a list of DAO/Jet/etc reserved words - and on that list you will see the InstrRev is a VBA() function, and is not a part of the Jet/Ace Engines.
using InStrRev() and similar functions in Jet/ACE queries outside of Access
As you have discovered, SQL queries executed from within Access can use many VBA functions that are not natively supported by the Jet/ACE dialect of SQL
That said, computed/calculated columns are only really of use in stored VIEW objects ("Queries" objects in Access parlance) - which in turn are used for user convenience, not for any programming advantage - especially as these are scalar functions that are evaluated for every row of data that the engine processes (making them potentially very expensive and inefficient to run).
...so the only real solution is to abandon computed/calculated columns and perform this processing in your own application code - but the advantage is that your program will likely be significantly faster.
...or don't use Access and switch to a different DBMS with better active support, such as SQLite (for an in-process database), SQL Server (now with LocalDb for in-process support), or VistaDB (proprietary, but 100% Managed code). Note that Access also supports acting as a front-end for a SQL Server "backend" data-store - where you could create a VIEW that performs this operation, then query the view from your Access code or other consuming client.
There is a workaround if you must: Create a duplicate column that contains the string-reversed value of your original column, then you can evaluate the ODBC LOCATE or JET SQL InStr functions on it and get the result you want (albiet, reversed) - but this would require double the storage space.
e.g.
RowId, FileName , FileNameRev
1 , 'Foo.txt', 'txt.ooF'
2 , 'Bar.txt', 'txt.raB'
Avoid any calculated field. It's a "super user" feature only, that will cause you nothing but trouble. Calculated fields - or expressions - belong in a query.
So create a simple select query:
Select
*,
InStrRev([FieldToCheck], "YourMatchingString") As StringMatch
From
YourTable
Save the query, and then use this whenever you need the table values and this expression.
I have a reporting database and have to transfer data from that to another server where we run some other reports or functions on Data. What is the best way to transfer data periodically like months or by-weekly. I can use SSIS but is there anyway I can put some where clause on what rows should be extracted from the source database? like i only want to extract data for a current month. Please do let me know.
Thanks,
Vivek
For scheduling periodic extractions, I'd leave to that SQL Agent.
As for restricting the results by some condition, that's an easy thing. Instead of this (and you should always use SQL Command or SQL Command From Variable over Table Name/Table Name From Variable as they are faster)
Add a parameter. If you're use OLE DB connection manager, your indicator for a variable is ?. ADO.NET will be #parameterName
Now, wire the filter up by clicking the Parameters... button. With OLE DB, it's ordinal position starting at 0. If you wanted to use the same parameter twice, you will have to list it each time or use the ADO.NET connection manager.
The biggest question you will have to answer is how do I identify what row(s) need to go. Possibilities are endless: query into the target database and find most recent modified date for a table or highest key value. You could create a local table that tracks what's been sent and query that. You could perform an incremental load / ETL Instrumentation to identify new/updated/unchanged rows, etc.
I've got one easy question: say there is a site with a query like:
SELECT id, name, message FROM messages WHERE id = $_GET['q'].
Is there any way to get something updated/deleted in the database (MySQL)? Until now I've never seen an injection that was able to delete/update using a SELECT query, so, is it even possible?
Before directly answering the question, it's worth noting that even if all an attacker can do is read data that he shouldn't be able to, that's usually still really bad. Consider that by using JOINs and SELECTing from system tables (like mysql.innodb_table_stats), an attacker who starts with a SELECT injection and no other knowledge of your database can map your schema and then exfiltrate the entirety of the data that you have in MySQL. For the vast majority of databases and applications, that already represents a catastrophic security hole.
But to answer the question directly: there are a few ways that I know of by which injection into a MySQL SELECT can be used to modify data. Fortunately, they all require reasonably unusual circumstances to be possible. All example injections below are given relative to the example injectable query from the question:
SELECT id, name, message FROM messages WHERE id = $_GET['q']
1. "Stacked" or "batched" queries.
The classic injection technique of just putting an entire other statement after the one being injected into. As suggested in another answer here, you could set $_GET['q'] to 1; DELETE FROM users; -- so that the query forms two statements which get executed consecutively, the second of which deletes everything in the users table.
In mitigation
Most MySQL connectors - notably including PHP's (deprecated) mysql_* and (non-deprecated) mysqli_* functions - don't support stacked or batched queries at all, so this kind of attack just plain doesn't work. However, some do - notably including PHP's PDO connector (although the support can be disabled to increase security).
2. Exploiting user-defined functions
Functions can be called from a SELECT, and can alter data. If a data-altering function has been created in the database, you could make the SELECT call it, for instance by passing 0 OR SOME_FUNCTION_NAME() as the value of $_GET['q'].
In mitigation
Most databases don't contain any user-defined functions - let alone data-altering ones - and so offer no opportunity at all to perform this sort of exploit.
3. Writing to files
As described in Muhaimin Dzulfakar's (somewhat presumptuously named) paper Advanced MySQL Exploitation, you can use INTO OUTFILE or INTO DUMPFILE clauses on a MySQL select to dump the result into a file. Since, by using a UNION, any arbitrary result can be SELECTed, this allows writing new files with arbitrary content at any location that the user running mysqld can access. Conceivably this can be exploited not merely to modify data in the MySQL database, but to get shell access to the server on which it is running - for instance, by writing a PHP script to the webroot and then making a request to it, if the MySQL server is co-hosted with a PHP server.
In mitigation
Lots of factors reduce the practical exploitability of this otherwise impressive-sounding attack:
MySQL will never let you use INTO OUTFILE or INTO DUMPFILE to overwrite an existing file, nor write to a folder that doesn't exist. This prevents attacks like creating a .ssh folder with a private key in the mysql user's home directory and then SSHing in, or overwriting the mysqld binary itself with a malicious version and waiting for a server restart.
Any halfway decent installation package will set up a special user (typically named mysql) to run mysqld, and give that user only very limited permissions. As such, it shouldn't be able to write to most locations on the file system - and certainly shouldn't ordinarily be able to do things like write to a web application's webroot.
Modern installations of MySQL come with --secure-file-priv set by default, preventing MySQL from writing to anywhere other than a designated data import/export directory and thereby rendering this attack almost completely impotent... unless the owner of the server has deliberately disabled it. Fortunately, nobody would ever just completely disable a security feature like that since that would obviously be - oh wait never mind.
4. Calling the sys_exec() function from lib_mysqludf_sys to run arbitrary shell commands
There's a MySQL extension called lib_mysqludf_sys that - judging from its stars on GitHub and a quick Stack Overflow search - has at least a few hundred users. It adds a function called sys_exec that runs shell commands. As noted in #2, functions can be called from within a SELECT; the implications are hopefully obvious. To quote from the source, this function "can be a security hazard".
In mitigation
Most systems don't have this extension installed.
If you say you use mysql_query that doesn't support multiple queries, you cannot directly add DELETE/UPDATE/INSERT, but it's possible to modify data under some circumstances. For example, let's say you have the following function
DELIMITER //
CREATE DEFINER=`root`#`localhost` FUNCTION `testP`()
RETURNS int(11)
LANGUAGE SQL
NOT DETERMINISTIC
MODIFIES SQL DATA
SQL SECURITY DEFINER
COMMENT ''
BEGIN
DELETE FROM test2;
return 1;
END //
Now you can call this function in SELECT :
SELECT id, name, message FROM messages WHERE id = NULL OR testP()
(id = NULL - always NULL(FALSE), so testP() always gets executed.
It depends on the DBMS connector you are using. Most of the time your scenario should not be possible, but under certain circumstances it could work. For further details you should take a look at chapter 4 and 5 from the Blackhat-Paper Advanced MySQL Exploitation.
Yes it's possible.
$_GET['q'] would hold 1; DELETE FROM users; --
SELECT id, name, message FROM messages WHERE id = 1; DELETE FROM users; -- whatever here');
So I was playing with my MS SQL Server 2008 app to see how good it is protected against SQL injections. The app lets users to create views in the database.
Now consider the following:
create view dbo.[]]; drop database foo--] as select 1 as [hi!]
This creates a view with a name of ]; drop database foo--. It is valid and you can select from it (returns the number 1, obviously).
Strange thing #1:
In SQL Management Studio, the query SELECT [hi!] FROM [dbo].[]]; drop database foo--] is red-underlined as incorrect, claiming that the object name is not valid. Nevertheless, it executes and returns the 1.
Strange thing #2:
Call to OBJECT_ID(']; drop database foo--') yields NULL (which means the object does not exist), but the following query returns information about the view properly:
select * from sys.objects where name = ']; drop database foo--';
Are those bugs or am I missing a point?
You're missing the point. SQL Server can't protect itself against SQL injection - if somebody has direct access to your database then you've already been pwned. It's your application that needs to protect against SQL injection by parameterizing queries, and preventing these kinds of statements from ever making it to the database.
1: that only means the intellisense parser is not up to par witht the finer details of SQL syntax. While it may be an intellisense bug, it is not an injection vector.
2: object_id() accepts multipart names, so it needs the name in quotes if ambiguous: select object_id('[]]; drop database foo--]')
That's like using your key to get into your car and then saying "hey there's a security hole, I'm allowed to steal the radio"
It seems the problem is that you are yourself causing SQL injection by accepting user input and using it as SQL statement text.
The fact that you "properly escaped" the ] (by substituting with ]]) really doesn't matter - it's you allowing the user input to be used as anything else but a value by definition means you allow SQL injection.