I'm building a sql statement like the one below in a rails app :-
bank_ids = params[:bank_ids] # comes from end user or simply, is a user input.
sql_string = "SELECT * FROM users WHERE bank_id IN (#{bank_ids});"
Is the sql statement above vulnerable to an injection attack, input 'bank_ids' is end user controlled.
Take as example a table designed to store a boolean value to tell if a user is admin or not (might not happend, but it's an example):
Table "public.users"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------+--------------------------------+-----------+----------+-----------------------------------+----------+--------------+-------------
id | bigint | | not null | nextval('users_id_seq'::regclass) | plain | |
name | character varying | | | | extended | |
admin | boolean | | | | plain | |
bank_id | integer | | | | plain | |
If you receive something like this:
'1) or id IN (select id from users where admin = true'
And that's interpolated afterward, then the select clause asking for the admin users will retrieve data that otherwise wouldn't appear. The query would be executed as it's built:
select * from users where bank_id IN (1) or id IN (select id from users where admin = true)
Is better for you to rely on the ORM you have at hand and leave it to do the sanitization and proper bindings for you (it's one of the reasons why those tools exist). Using ActiveRecord for example would bind the passed values for you, without having to do much:
User.where(bank_id: '1) or id IN (select id from users where admin = true')
# ... SELECT "users".* FROM "users" WHERE "users"."bank_id" = $1 [["bank_id", 1]]
Related
Data to work with:
+-------------+-------------+
| user | host |
+-------------+-------------+
| user1 | host1 | -
| user1 | ip1 | -
| user1 | host2 | *
| user2 | host2 | -
| user2 | ip2 | -
| unknown | unknown | +
| user1 | unknown | +
| unknown | host | +
+-------------+-------------+
The symbol to the right of the table are:
- do not show | + show as they are unknown | * because a user can only connect to one host, unless I had authorised it in which case I would supply the user host pair to the call also and it would not show.
Thats how I want things to work anyway.
This is where I am at with help from my question and as there is now a further condition a new question needs asking.
Current procedure
USE mysql;
DROP PROCEDURE IF EXISTS ShowUsers;
DELIMITER $
CREATE PROCEDURE `ShowUsers`(
IN KnownUsers varchar(500),
IN KnownHosts varchar(500)
)
BEGIN
SELECT
user,host
FROM
user
WHERE
NOT FIND_IN_SET(host, KnownHosts)
AND
NOT FIND_IN_SET(user, KnownUsers)
ORDER BY user, host ASC;
END $
DELIMITER ;
Calling the procedure like this
# known users and known hostnames or ips to match and exclude from results.
SET #Usernames = 'user1,user2';
SET #Hostnames = 'host1,host2,ip1,ip2'
CALL ShowUsers(#Usernames, #Hostnames);
Intended Result:
+-------------+-------------+
| user | host |
+-------------+-------------+
| user1 | host2 | *
| unknown | unknown | +
| user1 | unknown | +
| unknown | host | +
+-------------+-------------+
I want to be able to supply multiple user:host pairs of (known legitimate credentials) and return results that do not match, so return only suspect/illegitimate credentials in the query results.
I have created a fiddle https://www.db-fiddle.com/f/xb7dWXbkokHGbcPdzR7BUa/4 Hopefully you can see where I am going with this.
Based on whatever I could understand from your problem statement, you will need to use multiple string operations to satisfy your conditions (explanation in inline comments below):
Query
SELECT
`user`,`host`
FROM
tbl
WHERE
-- NOT condition to avoid returning one-to-one mapping between `user` and `host`
-- If `user` exist in the #Usernames, and the position of the
-- `user` matches with the position of the `host` in the #Hostnames
NOT (
FIND_IN_SET(`user`, #Usernames) > 0
-- Host and User are at same position in the lists
AND FIND_IN_SET(`user`, #Usernames) = FIND_IN_SET(`host`, #Hostnames)
)
AND
-- NOT condition to handle `host` at the end of #Hostnames list, where
-- there is no corresponding `user` mapped
NOT (
FIND_IN_SET(`host`, #Hostnames) > CHAR_LENGTH(#Usernames)
- CHAR_LENGTH(REPLACE(#Usernames, ',', ''))
+ 1
);
Result
| user | host |
| ------- | ------- |
| user1 | host2 |
| unknown | unknown |
| user1 | unknown |
| unknown | host |
View on DB Fiddle
Caveat: Above query will not work when there is no user in the #Usernames list. For brevity purpose, I avoided making the conditions more complex to handle that. Moreover, I doubt that in your practical use-case, you would have a situation where there are no user in the list.
This construct works (but not efficiently for huge tables):
WHERE (user, host) NOT IN ( ('u1', 'h1'), ('u2', 'h2), ... )
For further discussion, see "row constructor".
I have a table of users rights. These rights can be Write or Read. I have a SQL view which merges profile_user table and right_user table. This view give me the following results:
+-------+-----------+---------------+
| email | right | write_or_read |
+-------+-----------+---------------+
| admin | dashboard | write |
+-------+-----------+---------------+
| admin | dashboard | read |
+-------+-----------+---------------+
| admin | log | read |
+-------+-----------+---------------+
How can I un-duplicate dashboard write to get only the most important (the write right).
I want to write an SQL query which gives me the following result:
+-------+-----------+---------------+
| admin | dashboard | write |
+-------+-----------+---------------+
| admin | log | read |
+-------+-----------+---------------+
I read this question but the answer covers only id numeric field to get the right record.
In this case, a simple max() works:
select email, right, max(write_or_read)
from user_rights
group by email, right;
You can use CASE expression to list the conditions in priority. This works for more than two values as well:
SELECT email, right, CASE
WHEN SUM(write_or_read = 'write') > 0 THEN 'write'
WHEN SUM(write_or_read = 'read') > 0 THEN 'read'
-- more conditions
END AS permission
FROM t
GROUP BY email, right
There is a table named as customer like this:
| Name | Age | Balance |
------------------------------
| Kevin | 25 | 150000 |
| Bob | 33 | 350000 |
| Anna | 27 | 200000 |
Simply, to select "Name" column we can use:
SELECT Name FROM customer
Now, I want to do that by using a variable like this:
SET #temp = 'Name'
SELECT #temp FROM customer
The result I get:
| #temp |
-----------
| Name |
The result I want is same like the normal select:
| Name |
----------
| Kevin |
| Bob |
| Anna |
I am expecting this will run the same like "SELECT Name From Customer", so it basically run the SELECT from a variable value.
I also use a function returned value to do the same thing, but I get the similar result. For example, there is function called CustName(Value):
SELECT CustName(A) // Return : 'Name'
FROM customer;
This will give me result:
| CustName(A) |
-----------------
| Name |
Is there any way that MySQL will run "Name" normally like when I basically write "Select Name from customer" ?
What you're saying you're looking for is dynamic sql.. it's generally not a fabulous idea as you're trying to vary a part of a query that the database wants to be fixed, for performance reasons. You'll also struggle to make use of your sql in a client app if it's expecting a string of a username, but then the user supplied 'birthday' as the thing to select and your client gets a date instead
If you're hell bent on doing it, this SO post gives more detail: How To have Dynamic SQL in MySQL Stored Procedure
I must ask you to consider though, that this is a broken solution you've devised, to some other problem. It might be better to post the other problem as solving it may prove more productive
I am trying to get the eNum (employee number) of that who masters 2 values (MySQL and Python) from the same attribute column, The closest I get is down below, but the eNum is duplicated. I want to get just one eNum once. I think I am messing it up in the WHERE clause... I don't know...
mysql> select * from employee_expert;
+------+---------+
| eNum | package |
+------+---------+
| E246 | Excel |
| E246 | MySQL |
| E246 | Python |
| E246 | Word |
| E403 | Jave |
| E403 | MySQL |
| E892 | Excel |
| E892 | PHP |
| E892 | Python |
+------+---------+
mysql> SELECT eNum, package
FROM employee_expert
WHERE (package = 'MySQL' OR package = 'Python') AND (package = 'MySQL' OR package = 'Python')
GROUP BY package;
+------+---------+
| eNum | package |
+------+---------+
| E246 | MySQL |
| E246 | Python |
+------+---------+
The WHERE clauses contains unnecessary duplication of the condition package = 'MySQL' OR package = 'Python'. Using WHERE (package = 'MySQL' OR package = 'Python') is enough. Or, to make it more readable you can write WHERE package IN ('MySQL', 'Python').
Your query selects the employees that know 'MySQL' or 'Python' or both.
It looks like you want to select the employees that know both 'MySQL' and 'Python'. You need to use a JOIN for this purpose:
SELECT f.eNum
FROM employee_expert f # 'f' from 'first'
INNER JOIN employee_expert s USING(eNum) # 's' from 'second'
WHERE f.package = 'MySQL'
AND s.package = 'Python'
Unfortunately, this approach does not scale very well if you need to find by a larger set of languages. A better approach would be to use the original query and group the results by eNum like this:
SELECT eNum, COUNT(DISTINCT package) AS nbLangs
FROM employee_expert
WHERE package IN ('MySQL', 'Python') # <------------------------------------+
GROUP BY eNum # Make one entry for each employee |
HAVING nbLangs = 2 # Replace '2' with the number of items in this list --+
This query counts the number of known languages for all the employees that know at least one of the languages in the list then keeps only those that knows all of them.
I think the problem is in the design itself, come to think of it, an employee can master MANY packages and a package can be mastered by MANY employees, it's a many to many relationship, in terms of database that will produce a table employee_package for example which contains a primary key composed of the primary key of each table
+------+------------+
| eNum | package_id |
+------+------------+
| E246 | 1 |
| E246 | 2 |
| E246 | 3 |
| E892 | 1 |
+------+------------+
then your request will be something like :
SELECT DISTINCT e.eNum from employees e JOIN employee_package ep on ep.eNum = e.eNum
WHERE ep.package_id = 1 OR ep.package_id = 2
-- let's say that id 1 is for MySQL and id 2 is for Python
I'm thinking to use Redis to cache some user data snapshot(s) in order to speed up the access to that data (one of the reasons is because my MySQL table(s) suffer of lock contention) and I'm looking for the best way to import in one step a table like this(which may contain from a few record to millions of records):
mysql> select * from mytable where snapshot = 1133;
+------+--------------------------+----------------+-------------------+-----------+-----------+
| id | email | name | surname | operation | snapshot |
+------+--------------------------+----------------+-------------------+-----------+-----------+
| 2989 | example-2989#example.com | fake-name-2989 | fake-surname-2989 | 2 | 1133 |
| 2990 | example-2990#example.com | fake-name-2990 | fake-surname-2990 | 10 | 1133 |
| 2992 | example-2992#example.com | fake-name-2992 | fake-surname-2992 | 5 | 1133 |
| 2993 | example-2993#example.com | fake-name-2993 | fake-surname-2993 | 5 | 1133 |
| 2994 | example-2994#example.com | fake-name-2994 | fake-surname-2994 | 9 | 1133 |
| 2995 | example-2995#example.com | fake-name-2995 | fake-surname-2995 | 7 | 1133 |
| 2996 | example-2996#example.com | fake-name-2996 | fake-surname-2996 | 1 | 1133 |
+------+--------------------------+----------------+-------------------+-----------+-----------+
into the Redis key-value store.
I can have many "snapshots" to load into Redis, and the basic access pattern is (SQL like syntax)
select * from mytable where snapshot = ? and id = ?
these snapshots can also coming from others table, so the "global unique ID per snapshot" is the column snapshot, ex:
mysql> select * from my_other_table where snapshot = 1134;
+------+--------------------------+----------------+-------------------+-----------+-----------+
| id | email | name | surname | operation | snapshot |
+------+--------------------------+----------------+-------------------+-----------+-----------+
| 2989 | example-2989#example.com | fake-name-2989 | fake-surname-2989 | 1 | 1134 |
| 2990 | example-2990#example.com | fake-name-2990 | fake-surname-2990 | 8 | 1134 |
| 2552 | example-2552#example.com | fake-name-2552 | fake-surname-2552 | 5 | 1134 |
+------+--------------------------+----------------+-------------------+-----------+-----------+
The loaded snapshot into redis never change, they are available only for a week via TTL
There is a way to load in one step this kind of data(rows and columns) into redis combining redis-cli --pipe and HMSET?
What is the best model to use in redis in order to store/get this data (thinking at the access pattern)?
I have found the redis-cli --pipe Redis Mass Insertion (and also MySQL to Redis in One Step) but I can't figure out the best way to achieve my requirements (load from mysql in one step all rows/colums, best redis model for this) using HMSET
Thanks in advance
Cristian.
Model
To be able to query your data from Redis the same way as:
select * from mytable where snapshot = ?
select * from mytable where id = ?
You'll need the model below.
Note: select * from mytable where snapshot = ? and id = ? does not make a lot of sense here, since it's the same as select * from mytable where id = ?.
Key type and naming
[Key Type] [Key name pattern]
HASH d:{id}
ZSET d:ByInsertionDate
SET d:BySnapshot:{id}
Note: I used d: as a namespace but you may want to rename it with the name of your domain model.
Data insertion
Insert a new line from Mysql into Redis:
hmset d:2989 id 2989 email example-2989#example.com name fake-name-2989 ... snapshot 1134
zadd d:ByInsertionDate {current_timestamp} d:2989
sadd d:BySnapshot:1134 d:2989
Another example:
hmset d:2990 id 2990 email example-2990#example.com name fake-name-2990 ... snapshot 1134
zadd d:ByInsertionDate {current_timestamp} d:2990
sadd d:BySnapshot:1134 d:2990
Cron
Here is the algorithm that must be run each day or week depending on your requirements:
for key_name in redis(ZREVRANGEBYSCORE d:ByInsertionDate -inf {timestamp_one_week_ago})
// retrieve the snapshot id from d:{id}
val snapshot_id = redis(hget {key_name} snapshot)
// remove the hash (d:{id})
redis(del key_name)
// remove the hash entry from the set
redis(srem d:BySnapshot:{snapshot_id} {key_name})
// clean the zset from expired keys
redis(zremrangebyscore d:ByInsertionDate -inf {timestamp_one_week_ago})
Usage
select * from my_other_table where snapshot = 1134; will be either:
{snapshot_id} = 1134
for key_name in redis(smembers d:BySnapshot:{snapshot_id})
print(redis(hgetall {keyname}))
or write a lua script to do this directly on redis side. Finally:
select * from my_other_table where id = 2989; will be:
{id} = 2989
print(redis(hgetall d:{id}))
Import
This part is quite easy, just read the table and follow the above model. Depending on your requirements you may want to import all (or a part of) your data with an hourly/daily/weekly cron.