how to handle non ascii characters in where clause - mysql

I am facing the problem with non-ascii character in where clause using with Oracle, MySQL, snowflake query.
SELECT * FROM TABLE WHERE col = 'Niño Pobre, Niño Rico';
This query returns no result.
Is there any solution to handle non-ascii character in where clause then please reply me.
Thanks.

Maurcin and user3278684 made comments about the Snowflake Data wharehouse.
In Snowflake when working with data with multiple languages, the COLLATION() function is very helpful.
https://docs.snowflake.net/manuals/sql-reference/functions/collate.html
https://docs.snowflake.net/manuals/sql-reference/functions/collation.html
Considerations and limitations, supported functions used to search are listed: https://docs.snowflake.net/manuals/sql-reference/collation.html#limited-support-for-collation-in-built-in-functions
So say for instance you have a table called feedback with two columns
| id | feedback_string |
| 1 | 'Niño Pobre, Niño Rico'|
SELECT collate(feedback_string) from feedback
WHERE feedback_string like '%Niño Pobre, Niño Rico%';
If you wanted to create a table to search for strings that are a specific language, you can create the same table above in Snowflake like this:
CREATE TABLE feedback (id NUMBER, feedback_string varchar(20) collate 'sp');
INSERT INTO collation1 (v) VALUES (1, 'Niño Pobre, Niño Rico');
then you can search with Like, but know that the search for for N will be close to ñ.

Related

MySQL Remove characters from column headers

All my column headers in a MySQL database are prefixed with a number, 1_X, 2_X, etc... which makes bringing the data into IDL impossible using just a basic select statement to bring in the entire table. I'm not sure but I see two possible ways:
1) Bring in the table with column name aliases. Can I use TRIM or SUBSTRING_INDEX to remove/replace the first two characters?
2) Create a routine that uses the information schema to to recursively go through and delete the first two characters of the column headers and create a new table with those headers and copy the data in.
If there weren't so many different tables (all with 1_X, 2_X, etc...) there'd be no problem manually selecting 1_X AS X but that's not feasible. It would be great to be able to use TRIM/SUBSTRING on column headers in the select statement.
Thanks.
It's not possible to use functions in a SQL statement to alter the identifier assigned to a column being returned. The SQL way of specifying the identifier for the column in a resultset is to use the expr AS alias approach.
Rather than trim off the leading digit characters, you could prepend the identifiers with another valid character. (Trimming off leading characters seems like it would potentially lead to another problem, duplicate and/or zero length column names.)
You could just use a SQL statement to generate the SELECT list for you.
(NOTE: the GROUP_CONCAT function is limited by some system/session variables: group_concat_max_len and max_allowed_packet, it's easy enough to adjust these higher, though changing global max_allowed_packet may require MySQL to be restarted.)
To get it back the SELECT list on all one line (assuming you won't overflow the GROUP_CONCAT limits) something like:
SELECT c.table_schema
, c.table_name
, GROUP_CONCAT(
CONCAT('t.`',c.column_name,'` AS `x',c.column_name,'`')
ORDER BY c.ordinal_position
) AS select_list_expr
FROM information_schema.columns c
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
GROUP BY c.table_schema, c.table_name
Or, you could even get back a whole SELECT statement, if you wrapped that GROUP_CONCAT expression (which produces the select list) in another CONCAT
Something like this:
SELECT CONCAT('SELECT '
, GROUP_CONCAT(
<select_list_expr>
)
, ' FROM `',c.table_schema,'`.`',c.table_name,'` t;'
) AS stmt
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
GROUP BY c.table_schema, c.table_name
You could use a more clever expression for <select_list_expr>, to check for leading "digit" characters, and assign an alias to just those columns that need it, and leave the other columns unchanged, though that again introduces the potential for returning duplicate column names.
That is, if you already have columns named '1_X' and 'x1_X' in the same table. But a carefully chosen leading character may avoid that problem...
The <select_list_expr> could be more clever by doing a conditional test for leading digit character, something like this:
SELECT CONCAT('SELECT '
, GROUP_CONCAT(
CASE
WHEN c.column_name REGEXP '^[[:digit:]]'
THEN CONCAT('t.`',c.column_name,'` AS `x',c.column_name,'`')
ELSE CONCAT('t.`',c.column_name,'`')
END
)
, ' FROM `',c.table_schema,'`.`',c.table_name,'` t;'
) AS stmt
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
GROUP BY c.table_schema, c.table_name
Again, there's a potential for generation "duplicate" column names with this approach. The conditional test "c.column_name REGEXP" could be extended to check for other "invalid" leading characters as well.
As a side note, at some point, someone thought it a "good idea" to name columns with leading digit characters. Just because something is allowed doesn't mean it's a good idea.
Then again, maybe all that rigamarole isn't necessary, and just wrapping the column names in backticks would be sufficient for your application.
I think you can follow option 2. However this will not be quick solution.
Another way around this could be,
Generate schema script for the tables you want to correct.
Open the script in notepad++ or any editor that supports find using regular expression.
Search and replace with [0-9]+_ expression and empty string for replacement.
Create the new tables using this script and copy data into them.
This may sound like a manual approach but you will do this once for all of your tables.
Look into a strategy of doing 2 selects, one for the column name, then one for the data with column alias. You might have to revert to some scripting language, like PHP, for help.
First, get the column names :
show columns from tbl_client;
+-------------------------------+-----------------------------------+------+-----+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------------+-----------------------------------+------+-----+---------------------+-----------------------------+
| 1_X | int(11) | NO | PRI | NULL | auto_increment |
Then, loop through the results and create a list of column alias
Then create your new select
SELECT 1_X as NEW_COLUMN_NAME_FOR_FIELD_1 FROM tbl_client;

Full JOIN MySQL Query is returning empty

So here is a MySQL Query:
SELECT TestSite . * , LoggedCarts . *
FROM TestSite, LoggedCarts
WHERE TestSite.email = 'LoggedCarts.Bill-Email'
LIMIT 0 , 30
It is returning an empty result set, when it should be returning four results based on the tables below.
First Table: LoggedCarts - Column: Bill-Email
casedilla#hotmail.com
crazyandy#theholeintheground.com
Second Table: TestSite - Column: email
samuel#lipsum.com
taco#flavoredkisses.com
honeybadger#dontcare.com
casedilla#hotmail.com
messingwith#sasquatch.com
The goal is to get a MySQL statement that returns the rows in Table: TestSite that don't match the rows in Table: LoggedCarts.
Note: I understand that the use of a hyphen in a column name requires special care when constructing a query, involving backticks to tell MySQL there are special characters. I would change the column names to match up, however the Table: LoggedCarts has data fed via post from a Yahoo Shopping Cart and without heavy preparation before insertion setting the name to anything but the key sent in the post data is daunting.
However, if it turns out rebuilding the data prior to insertion is easier than using a JOIN statement or for some reason using two columns with different names as the comparison columns just doesn't work, I will go through and rebuild the database and PHP code.
Single quotes indicate a string literal. You need to use backticks for identifiers. Also, each component of an identifier must be quoted individually.
SELECT TestSite . * , LoggedCarts . *
FROM TestSite, LoggedCarts
WHERE TestSite.email = LoggedCarts.`Bill-Email`
LIMIT 0 , 30
From the manual:
If any components of a multiple-part name require quoting, quote them individually rather than quoting the name as a whole. For example, write `my-table`.`my-column`, not `my-table.my-column`.
With a bit of research inspired by somne of the hints given, I found the solution I was looking for here: SELECT * WHERE NOT EXISTS
Does exactly what I need it to do, and as a bonus, I like the shorthand syntax that is used that allows you to put in an alias for the table name and use the alias throughout the statement.
SELECT *
FROM TestSite e
WHERE NOT EXISTS
(
SELECT null
FROM LoggedCarts d
WHERE d.`Bill-Email` = e.email
)

MySQL - Characters matching

How would I get MySQL to be more strict with character matching?
A quick example of what I mean, say I have a table with a single column `name`. In this column, I have two names: 'Jorge' and 'Jorgé" The only difference between these names is the ´ over the e. If I run the query SELECT * FROM table WHERE name = 'Jorge', it will return
+--------+
| name |
+--------+
| Jorge |
| Jorgé |
+--------+
and if I run the query SELECT * FROM table WHERE name = 'Jorgé', it returns the same result table. How would I set MySQL to be more strict in that so that it would not return both names?
Thanks ahead.
Quick Edit: I'm using the UTF-8 character encoding
If you want to make sure that no similar characters (like e and é) are considered the same, you should use the utf8_bin collation on that column. I assume that you're using utf8_general_ci now, which will consider some similar characters to be the same. utf8_bin only matches on the exact same characters.
#G-Nugget is correct, but since you are looking at Spanish stuff you might also be interested in the utf8_spanish_ci or utf8_spanish2_ci. They correspond to modern and traditional Spanish. "ñ" is considered a separate letter, and in traditional the "ch" and "ll" are also treated as separate letters.
More here: http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

Searching for multiple keywords using SQL Server stored procedure

I'm going to search my database (SQL Server 2008) using a stored procedure. My users can enter keyword(s) in a textbox (keywords can be separated using , for instance).
Currently I'm using something like this:
keyword like N"%'+#SearchQuery%'%"
(keyword is a nvarchar column in my table, and #SearchQuery is the input to my stored procedure)
It works fine but what if user types several keywords: apple,orange, banana
Should I limit number of my keywords? How should I write my stored procedure if I have more than one keyword? How should I pass my user input to the stored procedure? I should pass apple, orange, banana as a whole phrase and then I should parse them in my stored procedure, or I should separate my keywords and send 3 keywords? How can I query these 3 keywords? A for loop?
What are best practices for performing such queries?
thanks
Do the parsing of the keywords in your application. SQL is not the best place for string manipulation.
Send the keywords as a table valued parameter (ie : http://www.mssqltips.com/sqlservertip/2112/table-value-parameters-in-sql-server-2008-and-net-c/ ) then you aren't limited to a fixed number of keywords.
Add the wildcards to the parameter in the stored procedure
update #keywords set keyword = '%'+keyword+'%'
filter your results by joining your source data to this table
eg:
SELECT result
FROM source
INNER JOIN #keywords keywords
ON source.keyword LIKE keywords.keyword
It depends on:
* How big it's your database.
* How often users will search for something.
* How precise results users except.
LIKE is not performance daemon, especially starting with %.
Maybe you should try full search text?
If you would like stay with LIKE (it will works only for small tables) I would try something like:
Split intput by , character (insert them into table as podiluska suggested is a good idea).
Build query for each token and UNION all results. Or run it in loop for each token and insert results to temporary table.
If you need some precise results (i.e. only records matches all 3 words) you can select most matching results from temporary results built above.
You could use CTE to split the string of keywords in a temporary table and then use it as you like. The keyword list can even have numbers or any characters, like %$<> or what you want, just remember comma is the string separator
DECLARE #CommaSeparatorString VARCHAR(MAX),
#CommaSeparatorXML XML
DECLARE #handle INT
SELECT #CommaSeparatorString = 'apple,orange,banana'
SELECT #CommaSeparatorString = REPLACE(REPLACE(#CommaSeparatorString,'<','$^%'),'>','%^$')
SELECT #CommaSeparatorXML = CAST('<ROOT><i>' + REPLACE(#CommaSeparatorString, ',', '</i><i>') + '</i></ROOT>' AS XML)
SELECT REPLACE(REPLACE(c.value('.', 'VARCHAR(100)'),'$^%','<'),'%^$','>') AS ID
FROM (SELECT #CommaSeparatorXML AS CommaXML) a
CROSS APPLY CommaXML.nodes('//i') x(c)
Result:
ID
------
apple
orange
banana

Searching for a codes in a mysql database

I have a database that stores a large number of codes, these codes are used to validate submission of a form. When ever i run the following query i get zero rows back
SELECT * FROM `codes` WHERE `voucher` = 'JTBLYNQ9HA'
but when i run the following query it bring back the single row with the code in it.
SELECT * FROM `codes` WHERE `voucher` LIKE CONVERT( _utf8 '%JTBLYNQ9HA%' USING latin1 ) COLLATE latin1_swedish_ci LIMIT 0 , 30
What am i doing wrong which causes the first query to fail or should is it best practise to use the second query?
Thanks for the help
The two queries are not equivalent. The first one is looking for a code whose voucher is exactly "JTBLYNQ9HA", the second one is looking for a code whose voucher contains that string (for instance, "ABCDEFGJTBLYNQ9HAHIJKLM").
The character set conversion and COLLATE are almost certainly irrelevant.