Find all characters in a table column of MySQL database? - mysql

Is there any easy way to find out all characters used in a specific column of a table in MySQL?
For example, these records:
"title"
"DP&E"
"UI/O"
"B,B#M"
All the characters used in the "title" column would be: DPEUIOBM&/#,

I'm not aware of any means to do this easily using MySQL. The best you'll be able to do is to test each potential character one by one with exists statements. This will be very slow, too, since it'll lead to reading your whole table as many times as there are characters that are not present.
If you've the possibility, create a temporary table that aggregates your needed data into a huge text field, dump it, and populate a compatible table in PostgreSQL. This will allow you to extract the needed data using a query that looks like this:
select distinct regexp_split_to_table(yourfield, '') as letter
from yourtable;
It'll still be very slow, but at least you'll go through the data a single time.

Related

In MySQL, Is there a programmatically way to check data's integrity before altering encoding/collation of a column?

I have a table with one column whose encoding is cp1252 and collation is latin_swedish_ci, and I need to change it to utf8_general_ci.
I'd like to check if I'm not going to end up with weird characters in one of the rows due to the conversion.
This column stores domain names, and I'm unsure whether or not I have swedish characters in one of the rows.
I've been researching this but I haven't been able to find a way to check for data's integrity before changing the collection.
My best guess so far is to write a script to check if there's a column that doesn't contain any of the english alphabet characters, but I'm pretty sure that there's a better way to do this.
Any help would be great!
UPDATE
I've found multiple rows with garbage like this:
ÜZìp;ìê+ØeÞ{/e¼ðP;
Is there a way to ged rid of that junk without examining row per row?
The canonical way for this is to try it out:
Use SHOW CREATE TABLE to create an identically-structured testing table
Use INSERT INTO .. SELECT .. to populate the testing table with the primary key and relevant column(s) of the original
Try out conversion, noting necessary steps to fix problems
Rinse and repeat

mysql: change all columns of a table from varchar to decimal

I have a table with all columns of type varchar, but the data are actually numbers, i.e. decimal(6,2) would be more appropriate. How can I change all columns from varchar to decimal if there are a lot of columns.
Thanks a lot in advance for your help.
You can change an individual column to decimal by using ALTER TABLE tablename MODIFY COLUMN columnname DECIMAL(6,2); . Any strings that can be converted to numbers will be converted, and others will be changed to zero.
If you want to be certain of doing this non-destructively, you could instead do ALTER TABLE ADD COLUMN to add a decimal column, and then UPDATE tablename SET decimalcolumn = textcolumn , and then use a SELECT to check for any rows where textcolumn and decimalcolumn aren't equal (it does type conversion as part of the comparison, so "5" and 5.00 are equal, as you'd want).
I don't know of a way to automatically apply the same conversion to multiple columns at once, though you could do it in PHP or another programming language by selecting a row from the table, looping over the columns that are returned, and running MODIFY for each one. If there are only a few columns, it's probably easier to do it by hand.
MySQL's ALTER TABLE statement supports changing multiple columns at once. In fact, doing as many changes to a table's schema as you can in one statement is preferred and highly recommended. This is because MySQL copies the whole table to do a schema change, but only does the copy once per ALTER TABLE statement. This is an important time saver when modifying a very large table!
That said, you can rehearse your changes in a couple of ways.
Firstly, I would use a development database to test all this, not a production one. You can then use CREATE TABLE ... LIKE ... to create a structurally identical table and then use INSERT INTO ... SELECT * FROM ... to copy the data. Now you can experiment with ALTER TABLE ... MODIFY COLUMN ... DECIMAL(6,2). If you do this on one column and get the message 0 Warnings, then that column will convert without incident and you can test the next. If you do get warnings, then SHOW WARNINGS will show a number of them so you know what problem MySQL encountered.
Depending on how well you know the data, you can also do a number of different SELECTs to find and filter it to see how much of it might be out of range or unconvertable (e.g. blank or text instead of numbers).
Yes, this approach will take some time, but once you're happy with this, you can assemble all the MODIFY COLUMN clauses into the one statement and run it on the real table.

Partitioning of a large MySQL table that uses LIKE for search

I have a table with 80 millions of records. The structure of the table:
id - autoincrement,
code - alphanumeric code from 5 to 100 characters,
other fields.
The most used query is
SELECT * FROM table
WHERE code LIKE '%{user-defined-value}%'
The number of queries is growing as well as the recodrs count. Very soon I will have performance problems.
Is there any way to split the table in the parts? Or maybe some other ways to optimize the table?
The leading % in the search is the killer here. It negates the use of any index.
The only thing I can think of is to partition the table based on length of code.
For example, if the code that is entered is 10 characters long, then first search the table with 10 character codes, without the leading percent sign, then search the table with 11 character codes, with the leading percent sign, and then the table with 12 character codes, with the leading percent sign, and so on.
This saves you from searching through all of codes that are less than 10 characters long that will never match. Also, you are able to utilize an index for one of the searches (the first one).
This also will help keep the table sizes somewhat smaller.
You can use a UNION to perform all of the queries at once, though you'll probably want to create the query dynamically.
You should also take a look to see if FULLTEXT indexing might be a better solution.
Some thoughts:
You can split the table into multiple smaller tables based on a certain condition. For example, on ID perhaps or may be code or may be any other fields. It basically means that you keep a certain type of records in a table and split different types into different tables
Try MySQL Partitioning
If possible. purge older entries or you may at least think of moving them to another archive table
Instead of LIKE, consider using REGEXP for regular expression search
Rather than running SELECT *, try selecting only selective columns SELECT id, code, ...
I'm not sure if this query is somewhat related to a search within your application where a user inputted value is compared with the code column and results echoed to the user. But if yes, you can try to add options to the search query, like asking user if he wants an exact match or should start with match etc. This way you do not necessarily need to run a LIKE match everytime
This should have been the first point, but I assume you have the right indexes on the table
Try using more of the query cache. The best way to use it is to avoid frequent updates to the table because on each update the query cache is cleaned. So lesser the updates, more likely it is that MySQL caches the queries, which will then mean quicker results
Hope the above helps!

How to search for rows containing a substring?

If I store an HTML TEXTAREA in my ODBC database each time the user submits a form, what's the SELECT statement to retrieve 1) all rows which contain a given sub-string 2) all rows which don't (and is the search case sensitive?)
Edit: if LIKE "%SUBSTRING%" is going to be slow, would it be better to get everything & sort it out in PHP?
Well, you can always try WHERE textcolumn LIKE "%SUBSTRING%" - but this is guaranteed to be pretty slow, as your query can't do an index match because you are looking for characters on the left side.
It depends on the field type - a textarea usually won't be saved as VARCHAR, but rather as (a kind of) TEXT field, so you can use the MATCH AGAINST operator.
To get the columns that don't match, simply put a NOT in front of the like: WHERE textcolumn NOT LIKE "%SUBSTRING%".
Whether the search is case-sensitive or not depends on how you stock the data, especially what COLLATION you use. By default, the search will be case-insensitive.
Updated answer to reflect question update:
I say that doing a WHERE field LIKE "%value%" is slower than WHERE field LIKE "value%" if the column field has an index, but this is still considerably faster than getting all values and having your application filter. Both scenario's:
1/ If you do SELECT field FROM table WHERE field LIKE "%value%", MySQL will scan the entire table, and only send the fields containing "value".
2/ If you do SELECT field FROM table and then have your application (in your case PHP) filter only the rows with "value" in it, MySQL will also scan the entire table, but send all the fields to PHP, which then has to do additional work. This is much slower than case #1.
Solution: Please do use the WHERE clause, and use EXPLAIN to see the performance.
Info on MySQL's full text search. This is restricted to MyISAM tables, so may not be suitable if you wantto use a different table type.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Even if WHERE textcolumn LIKE "%SUBSTRING%" is going to be slow, I think it is probably better to let the Database handle it rather than have PHP handle it. If it is possible to restrict searches by some other criteria (date range, user, etc) then you may find the substring search is OK (ish).
If you are searching for whole words, you could pull out all the individual words into a separate table and use that to restrict the substring search. (So when searching for "my search string" you look for the the longest word "search" only do the substring search on records containing the word "search")
I simply use SELECT ColumnName1, ColumnName2,.....WHERE LOCATE(subtr, ColumnNameX)<>0
To get rows with ColumnNameX having the substring.
Replace <> with = to get rows NOT having the substring.

How do I search part of a column?

I have a mysql table containing 40 million records that is being populated by a process over which I have no control. Data is added only once every month. This table needs to be search-able by the Name column. But the name column contains the full name in the format 'Last First Middle'.
In the sphinx.conf, I have
sql_query = SELECT Id, OwnersName,
substring_index(substring_index(OwnersName,' ',2),' ',-1) as firstname,
substring_index(OwnersName,' ',2) as lastname
FROM table1
How do I use sphinx search to search by firstname and/or lastname? I would like to be able to search for 'Smith' in only the first name?
Per-row functions in SQL queries are always a bad idea for tables that may grow large. If you want to search on part of a column, it should be extracted out to its own column and indexed.
I would suggest, if you have power over the schema (as opposed to the population process), inserting new columns called OwnersFirstName and OwnersLastName along with an update/insert trigger which extracts the relevant information from OwnersName and populats the new columns appropriately.
This means the expense of figuring out the first name is only done when a row is changed, not every single time you run your query. That is the right time to do it.
Then your queries become blindingly fast. And, yes, this breaks 3NF, but most people don't realize that it's okay to do that for performance reasons, provided you understand the consequences. And, since the new columns are controlled by the triggers, the data duplication that would be cause for concern is "clean".
Most problems people have with databases is the speed of their queries. Wasting a bit of disk space to gain a large amount of performance improvement is usually okay.
If you have absolutely no power over even the schema, another possibility is to create your own database with the "correct" schema and populate it periodically from the real database. Then query yours. That may involve a fair bit of data transfer every month however so the first option is the better one, if allowed.
Judging by the other answers, I may have missed something... but to restrict a search in Sphinx to a specific field, make sure you're using the extended (or extended2) match mode, and then use the following query string: #firstname Smith.
You could use substring to get the parts of the field that you want to search in, but that will slow down the process. The query can not use any kind of index to do the comparison, so it has to touch each record in the table.
The best would be not to store several values in the same field, but put the name components in three separate fields. When you store more than one value in a fields it's almost always some problems accessing the data. I see this over and over in different forums...
This is an intractable problrm because fulll names can contains prefixes, suffixes, middle names and no middle names, composite first and last names with and without hyphens, etc. There is no reasonable way to do this with 100% reliability