mySQL convert integer to text in SELECT query - mysql

I want to convert an integer to text in a mySQL select query. Here's what a table looks like:
Languages
--------
1,2,3
I want to convert each integer to a language (e.g., 1 => English, 2 => French, etc.)
I've been reading up on CONVERT and CAST functions in mySQL, but they mostly seem to focus on converting various data types to integers. And also I couldn't find anything that dealt with the specific way I'm storing the data (multiple numbers in one field).
How can I convert the integers to text in a mySQL query?
UPDATE
Here's my mySQL query:
SELECT u.id, ulp.userid, ulp.languages, ll.id, ll.language_detail
FROM users AS u
JOIN user_language_profile AS ulp ON (ulp.userid = u.id)
JOIN language_detail AS ll ON (ulp.languages = ll.id)

Use either:
MySQL's ELT() funtion:
SELECT
ELT(Languages
, 'English' -- 1
, 'French' -- 2
-- etc.
)
FROM table_name
A CASE expression:
SELECT
CASE Languages
WHEN 1 THEN 'English'
WHEN 2 THEN 'French'
-- etc.
END
FROM table_name
Although, if possible I would be tempted to either JOIN with a lookup table (as #Mr.TAMER says) or change the data type of the column to ENUM('English','French',...).
UPDATE
From your comments, it now seems that each field contains a set (perhaps even using the SET data type?) of languages and you want to replace the numeric values with strings?
First, read Bill Karwin's excellent answer to "Is storing a delimited list in a database column really that bad?".
In this case, I suggest you normalise your database a tad: create a new language-entity table wherein each record associates the PK of the entities in the existing table with a single language. Then you can use a SELECT query (joining on that new table) with GROUP_CONCAT aggregation to obtain the desired list of language names.
Without such normalisation, your only option is to do string-based search & replace (which would not be particularly efficient); for example:
SELECT CONCAT_WS(',',
IF(FIND_IN_SET('1', Languages), 'English', NULL),
IF(FIND_IN_SET('2', Languages), 'French' , NULL),
-- etc.
)
FROM table_name

Why don't you make a number-language table and, when SELECTing, get the language associated with that number that you selected.
This is better in case you want to add a new language. You will only insert it into the table instead of changing all the queries in your code, and also easier if others are using your code (they won't be happy debugging and editing all the queries).

From your other comments, are you saying that the languages field is a literal string embedded with commas?
From an SQL perspective, that's a pretty unworkable design. A variable number of languages should be stored in another table.
However, if you're stuck with what you've got, you might be able to construct a regexp replacement algorithm, but it seems terribly fragile, and I wouldn't recommend it. If you've got more than 9 languages, the following will be broken, and you would need the Regexp UDF, which introduces a bunch of complexity.
Assuming the simple case:
SELECT REPLACE(
REPLACE(
REPLACE(Languages, '1', 'English'),
'2', 'French'),
N, DESCRIPTION)
and so on. But I repeat: this is an awful data design. If it's possible to fix it to something like:
person person_lang language
========== ============ =========
person_id -----< person_id
... lang_id >----- lang_id
lang_desc
Then I strongly suggest you do so.

Related

How to replace MySQL enum values for sorting

I have a mysql table with an enum field storing the state of elements, e.g.:
draft
inactive
published
These states get translated into the user's locale in the application, and since the translations can differ greatly, it is not possible to sort records by state in the Mysql query, since the order of the enum values will not match the order of the translated strings.
For example:
SELECT state FROM records ORDER BY state ASC
Would give the following results for english, german and french:
draft » Draft / Entwurf / Ébauche
inactive » Inactive / Inaktiv / Inactif
published » Published / Freigeschaltet / Publié
As Mysql sorts by the enum values, using this order in the application makes it seem like the sorting by state is jumbled.
Of course it is possible to do the sorting by state afterwards in the application using the translated strings, but it would remove a layer of complexity to be able to do this directly in the query - as well as improve application performance.
One solution I found would be to use a CASE statement in the query:
SELECT
CASE state
WHEN 'draft' THEN 'Entwurf'
WHEN 'inactive' THEN 'Inaktiv'
WHEN 'published' THEN 'Freigeschaltet'
END
FROM
records
ORDER BY
state ASC
Are there better/faster ways to sort an enum by custom strings translated in the application?
you can use ORDER BY FIELD(state, opt1, opt2, opt3....)
https://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_field
as its just a CSV list you should be able to use the application to pass the order you want.
Probably the quickest solution you can get would be to create a translation table along the lines of:
TABLE myEnum
lang VARCHAR(2)
keyValue VARCHAR(32)
localeValue VARCHAR(32)
Then fill the table with your translations of the enum, and use this table in a join in your SQL Query:
SELECT r.state FROM records r, myEnum e WHERE e.lang = 'de' and e.keyValue = r.state ORDER BY e.localeValue ASC
You'll probably not want to use those table or column names, of course.

find_in_set and regex in a select query

Is there any way to check for a regex expression in a comma separated values column?
I have a column named storeId with the following values EMP_0345,00345,OPS and I need to get only the storeid with no alpha numeric characters in it.
I am able to get the valid store_ids with the regex REGEXP '^[0-9]+$' but how do get the values in a comma separated values column?
You violated a dogma of database design: Never ever store more than one value in a single field, if you need to access them separatly
Even if you manage to REGEX your way around this, you will run into massive performance troubles. The correct way to tackle this ist to move the contents of the CSV column into rows of a join table, then simply match against the single values of that join table.
Your database design is flawed - given your current request.
Whether you should (try to) convert that column into several columns in the current (or a different) table, or rather into rows in a different table does primarily depend on whether or not there is some structure in that column's data.
With some inherent structure, you could use something like
SELECT
storeId
, SUBSTRING_INDEX(storeId, ',', 1) AS some_column
, SUBSTRING_INDEX(SUBSTRING_INDEX(storeId, ',', 2), ',', -1) AS store_id
, SUBSTRING_INDEX(storeId, ',', -1) AS another_column
FROM T
WHERE storeId REGEXP '^[^,]+,[0-9]+,[^,]+$'
;
to separate the values (and potentially populate newly added columns). The WHEREclause would allow to differentiate between sets of rows with specific arrangements of values in the column in question.
See in action / more detail: SQLFiddle.
Please comment if and as adjustment / further detail is required, or update your request to provide more detailed input.

MySQL WHERE, LIMIT and pagination

I have tables: documents, languages and document_languages. Documents exist in one or more languages and this relationship is mapped in document_languages.
Imagine now I want to display the documents and all of its languages on a page, and paginate my result set to show 10 records on each page. There will be a WHERE statement, specifying which languages should be retrieved (ex: en, fr, it).
Even though I only want to display 10 documents on the page (LIMIT 10), I have to return more than 10 records if a document has more than one language (which most do).
How can you combine the WHERE statement with the LIMIT in a single query to get the records I need?
Use sub query to filter only documents records
select * from
(select * from documents limit 0,10) as doc,
languages lan,
document_languages dl
where doc.docid = dl.docid
and lan.langid = dl.langid
Check sub query doc as well
http://dev.mysql.com/doc/refman/5.0/en/from-clause-subqueries.html
http://dev.mysql.com/doc/refman/5.0/en/subqueries.html
You can add a little counter to each row counting how many unique documents you're returning and then return just 10. You just specify what document_id to start with and then it returns the next coming 10.
SELECT document_id,
if (#storedDocumentId <> document_id,(#docNum:=#docNum+1),#docNum),
#storedDocumentId:=document_id
FROM document, document_languages,(SELECT #docNum:=0) AS document_count
where #docNum<10
and document_id>=1234
and document.id=document_languages.document_id
order by document_id;
I created these tables:
create table documents (iddocument int, name varchar(30));
create table languages (idlang char(2), lang_name varchar(30));
create table document_languages (iddocument int, idlang char(2));
Make a basic query using GROUP_CONCAT function to obtain the traspose of languages results:
select d.iddocument, group_concat(dl.idlang)
from documents d, document_languages dl
where d.iddocument = dl.iddocument
group by d.iddocument;
And finally set the number of the documents with LIMIT option:
select d.iddocument, group_concat(dl.idlang)
from documents d, document_languages dl
where d.iddocument = dl.iddocument
group by d.iddocument limit 10;
You can check more info about GROUP_CONCAT here: http://dev.mysql.com/doc/refman/5.0/es/group-by-functions.html
Hmmmm... so, if you post your query (SQL statement), it might be easier to spot the error. Your outermost LIMIT statement should "do the trick." As Rakesh said, you can use subqueries. However, depending on your data, you may (probably) just want to use simple JOINs (e.g. where a.id = b.id...).
This should be fairly straightforward in MySQL. In the unlikely case that you're doing something "fancy," you can always pull the datasets into variables to be parsed by an external language (e.g., Python). In the case that you're literally just trying to limit screen output (interactive session), check-out the "pager" command (I like "pager less;").
Lastly, check-out using the UNION statement. I hope that something, here, is useful. Good luck!

MySQL IN() clause multiple returns

I have a special data environment where I need to be returned data in a certain way to populate a table.
This is my current query:
SELECT
bs_id,
IF(bs_board = 0, 'All Boards', (SELECT b_name FROM certboards WHERE b_id IN (REPLACE(bs_board, ';', ',')))) AS board
FROM boardsubs
As you can see I have an if statement then a special subselect.
The reason I have this is that the field bs_board is a varchar field containing multiple row IDs like so:
1;2;6;17
So, the query like it is works fine, but it only returns the first matched b_name. I need it to return all matches. For instance in this was 1;2 it should return two boards Board 1 and Board 2 in the same column. Later I can deal with adding a <br> in between each result.
But the problem I am dealing with is that it has to come back in a single column both name, or all names since the field can contain as many as the original editor selected.
This will not work the way you're thinking it will work.
Let's say bs_board is '1;2;3'
In your query, REPLACE(bs_board, ';', ',') will resolve to '1,2,3', which is a single literal string. This makes your final subquery:
SELECT b_name FROM certboards WHERE b_id IN ('1,2,3')
which is equivalent to:
SELECT b_name FROM certboards WHERE b_id = '1,2,3'
The most correct solution to the problem is to normalize your database. Your current system or storing multiple values in a single field is exactly what you should never do with an RDBMS, and this is exactly why. The database is not designed to handle this kind of field. You should have a separate table with one row for each bs_board, and then JOIN the tables.
There are no good solutions to this problem. It's a fundamental schema design flaw. The easiest way around it is to fix it with application logic. First you run:
SELECT bs_id, bs_board FROM boardsubs
From there you parse the bs_board field in your application logic and build the actual query you want to run:
SELECT bs_id,
IF(bs_board = 0, 'All Boards', (SELECT b_name FROM certboards WHERE b_id IN (<InsertedStringHere>) AS board
FROM boardsubs
There are other ways around the problem, but you will have problems with sorting order, matching, and numerous other problems. The best solution is to add a table and move this multi-valued field to that table.
The b_id IN (REPLACE(bs_board, ';', ',')) will result in b_id IN ('1,2,6,7') which is different from b_id IN (1,2,6,7) which is what you are looking for.
To make it work either parse the string before doing the query, or use prepared statements.

SQL - Comparing text(combinations) on 100million table

I have a problem.
I have a table that has around 80-100million records in it. In that table I have a field, that has stored from 3 up to 16 different "combinations"(varchar). Combination is a 4-digit number, a colon and a char(A-E), . For example:
'0001:A/0002:A/0005:C/9999:E'. In this case there are 4 different combinations (they can go up to 16). This field is in every row of the table, never a null.
Now the problem: I have to go through the table, find every row, and see if they are similar.
Example rows:
0001:A/0002:A/0003:C/0005:A/0684:A/0699:A/0701:A/0707:A/0709:A/0710:D/0711:C/0712:A/0713:A
0001:A/0002:A/0003:C
0001:A/0002:A/0003:A/0006:C
0701:A/0709:A/0711:C/0712:A/0713:A
As you can see, each of these rows is similar to the others (in some way). The thing that needs to be done here is when you send '0001:A/0002:A/0003:C' via program(or parameter in SQL), that it checks every row and see if they have the same "group". Now the catch here is that it has to go both ways and it has to be done "quick", and the SQL needs to compare them somehow.
So when you send '0001:A/0002:A/0003:C/0005:A/0684:A/0699:A/0701:A/0707:A/0709:A/0710:D/0711:C/0712:A/0713:A' it has to find all fields where there are 3-16 same combinations and return the rows. This 3-16 can be specified via parameter, but the problem is that you would need to find all possible combinations, because you can send '0002:A:/0711:C/0713:A', and as you can see you can send 0002:A as the first parameter.
But you cannot have indexing because a combination can be on any place in a string, and you can send different combinations that are not "attached" (there could be a different combination in the middle).
So, sending '0001:A/0002:A/0003:C/0005:A/0684:A/0699:A/0701:A/0707:A/0709:A/0710:D/0711:C/0712:A/0713:A' has to return all fields that has the same 3-16 fields
and it has to go both ways, if you send "0001:A/0002:A/0003:C" it has to find the row above + similar rows(all that contain all the parameters).
Some things/options I tried:
Doing LIKE for all send combinations is not practical + too slow
Giving a field full-index search isn't an option(don't know why exactly)
One of the few things that could work would be making some "hash" type of encoding for fields, calculating it via program, and searching for all same "hashes"(Don't know how would you do that, given that the hash would generate different combinations for similar texts, maybe some hash that would be written exactly for that
Making a new field, calculating/writing(can be done on insert) all possible combinations and checking via SQL/program if they have the same % of combinations, but I don't know how you can store 10080 combinations(in case of 16) into a "varchar" effectively, or via some hash code + knowing then which of them are familiar.
There is another catch, this table is in usage almost 24/7, doing combinations to check if they are the same in SQL is too slow because the table is too big, it can be done via program or something, but I don't have any clue on how could you store this in a new row that you would know somehow that they are the same. It is a possibility that you would calculate combinations, storing them via some hash code or something on each row insert, calculating "hash" via program, and checking the table like:
SELECT * FROM TABLE WHERE ROW = "a346adsad"
where the parameter would be sent via program.
This script would need to be executed really fast, under 1 minute, because there could be new inserts into the table, that you would need to check.
The whole point of this would be to see if there are any similar combinations in SQL already and blocking any new combination that would be "similar" for inserting.
I have been dealing with that problem for 3 days now without any possible solution, the thing that was the closest is different type of insert/hash like, but I don't know how could that work.
Thank you in advance for any possible help, or if this is even possible!
it checks every row and see if they have the same "group".
IMHO if the group is a basic element of your data structure, your database structure is flawed: it should have each group in its own cell to be normalized. The structure you described makes it clear that you store a composite value in the field.
I'd tear up the table into 3:
one for the "header" information of the group sequences
one for the groups themselves
a connecting table between the two
Something along these lines:
CREATE TABLE GRP_SEQUENCE_HEADER (
ID BIGINT PRIMARY KEY,
DESCRIPTION TEXT
);
CREATE TABLE GRP (
ID BIGINT PRIMARY KEY,
GROUP_TXT CHAR(6)
);
CREATE TABLE GRP_GRP_SEQUENCE_HEADER (
GROUP_ID BIGINT,
GROUP_SEQUENCE_HEADER_ID BIGINT,
GROUP_SEQUENCE_HEADER_ORDER INT, /* For storing the order in the sequence */
PRIMARY KEY(GROUP_ID, GROUP_SEQUENCE_HEADER_ID)
);
(of course, add the foreign keys, and most importantly the indexes necessary)
Then you only have to break up the input into groups, and execute a simple query on a properly indexed table.
Also, you would probably save on the disk space too by not storing duplicates...
A sample query for finding the "similar" sequences' IDs:
SELECT ggsh.GROUP_SEQUENCE_HEADER_ID,COUNT(1)
FROM GRP_GRP_SEQUENCE_HEADER ggsh
JOIN GRP g ON ggsh.GROUP_ID=g.GROUP_ID
WHERE g.GROUP_TXT IN (<groups to check for from the sequence>)
GROUP BY gsh.ID
HAVING COUNT(1) BETWEEN 3 AND 16 --lower and upper boundaries
This returns all the header IDs that the current sequence is similar to.
EDIT
Rethinking it a bit more, you could even break up the group into the two parts, but as I seem to understand, you always have full groups to deal with, so it doesn't seem to be necessary.
EDIT2 Maybe if you want to speed the process up even more, I'd recommend to translate the sequences using bijection into numeric data. For example, evaluate the first 4 numbers to be an integer, shift it by 4 bits to the left (multiply by 16, but quicker), and add the hex value of the character in the last place.
Examples:
0001/A --> 1 as integer, A is 10, so 1*16+10 =26
...
0002/B --> 2 as integer, B is 11, so 2*16+11 =43
...
0343/D --> 343 as integer, D is 13, so 343*16+13 =5501
...
9999/E --> 9999 as integer, E is 14, so 9999*16+14 =159998 (max value, if I understood correctly)
Numerical values are handled more efficiently by the DB, so this should result in an even better performance - of course with the new structure.
So basically you want to execute a complex string manipulation on 80-100 million rows in less than a minute! Ha, ha, good one!
Oh wait, you're serious.
You cannot hope to do these searches on the fly. Read Joel Spolsky's piece on getting Back to Basics to understand why.
What you need to do is hive off those 80-100 million strings into their own table, broken up into those discrete tokens i.e. '0001:A/0002:A/0003:C' is broken up into three records (perhaps of two columns - you're a bit a vague about the relationship between the numeric and alphabetic components of th etokens). Those records can be indexed.
Then it is simply a matter of tokenizing the search strings and doing a select joining the search tokens to the new table. Not sure how well it will perform: that rather depends on how many distinct tokens you have.
As people have commented you would benefit immensely from normalizing your data, but can you not cheat and create a temp table with the key and exploding out your column on the "/", so you go from
KEY | "0001:A/0002:A/0003:A/0006:C"
KEY1| "0001:A/0002:A/0003:A"
to
KEY | 0001:A
KEY | 0002:A
KEY | 0003:A
KEY | 0006:C
KEY1| 0001:A
KEY1| 0002:A
KEY1| 0003:A
Which would allow you to develop a query something like the following (not tested):
SELECT
t1.key
, t2.key
, COUNT(t1.*)
FROM
temp_table t1
, temp_table t2
, ( SELECT t3.key, COUNT(*) AS cnt FROM temp_table t3 GROUP BY t3.key) t4
WHERE
t1.combination IN (
SELECT
t5.combination
FROM
temp_table t5
WHERE
t5.key = t2.key)
AND t1.key <> t2.key
HAVING
COUNT(t1.*) = t4.cnt
So return the two keys where key1 is a proper subset of key?
I guess I can recommend to build special "index".
It will be quite big but you will achieve superspeedy results.
Let's consider this task as searching a set of symbols.
There are design conditions.
The symbols are made by pattern "NNNN:X", where NNNN is number [0001-9999] and X is letter [A-E].
So we have 5 * 9999 = 49995 symbols in alphabet.
Maximum length of words with this alphabet is 16.
We can build for each word set of combinations of its symbols.
For example, the word "abcd" will have next combinations:
abcd
abc
ab
a
abd
acd
ac
ad
bcd
bc
b
bd
cd
с
d
As symbols are sorted in words we have only 2^N-1 combinations (15 for 4 symbols).
For 16-symbols word there are 2^16 - 1 = 65535 combinations.
So we make for an additional index-organized table like this one
create table spec_ndx(combination varchar2(100), original_value varchar2(100))
Performance will be excellent with price of overhead - in the worst case for each record in the original table there will be 65535 "index" records.
So for 100-million table we will get 6-trillion table.
But if we have short values size of "special index" reduces drastically.