MySQL flexible conversion to numeric data - mysql

I have a MySQL database which has several categorical columns. In searching the database, having a conversion of categorical data and data in multiple columns to one numeric variable I could use for sorting would be nice.
Ideally this conversion would be a function and not just another data table since the mapping itself may change. It could probably be as simple the following code, but I’m not sure what the best way to do something like this in SQL would be. Thanks in advance.
a = 0
if b==“val1” {
a += 1
}
if c==2 { a += 2 }
if c==1 { a += 1 }
return a
Where a is the numeric column and b and c are values I’m mapping to a. Same example in table form with everything joined if these columns are in different tables.
+---+------+------+
| a | b | c |
+---+------+------+
| 0 | 3 | xxxx |
| 1 | 1 | xxxx |
| 2 | 2 | xxxx |
| 1 | 3 | val1 |
| 3 | 2 | val1 |
| 2 | 1 | val1 |
+---+------+------+

Related

SQL query to find set of doc_ids where there is maximum intersection of ent_ids

I have a table with O(1M) rows with columns doc_id and ent_id where (doc_id, ent_id) is the primary key.
+--------+--------+
| doc_id | ent_id |
+--------+--------+
| 1 | a |
| 1 | b |
| 1 | x |
| 1 | y |
| 2 | a |
| 3 | a |
| 3 | x |
| 3 | y |
| 4 | x |
| 4 | y |
+--------+--------+
My question is, How do I efficiently find a set of doc_ids ( say I need top 1000 or 5000 doc_ids) where there is maximum intersection of ent_ids among that selected set of doc_ids?
For example : In the above table,
say I need top 2 doc_ids where there is maximum intersection among their ent_ids.The result would be - doc_ids = {1,3} with [ common ent_ids={a,x,y}, common ent_ids count=3 ]
say I need top 3 doc_ids where there is maximum intersection among their ent_ids. The result would be - doc_ids = {1,3,4} with [ common ent_ids={x,y}, common ent_ids count=2 ]
footnote - If it's not possible do it efficiently with SQL, any direction towards alternative method of doing it in application code would also be helpful. say, convert to csv -> some data-structure[inverted index?]/library + python code -> result set.

MySQL Substring between two DIFFERENT strings where the second needle comes AFTER the first

I have to extract certain data from a MySQL column. The table looks like so:
+----+---------------------+------------------------+
| id | time | data |
+----+---------------------+------------------------+
| 1 | 2016-10-28 00:12:01 | a Q1!! AF3 !! ext!! z |
| 2 | 2016-10-28 02:19:02 | z !!3F2 !AF66-2!! !!a |
| 3 | 2016-10-28 11:35:03 | AF!a !!! pl6 f !!! dd |
+----+---------------------+------------------------+
I want to grab the string from column data between the characters AF and the NEXT occurrence of !! So ideally the query SELECTid,[something] AS x FROM tbl would result in:
+----+------+
| id | x |
+----+------+
| 1 | 3 |
| 2 | 66-2 |
| 3 | !a |
+----+------+
Thoughts on how to do this? All the other questions I see don't quite relate, as they don't deal with finding the first occurrence of the second needle (!!) AFTER the first needle (AF).
There may be faster ways to do this but this is a good start:
select substring_index(substring_index(data, 'AF', -1), '!!', 1)

mysql how to split a column into multiple columns with ambiguous number

Hello i want to split a resulting column in multiple columns just like on the link. But number of columns are not specific ;
Example
COL1 | OTHER COLUMNS
----------------------------------------
this,will,split | some value
also,this | some value
this,is,four,columns | some value
I want make this something like that ;
COL1 | COL2 | COL3 | COL4 | OTHER
----------------------------------------
this | will | split| NULL | some value
also | this | NULL | NULL | some value
this | is | four | columns| some value
edit
it looks like similar that question but not:
Can you split/explode a field in a MySQL query?
I want results in 1 row, I dont want something like that;
RESULT
-----
this
will
split
...
on that question you can see there is specific number of cols. bu i dont. :(
How to split a resulting column in multiple columns
I think you can create one relational table and add multiple entry in relational table, hear you don't need to think about column, you have to add entry in row.
eg.
Table 1:
ID | COL1 | OTHER COLUMNS
----------------------------------------
1 |this,will,split | some value
2 |also,this | some value
3 |this,is,four,columns | some value
Table2
ID | Table1_id | value
-------------------------
1 | 1 | this
2 | 1 | will
3 | 1 | split
4 | 2 | also
5 | 2 | this
6 | 3 | this
6 | 3 | is
6 | 3 | four
6 | 3 | columns
Please check this, i think fix your problem.

export mysql table with data spread over multiple rows to csv

I have a mysql table which is filled with inputs from a webform on my website. The form has fields for last name, surname, email, phone, address, etc.... and when a user submits the form these data are stored in a mysql table in a rather strange way.
my table looks like this:
subission# | value | field | tstamp | and |many |more |columns
=====================================================================================
1 |john#server.com |email |1448898875 | | | |
1 |john |firstname|1448898875 | | | |
1 |doe |lastname |1448898875 | | | |
1 |london |city |1448898875 | | | |
2 |jane#aol.com |email |1448898870 | | | |
2 |jane |firstname|1448898870 | | | |
2 |doe |lastname |1448898870 | | | |
2 |new york |city |1448898870 | | | |
3 |tim #aol.com |email |1448838571 | | | |
3 |tim |firstname|1448838571 | | | |
3 |smith |lastname |1448838571 | | | |
3 |paris |city |1448838571 | | | |
I need to export these data to a csv file in order to import it to a newsletter script on some other server, but the server expects these data in a different format:
submission#,email,firstname,lastname,tstamp,.....
1,john#server.com,john,doe,london,1448898875,,,,
2,jane#aol.com,jane,doe,1448898870,,,,
The export as csv is not the problem, but how do I get all the data of one submission# into one row? Can anyone please point me into the right direction, how to accomplish this with SQL?
You can achieve the desired output, if you concatenate the field contents into a single field using concat() and group_concat() functions, where the values are separated by comma.
The only issue can be if for a particular submission any of the properies is missing. If that's the case, then you will need a helper table which lists all properies and you need to left join on that table. Since this is not the case for your sample data, I'm not providing the code for this scenario.
select concat(submission, ',', group_concat(`value` order by `field` asc), ',',tstamp)
from table group by submission, tstamp
If you need the field names in the 1st row, then create a separate query that conatenates the field names separated by commas and combine the 2 with union.

mysql: how to split list field

I have a table which only contains id and a field whose data is a list of data. e.g.
--------------
| id | data |
| 1 | a,b,c,d|
| 2 | a,b,k,m|
---------------
I guess it's not a good design that put a list data in a field, so I want to know how can I redesign it?
As per me you need two tables i.e. Master and Transaction tables only when some details are gonna be same for every records and some are gonna be changing. In your case if there are not any other thing related to your id field is gonna be same you can carry on with one table and with following structure.
--------------
| id | data |
| 1 | a |
| 1 | b |
| 1 | c |
| 1 | d |
| 2 | a |
| 2 | b |
| 2 | k |
| 2 | m |
---------------
BUT if there are any other things related to the id fields that is gonna be same for same id records you will have to use two tables.
like following case. there are 3 fields id, name and data.
and you current table looks something like
--------------------------
| id | name | data |
| 1 | testname | a,b,c,d|
| 2 | remy | a,b,c,d|
--------------------------
your new table structure should look like.
table 1 Master
-----------------
| id | name |
| 1 | testname |
| 2 | remy |
-----------------
Table 2 Transaction
--------------
| id | data |
| 1 | a |
| 1 | b |
| 1 | c |
| 1 | d |
| 2 | a |
| 2 | b |
| 2 | k |
| 2 | m |
---------------
For better database management we might need to normalize the data.
Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships. You can find more on below links
3 Normal Forms Database Tutorial
Database normalization
If you have only those two fields in your table then you should have only 1 table as below
id | data
with composite primary key as PRIMARY KEY(id,data) so that there won't be any duplicate data for the respective ID.
The data would be like this
id | data
1 | a
1 | b
1 | c
1 | d
2 | a
2 | b
2 | k
2 | m
You will need another table which can be of the ONE to MANY type.
For e.g. you could have another table datamapping which would have data and ID column where the ID column is a FOREIGN KEY to the ID column of the data table.
So according to your example there would be 4 entries for ID = 1 in the datamapping table.
You will need two tables with a foreign key.
Table 1
id
Table 2
id
datavalue
So the data looks like:
Table 1:
id
1
2
3
Table 2:
id | data
1 | a
1 | b
1 | c
1 | d
2 | a
2 | b
2 | k
2 | m
You are correct, this this is not a good database design. The data field violates the principle of atomicity and therefore the 1NF, which can lead to problems in maintaining and querying the data.
To normalize your design, split the original table in two. There are 2 basic strategies to do it: using non-identifying and using identifying relationship.
NOTE: If you only have id in the parent table, and no other FKs on it, and parent cannot exist without at least one child (i.e. data could not have been empty in the original design), you can dispense with the parent table altogether.