MySQL structure for 2D array lookup? - mysql

sIm trying to create a mysql database for looking up in a big dataset. I have a set of data (n number) samples and a list of genes in my data collection and I would like to able to lookup a value based on pairing one sample with one gene. In the the table below I have the data for genes, samples and value, and I would like to know the best way sort set this up on mysql for fast lookup.
In my data I have about 35.000 different genes and an aomunt of samples that can vmay vary between 2000-15000. (I dont have a specific number, because I have finished my data yet.)
|------------------------------------------------------|
| | sample1 | sample2 |
|------------------|----------------|------------------|
| gene1 | value1 | value2 |
|------------------|----------------|------------------|
| gene2 | value3 | value4 |
|------------------|----------------|------------------|
What is the best way to put this data up on a mysql database? The "easiest" way that I thought of right away is just throwing it in the database like in the following output, which seems like it would be a big disaster. How can I approach this correctly?
|----------------------------------------- ------------|
| gene1 | sample1 | value1 |
|------------------|----------------|------------------|
| gene1 | sample2 | value2 |
|------------------|----------------|------------------|
| gene2 | sample1 | value3 |
|------------------|----------------|------------------|
| gene2 | sample2 | value4 |
|------------------|----------------|------------------|

Related

JSON extract multiple columns PostgreSQL

I had a question earlier: PostgreSQL trim text field with regex (or else) And I got a wonderful answer by a_horse_with_no_name. Now I have an additional question regarding this issue.
So here it is this rextester https://rextester.com/SUWG96428 and the goal is to have all the ids in a separate column. Is it possible at all?
Like this:
+---+----+-------+-------+
| | id | ids_1 | ids_2 |
+---+----+-------+-------+
| 1 | 1 | 4202 | 4203 |
| 2 | 2 | 4204 | |
| 3 | 3 | 4201 | |
+---+----+-------+-------+
Yep, you can modify your query like:
select
t.id,
right(((the_column::json->'itemID')->>0)::varchar, 4) as col1,
right(((the_column::json->'itemID')->>1)::varchar, 4) as col2,
right(((the_column::json->'itemID')->>2)::varchar, 4) as col3
from the_table t;
DB Fiddle

getting the new row id from pySpark SQL write to remote mysql db (JDBC)

I am using pyspark-sql to create rows in a remote mysql db, using JDBC.
I have two tables, parent_table(id, value) and child_table(id, value, parent_id), so each row of parent_id may have as many rows in child_id associated to it as needed.
Now I want to create some new data and insert it into the database. I'm using the code guidelines here for the write opperation, but I would like to be able to do something like:
parentDf = sc.parallelize([5, 6, 7]).toDF(('value',))
parentWithIdDf = parentDf.write.mode('append') \
.format("jdbc") \
.option("url", "jdbc:mysql://" + host_name + "/"
+ db_name).option("dbtable", table_name) \
.option("user", user_name).option("password", password_str) \
.save()
# The assignment at the previous line is wrong, as pyspark.sql.DataFrameWriter#save doesn't return anything.
I would like a way for the last line of code above to return a DataFrame with the new row ids for each row so I can do
childDf = parentWithIdDf.flatMap(lambda x: [[8, x[0]], [9, x[0]]])
childDf.write.mode('append')...
meaning that at the end I would have in my remote databasde
parent_table
____________
| id | value |
____________
| 1 | 5 |
| 2 | 6 |
| 3 | 7 |
____________
child_table
________________________
| id | value | parent_id |
________________________
| 1 | 8 | 1 |
| 2 | 9 | 1 |
| 3 | 8 | 2 |
| 4 | 9 | 2 |
| 5 | 8 | 3 |
| 6 | 9 | 3 |
________________________
As I've written in the first code snippet above, pyspark.sql.DataFrameWriter#save doesn't return anything, looking at its documentation, so how can I achieve this?
Am I doing something completely wrong? It looks like there is no way to get data back from a Spark's action (which save is) while I would like to use this action as a transformation, shich leads me to think I may be thinking of all this in the wrong way.
A simple answer is to to use the timestamp + auto increment number to create a unique ID. This only works if there is only one server is running at an instance of time.
:)

MySQL flexible conversion to numeric data

I have a MySQL database which has several categorical columns. In searching the database, having a conversion of categorical data and data in multiple columns to one numeric variable I could use for sorting would be nice.
Ideally this conversion would be a function and not just another data table since the mapping itself may change. It could probably be as simple the following code, but I’m not sure what the best way to do something like this in SQL would be. Thanks in advance.
a = 0
if b==“val1” {
a += 1
}
if c==2 { a += 2 }
if c==1 { a += 1 }
return a
Where a is the numeric column and b and c are values I’m mapping to a. Same example in table form with everything joined if these columns are in different tables.
+---+------+------+
| a | b | c |
+---+------+------+
| 0 | 3 | xxxx |
| 1 | 1 | xxxx |
| 2 | 2 | xxxx |
| 1 | 3 | val1 |
| 3 | 2 | val1 |
| 2 | 1 | val1 |
+---+------+------+

Cross table with multiselect

I have a table with 2 Columns, filled with strings
CREATE TABLE [tbl_text]
(
[directoryName] nvarchar(200),
[text1] nvarchar(200),
[text2] nvarchar(200)
)
The Strings are build like the following
| Text1 | Text2 |
|------------|----------|
|tz1 tz3 tz2 | al1 al2 |
| tz1 tz3 | al1 al3 |
| tz2 | al3 |
| tz3 tz2 | al1 al2 |
Now i want to Count how many times the TestN or TextN are resulting in the
| Text1 | al1 | al2 | al3 |
|-------|------|------|------|
| tz1 | 2 | 1 | 1 |
| tz2 | 2 | 2 | 1 |
| tz3 | 3 | 2 | 1 |
i tried solving it with an sql-query like this:
TRANSFORM Count(tt.directoryName) AS Value
SELECT tt.Text1
FROM tbl_text as tt
GROUP BY tt.Text1
PIVOT tt.Text2;
This works fine if i got fields only with one value like the third column (the complete datasource has to be like a one-value-style)
But in my case i'm using the strings for a multiselect...
If i try to conform this query onto a datasource filled with the " " between the values the result is complete messed up
Any suggestions how the query should look like to get this result ?
You'll have to split the strings inside Text1/Text2 before you can do anything with them. In VBA, you'd loop a recordset, use the Split() function and insert the results into a temp table.
In Sql Server there are more powerful options available.
Coming from here: Split function equivalent in T-SQL? ,
you should read this page:
http://www.sommarskog.se/arrays-in-sql-2005.html#tablelists

MySQL recursive update based on values in the same table

I am having trouble implementing the following structure in MySQL.
Table1:
ID | Val
1 | 10
2 | 20
Table2:
ID | LeftTableType | LeftID | LeftVal | RightTableType | RightID | RightVal | Operation | Result
1 | Table1 | 1 | (10) | Table1 | 2 | (20) | + | (30)
2 | Table2 | 1 | (30) | Table1 | 2 | (20) | + | (50)
I tried to use a trigger system where an update to Table1 would update the values of Table2. Unfortunately, I needed to then update subsequent values of Table2, which caused a recursive trigger system that MySQL did not like.
I have also been looking into nested sets and tree structures. It seems like they might be what I am looking for, or at least very close.
Is there something obvious that I am missing to implement something like this? This seems like it might lead me to a messy mixture of cursors, recursion, triggers, procedures, and tree structures.
Any hints would be greatly appreciated!