insert and fetch strings and matrices to/from MySQL with Matlab - mysql

I need to store data in a database. I have installed and configured a MySQL database (and an SQLite database) in Matlab. However I cannot store and retrieve anything other than scalar numeric values.
% create an empty database called test_data base with MySQL workbench.
% connect to it in Matlab
conn=database('test_database','root','XXXXXX','Vendor','MySQL');
% create a table to store values
create_test_table=['CREATE TABLE test_table (testID NUMERIC PRIMARY KEY, test_string VARCHAR(255), test_vector BLOB, test_scalar NUMERIC)'];
curs=exec(conn,create_test_table)
Result is good so far (curs.Message is an empty string)
% create a new record
datainsert(conn,'test_table',{'testID','test_string','test_vector','test_scalar'},{1,'string1',[1,2],1})
% try to read out the new record
sqlquery='SELECT * FROM test_table8';
data_to_view=fetch(conn,sqlquery)
Result is bad:
data_to_view =
1 NaN NaN 1
From the documentation for "fetch" I would expect:
data_to_view =
1×4 table
testID test_string test_vector test_scalar
_____________ ___________ ______________ ________
1 'string1' 1x2 double 1
Until I learn how to read blobs I'd even be willing to accept:
data_to_view =
1×4 table
testID test_string test_vector test_scalar
_____________ ___________ ______________ ________
1 'string1' NaN 1
I get the same thing with an sqlite database. How can I store and then read out strings and blobs and why isn't the data returned in table format?

Matlab does not document that the default options for SQLite and MySQL database retrieval are to attempt to return everything as a numeric array. One only needs this line:
setdbprefs('DataReturnFormat','cellarray')
or
setdbprefs('DataReturnFormat','table')
in order to get results with differing datatypes. However! now my result is:
data_to_view =
1×4 cell array
{[2]} {'string1'} {11×1 int8} {[1]}
If instead I input:
datainsert(conn,'test_table',{'testID','test_string','test_vector','test_scalar'},{1,'string1',typecast([1,2],'int8'),1})
Then I get:
data_to_view =
1×4 cell array
{[2]} {'string1'} {16×1 int8} {[1]}
which I can convert like so:
typecast(data_to_view{3},'double')
ans =
1 2
Unfortunately this does not work for SQLite. I get:
data_to_view =
1×4 cell array
{[2]} {'string1'} {' �? #'} {[1]}
and I can't convert the third part correctly:
typecast(unicode2native(data_to_view{1,3}),'double')
ans =
0.0001 2.0000
So I still need to learn how to read an SQLite blob in Matlab but that is a different question.

Related

Extracting a list of dicts for a Pandas column

I have a list of dictionaries within a pandas column to designate landing pages for a particular keyword.
keyword | 07-31-2019 | landing_pages |
cloud api | 50 | [{'url' : 'www.example.com', 'date' : '07-31-2019'}, {'url' ... ]|
database | 14 | [{'url' : 'www.example.com/2', 'date' : '08-30-2019'} ... ]|
*There are actually many date columns, but I've only shown 1 for example.
My issue is that I already have columns for each date, so I want to extract the landing pages as a list and have that as a new column.
keyword | 07-31-2019 | landing_pages
cloud api | 50 | www.example.com, www.example.com/other
database | 14 | www.example.com/2, www.example.com/3
So far, I've tried using json_normalize, which gave me a new table of dates and landing pages. I've tried getting the values with list comprehension, but that gave me the wrong result as well. One way I can think of is to use loops to solve the problem, but I'm concerned that's not efficient. How can I do this efficiently?
Use generator with join for extract url values (if data are dictionaries):
df['landing_pages'] = df['landing_pages'].apply(lambda x: ', '.join(y['url'] for y in x))
print (df)
keyword 07-31-2019 landing_pages
0 cloud api 50 www.example.com
1 database 14 www.example.com/2
If not working because strings repr of dictionaries:
import ast
df['landing_pages'] = df['landing_pages']
.apply(lambda x: ', '.join(y['url'] for y in ast.literal_eval(x)))
EDIT: If want maximal url by recent dates create DataFrame with adding new keys by index values, then convert datetimes from strings and last use DataFrameGroupBy.idxmax for index of maximum datetimes, select by DataFrame.loc for rows with urls and last assign column url to original DataFrame:
L = [dict(x, **{'i':k}) for k, v in df['landing_pages'].items() for x in v]
df1 = pd.DataFrame(L)
df1['date'] = pd.to_datetime(df1['date'])
df['url by max date'] = df1.loc[df1.groupby('i')['date'].idxmax()].set_index('i')['url']

SQL query about to bind multiple elements to a specific one

I am going to convert my current PostgreSQL database into a MongoDB version. For example, I have a table to record tweets, and another table to record multiple hashtags used by a specific tweet. What I wanna do is to use SQL to get a table like below and then export it as a .csv file so that I could import it to MongoDB.
Example:
2018-04-02 18:12:32 This plane has no outlet for me to charge my p... [{'tag': 'GucciGarden', 'airline': 'American A...
The problem that I met is that I can get a .csv file contains json array like "[{'tag': 'GucciGarden', 'airline': 'American A...", but it is a String type! And when I import it into MongoDB. The quote will be kept, which makes sth wrong.
And here is my SQL code:
SELECT tweets.tweet_id,tweets.text,
(SELECT array_to_json(array_agg(row_to_json(d)))
from (
SELECT tags.tag
FROM
tags
WHERE tags.tweet_id=tweets.tweet_id
) d
) as Tags
from tweets
Here is the result that I import into MongoDB:
{
"_id" : ObjectId("5ac59c272221ade1185ec241"),
"tweet_id" : 9.80869021435351e+17.0,
"created_at" : "2018-04-02 18:06:13",
"text" : "RT #MiraSorvino: Brad Myles shares #Delta that awareness is working- 9,000 #humantrafficking cases identified by #polarisproject National H��",
"screen_name" : "MMexville",
"favorite_count" : 0.0,
"retweet_count" : 40.0,
"source" : "the public",
"tags" : "[{'tag': 'humantrafficking', 'airline': 'Delta Air Lines'}]"}
this is because [{'tag': is not a valid json - you should have used double quotes and cast to json, eg:
let's say smth like your sample:
t=# create table c (i int, t text, j text);
CREATE TABLE
t=# insert into c values(1,'text',$$[{'tag': 'GucciGarden'}]$$);
INSERT 0 1
t=# select * from c;
i | t | j
---+------+--------------------------
1 | text | [{'tag': 'GucciGarden'}]
(1 row)
so then smth like your qry:
t=# select to_json(c) from (select i,t,replace(j,$$'$$,'"')::json j from c) c;
to_json
-------------------------------------------------
{"i":1,"t":"text","j":[{"tag": "GucciGarden"}]}
(1 row)
of course you will have positive false replacements of single quotes, eg 'tag': 'Gucci's Garden' will break the query logic, so you will have to make a more sophisticated replacement. probably with regular expressions to be neater.

Dynamic SQL query to populate column with values in other columns

I'm trying to write a SQL query for a data quality report that presents data quality failed values from multiple columns into one column. Please see the below example
FACT TABLE
Ac_Nm INAmt Ast Rcs
123 100 5000 NA
456 200 -200 Yes
789 -300 1000 No
DESIRED OUTPUT (POPULATE VAL COLUMN)
Ac_Nm Is_Clm Val
123 RCS NA
456 Ast -200
789 InAmt -300
How do I write a SQL query to populate the Val column? I've got the rest of the data quality report query written.
In the above example I have a fact table where data quality issues have been identified in various columns (negative values, 'NA' values where there should be a Yes/No response, etc). I'd like to know how to write a dynamic SQL query that returns that failed value from the Fact Table depending on the account number and the column name. In the first row the desired output lists the account number(123) with the issue column name (RCS) containing the value at issue, and the Val column listing the value causing the issue (NA). I just need to know how to write a SQL query to populate the Val column depending on the account num and issue column.
You could do it using case statements, assuming only one column is going to have a "bad" value, as follows:
SELECT Ac_Nm,
CASE WHEN INAmt < 0 THEN 'INAmt'
WHEN Ast < 0 THEN 'Ast'
WHEN Rcs = 'N/A' THEN 'RCS'
ELSE NULL END AS Is_Clm,
CASE WHEN INAmt < 0 THEN CONVERT(INAmt, char)
WHEN Ast < 0 THEN CONVERT(Ast, char)
WHEN Rcs = 'N/A' THEN Rcs
ELSE NULL END AS Val
FROM fact_table;
Then to filter out the NULL values, wrap the query in a subquery and select from it. If you need a hand doing that, give me a shout.

MySQL: compare a mixed field containing letters and numbers

I have a field in the mysql database that contains data like the following:
Q16
Q32
L16
Q4
L32
L64
Q64
Q8
L1
L4
Q1
And so forth. What I'm trying to do is pull out, let's say, all the values that start with Q which is easy:
field_name LIKE 'Q%'
But then I want to filter let's say all the values that have a number higher than 32. As a result I'm supposed to get only 'Q64', however, I also get Q4, Q8 and so for as I'm comparing them as strings so only 3 and the respective digit are compared and the numbers are in general taken as single digits, not as integers.
As this makes perfect sense, I'm struggling to find a solution on how to perform this operation without pulling all the data out of the database, stripping out the Qs and parsing it all to integers.
I did play around with the CAST operator, however, it only works if the value is stored as string AND it contains only digits. The parsing fails if there's another character in there..
Extract the number from the string and cast it to a number with *1 or cast
select * from your_table
where substring(field_name, 1, 1) = 'Q'
and substring(field_name, 2) * 1 > 32

Perform MySQL select on unsorted digits

I am working on an application that requires me to validate if 3 randomly generated numbers match a 3 digit string that has been entered into a database from user input. I also need to preserve the exact order that the user enters the string, so sorting on input is not an option.
For example, the randomly generated digits may be 6 4 0, and in the database a string may show as '406'.
Is there an easy way this can be accomplished in a single query without enumerating the options or adding an extra column/view?
maybe you could try
create table y (z varchar(10));
insert into y values ('406');
insert into y values ('604');
insert into y values ('446');
insert into y values ('106');
insert into y values ('123');
and then
SELECT * from y where FIND_IN_SET(Substring('640',1,1),MAKE_SET(7,Substring(z,1,1),Substring(z,2,1),Substring(z,3,1))) and FIND_IN_SET(Substring('640',2,1),MAKE_SET(7,Substring(z,1,1),Substring(z,2,1),Substring(z,3,1))) and FIND_IN_SET(Substring('640',3,1),MAKE_SET(7,Substring(z,1,1),Substring(z,2,1),Substring(z,3,1)));
returns
406
604
Sum the three random digits
Something like
Select * From Triplets Where (Ascii(Substring(Number,0,1)) - 48) + (Ascii(substring(Number,1,1)) -48) +
(Ascii(substring(Number,2,1)) -48) = MySumOfNumber
easy is a state of mind isn't it, Storage requirement of an extra "CheckSum" int, versus the high cost of a query like this.