How can I get missing values recorded as NULL when importing from csv - csv

I have multiple, large, csv files, each of which has missing values in many places. When I import the csv file into SQLite, I would like to have the missing values recorded as NULL for the reason that another application expects missing data to be indicated by NULL. My current method does not produce the desired result.
An example CSV file (test.csv) is:
12|gamma|17|delta
67||19|zeta
96|eta||theta
98|iota|29|
The first line is complete; each of the other lines has (or is meant to show!) a single missing item. When I import using:
.headers on
.mode column
.nullvalue NULL
CREATE TABLE t (
id1 INTEGER PRIMARY KEY,
a1 TEXT,
n1 INTEGER,
a2 TEXT
);
.import test.csv t
SELECT
id1, typeof(id1),
a1, typeof(a1),
n1, typeof(n1),
a2, typeof(a2)
FROM t;
the result is
id1 typeof(id1) a1 typeof(a1) n1 typeof(n1) a2 typeof(a2)
---- ----------- ------ ---------- -- ---------- ------ ----------
12 integer gamma text 17 integer delta text
67 integer text 19 integer zeta text
96 integer eta text text theta text
98 integer iota text 29 integer text
so the missing values have become text. I would appreciate some guidance on how to ensure that all missing values become NULL.

sqlite3 imports values as text and there does not seem to be a way to make it treat empty values as nulls.
However, you can update the tables yourself after import, setting empty strings to nulls, like
UPDATE t SET a1=NULL WHERE a1='';
Repeat for each column.
You can also create a trigger for such updates:
CREATE TRIGGER trig_a1 AFTER INSERT ON t WHEN new.a1='' BEGIN
UPDATE t SET a1=NULL WHERE rowid=new.rowid;
END;

For the cases where you cannot update after import because the import will fail when the empty string (text columns) or 0 (integer columns) is inserted instead of NULL, see my answer to this other stackoverflow question

Related

insert and fetch strings and matrices to/from MySQL with Matlab

I need to store data in a database. I have installed and configured a MySQL database (and an SQLite database) in Matlab. However I cannot store and retrieve anything other than scalar numeric values.
% create an empty database called test_data base with MySQL workbench.
% connect to it in Matlab
conn=database('test_database','root','XXXXXX','Vendor','MySQL');
% create a table to store values
create_test_table=['CREATE TABLE test_table (testID NUMERIC PRIMARY KEY, test_string VARCHAR(255), test_vector BLOB, test_scalar NUMERIC)'];
curs=exec(conn,create_test_table)
Result is good so far (curs.Message is an empty string)
% create a new record
datainsert(conn,'test_table',{'testID','test_string','test_vector','test_scalar'},{1,'string1',[1,2],1})
% try to read out the new record
sqlquery='SELECT * FROM test_table8';
data_to_view=fetch(conn,sqlquery)
Result is bad:
data_to_view =
1 NaN NaN 1
From the documentation for "fetch" I would expect:
data_to_view =
1×4 table
testID test_string test_vector test_scalar
_____________ ___________ ______________ ________
1 'string1' 1x2 double 1
Until I learn how to read blobs I'd even be willing to accept:
data_to_view =
1×4 table
testID test_string test_vector test_scalar
_____________ ___________ ______________ ________
1 'string1' NaN 1
I get the same thing with an sqlite database. How can I store and then read out strings and blobs and why isn't the data returned in table format?
Matlab does not document that the default options for SQLite and MySQL database retrieval are to attempt to return everything as a numeric array. One only needs this line:
setdbprefs('DataReturnFormat','cellarray')
or
setdbprefs('DataReturnFormat','table')
in order to get results with differing datatypes. However! now my result is:
data_to_view =
1×4 cell array
{[2]} {'string1'} {11×1 int8} {[1]}
If instead I input:
datainsert(conn,'test_table',{'testID','test_string','test_vector','test_scalar'},{1,'string1',typecast([1,2],'int8'),1})
Then I get:
data_to_view =
1×4 cell array
{[2]} {'string1'} {16×1 int8} {[1]}
which I can convert like so:
typecast(data_to_view{3},'double')
ans =
1 2
Unfortunately this does not work for SQLite. I get:
data_to_view =
1×4 cell array
{[2]} {'string1'} {' �? #'} {[1]}
and I can't convert the third part correctly:
typecast(unicode2native(data_to_view{1,3}),'double')
ans =
0.0001 2.0000
So I still need to learn how to read an SQLite blob in Matlab but that is a different question.

mySQL column without a one-size-fits-all precision for DECIMAL

When I define a table to store decimal values I use a statement like this:
CREATE TABLE myTable (
myKey INT NOT NULL,
myValue DECIMAL(10,2) NOT NULL,
PRIMARY KEY (myKey)
);
However, this results in every myValue being stored with a one-size-fits-all precision of (10,2). For instance
45.6 becomes 45.60
21 becomes 21.00
17.008 becomes 17.01
But what if each record has a myValue of different precision? I need 45.6 to remain 45.6, 21 to remain 21, and 17.008 to remain 17.008. Otherwise the precision of measurement is being lost. There's a big difference between 21 and 21.00.
If you don't need to do greater/less-than compares, store as a VARCHAR(..)
The strings '21' and '21.00' would have identical values, but present different "precision".
When needing the numeric value, add zero (col + 0).
This does not allow for "negative precision", such as "1.2M" being represented as 1200000. If you need that, then Norbert's approach is probably better.
You can store with high precision and exact recall by following a different way of storing the data:
Create a table with two columns:
CREATE TABLE precise (value BIGINT, decimaldot INT);
Use code to determine where the dot is, for example in your 21 value: 2 (assuming 1 indexing). So stored the value would be:
INSERT INTO precise values (21,2);
Retrieved it would return 21 exact (parsing back the dot in the value 21 at position 2, is 21)
Value 17.008 would also have decimaldot at 2:
INSERT INTO precise values (17008,2);
Etc..
Larger values can be stored by using a VARCHAR(4000) instead of a biginteger, or by using blob fields.

select int column and compare it with Json array column

this is row in option column in table oc_cart
20,228,27,229
why no result found when value is 228 but result found when value is 20 like below :
select 1 from dual
where 228 in (select option as option from oc_cart)
and result found when I change value to 20 like
select 1 from dual
where 20 in (select option as option from oc_cart)
The option column data type is TEXT
In SQL, these two expressions are different:
WHERE 228 in ('20,228,27,229')
WHERE 228 in ('20','228','27','229')
The first example compares the integer 228 to a single string value, whose leading numeric characters can be converted to the integer 20. That's what happens. 228 is compared to 20, and fails.
The second example compares the integer 228 to a list of four values, each can be converted to different integers, and 228 matches the second integer 228.
Your subquery is returning a single string, not a list of values. If your oc_cart.option holds a single string, you can't use the IN( ) predicate in the way you're doing.
A workaround is this:
WHERE FIND_IN_SET(228, (SELECT option FROM oc_cart WHERE...))
But this is awkward. You really should not be storing strings of comma-separated numbers if you want to search for an individual number in the string. See my answer to Is storing a delimited list in a database column really that bad?

I need a trigger to create id's in my sql database with a string and some zeros

I'm currently using this trigger which adds id's with 3 zeros and two zeros and then the id from the sequences table.
BEGIN
INSERT INTO sequences VALUES (NULL);
SET NEW.deelnemernr = CONCAT('ztmr16', LPAD(LAST_INSERT_ID(), 3, '0'));
END
I changed the 3 to 4 but then it didn't increment the id anymore, resulting in and multiple id error. It stayed at ztmr16000. So what can I do to add more zeros and still get the id from the sequencestable?
The MySQL LPAD function limits the number of characters returned to the specified length.
The specification is a bit unclear, what you are trying to achieve.
If I need a fixed length string with leading zeros, my approach would be to prepend a boatload of zeros to my value, and then take the rightmost string, effectively lopping off extra zeros from the front.
To format a non-negative integer value val into a string that is ten characters in length, with the leading characters as zeros, I'd do something like this:
RIGHT(CONCAT('000000000',val),10)
As a demonstration:
SELECT RIGHT(CONCAT('000000000','123456789'),10) --> 0123456789
SELECT RIGHT(CONCAT('000000000','12345'),10) --> 0000012345
Also, I'd be cognizant of the maximum length allowed in the column I was populating, and be sure that the length of the value I was generating didn't exceed that, to avoid data truncation.
If the value being returned isn't be truncated when it's inserted into the column, then what I think the behavior you observe is due to the value returned from LAST_INSERT_ID() exceeding 1000.
Note that for a non-negative integer value val, the expression
LPAD(val,3,'0')
will allow at most 1000 distinct values. LPAD (as I noted earlier) restricts the length of the returned string. In this example, to three characters. As a demonstration of the behavior:
SELECT LPAD( 21,3,'0') --> 021
SELECT LPAD( 321,3,'0') --> 321
SELECT LPAD( 54321,3,'0') --> 543
SELECT LPAD( 54387,3,'0') --> 543
There's nothing illegal with doing that. But you're going to be in trouble if you depend on that to generate "unique" values.
FOLLOWUP
As stated, the specification ...
"adds id's with 3 zeros and two zeros and then the id from the sequences table."
is very unclear. What is it exactly that you want to achieve? Consider providing some examples. It doesn't seem like there's an issue concatenating something to those first five fixed characters. The issue seems to be with getting the id value "formatted" to your specification
This is just a guess of what you are trying to achieve:
id value formatted return
-------- ----------------
1 0001
9 0009
22 0022
99 0099
333 0333
4444 4444
55555 55555
666666 666666
You could achieve that with something like this:
BEGIN
DECLARE v_id BIGINT;
INSERT INTO sequences VALUES (NULL);
SELECT LAST_INSERT_ID() INTO v_id;
IF ( v_id <= 9999 ) THEN
SET NEW.deelnemernr = CONCAT('ztmr16',LPAD(v_id,4,'0'));
ELSE
SET NEW.deelnemernr = CONCAT('ztmr16',v_id);
END IF;
END

How to pickup date from long string Name column in oracle

I have table with column 'ID', 'File_Name'
Table
ID File_Name
123 ROSE1234_LLDAtIInstance_03012014_04292014_190038.zip
456 ROSE1234_LLDAtIInstance_08012014_04292014_190038.zip
All I need is to pickup the first date given in file name.
Required:
ID Date
123 03012014
456 08012014
Here's one method assuming 8 characters after 2nd _ is always true.
It finds the position of the first _ then looks for the position of the 2nd _ using the position of the first _+1 then it looks for the 8 characters after the 2nd _
SELECT Id
, substr(File_name, instr(File_name,'_',instr(File_name,'_')+1)+1,8) as Date
FROM Table
or
a more elegant way would be to use a RegExp_Instr Function which eliminates the need for nesting instr.
SELECT Id, substr(File_name,REGEXP_INSTR(FileName,'_',1,2)+1,8) as date
FROM dual;
Why don't you simply put the date in separate column? E.g. you can than query the (indexed) date. The theory says the date is a property of the file. It's about avoiding errors, maintainability and so on. What in the zip files? Excel sheets I suppose :-)
Use a much simplified call to REGEXP_SUBSTR( ):
SQL> with tbl(ID, File_name) as (
2 select 123, 'ROSE1234_LLDAtIInstance_03012014_04292014_190038.zip' from dual
3 union
4 select 456, 'ROSE1234_LLDAtIInstance_08012014_04292014_190038.zip' from dual
5 )
6 select ID,
7 REGEXP_SUBSTR(File_name, '_(\d{8})_', 1, 1, NULL, 1) "Date"
8 from tbl;
ID Date
---------- ----------------------------------------------------
123 03012014
456 08012014
SQL>
For 11g, click here for the parameters to REGEXP_SUBSTR( ).
EDIT: Making this a virtual column would be another way to handle it. Thanks to Epicurist's post for the idea. The virtual column will contain a date value holding the filename date once the ID and filename are committed. Add it like this:
alter table X_TEST add (filedate date generated always as (TO_DATE(REGEXP_SUBSTR(Filename, '_(\d{8})_', 1, 1, NULL, 1), 'MMDDYYYY')) virtual);
So now just insert the ID and Filename, commit and there's your filedate. Note that its read-only.