Mysql import of non-csv data with field identifiers - mysql

I'm trying to import a non-csv data file into MySQL.
1) The data fields are newline delimited, and the field identifier is at the start of each line.
2) Some fields have multiple entries
3) Not every record has every field populated
4) Some blank lines exist inside fields and need to be filtered out
5) Records are generally delimited by blank lines, but also by "Number X"
Here's an example of the file showing an example of three records as they appear
Number 1
ARTIST BOOM JEFF=SINGER
BACKING MUSICIANS=BAND
COMP BOOM JEFF
DATE 1980
TIME 3.23
FIELD3 FRONT ROW
NOTE LIVE RECORDING
Number 2
ARTIST JOHN LEE=VOCAL
COMP JOHN LEE
TIME 4.20
ID 000000230682
PUBLISHER BLAHBLAH
FIELD3 DAY I RODE THE TRAIN
Number 3
ARTIST BURT DAN=NARRATOR
JOHNS RY=DRUMS
STUDIO BAND=ORCHESTRA
FREE DAN=DIRECTOR
COMP JOHNS RY
DATE 1934
DUR 2.32
ID 000055332
PUBLISHER WEEWAH
SHELF 86000002
FIELD3 EVE OF THE WAR
NOTE FROM HE NARRATION "NO MORE THAT IN
THE FIRST YEARS OF THE SEVENTEENTH CENTURY .."
What's the best way to import this data into MySQL?
Can LOAD DATA INFILE be used to read it in? Or should I write a script to strip the field identifiers and convert it to csv format which can then be read in using LOAD DATA INFILE?

I would rather use sed to convert those to INSERT .. SET ... statements like:
INSERT INTO RECORDS SET
ARTIST="BOOM JEFF=SINGER~BACKING MUSICIANS=BAND" ,
COMP="BOOM JEFF" ,
DATE="1980" ,
TIME="3.23" ,
FIELD3="FRONT ROW" ,
NOTE="LIVE RECORDING"
replacing in-record newlines with ~ for example and after that analyse data with the help of SQL.

From what I see, your best bet is a script that would parse the data line by line with a script similar to the following (using php):
$lines=explode("\n",file_get_contents('file.name'));
$record=null;
//go through all the lines
foreach($lines as $line) {
//if the line is not empty, add the field to the record
if(trim($line)) {
//I am only processing the field name-you'll have to do the same for equal signs
$pos = strpos($line, ' ');
$fieldName=substr($line,0,$pos;
$fieldValue=substr($line,$pos+1);
$record[$fieldName]=$fieldValue;
}
//if it is a blank line and we have a record, save it
else if ($record) {
//insert the record into the database
insertRecord($record);
//empty the record as the next line is a new record
$record=null;
}
}
function insertRecord($record) {
//to do implement an sql insert statement
}

Related

How to convert csv to Json using expression transformation in informatica?

I have a csv file, I am converting this to Json array format. Below are the row wise operations in expression transformation for the two fields.
region
country
Json(output port): '{'||'"region": '||'"'||Region||'",'||'"Country": '||'"'||Country||'"'||'},'
output:
{"region": "Australia and Oceania", "Country": "Tuvalu"},
This output is saved in a text file with session file properties as fixed width.
enter code here
second mapping expression:
JSON(input)
V_JSON_start(variable port):INSTR(JSON,'{',1,1)
V_JSON_end(variable port):instr(JSON,'}',1,10)
O_Json(output port):'['||substr(JSON,V_JSON_start,V_JSON_end)||']'
output:
[{"region": "Australia and Oceania","Country": "Tuvalu"},
{"region": "Central America and the Caribbean","Country": "Grenada"}]
when I try to fetch next 10 records as json, it pulls 20 records instead of ten
below is the expression:
JSON(input)
V_JSON_start(variable port):INSTR(JSON,'{',1,11)
V_JSON_end(variable port):instr(JSON,'}',1,20)
O_Json(output port):'['||substr(JSON,V_JSON_start,V_JSON_end)||']'
Kindly look into this and correct where i am missing.
input: flat file(csv with two fields region and country))
expected output:(5 sessions each session 10 records in json format)
eg., [{"region";"value","country":"value"},
{"region":"value","country":"value"}]
session1(csv to json) -->session 2--session3--session4--session5--session6(all parallel sessions using the file of 1st session 5 records in json format)

Loading CSV Neo4j "Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1'"

I am using grades.csv data from the link below,
https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html
I noticed that all the strings in the csv file were in "" and it causes
error messages:
Neo.ClientError.Statement.SemanticError: Cannot merge node using null property value for Test1
so I removed the "" in the headers
the code I was trying to run:
LOAD CSV WITH HEADERS FROM 'file:///grades.csv' AS row
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
error message:
Neo.ClientError.Statement.SyntaxError: Type mismatch: expected Any, Map, Node, Relationship, Point, Duration, Date, Time, LocalTime, LocalDateTime or DateTime but was List<String> (line 2, column 24 (offset: 65))
"MERGE (t:Test1 {Test1: row.Test1})
Basically you can not merge node using null property value. In your case, Test1 must be null for one or more lines in your file. If you don't see blank values for Test1, please check is there is any blank line at the end of file.
You can also handle null check before MERGE using WHERE, like
LOAD CSV ...
WHERE row.Test1 IS NOT NULL
MERGE (t:Test1 {Test1: row.Test1})
RETURN count(t);
The issues are:
The file is missing a comma after the Test1 value in the row for "Airpump".
The file has white spaces between the values in each row. (Search for the regexp ", +" and replace with ",".)
Your query should work after fixing the above issues.

Max date row using Talend Data Integration

Using Talend Data Integration, following data is inputted from a CSV file
Time A B C
18:22.0 -0.30107087 3.79899955 6.52000999
18:23.5 -9.648633 0.84515321 1.50116444
18:25.0 -6.01184034 7.66982508 4.42568207
18:25.9 -9.22426033 3.12263751 5.10443783
18:26.7 -9.00698662 4.03901815 0.01316811
18:27.4 -4.31255579 6.25724602 5.02961922
18:28.2 -2.67013335 7.5932107 5.41628265
18:28.8 -1.76213241 6.26981592 7.44536877
18:29.5 -2.18590617 5.58567238 6.23928976
18:30.3 0.42078096 3.1429882 8.46290493
18:30.9 0.36391866 3.02926373 8.86752415
18:31.6 0.35673606 3.07176089 8.93396378
18:32.4 0.35374331 3.05081153 8.93994904
18:33.0 0.38187516 3.05799413 8.89745235
18:33.7 0.32920274 3.03644633 8.9315691
18:34.4 0.37529111 3.07475352 8.93575954
18:35.0 0.40342298 3.07654929 8.86393356
18:35.7 0.35254622 3.05260706 8.9267807
How do I extract only the max date row (18:35.7 0.35254622 3.05260706 8.9267807) and load it into a JSON?
You can achieve this by sorting your file on the Time column (reverse alphanumeric sort gives the most recent Time on top), then take the first row and write it to a JSON file. Like so :
tSampleRow_1 config :

MySql Seperate values in one col to many

I am retrieving data from mysql db. All the data is one column. I need to separate this into several cols: The structure of this col is as follows:
{{product ID=001 |Country=Netherlands |Repository Link=http://googt.com |Other Relevant Information=test }} ==Description== this are the below codes: code 1 code2 ==Case Study== case study 1 txt case study 2 txt ==Benefits== ben 1 ben 2 === Requirements === (empty col) === Architecture === *arch1 *arch2
So I want cols like: Product ID, Country, Repository Link, Architecture etc.....
If you are planning on simply parsing out the output of your column, it will depend on the language of choice you are currently using.
However, in general the procedure for doing this is as follows.
1, pull output into string
2, find a delimiter(In you case it appears '|' will do)
3, you have to options here(again depending on language)
A, Split each segment into an array
1, Run array through looping structure to print out each section OR use array
to manipulate data individually(your choice)
B, In Simple String method, you can either create a new string, or replace all
instances of '|' with '\n'(new line char) so that you can display all data.
I recommend the array conversion as this will allow you to easily interact with the data in a simple manner.
This is often something done today with json and other such formats which are often stored in single fields for various reasons.
Here is an example done in php making use of explode()
$unparsed = "this | is | a | string that is | not: parsed";
$parsed = explode("|", $unparsed);
echo $parsed[2]; // would be a
echo $parsed[4]; // would be not: parsed

Ordering invoice numbers and letters in MySQL

I'm having problems with SQL sorting results of a query from my MySQL database. I need a way to sort invoice numbers mixed with letters and a multi digit number.
Format is: ${optional-prefix}${number part}${optional-postfix} and they are all stored in Varchar(32). It is not an option to change the number format, because the values are imported from multiple systems.
What i want to sort: (unsorted)
IoCustTextNoNumber
Io-700
IO39ABC
IO-137-kk
IO-037-kk
201-ib
201
38-kk
036
12
11-KE
IO-37-kk
00001342
IO-36-kk
11-KEk
13
035
37-kk
200
Io-701
Expected result: (sorted)
11-KE
11-KEk
12
13
035
036
37-kk
38-kk
200
201
201-ib
00001342
IO-36-kk
IO-037-kk
IO-37-kk
IO-137-kk
Io-700
Io-701
IO39ABC
IoCustTextNoNumber
Can anyone help me with a solution?
MySQL is not going to do that. You can build a custom sort in something like PHP and you do a for loop and assign things to a position. Or, You can select all that begin with lo and then update all of those to put lo into another column.
In php you could do something like:
foreach($data => row){
$test = strpos('-', $row); // If this is successful than it has a dash in the string, and it goes towards the front.
if(!$test) { // If its not a test does it begin with a number.
if($row[0] >= 0){
// Do whatever you need
}
}
}