A shorter non-repeating alphanumeric code than UUID in MySQL - mysql

Is it possible for MySQL database to generate a 5 or 6 digit code comprised of only numbers and letters when I insert a record? If so how?
Just like goo.gl, bit.ly and jsfiddle do it. For exaple:
http://bit.ly/3PKQcJ
http://jsfiddle.net/XzKvP
cZ6ahF, 3t5mM, xGNPN, xswUdS...
So UUID_SHORT() will not work because it returns a value like 23043966240817183
Requirements:
Must be unique (non-repeating)
Can be but not required to be based off of primary key integer value
Must scale (grow by one character when all possible combinations have been used)
Must look random. (item 1234 cannot be BCDE while item 1235 be BCDF)
Must be generated on insert.
Would greatly appreciate code examples.

Try this:
SELECT LEFT(UUID(), 6);

I recommend using Redis for this task, actually. It has all the features that make this task suitable for its use. Foremost, it is very good at searching a big list for a value.
We will create two lists, buffered_ids, and used_ids. A cronjob will run every 5 minutes (or whatever interval you like), which will check the length of buffered_ids and keep it above, say, 5000 in length. When you need to use an id, pop it from buffered_ids and add it to used_ids.
Redis has sets, which are unique items in a collection. Think of it as a hash where the keys are unique and all the values are "true".
Your cronjob, in bash:
log(){ local x=$1 n=2 l=-1;if [ "$2" != "" ];then n=$x;x=$2;fi;while((x));do let l+=1 x/=n;done;echo $l; }
scale=`redis-cli SCARD used_ids`
scale=`log 16 $scale`
scale=$[ scale + 6]
while [ `redis-cli SCARD buffered_ids` -lt 5000 ]; do
uuid=`cat /dev/urandom | tr -cd "[:alnum:]" | head -c ${1:-$scale}`
if [ `redis-cli SISMEMBER used_ids $uuid` == 1]; then
continue
fi
redis-cli SADD buffered_ids $uuid
done
To grab the next uid for use in your application (in pseudocode because you did not specify a language)
$uid = redis('SPOP buffered_ids');
redis('SADD used_ids ' . $uid);
edit actually there's a race condition there. To safely pop a value, add it to used_ids first, then remove it from buffered_ids.
$uid = redis('SRANDMEMBER buffered_ids');
redis('SADD used_ids ' . $uid);
redis('SREM buffered_ids ' . $uid);

Related

Gnuplot one-liner to generate a titled line for each row in a CSV file

I've been trying to figure out gnuplot but haven't been getting anywhere for seemingly 2 reasons. My lack of understanding gnuplot set commands, and the layout of my data file. I've decided the best option is to ask for help.
Getting this gnuplot command into a one-liner is the hope.
Example rows from my CSV data file (MyData.csv):
> _TitleRow1_,15.21,15.21,...could be more, could be less
> _TitleRow2_,16.27,16.27,101,55.12,...could be more, could be less
> _TitleRow3_,16.19,16.19,20.8,...could be more, could be less
...(over 100 rows)
Contents of MyData.csv rows will always be a string as the first column for title, followed by an undetermined amount of decimal values. (Each row gets appended to periodically, so specifying an open ended amount of columns to include is needed)
What I'd like to happen is to generate a line graph showing a line for each row in the csv, using the first column as a row title, and the following numbers generating the actual line.
This is the I'm trying:
gnuplot -e 'set datafile separator ","; set key autotitle columnhead; plot "MyData.csv"'
Which results in:
set datafile separator ","; set key autotitle columnhead; plot "MyData.csv"
^
line 0: Bad data on line 2 of file MyData.csv
This looks like an amazing tool and I'm looking forward to learning more about it. Thanks in advance for any hints/assistance!
Your datafile format is very unfortunate for gnuplot which prefers data in columns.
Although, you can also plot rows (which is not straightforward in gnuplot, but see an example here). This requires a strict matrix, but the problem with your data is that you have a variable column count.
Actually, your CSV is not a "correct" CSV, because a CSV should have the same number of columns for all rows, i.e. if one row has less data than the row with maximum data the line should be filled with ,,, as many as needed. That's basically what the script below is doing.
With this you can plot rows with the option matrix (check help matrix). However, you will get some warnings warning: matrix contains missing or undefined values which you can ignore.
Alternatively, you could transpose your data (with variable column count maybe not straightforward). Maybe there are external tools which can do it easily. With gnuplot-only it will be a bit cumbersome (and first you would have to fill your shorter rows as in the example below).
Maybe there is a simpler and better gnuplot-only solution which I am currently not aware of.
Data: SO73099645.dat
_TitleRow1_, 1.2, 1.3
_TitleRow2_, 2.2, 2.3, 2.4, 2.5
_TitleRow3_, 3.2, 3.3, 3.4
Script:
### plotting rows with variable columns
reset session
FILE = "SO73099645.dat"
getColumns(s) = (sum [i=1:strlen(s)] (s[i:i] eq ',') ? 1 : 0) + 1
set datafile separator "\t"
colCount = 0
myNaNs = myHeaders = ''
stats FILE u (rowCount=$0+1, c=getColumns(strcol(1)), c>colCount ? colCount=c : 0) nooutput
do for [i=1:colCount] {myNaNs=myNaNs.',NaN' }
set table $Data
plot FILE u (s=strcol(1),c=getColumns(s),s.myNaNs[1:(colCount-c)*4]) w table
unset table
set datafile separator ","
stats FILE u (myHeaders=sprintf('%s "%s"',myHeaders,strcol(1))) nooutput
myHeader(n) = word(myHeaders,n)
set key noenhanced
plot for [row=0:rowCount-1] $Data matrix u 1:3 every ::1:row::row w lp pt 7 ti myHeader(row+1)
### end of script
As "one-liner":
FILE = "SO/SO73099645.dat"; getColumns(s) = (sum [i=1:strlen(s)] (s[i:i] eq ',') ? 1 : 0) + 1; set datafile separator "\t"; colCount = 0; myNaNs = myHeaders = ''; stats FILE u (rowCount=$0+1, c=getColumns(strcol(1)), c>colCount ? colCount=c : 0) nooutput; do for [i=1:colCount] {myNaNs=myNaNs.',NaN' }; set table $Data; plot FILE u (s=strcol(1),c=getColumns(s),s.myNaNs[1:(colCount-c)*4]) w table; unset table; set datafile separator ","; stats FILE u (myHeaders=sprintf('%s "%s"',myHeaders,strcol(1))) nooutput; myHeader(n) = word(myHeaders,n); set key noenhanced; plot for [row=0:rowCount-1] $Data matrix u 1:3 every ::1:row::row w lp pt 7 ti myHeader(row+1)
Result:

Powershell array value in MySQL insert is outputting entire array

First off I am new to Powershell, as in this is my first ever attempt. I'm doing the I can program in other languages so I can hack through this project. I have spent a few days now trying to solve this one, and I know is something stupid. I'm just stuck.
# Insert into master table
$query = "INSERT INTO ``master``(``diff``, ``key``) VALUES ('$diff', '$key')"
Invoke-MySqlQuery -Query $query
$query
This works fine the test output displays:
INSERT INTO `master`(`diff`, `key`) VALUES ('248', 'k000002143200000000000000680006080005500900030082670009461000500000091000000000000')
It also inputs into the MySQL DB just fine.
This following the exact same format is not working for me. Like I said at the top, I know it is some stupid formatting thing I'm missing.
# Insert into answers table
$query = "INSERT INTO ``puzzles``(``id``, ``p1``, ``p2``, ``p3``, ``p4``, ``p5``, ``p6``, ``p7``, ``p8``, ``p9``, ``p10``, ``p11``, ``p12``, ``p13``, ``p14``, ``p15``, ``p16``, ``p17``, ``p18``, ``p19``, ``p20``, ``p21``, ``p22``, ``p23``, ``p24``, ``p25``, ``p26``, ``p27``, ``p28``, ``p29``, ``p30``, ``p31``, ``p32``, ``p33``, ``p34``, ``p35``, ``p36``, ``p37``, ``p38``, ``p39``, ``p40``, ``p41``, ``p42``, ``p43``, ``p44``, ``p45``, ``p46``, ``p47``, ``p48``, ``p49``, ``p50``, ``p51``, ``p52``, ``p53``, ``p54``, ``p55``, ``p56``, ``p57``, ``p58``, ``p59``, ``p60``, ``p61``, ``p62``, ``p63``, ``p64``, ``p65``, ``p66``, ``p67``, ``p68``, ``p69``, ``p70``, ``p71``, ``p72``, ``p73``, ``p74``, ``p75``, ``p76``, ``p77``, ``p78``, ``p79``, ``p80``, ``p81``) VALUES ('$id', '$p[0]', '$p[1]', '$p[2]', '$p[3]', '$p[4]', '$p[5]', '$p[6]', '$p[7]', '$p[8]', '$p[9]', '$p[10]', '$p[11]', '$p[12]', '$p[13]', '$p[14]', '$p[15]', '$p[16]', '$p[17]', '$p[18]', '$p[19]', '$p[20]', '$p[21]', '$p[22]', '$p[23]', '$p[24]', '$p[25]', '$p[26]', '$p[27]', '$p[28]', '$p[29]', '$p[30]', '$p[31]', '$p[32]', '$p[33]', '$p[34]', '$p[35]', '$p[36]', '$p[37]', '$p[38]', '$p[39]', '$p[40]', '$p[41]', '$p[42]', '$p[43]', '$p[44]', '$p[45]', '$p[46]', '$p[47]', '$p[48]', '$p[49]', '$p[50]', '$p[51]', '$p[52]', '$p[53]', '$p[54]', '$p[55]', '$p[56]', '$p[57]', '$p[58]', '$p[59]', '$p[60]', '$p[61]', '$p[62]', '$p[63]', '$p[64]', '$p[65]', '$p[66]', '$p[67]', '$p[68]', '$p[69]', '$p[70]', '$p[71]', '$p[72]', '$p[73]', '$p[74]', '$p[75]', '$p[76]', '$p[77]', '$p[78]', '$p[79]', '$p[80]')"
$query
$p[0]
$p[1]
$p[2]
$p[3]
$p[4]
$p[5]
Invoke-MySqlQuery -Query $query
This is the output I get on one of the runs:
INSERT INTO `puzzles`(`id`, `p1`, `p2`, `p3`, `p4`, `p5`, `p6`, `p7`, `p8`, `p9`, `p10`, `p11`, `p12`, `p13`, `p14`, `p15`, `p16`, `p17`, `p
18`, `p19`, `p20`, `p21`, `p22`, `p23`, `p24`, `p25`, `p26`, `p27`, `p28`, `p29`, `p30`, `p31`, `p32`, `p33`, `p34`, `p35`, `p36`, `p37`, `p
38`, `p39`, `p40`, `p41`, `p42`, `p43`, `p44`, `p45`, `p46`, `p47`, `p48`, `p49`, `p50`, `p51`, `p52`, `p53`, `p54`, `p55`, `p56`, `p57`, `p
58`, `p59`, `p60`, `p61`, `p62`, `p63`, `p64`, `p65`, `p66`, `p67`, `p68`, `p69`, `p70`, `p71`, `p72`, `p73`, `p74`, `p75`, `p76`, `p77`, `p
78`, `p79`, `p80`, `p81`) VALUES ('2596', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[0]', '000002143
200000000000000680006080005500900030082670009461000500000091000000000000[1]', '0000021432000000000000006800060800055009000300826700094610005
00000091000000000000[2]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[3]', '0000021432000000000000006
80006080005500900030082670009461000500000091000000000000[4]', '00000214320000000000000068000608000550090003008267000946100050000009100000000
0000[5]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[6]', '00000214320000000000000068000608000550090
0030082670009461000500000091000000000000[7]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[8]', '00000
2143200000000000000680006080005500900030082670009461000500000091000000000000[9]', '000002143200000000000000680006080005500900030082670009461
000500000091000000000000[10]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[11]', '0000021432000000000
00000680006080005500900030082670009461000500000091000000000000[12]', '0000021432000000000000006800060800055009000300826700094610005000000910
00000000000[13]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[14]', '00000214320000000000000068000608
0005500900030082670009461000500000091000000000000[15]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[1
6]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[17]', '000002143200000000000000680006080005500900030
082670009461000500000091000000000000[18]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[19]', '0000021
43200000000000000680006080005500900030082670009461000500000091000000000000[20]', '0000021432000000000000006800060800055009000300826700094610
00500000091000000000000[21]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[22]', '00000214320000000000
0000680006080005500900030082670009461000500000091000000000000[23]', '00000214320000000000000068000608000550090003008267000946100050000009100
0000000000[24]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[25]', '000002143200000000000000680006080
005500900030082670009461000500000091000000000000[26]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[27
]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[28]', '0000021432000000000000006800060800055009000300
82670009461000500000091000000000000[29]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[30]', '00000214
3200000000000000680006080005500900030082670009461000500000091000000000000[31]', '00000214320000000000000068000608000550090003008267000946100
0500000091000000000000[32]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[33]', '000002143200000000000
000680006080005500900030082670009461000500000091000000000000[34]', '000002143200000000000000680006080005500900030082670009461000500000091000
000000000[35]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[36]', '0000021432000000000000006800060800
05500900030082670009461000500000091000000000000[37]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[38]
', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[39]', '00000214320000000000000068000608000550090003008
2670009461000500000091000000000000[40]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[41]', '000002143
200000000000000680006080005500900030082670009461000500000091000000000000[42]', '000002143200000000000000680006080005500900030082670009461000
500000091000000000000[43]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[44]', '0000021432000000000000
00680006080005500900030082670009461000500000091000000000000[45]', '0000021432000000000000006800060800055009000300826700094610005000000910000
00000000[46]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[47]', '00000214320000000000000068000608000
5500900030082670009461000500000091000000000000[48]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[49]'
, '000002143200000000000000680006080005500900030082670009461000500000091000000000000[50]', '000002143200000000000000680006080005500900030082
670009461000500000091000000000000[51]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[52]', '0000021432
00000000000000680006080005500900030082670009461000500000091000000000000[53]', '0000021432000000000000006800060800055009000300826700094610005
00000091000000000000[54]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[55]', '00000214320000000000000
0680006080005500900030082670009461000500000091000000000000[56]', '00000214320000000000000068000608000550090003008267000946100050000009100000
0000000[57]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[58]', '000002143200000000000000680006080005
500900030082670009461000500000091000000000000[59]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[60]',
'000002143200000000000000680006080005500900030082670009461000500000091000000000000[61]', '0000021432000000000000006800060800055009000300826
70009461000500000091000000000000[62]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[63]', '00000214320
0000000000000680006080005500900030082670009461000500000091000000000000[64]', '00000214320000000000000068000608000550090003008267000946100050
0000091000000000000[65]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[66]', '000002143200000000000000
680006080005500900030082670009461000500000091000000000000[67]', '000002143200000000000000680006080005500900030082670009461000500000091000000
000000[68]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[69]', '0000021432000000000000006800060800055
00900030082670009461000500000091000000000000[70]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[71]',
'000002143200000000000000680006080005500900030082670009461000500000091000000000000[72]', '00000214320000000000000068000608000550090003008267
0009461000500000091000000000000[73]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[74]', '000002143200
000000000000680006080005500900030082670009461000500000091000000000000[75]', '000002143200000000000000680006080005500900030082670009461000500
000091000000000000[76]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[77]', '0000021432000000000000006
80006080005500900030082670009461000500000091000000000000[78]', '0000021432000000000000006800060800055009000300826700094610005000000910000000
00000[79]', '000002143200000000000000680006080005500900030082670009461000500000091000000000000[80]')
0
0
0
0
0
2
In the $query variable the $p[X] does not work like the $p[X] below. It's spitting out the entire array. Obviously it's not working when it sends it to MySQL, I get the ID correct and that's it.
Thank you all for the help in advance, thumping my head on the keyboard hurts after awhile!
Ian
Bonus question! Any tips on this would be great as well. Any of the 0s don't need to be there, I actually want them as NULL. Any tips on a clean way to do that would be wonderful. Full disclosure I have not even looked into this part yet!
EDIT - In response to #Mathias R. Jessen
I removed all the other code not having to do with $p, to make it easy on the eyes.
# Set the key variable
$key = (Get-Content Puzzles.txt)[$loop] # ....73.146...8.......6......5..2..8.8..5.9..1.923..........273.........8..3..1.46 #Extreme
$key = $key.Replace(".", "0") # 000073014600080000000600000050020080800509001092300000000002730000000008003001046 #Extreme
$key = -join('k', $key.Substring(0,81)) # k000073014600080000000600000050020080800509001092300000000002730000000008003001046
$check = (Invoke-MySqlQuery -Query "SELECT COUNT(*) FROM ``master`` WHERE ``key`` = '$key'")
$check = $check.'COUNT(*)'
# Check in master table if key exists, i.e. not a new puzzle
IF ($check -eq 0) {
$p = $key.Substring(1,81) # 000073014600080000000600000050020080800509001092300000000002730000000008003001046
# Insert into answers table
$query = "INSERT INTO ``puzzles``(``id``, ``p1``, ``p2``, ``p3``, ``p4``, ``p5``, ``p6``, ``p7``, ``p8``, ``p9``, ``p10``, ``p11``, ``p12``, ``p13``, ``p14``, ``p15``, ``p16``, ``p17``, ``p18``, ``p19``, ``p20``, ``p21``, ``p22``, ``p23``, ``p24``, ``p25``, ``p26``, ``p27``, ``p28``, ``p29``, ``p30``, ``p31``, ``p32``, ``p33``, ``p34``, ``p35``, ``p36``, ``p37``, ``p38``, ``p39``, ``p40``, ``p41``, ``p42``, ``p43``, ``p44``, ``p45``, ``p46``, ``p47``, ``p48``, ``p49``, ``p50``, ``p51``, ``p52``, ``p53``, ``p54``, ``p55``, ``p56``, ``p57``, ``p58``, ``p59``, ``p60``, ``p61``, ``p62``, ``p63``, ``p64``, ``p65``, ``p66``, ``p67``, ``p68``, ``p69``, ``p70``, ``p71``, ``p72``, ``p73``, ``p74``, ``p75``, ``p76``, ``p77``, ``p78``, ``p79``, ``p80``, ``p81``) VALUES ('$id', '$p[0]', '$p[1]', '$p[2]', '$p[3]', '$p[4]', '$p[5]', '$p[6]', '$p[7]', '$p[8]', '$p[9]', '$p[10]', '$p[11]', '$p[12]', '$p[13]', '$p[14]', '$p[15]', '$p[16]', '$p[17]', '$p[18]', '$p[19]', '$p[20]', '$p[21]', '$p[22]', '$p[23]', '$p[24]', '$p[25]', '$p[26]', '$p[27]', '$p[28]', '$p[29]', '$p[30]', '$p[31]', '$p[32]', '$p[33]', '$p[34]', '$p[35]', '$p[36]', '$p[37]', '$p[38]', '$p[39]', '$p[40]', '$p[41]', '$p[42]', '$p[43]', '$p[44]', '$p[45]', '$p[46]', '$p[47]', '$p[48]', '$p[49]', '$p[50]', '$p[51]', '$p[52]', '$p[53]', '$p[54]', '$p[55]', '$p[56]', '$p[57]', '$p[58]', '$p[59]', '$p[60]', '$p[61]', '$p[62]', '$p[63]', '$p[64]', '$p[65]', '$p[66]', '$p[67]', '$p[68]', '$p[69]', '$p[70]', '$p[71]', '$p[72]', '$p[73]', '$p[74]', '$p[75]', '$p[76]', '$p[77]', '$p[78]', '$p[79]', '$p[80]')"
Invoke-MySqlQuery -Query $query
}
You need to use the subexpression operator $() to evaluate the collection at that index during string interpolation. Without this operator, the content of your entire collection is being printed, as well as your literal index syntax.
Your first example works as expected because you're only interpolating simple variables, without doing any additional work to them.
Here's a simple example from the command line:
C:\> $arr = 1,2,3,4
Outside of a string:
C:\> $arr[0]
1
During string interpolation, without the subexpression operator:
C:\> "$arr[0]"
1 2 3 4[0]
During string interpolation, with the subexpression operator:
C:\> "$($arr[0])"
1
This means that your example would become something like this:
...VALUES ('$id', '$($p[0])', '$($p[1])'...
Note that $id is working correctly because it is a simple variable. You only need to use the subexpression operator for additional work like evaluating indexes, properties, etc.
This concept is also sometimes called variable expansion, if you would like to research further.

How to convert data from a custom format to CSV?

I have file that the content of file is as bellow, I have only output two records here but there is around 1000 record in single file:
Record type : GR
address : 62.5.196
ID : 1926089329
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
inID : 101
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:51:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
--------------------------------------------------------------------
Record type : GR
address : 61.5.196
ID : 1926089327
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
intID : 100
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:55:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
--------------------------------------------------------------------
Record type : GR
address : 63.5.196
ID : 1926089328
time : Sun Aug 10 09:53:47 2014
Time zone : + 16200 seconds
address [1] : 61.5.196
PN ID : 412 1
---------- Container #1 (start) -------
intID : 100
---------- Container #1 (end) -------
timerecorded: Sun Aug 10 09:55:47 2014
Uplink data volume : 502838
Downlink data volume : 3133869
Change condition : Record closed
my Goal is to convert this to CSV or txt file like bellow
Record type| address |ID | time | Time zone| address [1] | PN ID
GR |61.5.196 |1926089329 |Sun Aug 10 09:53:47 2014 |+ 16200 seconds |61.5.196 |412 1
any guide would be great on how you think would be best way to start this, the sample that I provided I think will give the clear idea but in words I would want to read the header of each record once and put their data under the out put header.
thanks for your time and help or suggestion
What you're doing is creating an Extract/Transform script (the ET part of an ETL). I don't know which language you're intending to use, but essentially any language can be used. Personally, unless this is a massive file, I'd recommend Python as it's easy to grok and easy to write with the included csv module.
First, you need to understand the format thoroughly.
How are records separated?
How are fields separated?
Are there any fields that are optional?
If so, are the optional fields important, or do they need to be discarded?
Unfortunately, this is all headwork: there's no magical code solution to make this easier. Then, once you have figured out the format, you'll want to start writing code. This is essentially a series of data transformations:
Read the file.
Split it into records.
For each record, transform the fields into an appropriate data structure.
Serialize the data structure into the CSV.
If your file is larger than memory, this can become more complicated; instead of reading and then splitting, for example, you may want to read the file sequentially and create a Record object each time the record delimiter is detected. If your file is even larger, you might want to use a language with better multithreading capabilities to handle the transformation in parallel; but those are more advanced than it sounds like you need to go at the moment.
This is a simple PHP script that will read a text file containing your data and write a csv file with the results. If you are on a system which has command line PHP installed, just save it to a file in some directory, copy your data file next to it renaming it to "your_data_file.txt" and call "php whatever_you_named_the_script.php" on the command line from that directory.
<?php
$text = file_get_contents("your_data_file.txt");
$matches;
preg_match_all("/Record type[\s\v]*:[\s\v]*(.+?)address[\s\v]*:[\s\v]*(.+?)ID[\s\v]*:[\s\v]*(.+?)time[\s\v]*:[\s\v]*(.+?)Time zone[\s\v]*:[\s\v]*(.+?)address \[1\][\s\v]*:[\s\v]*(.+?)PN ID[\s\v]*:[\s\v]*(.+?)/su", $text, $matches, PREG_SET_ORDER);
$csv_file = fopen("your_csv_file.csv", "w");
if($csv_file) {
if(fputcsv($csv_file, array("Record type","address","ID","time","Time zone","address [1]","PN ID"), "|") === FALSE) {
echo "could not write headers to csv file\n";
}
foreach($matches as $match) {
$clean_values = array();
for($i=1;$i<8;$i++) {
$clean_values[] = trim($match[$i]);
}
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
}
fclose($csv_file);
} else {
die("could not open csv file\n");
}
This script assumes that your data records are always formatted similar to the examples you have posted and that all values are always present. If the data file may have exceptions to those rules, the script probably has to be adapted accordingly. But it should give you an idea of how this can be done.
Update
Adapted the script to deal with the full format provided in the updated question. The regular expression now matches single data lines (extracting their values) as well as the record separator made up of dashes. The loop has changed a bit and does now fill up a buffer array field by field until a record separator is encountered.
<?php
$text = file_get_contents("your_data_file.txt");
// this will match whole lines
// only if they either start with an alpha-num character
// or are completely made of dashes (record separator)
// it also extracts the values of data lines one by one
$regExp = '/(^\s*[a-zA-Z0-9][^:]*:(.*)$|^-+$)/m';
$matches;
preg_match_all($regExp, $text, $matches, PREG_SET_ORDER);
$csv_file = fopen("your_csv_file.csv", "w");
if($csv_file) {
// in case the number or order of fields changes, adapt this array as well
$column_headers = array(
"Record type",
"address",
"ID",
"time",
"Time zone",
"address [1]",
"PN ID",
"inID",
"timerecorded",
"Uplink data volume",
"Downlink data volume",
"Change condition"
);
if(fputcsv($csv_file, $column_headers, "|") === FALSE) {
echo "could not write headers to csv file\n";
}
$clean_values = array();
foreach($matches as $match) {
// first entry will contain the whole line
// remove surrounding whitespace
$whole_line = trim($match[0]);
if(strpos($whole_line, '-') !== 0) {
// this match starts with something else than -
// so it must be a data field, store the extracted value
$clean_values[] = trim($match[2]);
} else {
// this match is a record separator, write csv line and reset buffer
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
$clean_values = array();
}
}
if(!empty($clean_values)) {
// there was no record separator at the end of the file
// write the last entry that is still in the buffer
if(fputcsv($csv_file, $clean_values, "|") === FALSE) {
echo "could not write data to csv file\n";
}
}
fclose($csv_file);
} else {
die("could not open csv file\n");
}
Doing the data extraction using regular expressions is one possible method mostly useful for simple data formats with a clear structure and no surprises. As syrion pointed out in his answer, things can get much more complicated. In that case you might need to write a more sophisticated script than this one.

awk set elements in array

I have a large .csv file to to process and my elements are arranged randomly like this:
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
xxxxxx,xx,MLOCAL,341993,22/10/2012
xxxxxx,xx,MREMOTE,9356828,08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
where fields LOCAL, REMOTE, MLOCAL or MREMOTE are displayed like:
when they are displayed as pairs (LOCAL/REMOTE) if 3rd field is MLOCAL, and 4th field is MREMOTE, then 5th and 7th field represent the value and date of MLOCAL, and 6th and 8th represent the value and date of MREMOTE
when they are displayed as single (only LOCAL or only REMOTE) then the 4th and 5th fields represent the value and date of field 3.
Now, I have split these rows using:
nawk 'BEGIN{
while (getline < "'"$filedata"'")
split($0,ft,",");
name=ft[1];
ID=ft[2]
?=ft[3]
?=ft[4]
....................
but because I can't find a pattern for the 3rd and 4th field I'm pretty stuck to continue to assign var names for each of the array elements in order to use them for further processing.
Now, I tried to use "case" statement but isn't working for awk or nawk (only in gawk is working as expected). I also tried this:
if ( ft[3] == "MLOCAL" && ft[4]!= "MREMOTE" )
{
MLOCAL=ft[3];
MLOCAL_qty=ft[4];
MLOCAL_TIMESTAMP=ft[5];
}
else if ( ft[3] == MLOCAL && ft[4] == MREMOTE )
{
MLOCAL=ft[3];
MREMOTE=ft[4];
MOCAL_qty=ft[5];
MREMOTE_qty=ft[6];
MOCAL_TIMESTAMP=ft[7];
MREMOTE_TIMESTAMP=ft[8];
}
else if ( ft[3] == MREMOTE && ft[4] != MOCAL )
{
MREMOTE=ft[3];
MREMOTE_qty=ft[4];
MREMOTE_TIMESTAMP=ft[5];
..........................................
but it's not working as well.
So, if you have any idea how to handle this, I would be grateful to give me a hint in order to be able to find a pattern in order to cover all the possible situations from above.
EDIT
I don't know how to thank you for all this help. Now, what I have to do is more complex than I wrote above, I'll try to describe as simple as I can otherwise I'll make you guys pretty confused.
My output should be like following:
NAME,UNIQUE_ID,VOLUME_ALOCATED,MLOCAL_VALUE,MLOCAL_TIMESTMP,MLOCAL_limit,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,MREMOTE_VALUE,MREMOTE_TIMESTAMP,REMOTE_VALUE,REMOTE_TIMESTAMP
(where MLOCAL_limit and LOCAL_limit are a subtract result between VOLUME_ALOCATED and MLOCAL_VALUE or LOCAL_VALUE)
So, in my output file, fields position should be arranged like:
4th field =MLOCAL_VALUE,5th field =MLOCAL_TIMESTMP,7th field=LOCAL_VALUE,
8th field=LOCAL_TIMESTAMP,10th field=MREMOTE_VALUE,11th field=MREMOTE_TIMESTAMP,12th field=REMOTE_VALUE,13th field=REMOTE_TIMESTAMP
Now, an example would be this:
for the following input: name,ID,VOLUME_ALLOCATED,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
name,ID,VOLUME_ALLOCATED,REMOTE,234455,19/12/2012
I should process this line and the output should be this:
name,ID,VOLUME_ALLOCATED,33222,22/10/2012,MLOCAL_LIMIT, ,,,56,18/10/2012,,
7th ,8th, 9th,12th, and 13th fields are empty because there is no info related to: LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_limit,REMOTE_VALUE, and REMOTE_TIMESTAMP
OR
name,ID,VOLUME_ALLOCATED,,,,,,,,,234455,9/12/2012
4th,5th,6th,7th,8th,9th,10thand ,11th, fields should be empty values because there is no info about: MLOCAL_VALUE,MLOCAL_TIMESTAMP,MLOCAL_LIMIT,LOCAL_VALUE,LOCAL_TIMESTAMP,LOCAL_LIMIT,MREMOTE_VALUE,MREMOTE_TIMESTAMP
VOLUME_ALLOCATED is retrieved from other csv file (called "info.csv") based on the ID field which is processed earlier in the script like:
info.csv
VOLUME_ALLOCATED,ID,CLIENT
5242881,64,subscriber
567743,24,visitor
data.csv
NAME,64,MLOCAL,341993,23/10/2012
NAME,24,LOCAL$REMOTE,2347$4324,19/12/2012$18/12/2012
Now, my code is this:
#! /usr/bin/bash
input="info.csv"
filedata="data.csv"
outfile="out"
nawk 'BEGIN{
while (getline < "'"$input"'")
{
split($0,ft,",");
volume=ft[1];
id=ft[2];
client=ft[3];
key=id;
volumeArr[key]=volume;
clientArr[key]=client;
}
close("'"$input"'");
while (getline < "'"$filedata"'")
{
gsub(/\$/,","); # substitute the $ separator with comma
split($0,ft,",");
volume=volumeArr[id]; # Get the volume from the volumeArr, using "id" as key
segment=clientArr[id]; # Get the client mode from the clientArr, using "id" as key
NAME=ft[1];
id=ft[2];
here I'm stuck, I can't find the right way to set the rest of the
fields since I don't know how to handle the 3rd and 4th fields.
? =ft[3];
? =ft[4];
Sorry, if I make you pretty confused but this is my current situation right now.
Thanks
You didn't provide the expected output from your sample input but here's a start to show how to get the values for the 2 different formats of input line:
$ cat tst.awk
BEGIN{ FS=","; OFS="\t" }
{
delete value # or use split("",value) if your awk cant delete arrays
if ($4 ~ /LOCAL|REMOTE/) {
value[$3] = $5
date[$3] = $7
value[$4] = $6
date[$4] = $8
}
else {
value[$3] = $4
date[$3] = $5
}
print
for (type in value) {
printf "%15s%15s%15s\n", type, value[type], date[type]
}
}
$ awk -f tst.awk file
xxxxxx,xx,MLOCAL,MREMOTE,33222,56,22/10/2012,18/10/2012
MREMOTE 56 18/10/2012
MLOCAL 33222 22/10/2012
xxxxxx,xx,MREMOTE,MLOCAL,33222,56,22/10/2012,18/10/2012
MREMOTE 33222 22/10/2012
MLOCAL 56 18/10/2012
xxxxxx,xx,MLOCAL,*341993,22/10/2012*
MLOCAL *341993 22/10/2012*
xxxxxx,xx,MREMOTE,9356828,08/10/2012
MREMOTE 9356828 08/10/2012
xxxxxx,xx,LOCAL,REMOTE,19316,15253,22/10/2012,22/10/2012
REMOTE 15253 22/10/2012
LOCAL 19316 22/10/2012
xxxxxx,xx,REMOTE,LOCAL,1865871,383666,22/10/2012,22/10/2012
REMOTE 1865871 22/10/2012
LOCAL 383666 22/10/2012
xxxxxx,xx,REMOTE,1180306134,19/10/2012
REMOTE 1180306134 19/10/2012
and if you post the expected output we could help you more.

How do I cleanly extract MySQL enum values in Perl?

I have some code which needs to ensure some data is in a mysql enum prior to insertion in the database. The cleanest way I've found of doing this is the following code:
sub enum_values {
my ( $self, $schema, $table, $column ) = #_;
# don't eval to let the error bubble up
my $columns = $schema->storage->dbh->selectrow_hashref(
"SHOW COLUMNS FROM `$table` like ?",
{},
$column
);
unless ($columns) {
X::Internal::Database::UnknownColumn->throw(
column => $column,
table => $table,
);
}
my $type = $columns->{Type} or X::Panic->throw(
details => "Could not determine type for $table.$column",
);
unless ( $type =~ /\Aenum\((.*)\)\z/ ) {
X::Internal::Database::IncorrectTypeForColumn->throw(
type_wanted => 'enum',
type_found => $type,
);
}
$type = $1;
require Text::CSV_XS;
my $csv = Text::CSV_XS->new;
$csv->parse($type) or X::Panic->throw(
details => "Could not parse enum CSV data: ".$csv->error_input,
);
return map { /\A'(.*)'\z/; $1 }$csv->fields;
}
We're using DBIx::Class. Surely there is a better way of accomplishing this? (Note that the $table variable is coming from our code, not from any external source. Thus, no security issue).
No need to be so heroic. Using a reasonably modern version of DBD::mysql, the hash returned by DBI's column info method contains a pre-split version of the valid enum values in the key mysql_values:
my $sth = $dbh->column_info(undef, undef, 'mytable', '%');
foreach my $col_info ($sth->fetchrow_hashref)
{
if($col_info->{'TYPE_NAME'} eq 'ENUM')
{
# The mysql_values key contains a reference to an array of valid enum values
print "Valid enum values for $col_info->{'COLUMN_NAME'}: ",
join(', ', #{$col_info->{'mysql_values'}}), "\n";
}
...
}
I'd say using Text::CSV_XS may be an overkill, unless you have weird things like commas in enums (a bad idea anyway if you ask me). I'd probably use this instead.
my #fields = $type =~ / ' ([^']+) ' (?:,|\z) /msgx;
Other than that, I don't think there are shortcuts.
I spent part of the day asking the #dbix-class channel over on MagNet the same question and came across this lack of answer. Since I found the answer and nobody else seems to have done so yet, I'll paste the transcript below the TL;DR here:
my $cfg = new Config::Simple( $rc_file );
my $mysql = $cfg->get_block('mysql');
my $dsn =
"DBI:mysql:database=$mysql->{database};".
"host=$mysql->{hostname};port=$mysql->{port}";
my $schema =
DTSS::CDN::Schema->connect( $dsn, $mysql->{user}, $mysql->{password} );
my $valid_enum_values =
$schema->source('Cdnurl')->column_info('scheme')->{extra}->{list};
And now the IRC log of me beating my head against a wall:
14:40 < cj> is there a cross-platform way to get the valid values of an
enum?
15:11 < cj> it looks like I could add 'InflateColumn::Object::Enum' to the
__PACKAGE__->load_components(...) list for tables with enum
columns
15:12 < cj> and then call values() on the enum column
15:13 < cj> but how do I get dbic-dump to add
'InflateColumn::Object::Enum' to
__PACKAGE__->load_components(...) for only tables with enum
columns?
15:20 < cj> I guess I could just add it for all tables, since I'm doing
the same for InflateColumn::DateTime
15:39 < cj> hurm... is there a way to get a column without making a
request to the db?
15:40 < cj> I know that we store in the DTSS::CDN::Schema::Result::Cdnurl
class all of the information that I need to know about the
scheme column before any request is issued
15:42 <#ilmari> cj: for Pg and mysql Schema::Loader will add the list of
valid values to the ->{extra}->{list} column attribute
15:43 <#ilmari> cj: if you're using some other database that has enums,
patches welcome :)
15:43 <#ilmari> or even just a link to the documentation on how to extract
the values
15:43 <#ilmari> and a willingness to test if it's not a database I have
access to
15:43 < cj> thanks, but I'm using mysql. if I were using sqlite for this
project, I'd probably oblige :-)
15:44 <#ilmari> cj: to add components to only some tables, use
result_components_map
15:44 < cj> and is there a way to get at those attributes without making a
query?
15:45 < cj> can we do $schema->resultset('Cdnurl') without having it issue
a query, for instance?
15:45 <#ilmari> $result_source->column_info('colname')->{extra}->{list}
15:45 < cj> and $result_source is $schema->resultset('Cdnurl') ?
15:45 <#ilmari> dbic never issues a query until you start retrieving the
results
15:45 < cj> oh, nice.
15:46 <#ilmari> $schema->source('Cdnurl')
15:46 <#ilmari> the result source is where the result set gets the results
from when they are needed
15:47 <#ilmari> names have meanings :)