Related
What I did
Stored a UUID as BINARY(16) in NodeJS using
const uuid = Buffer.from('myEditedUuid');
(A followup to How do I fetch binary columns from MySQL in Rust?)
What I want to do
I want to fetch said UUID using Rust https://docs.rs/mysql/20.0.0/mysql/.
I am currently using Vec<u8> to gain said UUID:
#[derive(Debug, PartialEq, Eq, Serialize)]
pub struct Policy {
sub: String,
contents: Option<String>,
}
#[derive(Debug, PartialEq, Eq, Serialize)]
pub struct RawPolicy {
sub: Option<Vec<u8>>,
contents: Option<String>,
}
// fetch policies themselves
let policies: Vec<RawPolicy> = connection.query_map("SELECT sub, contents FROM policy", |(sub, contents)| {
RawPolicy { sub, contents }
},)?;
// convert uuid to string
let processed = policies.into_iter().map(|policy| {
let sub = policy.sub.unwrap();
let sub_string = String::from_utf8(sub).unwrap().to_string();
Policy {
sub: sub_string,
contents: policy.contents,
}
}).collect();
What my problem is
In Node, I would receive a Buffer from said database and use something like uuidBUffer.toString('utf8');
So in Rust, I try to use String::from_utf8(), but said Vec does not seem to be a valid utf8-vec:
panicked at 'called `Result::unwrap()` on an `Err` value: FromUtf8Error { bytes: [17, 234, 79, 61, 99, 181, 10, 240, 164, 224, 103, 175, 134, 6, 72, 71], error: Utf8Error { valid_up_to: 1, error_len: Some(1) } }'
My question is
Is Using Vec correct way of fetching BINARY-Columns and if so, how do I convert them back to a string?
Edit1:
Node seems to use Base 16 to Convert A string to a Buffer (Buffer.from('abcd') => <Buffer 61 62 63 64>).
Fetching my parsed UUID in Rust made With Buffer.from() gives me Vec<u8> [17, 234, 79, 61, 99, 181, 10, 240, 164, 224, 103, 175, 134, 6, 72, 71] which thows said utf8-Error.
Vec does not seem to be allowed by MySQL in Rust.
Solution is simple:
You need to convert the BINARY to hex at you database Query or you code. So either try Using the HEX-Crate https://docs.rs/hex/0.4.2/hex/ or rewrite your Query:
Rewriting The Query
let policies: Vec<RawPolicy> = connection.query_map("SELECT hex(sub), contents FROM policy", |(sub, contents)| {
RawPolicy { sub, contents }
},)?;
Converts the sub to hex numbers. Now the resulting Vec can be converted using
let sub = policy.sub.unwrap();
let sub_string = String::from_utf8(sub).unwrap();
from_utf8_lossy can be used
let input = [17, 234, 79, 61, 99, 181, 10, 240, 164, 224, 103, 175, 134, 6, 72, 71];
let output = String::from_utf8_lossy(&input); // "\u{11}�O=c�\n��g��\u{6}HG"
Invalid characters will be replaced by �
The output "\u{11}�O=c�\n��g��\u{6}HG" is the same as the nodejs output "\u0011�O=c�\n��g��\u0006HG".
Unless this string is to be send to a javascript runtime, it should be kept that way.
But if this string is to be send to a javascript runtime (browser or nodejs), then the unicode point notations\u{x} should be substituted to their equivalent notation in javascript
playground
from_ut16_lossy can be used as well
If some of the previous � are not utf-8 encoded but utf-16, they will be converted, if not the same � will be used to render them.
let input:&[u16] = &vec![17, 234, 79, 61, 99, 181, 10, 240, 164, 224, 103, 175, 134, 6, 72, 71];
println!("{}", String::from_utf16_lossy(input))
playground
I have a CSV file that look something like below: i.e. not in Prolog format
james,facebook,intel,samsung
rebecca,intel,samsung,facebook
Ian,samsung,facebook,intel
I am trying to write a Prolog predicate that reads the file and returns a list that looks like
[[james,facebook,intel,samsung],[rebecca,intel,samsung,facebook],[Ian,samsung,facebook,intel]]
to be used further in other predicates.
I am still a beginner and have found some good information from SO and modified them to see if I can get it but I`m stuck because I only generate a list that looks like this
[[(james,facebook,intel,samsung)],[(rebecca,intel,samsung,facebook)],[(Ian,samsung,facebook,intel)]]
which means when I call the head of the inner lists I get (james,facebook,intel,samsung) and not james.
Here is the code being used :- (seen on SO and modified)
stream_representations(Input,Lines) :-
read_line_to_codes(Input,Line),
( Line == end_of_file
-> Lines = []
; atom_codes(FinalLine, Line),
term_to_atom(LineTerm,FinalLine),
Lines = [[LineTerm] | FurtherLines],
stream_representations(Input,FurtherLines)
).
main(Lines) :-
open('file.txt', read, Input),
stream_representations(Input, Lines),
close(Input).
The problem lies with term_to_atom(LineTerm,FinalLine).
First we read a line of the CSV file into a list of character codes in
read_line_to_codes(Input,Line).
Let's simulate input with atom_codes/2:
?- atom_codes('james,facebook,intel,samsung',Line).
Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...].
Then we recompose the original atom read in into FinalLine (this seems wasteful, there must be a way to hoover up a line into an atom directly)
?- atom_codes('james,facebook,intel,samsung',Line),
atom_codes(FinalLine, Line).
Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung'.
The we try to map this atom in FinalLine into a term, LineTerm, using term_to_atom/2
?- atom_codes('james,facebook,intel,samsung',Line),
atom_codes(FinalLine, Line),
term_to_atom(LineTerm,FinalLine).
Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
LineTerm = (james, facebook, intel, samsung).
You see the problem here: LineTerm is not quite a list, but a nested term using the functor , to separate elements:
?- atom_codes('james,facebook,intel,samsung',Line),
atom_codes(FinalLine, Line),
term_to_atom(LineTerm,FinalLine),
write_canonical(LineTerm).
','(james,','(facebook,','(intel,samsung)))
Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
LineTerm = (james, facebook, intel, samsung).
This ','(james,','(facebook,','(intel,samsung))) term will thus also be in the final result, just written differently: (james,facebook,intel,samsung) and packed into a list:
[(james,facebook,intel,samsung)]
You do not want this term, you want a list. You could use atomic_list_concat/2 to create a new atom that can be read as a list:
?- atom_codes('james,facebook,intel,samsung',Line),
atom_codes(FinalLine, Line),
atomic_list_concat(['[',FinalLine,']'],ListyAtom),
term_to_atom(LineTerm,ListyAtom),
LineTerm = [V1,V2,V3,V4].
Line = [106, 97, 109, 101, 115, 44, 102, 97, 99|...],
FinalLine = 'james,facebook,intel,samsung',
ListyAtom = '[james,facebook,intel,samsung]',
LineTerm = [james, facebook, intel, samsung],
V1 = james,
V2 = facebook,
V3 = intel,
V4 = samsung.
But that's rather barbaric.
We must do this whole processing in fewer steps:
Read a line of comma-separated strings on input.
Transform this into a list of either atoms or strings directly.
DCGs seem like the correct solution. Maybe someone can add a two-liner.
I have a CSV file (4.7 million characters) that I am struggling to import into a spreadsheet.
It seems the line delimiter is just a space...and yet there are also spaces after every comma.
What can I do to correctly organize this data in a spreadsheet?
I have tried using Google sheets import and Microsoft Excel import.
Example of current CSV
73, 5/11/2018,Vet Check,Result:Pregnant Multiple, , 73, 5/19/2018,Move To String/Pen,Move To:16, , 73, 5/22/2018,Mastitis,Treat. Name:Spectramast, Treat. Type:Intramammary, Comments:4 Times, Move To:1673, 5/25/2018,Move To String/Pen,Move To:10, , 73, 5/28/2018,Move To String/Pen,Move To:11, , 73, 7/20/2018,Vet Check,Result:OK - Confirmed PG, ,
Where the linebreaks should be.
73, 5/11/2018,Vet Check,Result:Pregnant Multiple, ,
73, 5/19/2018,Move To String/Pen,Move To:16, ,
73, 5/22/2018,Mastitis,Treat. Name:Spectramast, Treat. Type:Intramammary, Comments:4 Times, Move To:16
73, 5/25/2018,Move To String/Pen,Move To:10, ,
73, 5/28/2018,Move To String/Pen,Move To:11, ,
73, 7/20/2018,Vet Check,Result:OK - Confirmed PG, ,
It seems that you could apply this kind of regex https://regex101.com/r/HU13Um/2
Then using sed and tail, if you run
<input sed -r 's/([0-9]{2}, *[0-9]+\/)/\n\1/g' | tail -n +2 >output
you will have
73, 5/11/2018,Vet Check,Result:Pregnant Multiple, ,
73, 5/19/2018,Move To String/Pen,Move To:16, ,
73, 5/22/2018,Mastitis,Treat. Name:Spectramast, Treat. Type:Intramammary, Comments:4 Times, Move To:16
73, 5/25/2018,Move To String/Pen,Move To:10, ,
73, 5/28/2018,Move To String/Pen,Move To:11, ,
73, 7/20/2018,Vet Check,Result:OK - Confirmed PG, ,
The following query, executed against an old MySQL database, should reveal a single UTF-8 character 'yama' for mountain.
select convert(sc_cardname using binary) as cn
from mtg.mtg_cdb_set_cards where setcardid = 214400
Instead it yields the following 15 byte array:
[195, 165, 194, 177, 194, 177, 195, 168, 226, 128, 158, 226, 128, 176, 32]
What are these values and how do I get from there to a character identity?
For reference, the expected binary aray would be the following:
[229, 177, 177]
Update: the following code fixes the yama problem, but I don't know why:
var iconv = new Iconv('utf8','ISO-8859-1');
shortBuffer = buffer.slice(0,-9);
result = iconv.convert(shortBuffer).toString('utf8');
The answer was this, everything was actually encoded in LATIN1... changing the connection properties to reflect that solved the problem
I have an old vb6 program which queries an access 2000 database. I have a fairly long query which looks something like this:
Select * from table where key in ( 0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 19, 20, 21, 24, 27, 29, 30, 35, 38, 39, 40, 42, 43, 44, 46, 47, 49, 50, 53, 56, 59, 60, 61, 63, 64, 65, 66, 67, 68, 72, 76, 80, 84, 86, 89, 90, 91, 93, 94, 98, 99, 10041, 10042, 10045, 10046, 10047, 10049, 10057, 10060, 10089, 32200, 32202, 32203, 32204, 32205, 32207, 32214, 32245, 32303, 32314, 32403, 32405, 32414, 32415, 32503, 32703, 32803, 32903, 33003, 33014, 33102, 33103, 33303, 33403, 33405, 33601, 33603, 33604, 33614, 33705, 33714, 33901, 33903, 33914, 34001, 34105, 34114, 34203, 34303, 34401, 34501, 34601, 34603, 34604, 34605, 34803, 41001, 41005, 41007, 41013, 42001, 42005, 42007, 42013, 43001, 43002, 44001, 44007, 46001, 46007, 99999, 9999999)
However, when I look at the RecordSource of the data object, it seems that the query is being truncated to this (which is obviously not syntactically valid and throws an error):
Select * from table where key in ( 0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 19, 20, 21, 24, 27, 29, 30, 35, 38, 39, 40, 42, 43, 44, 46, 47, 49, 50, 53, 56, 59, 60, 61, 63, 64, 65, 66, 67, 68, 72, 76, 80, 84, 86, 89, 90, 91, 93, 94, 98, 99, 100
My data source looks like this:
Begin VB.Data dtaList
Caption = "dtaList"
Connect = "Access 2000;"
DatabaseName = ""
DefaultCursorType= 0 'DefaultCursor
DefaultType = 2 'UseODBC
Exclusive = 0 'False
Height = 345
Left = 960
Options = 0
ReadOnly = 0 'False
RecordsetType = 1 'Dynaset
RecordSource = ""
Top = 4440
Visible = 0 'False
Width = 2295
End
I've tried running the full query in the access database itself which works fine.
Is this a limitation in the VB.Data object, or is there some other explanation? Is there any way I can get around this issue?
Unfortunately I am unable to upgrade to a newer version of access.
The truncated version of the SQL statement you posted is 246 characters long, so it appears that something along the line is limiting the length of the SQL string to somewhere around 255 characters. As you have discovered by pasting the query into Access itself, the actual size limit of an Access query string is much larger (around 64,000 characters, I believe).
I remember running across a similar issue years ago but my problem was an INSERT statement that was writing some rather long strings to the database. The workaround in that case was to use a parameter query (which I realize, in hindsight, that I should have been using anyway). It greatly shortened the length of the SQL string because the parameters were passed separately. Unfortunately that workaround probably wouldn't help you because even if you dynamically created a parameterized version of the query it wouldn't be all that much shorter than the current SQL string.
Another workaround would be to write all of those numbers for the IN clause as rows in a temporary table named something like [inValues], and then use the query
SELECT [table].*
FROM
[table]
INNER JOIN
[inValues]
ON [table].[key] = [inValues].[key]