MySQL Entity Framework: Can't create index - mysql

I'm using MySql.Data.Entity version 6.10.8 with Entity Framework.
I'm doing "Code first" to let the MySql-provider create the database structure.
Whenever the migrations contains a creation of a Index my migration fails when running the update-database command. The error message and stack trace are as follows:
System.FormatException: Input string was not in a correct format.
at System.Number.ParseDouble(String value, NumberStyles options,
NumberFormatInfo numfmt) at System.Convert.ToDouble(String value)
at
MySql.Data.Entity.MySqlMigrationSqlGenerator.Generate(CreateIndexOperation
op) at
MySql.Data.Entity.MySqlMigrationSqlGenerator.<.ctor>b__22_4(MigrationOperation> op) at
MySql.Data.Entity.MySqlMigrationSqlGenerator.Generate(IEnumerable 1
migrationOperations, String providerManifestToken) at
System.Data.Entity.Migrations.DbMigrator.GenerateStatements(IList 1
operations, String migrationId) at System.Data.Entity.Migrations.Infrastructure.MigratorBase.GenerateStatements(IList`1 operations, String migrationId) at
System.Data.Entity.Migrations.DbMigrator.ExecuteOperations(String
migrationId, VersionedModel targetModel, IEnumerable 1 operations,
IEnumerable 1 systemOperations, Boolean downgrading, Boolean auto)
at System.Data.Entity.Migrations.DbMigrator.ApplyMigration(DbMigration
migration, DbMigration lastMigration) at
System.Data.Entity.Migrations.Infrastructure.MigratorLoggingDecorator.ApplyMigration(DbMigration
migration, DbMigration lastMigration) at
System.Data.Entity.Migrations.DbMigrator.Upgrade(IEnumerable 1
pendingMigrations, String targetMigrationId, String lastMigrationId)
at
System.Data.Entity.Migrations.Infrastructure.MigratorLoggingDecorator.Upgrade(IEnumerable 1
pendingMigrations, String targetMigrationId, String lastMigrationId)
at System.Data.Entity.Migrations.DbMigrator.UpdateInternal(String
targetMigration) at
System.Data.Entity.Migrations.DbMigrator.<>c__DisplayClasse.b__d()
at
System.Data.Entity.Migrations.DbMigrator.EnsureDatabaseExists(Action
mustSucceedToKeepDatabase) at
System.Data.Entity.Migrations.Infrastructure.MigratorBase.EnsureDatabaseExists(Action
mustSucceedToKeepDatabase) at
System.Data.Entity.Migrations.DbMigrator.Update(String
targetMigration) at
System.Data.Entity.Migrations.Infrastructure.MigratorBase.Update(String
targetMigration) at
System.Data.Entity.Migrations.Design.ToolingFacade.UpdateRunner.RunCore()
at System.Data.Entity.Migrations.Design.ToolingFacade.BaseRunner.Run()
Input string was not in a correct format.
How to repeat:
Use Swedish Windows (or any other language which do not use "." as the decimal separator).
Create a migration file with an index, such as:
CreateTable(
"dbo.AspNetRoles",
c => new
{
Id = c.String(nullable: false, maxLength: 128, storeType: "nvarchar"),
Name = c.String(nullable: false, maxLength: 256, storeType: "nvarchar"),
})
.PrimaryKey(t => t.Id)
.Index(t => t.Name, unique: true, name: "RoleNameIndex"); // This line causes the exception
Run database-update
(A bug report has been sent to MySql: https://bugs.mysql.com/bug.php?id=92561)

This error is due to MySqlMigrationSqlGenerator.Generate(CreateIndexOperation op) which checks the version of the database by converting a string to double. However it does so without specifying an IFormatProvider. Since Swedish use "," as the decimal separator this conversion fails (the version number is separated by ".").
By overriding MySqlMigrationSqlGenerator.Generate(CreateIndexOperation op) this can be avoided. Use the code from this answer https://stackoverflow.com/a/51756143/1037864 (which is an answer for a different question)

Related

How to incorporate projected columns in scanner into new dataset partitioning

Let's say I load a dataset
myds=ds.dataset('mypath', format='parquet', partitioning='hive')
myds.schema
# On/Off_Peak: string
# area: string
# price: decimal128(8, 4)
# date: date32[day]
# hourbegin: int32
# hourend: int32
# inflation: string rename to Inflation
# Price_Type: string
# Reference_Year: int32
# Case: string
# region: string rename to Region
My end goal is to resave the dataset with the following projection:
projection={'Region':ds.field('region'),
'Date':ds.field('date'),
'isPeak':pc.equal(ds.field('On/Off_Peak'),ds.scalar('On')),
'Hourbegin':ds.field('hourbegin'),
'Hourend':ds.field('hourend'),
'Inflation':ds.field('inflation'),
'Price_Type':ds.field('Price_Type'),
'Area':ds.field('area'),
'Price':ds.field('price'),
'Reference_Year':ds.field('Reference_Year'),
'Case':ds.field('Case'),
}
I make a scanner
scanner=myds.scanner(columns=projection)
Now I try to save my new dataset with
ds.write_dataset(scanner, 'newpath',
partitioning=['Reference_Year', 'Case', 'Region'], partitioning_flavor='hive',
format='parquet')
but I get
KeyError: 'Column Region does not exist in schema'
I can work around this by changing my partitioning to ['Reference_Year', 'Case', 'region'] to match the non-projected columns (and then later changing the name of all those directories) but is there a way to do it directly?
Suppose my partitioning needed the compute for more than just the column name changing. Would I have to save a non-partitioned dataset in one step to get the new column and then do another save operation to create the partitioned dataset?
EDIT: this bug has been fixed in pyarrow 10.0.0
It looks like a bug to me. It's as if write_dataset is looking at the dataset_schema rather than the projected_schema
I think you can get around it by calling to_reader on the scanner.
table = pa.Table.from_arrays(
[
pa.array(['a', 'b', 'c'], pa.string()),
pa.array(['a', 'b', 'c'], pa.string()),
],
names=['region', "Other"]
)
table_dataset = ds.dataset(table)
columns={
"Region": ds.field('region'),
"Other": ds.field('Other'),
}
scanner = table_dataset.scanner(columns=columns)
ds.write_dataset(
scanner.to_reader(),
'newpath',
partitioning=['Region'], partitioning_flavor='hive',
format='parquet')
I've reported the issue here

Useful way to convert string to dictionary using python

I have the below string as input:
'name SP2, status Online, size 4764771 MB, free 2576353 MB, path /dev/sde, log 210 MB, port 5660, guid 7478a0141b7b9b0d005b30b0e60f3c4d, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sde /dev/sdf /dev/sdg, dare 0'
I wrote function which convert it to dictionary using python:
def str_2_json(string):
str_arr = string.split(',')
#str_arr{0} = name SP2
#str_arr{1} = status Online
json_data = {}
for i in str_arr:
#remove whitespaces
stripped_str = " ".join(i.split()) # i.strip()
subarray = stripped_str.split(' ')
#subarray{0}=name
#subarray{1}=SP2
key = subarray[0] #key: 'name'
value = subarray[1] #value: 'SP2'
json_data[key] = value
#{dict 0}='name': SP2'
#{dict 1}='status': online'
return json_data
The return turns the dictionary into json (it has jsonfiy).
Is there a simple/elegant way to do it better?
You can do this with regex
import re
def parseString(s):
dict(re.findall('(?:(\S+) ([^,]+)(?:, )?)', s))
sample = "name SP1, status Offline, size 4764771 MB, free 2406182 MB, path /dev/sdb, log 230 MB, port 5660, guid a48134c00cda2c37005b30b0e40e3ed6, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sdb /dev/sdc /dev/sdd, dare 0"
parseString(sample)
Output:
{'name': 'SP1',
'status': 'Offline',
'size': '4764771 MB',
'free': '2406182 MB',
'path': '/dev/sdb',
'log': '230 MB',
'port': '5660',
'guid': 'a48134c00cda2c37005b30b0e40e3ed6',
'clusterUuid': '-8650609094877646407--116798096584060989',
'disks': '/dev/sdb /dev/sdc /dev/sdd',
'dare': '0'}
Your approach is good, except for a couple weird things:
You aren't creating a JSON anything, so to avoid any confusion I suggest you don't name your returned dictionary json_data or your function str_2_json. JSON, or JavaScript Object Notation is just that -- a standard of denoting an object as text. The objects themselves have nothing to do with JSON.
You can use i.strip() instead of joining the splitted string (not sure why you did it this way, since you commented out i.strip())
Some of your values contain multiple spaces (e.g. "size 4764771 MB" or "disks /dev/sde /dev/sdf /dev/sdg"). By your code, you end up everything after the second space in such strings. To avoid this, do stripped_str.split(' ', 1) which limits how many times you want to split the string.
Other than that, you could create a dictionary in one line using the dict() constructor and a generator expression:
def str_2_dict(string):
data = dict(item.strip().split(' ', 1) for item in string.split(','))
return data
print(str_2_dict('name SP2, status Online, size 4764771 MB, free 2576353 MB, path /dev/sde, log 210 MB, port 5660, guid 7478a0141b7b9b0d005b30b0e60f3c4d, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sde /dev/sdf /dev/sdg, dare 0'))
Outputs:
{
'name': 'SP2',
'status': 'Online',
'size': '4764771 MB',
'free': '2576353 MB',
'path': '/dev/sde',
'log': '210 MB',
'port': '5660',
'guid': '7478a0141b7b9b0d005b30b0e60f3c4d',
'clusterUuid': '-8650609094877646407--116798096584060989',
'disks': '/dev/sde /dev/sdf /dev/sdg',
'dare': '0'
}
This is probably the same (practically, in terms of efficiency / time) as writing out the full loop:
def str_2_dict(string):
data = dict()
for item in string.split(','):
key, value = item.strip().split(' ', 1)
data[key] = value
return data
Assuming these fields cannot contain internal commas, you can use re.split to both split and remove surrounding whitespace. It looks like you have different types of fields that should be handled differently. I've added a guess at a schema handler based on field names that can serve as a template for converting the various fields as needed.
And as noted elsewhere, there is no json so don't use that name.
import re
test = 'name SP2, status Online, size 4764771 MB, free 2576353 MB, path /dev/sde, log 210 MB, port 5660, guid 7478a0141b7b9b0d005b30b0e60f3c4d, clusterUuid -8650609094877646407--116798096584060989, disks /dev/sde /dev/sdf /dev/sdg, dare 0'
def decode_data(string):
str_arr = re.split(r"\s*,\s*", string)
data = {}
for entry in str_arr:
values = re.split(r"\s+", entry)
key = values.pop(0)
# schema processing
if key in ("disks"): # multivalue keys
data[key] = values
elif key in ("size", "free"): # convert to int bytes on 2nd value
multiplier = {"MB":10**6, "MiB":2**20} # todo: expand as needed
data[key] = int(values[0]) * multiplier[values[1]]
else:
data[key] = " ".join(values)
return data
decoded = decode_data(test)
for kv in sorted(decoded.items()):
print(kv)
import json
json_data = json.loads(string)

JSON data deserialize using Regular Expressions

I'm facing issue while fetching keys and values from the data using regular expression if the JSON contains \ & ".
{
"KeyOne":"Value One",
"KeyTwo": "Value \\ two",
"KeyThree": "Value \" Three",
"KeyFour": "ValueFour\\"
}
It is sample data, from this I want to read the values are keys. How can I achieve with regular expressions.
Note: I'm deserializing this JSON data in the server side(SAP ABAP).
On earlier releases less than 7.2 (from memory) you can use class /ui2/cl_json
if on 7.3 or later use kernel IXML writer which support JSON.
It is orders of magnitude faster than /ui2/cl_json
you can use identity transformation approach where the source structure is known
and you can create that structure in abap or already has an abap equivalent defined. Otherwise just traverse the JSON document.
The example string was easily parsed
EDIT: Add sample code
REPORT zjsondemo.
CLASS lcl DEFINITION CREATE PUBLIC.
PUBLIC SECTION.
METHODS json_stru_known.
METHODS json_stru_traverse.
ENDCLASS.
CLASS lcl IMPLEMENTATION.
METHOD json_stru_known.
DATA l_src_json TYPE string.
DATA l_mara TYPE mara.
WRITE: / 'DEMO 1 Known structure Identity transformation '.
l_src_json = `{"MARA":{"MATNR":"012345678", "MATKL": "DUMMY" }}`.
WRITE: / 'Conver to MARA -> ', l_src_json.
CALL TRANSFORMATION id SOURCE XML l_src_json
RESULT mara = l_mara. "
WRITE: / 'MARA - MATNR', l_mara-matnr,
/ ' MATKL', l_mara-matkl.
TYPES:
BEGIN OF lty_foo_bar,
KeyOne TYPE string,
KeyTwo Type string,
KeyThree TYPE string,
KeyFour Type string,
END OF lty_foo_bar.
DATA:
lv_json_string TYPE string,
ls_data TYPE lty_foo_bar.
" in this example we use upper case attribute names
"because we map to SAP target
" structure which has upper case names.
" if you need lowercase variables then you can not map straight to an
" SAP type. Then you need to use the traverse technique. See example 2
lv_json_string = |\{| &&
|"KEYONE":"Value One",| &&
|"KEYTWO": "Value \\\\ two", | &&
|"KEYTHREE": "Value \\" Three", | &&
|"KEYFOUR": "ValueFour\\\\" | &&
|\}|.
lv_json_string = `{"JUNK":` && lv_json_string && `}`.
CALL TRANSFORMATION id SOURCE XML lv_json_string
RESULT junk = ls_data. "
Write: / ls_data-keyone,ls_data-keytwo, ls_data-keythree , ls_data-keyfour.
ENDMETHOD.
METHOD json_stru_traverse.
DATA l_src_json TYPE string.
DATA: lo_node TYPE REF TO if_sxml_node.
DATA: lif_element TYPE REF TO if_sxml_open_element,
lif_element_close TYPE REF TO if_sxml_close_element,
lif_value_node TYPE REF TO if_sxml_value,
l_val TYPE string,
l_attr TYPE if_sxml_attribute=>attributes,
l_att_val TYPE string.
FIELD-SYMBOLS: <attr> LIKE LINE OF l_attr.
WRITE: / 'DEMO 2 Traverse any json document'.
l_src_json = `{"MATNR":"012345678", "MATKL": "DUMMY", "SOMENODE": "With this value" }`.
WRITE: / 'Parse as JSON with 3 nodes -> ', l_src_json.
DATA(reader) = cl_sxml_string_reader=>create( cl_abap_codepage=>convert_to( l_src_json ) ).
lo_node = reader->read_next_node( ). " {
IF lo_node IS INITIAL.
EXIT.
ENDIF.
DO 3 TIMES.
lif_element ?= reader->read_next_node( ).
l_attr = lif_element->get_attributes( ).
LOOP AT l_attr ASSIGNING <attr>.
l_att_val = <attr>->get_value( ).
WRITE: / 'Attribute:', l_att_val.
ENDLOOP.
lif_value_node ?= reader->read_next_node( ).
l_val = lif_value_node->get_value( ).
WRITE: '=>', l_val.
lif_element_close ?= reader->read_next_node( ).
ENDDO.
ENDMETHOD.
ENDCLASS.
START-OF-SELECTION.
DATA lo_lcl TYPE REF TO lcl.
CREATE OBJECT lo_lcl.
lo_lcl->json_stru_known( ).
lo_lcl->json_stru_traverse( ).
The SAP system is supplied with many example programs.
Search for demo*json
SAP docu on json parsing
As #mrzasa and #joanis said in their comments: Do not use RegEx to parse JSON!
For small objects or when performance is not a concern, you can use /ui2/cl_json:
TYPES:
BEGIN OF lty_foo_bar,
KeyOne TYPE string,
KeyTwo Type string,
KeyThree TYPE string,
KeyFour Type string,
END OF lty_foo_bar.
DATA:
lv_json_string TYPE string,
ls_data TYPE lty_foo_bar.
lv_json_string = |\{| &&
|"KeyOne":"Value One",| &&
|"KeyTwo": "Value \\\\ two", | &&
|"KeyThree": "Value \\" Three", | &&
|"KeyFour": "ValueFour\\\\" | &&
|\}|.
/ui2/cl_json=>deserialize(
EXPORTING
json = lv_json_string
CHANGING
data = ls_data ).
ls_data-KeyOne contains 'Value One' and so on.
For larger objects and/or better performance check lxml from #phil soadys answer below. The correct handling of upper and lower case letters still causes headache in ABAP anyways.

Why do I always get a "trailing characters" error when trying to parse data with serde_json?

I have a server that returns requests in a JSON format. When trying to parse the data I always get "trailing characters" error. This happens only when getting the JSON from postman
let type_of_request = parsed_request[1];
let content_of_msg: Vec<&str> = msg_from_client.split("\r\n\r\n").collect();
println!("{}", content_of_msg[1]);
// Will print "{"username":"user","password":"password","email":"dwadwad"}"
let res: serde_json::Value = serde_json::from_str(content_of_msg[1]).unwrap();
println!("The username is: {}", res["username"]);
when getting the data from postman this happens:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error("trailing characters", line: 1, column: 60)', src\libcore\result.rs:997:5
but when having the string inside Rust:
let j = "{\"username\":\"user\",\"password\":\"password\",\"email\":\"dwadwad\"}";
let res: serde_json::Value = serde_json::from_str(j).unwrap();
println!("The username is: {}", res["username"]);
it works like a charm:
The username is: "user"
EDIT: Apparently as I read the message into a buffer and turned it into a string it saved all the NULL characters the buffer had which are of course the trailing characters.
Looking at the serde json code, one finds the following comment above the relevant ErrorCode enum element:
/// JSON has non-whitespace trailing characters after the value.
TrailingCharacters,
So as the error code implies, you've got some trailing character which is not whitespace. In your snippet, you say:
println!("{}", content_of_msg[1]);
// Will print "{"username":"user","password":"password","email":"dwadwad"}"
If you literally copy and pasted the printed output here, I'd note that I wouldn't expect the output to be wrapped in the leading and trailing quotation marks. Did you include these yourself or were they part of what was printed? If they were printed, I suspect that's the source of your problem.
Edit:
In fact, I can nearly recreate this using a raw string with leading/trailing quotation marks in Rust:
extern crate serde_json;
#[cfg(test)]
mod tests {
#[test]
fn test_serde() {
let s =
r#""{"username":"user","password":"password","email":"dwadwad"}""#;
println!("{}", s);
let _res: serde_json::Value = serde_json::from_str(s).unwrap();
}
}
Running it via cargo test yields:
test tests::test_serde ... FAILED
failures:
---- tests::test_serde stdout ----
"{"username":"user","password":"password","email":"dwadwad"}"
thread 'tests::test_serde' panicked at 'called `Result::unwrap()` on an `Err` value: Error("trailing characters", line: 1, column: 4)', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
failures:
tests::test_serde
Note that my printed output also includes leading/trailing quotation marks and I also get a TrailingCharacter error, albeit at a different column.
Edit 2:
Based on your comment that you've added the wrapping quotations yourself, you've got a known good string (the one you've defined in Rust), and one which you believe should match it but doesn't (the one from Postman).
This is a data problem and so we should examine the data. You can adapt the below code to check the good string against the other:
#[test]
fn test_str_comp() {
// known good string we'll compare against
let good =
r#"{"username":"user","password":"password","email":"dwadwad"}"#;
// lengthened string, additional characters
// also n and a in username are transposed
let bad =
r#"{"useranme":"user","password":"password","email":"dwadwad"}abc"#;
let good_size = good.chars().count();
let bad_size = bad.chars().count();
for (idx, (c1, c2)) in (0..)
.zip(good.chars().zip(bad.chars()))
.filter(|(_, (c1, c2))| c1 != c2)
{
println!(
"Strings differ at index {}: (good: `{}`, bad: `{}`)",
idx, c1, c2
);
}
if good_size < bad_size {
let trailing = bad.chars().skip(good_size);
println!(
"bad string contains extra characters: `{}`",
trailing.collect::<String>()
);
} else if good_size > bad_size {
let trailing = good.chars().skip(bad_size);
println!(
"good string contains extra characters: `{}`",
trailing.collect::<String>()
);
}
assert!(false);
}
For my example, this yields the failure:
test tests::test_str_comp ... FAILED
failures:
---- tests::test_str_comp stdout ----
Strings differ at index 6: (good: `n`, bad: `a`)
Strings differ at index 7: (good: `a`, bad: `n`)
bad string contains extra characters: `abc`
thread 'tests::test_str_comp' panicked at 'assertion failed: false', src/lib.rs:52:9
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
failures:
tests::test_str_comp

Converting a MongoDB Query String with Datetime field into Dict in Python

I have a string as follows,
s= "query : {'$and': [{'$or': [{'Component': 'pfr'}, {'Component': 'ng-pfr'}, {'Component': 'common-flow-table'}, {'Component': 'media-mon'}]}, {'Submitted-on': {'$gte': datetime.datetime(2016, 2, 21, 0, 0)}}, {'Submitted-on': {'$lte': datetime.datetime(2016, 2, 28, 0, 0)}}]}
" which is a MongoDB query stored in a string.How to convert it into a Dict or JSON format in Python
Your format is not standard, so you need a hack to get it.
import json
s = " query : {'names' :['abc','xyz'],'location':'India'}"
key, value = s.strip().split(':', 1)
r = value.replace("'", '"')
data = {
key: json.loads(r)
}
From your comment: the datetime gives problems. Then I present to you the hack of hacks: the eval function.
import datetime
import json
s = " query : {'names' :['abc','xyz'],'location':'India'}"
key, value = s.strip().split(':', 1)
# we can leave out the replacing, single quotes is fine for eval
data = {
key: eval(value)
}
NB eval -especially on unsanitized input- is very unsafe.
NB: hacks will be broken, in the first case for example because a value or key contains a quote character.