Rails i18n API Triple Dashes/Hyphens, Ellipsises and Newlines - mysql

I am using i18n API for a purpose. I seed a MySQL database with:
Translation.find_or_create_by(locale: 'en', key:'key1', value: 'value1')
However, after seed, the data is saved on database as:
locale: en
key: key1
value: --- value1\n...\n
All columns are varchar(255) and 'utf8_unicode_ci'.
On Rails i18n documentation, I could not find an explanation for this.
Because of that problem, I can not use find_or_create_by() method. It do/can not check the value column and adds duplicate entries.
Is there any solution for that?
Translate model:
Translation = I18n::Backend::ActiveRecord::Translation
if Translation.table_exists?
I18n.backend = I18n::Backend::ActiveRecord.new
I18n::Backend::ActiveRecord.send(:include, I18n::Backend::Memoize)
I18n::Backend::Simple.send(:include, I18n::Backend::Memoize)
I18n::Backend::Simple.send(:include, I18n::Backend::Pluralization)
I18n.backend = I18n::Backend::Chain.new(I18n::Backend::Simple.new, I18n.backend)
end

What you're seing in your value column is the value serialized to YAML (that's done by the I18n::Backend::ActiveRecord::Translation); which is required, among other things, for pluralization.
#find_or_create_by doesn't work nicely when the value stored in the database needs serialization
To do a simple seed try:
Translation.create_with(value: 'value1').find_or_create_by(locale: 'en', key: 'key1')

Related

How can I query for multiple values after a wildcard?

I have a json object like so:
{
_id: "12345",
identifier: [
{
value: "1",
system: "system1",
text: "text!"
},
{
value: "2",
system: "system1"
}
]
}
How can I use the XDevAPI SearchConditionStr to look for the specific combination of value + system in the identifier array? Something like this, but this doesn't seem to work...
collection.find("'${identifier.value}' IN identifier[*].value && '${identifier.system} IN identifier[*].system")
By using the IN operator, what happens underneath the covers is basically a call to JSON_CONTAINS().
So, if you call:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system1')
.execute()
What gets executed, in the end, is (simplified):
JSON_CONTAINS('["1", "2"]', '"2"') AND JSON_CONTAINS('["system1", "system1"]', '"system1"')
In this case, both those conditions are true, and the document will be returned.
The atomic unit is the document (not a slice of that document). So, in your case, regardless of the value of value and/or system, you are still looking for the same document (the one whose _id is '12345'). Using such a statement, the document is either returned if all search values are part of it, and it is not returned if one is not.
For instance, the following would not yield any results:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system2')
.execute()
EDIT: Potential workaround
I don't think using the CRUD API will allow to perform this kind of "cherry-picking", but you can always use SQL. In that case, one strategy that comes to mind is to use JSON_SEARCH() for retrieving an array of paths corresponding to each value in the scope of identifier[*].value and identifier[*].system i.e. the array indexes and use JSON_OVERLAPS() to ensure they are equal.
session.sql(`select * from collection WHERE json_overlaps(json_search(json_extract(doc, '$.identifier[*].value'), 'all', ?), json_search(json_extract(doc, '$.identifier[*].system'), 'all', ?))`)
.bind('2', 'system1')
.execute()
In this case, the result set will only include documents where the identifier array contains at least one JSON object element where value is equal to '2' and system is equal to system1. The filter is effectively applied over individual array items and not in aggregate, like on a basic IN operation.
Disclaimer: I'm the lead developer of the MySQL X DevAPI Connector for Node.js

Ruby Faker Library with JSONB in PostgreSQL db

My question is whether PostgreSQL actually stores json data in a jsonb column type with quotation marks?
The content in the column is stored as:
"{\"Verdie\":\"Barbecue Ribs\",\"Maurice\":\"Pappardelle alla Bolognese\",\"Vincent\":\"Tiramisù\"}"
I can't workout if this is a feature of PostgreSQL or how I'm seeding my Rails database with Faker data:
# seeds.rb
require "faker"
10.times do
con = Connector.create(
user_id: 1,
name: Faker::Company.name,
description: Faker::Company.buzzword
)
rand(6).times do
con.connectors_data.create(
version: Faker::Number.number(digits: 5),
metadata: Faker::Json.shallow_json(width: 3, options: { key: "Name.first_name", value: "Food.dish" }),
comment: Faker::Lorem.sentence
)
end
end
Its due to what you're using to seed the data.
This is a classic "double encoding" issue.
When dealing with JSON columns you need to remember that that the database adapter (the pg gem) will automatically serialize Ruby hashes, arrays, numerals, strings and booleans into json*. If you feed something that you have already converted into JSON the database adapter it will store it as a string - thus the escaped quotes. "JSON strings" in Ruby are not a specific type and the adapter has no idea that you for example intended to store the JSON object {"foo": "bar"} and not the string "{\"foo\": \"bar\"}".
This is what also what commonly happens when serialize or store are used on JSON columns out of ignorance.
The result is that you get garbage data that can't be queried without using the Postgres to_json function on every row which is extremely inefficient or you need to update the entire table with:
UPDATE table_name SET column_name = to_json(column_name);
Which can also be very costly.
While you could do:
rand(6).times do
con.connectors_data.create(
version: Faker::Number.number(digits: 5),
metadata: JSON.parse(Faker::Json.shallow_json(width: 3, options: { key: "Name.first_name", value: "Food.dish" })),
comment: Faker::Lorem.sentence
)
end
Its very smelly and the underlying method that Faker::Json uses to generate the hash is not public so you might want to look around for a better alternative.

Deedle - how to use 'ParseExact' within the Frame.ReadCsv schema

I have a CSV file of data in the form
21.06.2016 23:00:00.349, 153.461, 153.427
21.06.2016 23:00:00.400, 153.460, 153.423
etc
The initial step of creating a frame involves the optional inclusion of a 'schema' to specify or rename column heads and specify types:
let df = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/data/GBPJPY.csv", hasHeaders=true, inferTypes=false, schema="TS (DateTimeOffset), Bid (float(3)), Ask (float(3))")
I would like to specify the first column of string values to be ParseExact'ed to DateTimeOffset of the format
"dd.mm.yyyy HH:mm:ss.fff"
(I'm assuming the use of the setting System.Globalization.CultureInfo.InvariantCulture).
How do I express the schema such that it will parse the datetime string in that first Frame.ReadCsv("file.csv", schema = ........ )? Or is this not possible to accomplish within the schema statement?

Rails query objects by key value of hash saved to column?

I have 2 objects, Visitors and Events. Visitors have multiple Events. An event stores parameters like this...
#<Event id: 5466, event_type: "Visit", visitor_token: "c26a6098-64bb-4652-9aa0-e41c214f42cb", contact_id: 657, data: {"url"=>"http://widget.powerpress.co/", "title"=>"Home (light) | Widget"}, created_at: "2015-12-17 14:51:53", updated_at: "2015-12-17 14:51:53", website_id: 2>
As you can see, there is a serialized text column called data that stores a hash with more data.
I need to find out if a visitor has visited a certain page, which would be very simple if the url parameter were it's own column, or if the hash were an hstore column, however it wasn't originally set up that way and it's a part of the saved hash.
Here's my attempted rails queries...
visitor.events.where("data -> url = :value", value: 'http://widget.powerpress.co/')
visitor.events.where("data like ?", "{'url' => 'http://widget.powerpress.co/'}")
visitor.events.where("data -> :key LIKE :value", :key => 'url', :value => "%http://widget.powerpress.co/%")
How does one properly query postgres to find objects that have a hash that contains a key with a specific value?
I suspect you're not looking for the right string. It should be "url"=>"http://widget.powerpress.co/", so:
visitor.events.where("data like ?", '%"url"=>"http://widget.powerpress.co/"%')
Check the right value directly in DB.
If you are storing hash in a text column, try following:
visitor.events.select{|ve| eval(ve.data)["url"] == "http://widget.powerpress.co/"}
Hope, it helps!
It worked for me.
visitor.events.select { |n| n.data && n.data['url'] == "http://widget.powerpress.co/"}

Parse complex Json string contained in Hadoop

I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.)
But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example,
1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!)
TABLE temp_table
(
user_id BIGINT COMMENT 'user ID.',
complex_data MAP <STRING, STRING> COMMENT 'complex json data'
)
COMMENT 'temp data.'
PARTITIONED BY(created_date STRING)
STORED AS RCFILE;
2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') )
{ "a": "[]",
"b": "\"sdf\"",
"**c**":"[{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},
{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},]"
}
3. So, I tried... (same approach for Elephant Bird)
REGISTER '/path/to/akela-0.6-SNAPSHOT.jar';
DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader();
values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS
-- dump values_of_map shows correct chararray data per each row
-- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }])
([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ...
attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR
attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK
I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF.
I know that I can solve it by using JAVA or other languages.
But, I think that jython with jyson is the most simplist answer to this issue.