Ruby CSV: how to write nil fields as \N - mysql

I want to write a CSV with ruby's default CSV library in order to use MySQL's fast import LOAD DATA INFILE.
Currently, when I enter nil for a field, it is written as ...;;..., instead I want it to be ...;\N;... (capital N for NULL, not to be confused with \n newline).
CSV.open(product_updates_file_name, "wb", {col_sep: ";", headers: false, force_quotes: false}) do |product_csv|
product_csv << ["foo", nil, "bar"]
end
It currently leads to the decimal field being loaded as 0.00 in the database instead of NULL.
I know, I could set it NULL afterwards, but there are millions of rows and several columns affected, so I strongly prefere to write the CSV as MySQL expects me to:
http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Try overriding nil's to_s method like so:
class << nil
def to_s
"my nil placeholder text"
end
end
All code using to_s will use this implementation for the value nil.

You could modify the CSV-methods:
require 'csv'
class Array
alias :old_to_csv :to_csv
#Extend to_csv for usage like ["foo", nil, "bar"].to_csv( :col_sep => ";")
def to_csv(options)
self.map{|s| s.nil? ? '\N' : s }.old_to_csv
end
end
class CSV
alias :old_push :<<
def <<(data)
case data
when Array
old_push data.map{|s| s.nil? ? '\N' : s }
else
old_push data
end
end
end
#Testcode:
puts ["foo", nil, "bar"].to_csv( :col_sep => ";") #--> [["foo", "\\N", "bar"]]
CSV.open('test.csv', "wb",
{col_sep: ";", headers: false, force_quotes: false }
) do |product_csv|
product_csv << ["foo", nil, "bar"]
end
#-> Creates test.csv with 'foo;\N;bar'
This works only, if you insert Arrays. If you insert other stuff, you must modify the logic.
Remark:
My first idea was to use a converter. But it worked only for parsing a csv, not for writing.
CSV::Converters[:nil_N] = lambda{|s|
s.nil? ? '\N' : s
}
p CSV.parse('foo;;bar', :col_sep => ";", :converters => :nil_N)
#-> [["foo", "\\N", "bar"]]
Perhaps somebody else knows a way to use converters to build csv-files.

Related

In Elixir, How can I extract a lambda to a named function when the lambda is in a closure?

I have the following closure:
def get!(Item, id) do
Enum.find(
#items,
fn(item) -> item.id == id end
)
end
As I believe this looks ugly and difficult to read, I'd like to give this a name, like:
def get!(Item, id) do
defp has_target_id?(item), do: item.id = id
Enum.find(#items, has_target_id?/1)
end
Unfortunately, this results in:
== Compilation error in file lib/auction/fake_repo.ex ==
** (ArgumentError) cannot invoke defp/2 inside function/macro
(elixir) lib/kernel.ex:5238: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:4155: Kernel.define/4
(elixir) expanding macro: Kernel.defp/2
lib/auction/fake_repo.ex:28: Auction.FakeRepo.get!/2
Assuming it is possible, what is the correct way to do this?
The code you posted has an enormous amount of syntax errors/glitches. I would suggest you start with getting accustomed to the syntax, rather than trying to make Elixir better by inventing the things that nobody uses.
Here is the correct version that does what you wanted. The task might be accomplished with an anonymous function, although I hardly see a reason to make a perfectly looking idiomatic Elixir look ugly.
defmodule Foo do
#items [%{id: 1}, %{id: 2}, %{id: 3}]
def get!(id) do
has_target_id? = fn item -> item.id == id end
Enum.find(#items, has_target_id?)
end
end
Foo.get! 1
#⇒ %{id: 1}
Foo.get! 4
#⇒ nil
You can do this:
def get!(Item, id) do
Enum.find(
#items,
&compare_ids(&1, id)
)
end
defp compare_ids(%Item{}=item, id) do
item.id == id
end
But, that's equivalent to:
Enum.find(
#items,
fn item -> compare_ids(item, id) end
)
and may not pass your looks ugly and difficult to read test.
I was somehow under the impression Elixir supports nested functions?
Easy enough to test:
defmodule A do
def go do
def greet do
IO.puts "hello"
end
greet()
end
end
Same error:
$ iex a.ex
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
** (ArgumentError) cannot invoke def/2 inside function/macro
(elixir) lib/kernel.ex:5150: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:3906: Kernel.define/4
(elixir) expanding macro: Kernel.def/2
a.ex:3: A.go/0
wouldn't:
defp compare_ids(item, id), do: item.id == id
be enough? Is there any advantage to including %Item{} or making
separate functions for returning both true and false conditions?
What you gain by specifying the first parameter as:
func(%Item{} = item, target_id)
is that only an Item struct will match the first parameter. Here is an example:
defmodule Item do
defstruct [:id, :name, :description]
end
defmodule Dog do
defstruct [:id, :name, :owner]
end
defmodule A do
def go(%Item{} = item), do: IO.inspect(item.id, label: "id: ")
end
In iex:
iex(1)> item = %Item{id: 1, name: "book", description: "old"}
%Item{description: "old", id: 1, name: "book"}
iex(2)> dog = %Dog{id: 1, name: "fido", owner: "joe"}
%Dog{id: 1, name: "fido", owner: "joe"}
iex(3)> A.go item
id: : 1
1
iex(4)> A.go dog
** (FunctionClauseError) no function clause matching in A.go/1
The following arguments were given to A.go/1:
# 1
%Dog{id: 1, name: "fido", owner: "joe"}
a.ex:10: A.go/1
iex(4)>
You get a function clause error if you call the function with a non-Item, and the earlier an error occurs, the better, because it makes debugging easier.
Of course, by preventing the function from accepting other structs, you make the function less general--but because it's a private function, you can't call it from outside the module anyway. On the other hand, if you wanted to call the function on both Dog and Item structs, then you could simply specify the first parameter as:
|
V
func(%{}=thing, target_id)
then both an Item and a Dog would match--but not non-maps.
What you gain by specifying the first parameter as:
|
V
func(%Item{id: id}, target_id)
is that you let erlang's pattern matching engine extract the data you need, rather than calling item.id as you would need to do with this definition:
func(%Item{}=item, target_id)
In erlang, pattern matching in a parameter list is the most efficient/convenient/stylish way to write functions. You use pattern matching to extract the data that you want to use in the function body.
Going even further, if you write the function definition like this:
same variable name
| |
V V
func(%Item{id: target_id}, target_id)
then erlang's pattern matching engine not only extracts the value for the id field from the Item struct, but also checks that the value is equal to the value of the target_id variable in the 2nd argument.
Defining multiple function clauses is a common idiom in erlang, and it is considered good style because it takes advantage of pattern matching rather than logic inside the function body. Here's an erlang example:
get_evens(List) ->
get_evens(List, []).
get_evens([Head|Tail], Results) when Head rem 2 == 0 ->
get_evens(Tail, [Head|Results]);
get_evens([Head|Tail], Results) when Head rem 2 =/= 0 ->
get_evens(Tail, Results);
get_evens([], Results) ->
lists:reverse(Results).

Python Json dumps not printing values

I am trying out a simple program
import json
class unified_response():
trinitiversion="3"
preprocess = []
if __name__ == '__main__':
ur = unified_response()
preprocessValDict = dict()
preprocessValDict["input"] = "some string"
preprocessValDict["correct"] = " correct some string"
ur.preprocess.append(preprocessValDict)
s = json.dumps(unified_response.__dict__)
print s
s = json.dumps(ur.__dict__)
print s
First print statement prints
{"preprocess": [{"input": "some string", "correct": " correct some string"}], "trinitiversion": "3", "__module__": "__main__", "__doc__": null}
Second print statement prints
{}
Why is the second object not printing any values?
This is not related at all to the json module.
ur.__dict__ is an empty dictionary since only instance attributes are saved in the instance.
The unified_response class only has class attributes hence ur.__dict__ is an empty dict which json.dumps transforms to an empty string.
Compare the outputs of print unified_response.__dict__ and print ur.__dict__.
As a side note:
ur.preprocess.append(preprocessValDict)
Accessing (and especially modifying) class attributes through an instance is considered a bad practice as it can lead to hard-to-find bugs.

Seeding Phoenix Repo from json response

I'm trying to seed the database for my Phoenix application using the response from another api. I don't understand how to parse the response to create new object from it. I'm using HTTPoisin and Poison as of right now
seed.ex
alias NewsApplication.Article
HTTPoison.start
url = "rails_application.com/articles/index"
case HTTPoison.get(url) do
{:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
articles = Poison.decode!(body["articles"])
Enum.fetch(articles, 1, fn(a) -> IO.puts a end)
{:ok, %HTTPoison.Response{status_code: 404}} ->
IO.puts "Not found :("
{:error, %HTTPoison.Error{reason: reason}} ->
IO.inspect reason
end
web/model/article.ex
updated
defmodule NewsApplication.Article do
use Ecto.Model
schema "articles" do
field :label, :string
field :slug, :string
field :full_path, :string
field :content_cache, :string
field :position, :integer, default: 0
field :children_count, :integer, default: 0
field :is_published, :boolean, default: false
field :is_shared, :boolean, default: false
field :featured, :boolean, default: false
field :score, :integer
timestamps
end
end
sample response
{"articles":
[{
"article":{
"id":436,
"updated":"2015-08-14T11:51:21.931Z",
"title":"Celebrating It's 50th Issue",
"numberOfViews":0,
"numberOfFavorites":2,
"imageURLs":["http://.../images/1549/original/axye.png"],
"tags":["Company News"],
"isFeatured":false,
"isPublished":true,
"published":"2015-07-28T17:00:00.000Z"
}
}]
}
It looks like you're almost there, but are missing a couple of things.
You'll want to iterate through each of the article responses using Enum.each and save these to the database like this:
Article.changeset(%Article{}, json)
|> Repo.insert
putting it all together, it'd be something like this:
Enum.each articles_json, fn(article) ->
Article.changeset(%Article{}, json)
|> Repo.insert
end
This won't handle any errors on Repo.insert if the changeset is invalid, but it'll work otherwise.
I'd recommend having a read through the Phoenix docs for models for further reference – specifically have a look at how they're structuring create actions.

CSV record with spaces is not saving to the database

I am trying to save the below kind of CSV records into a DB:
9,Lambert,Kent D,Senator
But it is not being saved in DB, the transaction is being rollbacked and giving this error.
{"state_senate_district_id"=>"9", "last_name"=>"Lambert", "first_name"=>"Kent D", "tag"=>"Senator"}
(0.2ms) BEGIN
(0.1ms) ROLLBACK
["First name should contain only alphabets"]
So there is a space in first_name = "Kent D", hence it is not allowing space, so it's not saving to the DB.
Below is the code to parse the CSV:
hash = {}
CSV.foreach('Senator.csv', {:headers=>:first_row}) do |line|
hash['state_senate_district_id'] = line[0]
hash['last_name'] = line[1]
hash['first_name'] = line[2]
hash['tag'] = line[3]
puts hash
senator = Senator.new(hash)
unless senator.save(hash)
err = senator.errors.full_messages
p err
File.open("errors", "a") do |csv|
err.each do |c|
csv << "\n"
csv << "||||||"
csv << [c]
end
end
end
You probably have a validation rule in the Senator model that is preventing the first_name field from having a space. Remove that validation or change it so that it allows spaces.

Logstash indexing JSON arrays

Logstash is awesome. I can send it JSON like this (multi-lined for readability):
{
"a": "one"
"b": {
"alpha":"awesome"
}
}
And then query for that line in kibana using the search term b.alpha:awesome. Nice.
However I now have a JSON log line like this:
{
"different":[
{
"this": "one",
"that": "uno"
},
{
"this": "two"
}
]
}
And I'd like to be able to find this line with a search like different.this:two (or different.this:one, or different.that:uno)
If I was using Lucene directly I'd iterate through the different array, and generate a new search index for each hash within it, but Logstash currently seems to ingest that line like this:
different: {this: one, that: uno}, {this: two}
Which isn't going to help me searching for log lines using different.this or different.that.
Any got any thoughts as to a codec, filter or code change I can make to enable this?
You can write your own filter (copy & paste, rename the class name, the config_name and rewrite the filter(event) method) or modify the current JSON filter (source on Github)
You can find the JSON filter (Ruby class) source code in the following path logstash-1.x.x\lib\logstash\filters named as json.rb. The JSON filter parse the content as JSON as follows
begin
# TODO(sissel): Note, this will not successfully handle json lists
# like your text is '[ 1,2,3 ]' JSON.parse gives you an array (correctly)
# which won't merge into a hash. If someone needs this, we can fix it
# later.
dest.merge!(JSON.parse(source))
# If no target, we target the root of the event object. This can allow
# you to overwrite #timestamp. If so, let's parse it as a timestamp!
if !#target && event[TIMESTAMP].is_a?(String)
# This is a hack to help folks who are mucking with #timestamp during
# their json filter. You aren't supposed to do anything with
# "#timestamp" outside of the date filter, but nobody listens... ;)
event[TIMESTAMP] = Time.parse(event[TIMESTAMP]).utc
end
filter_matched(event)
rescue => e
event.tag("_jsonparsefailure")
#logger.warn("Trouble parsing json", :source => #source,
:raw => event[#source], :exception => e)
return
end
You can modify the parsing procedure to modify the original JSON
json = JSON.parse(source)
if json.is_a?(Hash)
json.each do |key, value|
if value.is_a?(Array)
value.each_with_index do |object, index|
#modify as you need
object["index"]=index
end
end
end
end
#save modified json
......
dest.merge!(json)
then you can modify your config file to use the/your new/modified JSON filter and place in \logstash-1.x.x\lib\logstash\config
This is mine elastic_with_json.conf with a modified json.rb filter
input{
stdin{
}
}filter{
json{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
if you want to use your new filter you can configure it with the config_name
class LogStash::Filters::Json_index < LogStash::Filters::Base
config_name "json_index"
milestone 2
....
end
and configure it
input{
stdin{
}
}filter{
json_index{
source => "message"
}
}output{
elasticsearch{
host=>localhost
}stdout{
}
}
Hope this helps.
For a quick and dirty hack, I used the Ruby filter and below code , no need to use the out of box 'json' filter anymore
input {
stdin{}
}
filter {
grok {
match => ["message","(?<json_raw>.*)"]
}
ruby {
init => "
def parse_json obj, pname=nil, event
obj = JSON.parse(obj) unless obj.is_a? Hash
obj = obj.to_hash unless obj.is_a? Hash
obj.each {|k,v|
p = pname.nil?? k : pname
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p,event)
else
p = pname.nil?? k : [pname,k].join('.')
event[p] = v
end
}
end
def parse_json_array obj, i,pname, event
obj = JSON.parse(obj) unless obj.is_a? Hash
pname_ = pname
if obj.is_a? Hash
obj.each {|k,v|
p=[pname_,i,k].join('.')
if v.is_a? Array
v.each_with_index {|oo,ii|
parse_json_array(oo,ii,p,event)
}
elsif v.is_a? Hash
parse_json(v,p, event)
else
event[p] = v
end
}
else
n = [pname_, i].join('.')
event[n] = obj
end
end
"
code => "parse_json(event['json_raw'].to_s,nil,event) if event['json_raw'].to_s.include? ':'"
}
}
output {
stdout{codec => rubydebug}
}
Test json structure
{"id":123, "members":[{"i":1, "arr":[{"ii":11},{"ii":22}]},{"i":2}], "im_json":{"id":234, "members":[{"i":3},{"i":4}]}}
and this is whats output
{
"message" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"#version" => "1",
"#timestamp" => "2014-07-25T00:06:00.814Z",
"host" => "Leis-MacBook-Pro.local",
"json_raw" => "{\"id\":123, \"members\":[{\"i\":1, \"arr\":[{\"ii\":11},{\"ii\":22}]},{\"i\":2}], \"im_json\":{\"id\":234, \"members\":[{\"i\":3},{\"i\":4}]}}",
"id" => 123,
"members.0.i" => 1,
"members.0.arr.0.ii" => 11,
"members.0.arr.1.ii" => 22,
"members.1.i" => 2,
"im_json" => 234,
"im_json.0.i" => 3,
"im_json.1.i" => 4
}
The solution I liked is the ruby filter because that requires us to not write another filter. However, that solution creates fields that are on the "root" of JSON and it's hard to keep track of how the original document looked.
I came up with something similar that's easier to follow and is a recursive solution so it's cleaner.
ruby {
init => "
def arrays_to_hash(h)
h.each do |k,v|
# If v is nil, an array is being iterated and the value is k.
# If v is not nil, a hash is being iterated and the value is v.
value = v || k
if value.is_a?(Array)
# "value" is replaced with "value_hash" later.
value_hash = {}
value.each_with_index do |v, i|
value_hash[i.to_s] = v
end
h[k] = value_hash
end
if value.is_a?(Hash) || value.is_a?(Array)
arrays_to_hash(value)
end
end
end
"
code => "arrays_to_hash(event.to_hash)"
}
It converts arrays to has with each key as the index number. More details:- http://blog.abhijeetr.com/2016/11/logstashelasticsearch-best-way-to.html