Activerecord query with group on multiple columns returning a hash with array as a key - mysql

I wrote an ActiveRecord query to fetch count of some data after grouping by two columns col_a and col_b
result = Sample.where(through: ['col_a', 'col_b'], status: [1, 5]).where("created_at > ?", 1.month.ago).group(:status, :through).count
This returns:
{[1, "col_a"]=>7, [1, "col_b"]=>7, [5, "col_a"]=>4, [5, "col_b"]=>1}
Now my question is, how do I access the values in this hash?
Doing something like results[1, "col_a"] throws an error (wrong no. of arguments).
I know I can do this by writing a loop and extracting the values one by one.
However I want to know if there is a more idiomatic way to access the values, something similar to results[1], maybe?

results[[1, "col_a"]]
# => 7

Four possible ways (I'm sure there are others):
# fetch one value at a time
results[[1, "col_a"]]
# => 7
# fetch all the values
results.values
# => [7, 7, 4, 1]
# loop through keys and values
results.each do |key, value|
puts key
puts value
end
# => [1, "col_a"], 7....
# convert results into a more usable hash
results.map! { |k,v| { k.join("_") => v } }.reduce({}, :merge)
results['1_col_a']
# => 7
Another heavier option, especially if this is a query you will do often, is to wrap the results into a new Ruby object. Then you can parse and use the results in a more idiomatic way and define an accessor simpler than [1,'col_a'].
class SampleGroupResult
attr_reader key, value
def initialize(key, value)
#key = key
#value = value
end
end
results.map { |k,v| SampleGroupResult.new(k,v) }

Related

Elixir - JasonHelpers - How can I send a keyword list to json_map?

I have a data structure that I want to convert to json and preserve the key order.
For example:
%{ x: 1, a: 5} should be converted to "{\"x\": 1, \"a\": 5}"
Poison does it without any problem. But when I upgrade to Jason, it changes to "{\"a\": 5, \"x\": 1}".
So I use JasonHelpers json_map to preserve the order like this:
Jason.Helpers.json_map([x: 1, a: 5])
It creates a fragment with correct order.
However, when I use a variable to do this:
list = [x: 1, a: 5]
Jason.Helpers.json_map(list)
I have an error:
** (Protocol.UndefinedError) protocol Enumerable not implemented for {:list, [line: 15], nil} of type Tuple.
....
QUESTION: How can I pass a pre-calculated list into Jason.Helpers.json_map ?
The calculation is complicated, so I don't want to repeat the code just to use json_map, but use the function that returns a list.
json_map/1 is a macro, from its docs:
Encodes a JSON map from a compile-time keyword.
It is designed for compiling JSON at compile-time, which is why it doesn't work with your runtime variable.
Support for encoding keyword lists was added to the Jason library a year ago, but it looks like it hasn't been pushed to hex yet. I managed to get it work by pulling the latest code from github:
defp deps do
[{:jason, git: "https://github.com/michalmuskala/jason.git"}]
end
Then by creating a struct that implements Jason.Encoder (adapted from this solution by the Jason author):
defmodule OrderedObject do
defstruct [:value]
def new(value), do: %__MODULE__{value: value}
defimpl Jason.Encoder do
def encode(%{value: value}, opts) do
Jason.Encode.keyword(value, opts)
end
end
end
Now we can encode objects with ordered keys:
iex(1)> Jason.encode!(OrderedObject.new([x: 1, a: 5]))
"{\"x\":1,\"a\":5}"
I don't know if this is part of the public API or just an implementation detail, but it appears you have some control of the order when implementing the Jason.Encoder protocol for a struct.
Let's say you've defined an Ordered struct:
defmodule Ordered do
#derive {Jason.Encoder, only: [:a, :x]}
defstruct [:a, :x]
end
If you encode the struct, the "a" key will be before the "x" key:
iex> Jason.encode!(%Ordered{a: 5, x: 1})
"{\"a\":5,\"x\":1}"
Let's reorder the keys we pass in to the :only option:
defmodule Ordered do
#derive {Jason.Encoder, only: [:x, :a]}
defstruct [:a, :x]
end
If we now encode the struct, the "x" key will be before the "a" key:
iex> Jason.encode!(%Ordered{a: 5, x: 1})
"{\"x\":1,\"a\":5}"

In Elixir, How can I extract a lambda to a named function when the lambda is in a closure?

I have the following closure:
def get!(Item, id) do
Enum.find(
#items,
fn(item) -> item.id == id end
)
end
As I believe this looks ugly and difficult to read, I'd like to give this a name, like:
def get!(Item, id) do
defp has_target_id?(item), do: item.id = id
Enum.find(#items, has_target_id?/1)
end
Unfortunately, this results in:
== Compilation error in file lib/auction/fake_repo.ex ==
** (ArgumentError) cannot invoke defp/2 inside function/macro
(elixir) lib/kernel.ex:5238: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:4155: Kernel.define/4
(elixir) expanding macro: Kernel.defp/2
lib/auction/fake_repo.ex:28: Auction.FakeRepo.get!/2
Assuming it is possible, what is the correct way to do this?
The code you posted has an enormous amount of syntax errors/glitches. I would suggest you start with getting accustomed to the syntax, rather than trying to make Elixir better by inventing the things that nobody uses.
Here is the correct version that does what you wanted. The task might be accomplished with an anonymous function, although I hardly see a reason to make a perfectly looking idiomatic Elixir look ugly.
defmodule Foo do
#items [%{id: 1}, %{id: 2}, %{id: 3}]
def get!(id) do
has_target_id? = fn item -> item.id == id end
Enum.find(#items, has_target_id?)
end
end
Foo.get! 1
#⇒ %{id: 1}
Foo.get! 4
#⇒ nil
You can do this:
def get!(Item, id) do
Enum.find(
#items,
&compare_ids(&1, id)
)
end
defp compare_ids(%Item{}=item, id) do
item.id == id
end
But, that's equivalent to:
Enum.find(
#items,
fn item -> compare_ids(item, id) end
)
and may not pass your looks ugly and difficult to read test.
I was somehow under the impression Elixir supports nested functions?
Easy enough to test:
defmodule A do
def go do
def greet do
IO.puts "hello"
end
greet()
end
end
Same error:
$ iex a.ex
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
** (ArgumentError) cannot invoke def/2 inside function/macro
(elixir) lib/kernel.ex:5150: Kernel.assert_no_function_scope/3
(elixir) lib/kernel.ex:3906: Kernel.define/4
(elixir) expanding macro: Kernel.def/2
a.ex:3: A.go/0
wouldn't:
defp compare_ids(item, id), do: item.id == id
be enough? Is there any advantage to including %Item{} or making
separate functions for returning both true and false conditions?
What you gain by specifying the first parameter as:
func(%Item{} = item, target_id)
is that only an Item struct will match the first parameter. Here is an example:
defmodule Item do
defstruct [:id, :name, :description]
end
defmodule Dog do
defstruct [:id, :name, :owner]
end
defmodule A do
def go(%Item{} = item), do: IO.inspect(item.id, label: "id: ")
end
In iex:
iex(1)> item = %Item{id: 1, name: "book", description: "old"}
%Item{description: "old", id: 1, name: "book"}
iex(2)> dog = %Dog{id: 1, name: "fido", owner: "joe"}
%Dog{id: 1, name: "fido", owner: "joe"}
iex(3)> A.go item
id: : 1
1
iex(4)> A.go dog
** (FunctionClauseError) no function clause matching in A.go/1
The following arguments were given to A.go/1:
# 1
%Dog{id: 1, name: "fido", owner: "joe"}
a.ex:10: A.go/1
iex(4)>
You get a function clause error if you call the function with a non-Item, and the earlier an error occurs, the better, because it makes debugging easier.
Of course, by preventing the function from accepting other structs, you make the function less general--but because it's a private function, you can't call it from outside the module anyway. On the other hand, if you wanted to call the function on both Dog and Item structs, then you could simply specify the first parameter as:
|
V
func(%{}=thing, target_id)
then both an Item and a Dog would match--but not non-maps.
What you gain by specifying the first parameter as:
|
V
func(%Item{id: id}, target_id)
is that you let erlang's pattern matching engine extract the data you need, rather than calling item.id as you would need to do with this definition:
func(%Item{}=item, target_id)
In erlang, pattern matching in a parameter list is the most efficient/convenient/stylish way to write functions. You use pattern matching to extract the data that you want to use in the function body.
Going even further, if you write the function definition like this:
same variable name
| |
V V
func(%Item{id: target_id}, target_id)
then erlang's pattern matching engine not only extracts the value for the id field from the Item struct, but also checks that the value is equal to the value of the target_id variable in the 2nd argument.
Defining multiple function clauses is a common idiom in erlang, and it is considered good style because it takes advantage of pattern matching rather than logic inside the function body. Here's an erlang example:
get_evens(List) ->
get_evens(List, []).
get_evens([Head|Tail], Results) when Head rem 2 == 0 ->
get_evens(Tail, [Head|Results]);
get_evens([Head|Tail], Results) when Head rem 2 =/= 0 ->
get_evens(Tail, Results);
get_evens([], Results) ->
lists:reverse(Results).

Inserting several "new" items into the database with DBIC

I'm working in a bioinformatics project that requires me to read genomic data (nothing too fancy, just think of it as strings) from various organisms and insert it into a database. Each read belongs to one organism, and can contain from 5000 to 50000 thousand genes, which I need to process and analyze prior to storage.
The script currently doing this is written in perl and, after all calculations, stores the results in a hash likie this:
$new{$id}{gene_name} = $id;
$new{$id}{gene_database_source} = $gene_database_source
$new{$id}{product} = $product;
$new{$id}{sequence} = $sequence;
$new{$id}{seqlength} = $seqlength;
$new{$id}{digest} = $digest;
$new{$id}{mw} = $mw;
$new{$id}{iep} = $iep;
$new{$id}{tms} = $tms;
After all genes are read and, the insertions are made looping through the hash into an eval{} statement.
eval {
foreach my $id (keys %new) {
my $rs = $schema->resultset('Genes')->create(
{
gene_name => $new{$id}{gene_name},
gene_product => $new{$id}{product},
sequence => $new{$id}{sequence},
gene_protein_length => $new{$id}{seqlength},
digest => $new{$id}{digest},
gene_isoelectric_point => $new{$id}{iep},
gene_molecular_weight => $new{$id}{mw},
gene_tmd_count => $new{$id}{tms},
gene_species => $species,
species_code => $spc,
user_id => $tdruserid,
gene_database_source => $new{$id}{gene_database_source}
}
);
};
While this "works", it has at least two problems I'd like to solve:
The eval statement is intended to "failsafe" the insertions: if one of the insertions fail, the eval dies and no insertion is done. This is clearly not how eval works. I'm pretty sure all insertions made
until failure point will be done and there's no rollback whatsoever.
The script needs to loop twice through very large datasets (one while reading and creating the hashes, and once again when reading
the hashes and performing the insertions). This makes the process' performance rather poor.
Instead of creating the hashes, I'd been thinking of using the new directive of DBIX $schema->new({..stuff..}); and then doing a massive insert transaction. That would solve the double iteration and the eval would either work (or not) with a single transaction, which would do the expected behaviour of < either all insertions or none > ... Is there a way to do this?
You can create your massive transaction by using a TxnScopeGuard in DBIC. In the most basic form, that would be as follows.
eval { # or try from Try::Tiny
my $guard = $schema->txn_scope_guard;
foreach my $id ( keys %new ) {
my $rs = $schema->resultset('Genes')->create(
{
gene_name => $new{$id}{gene_name},
gene_product => $new{$id}{product},
sequence => $new{$id}{sequence},
gene_protein_length => $new{$id}{seqlength},
digest => $new{$id}{digest},
gene_isoelectric_point => $new{$id}{iep},
gene_molecular_weight => $new{$id}{mw},
gene_tmd_count => $new{$id}{tms},
gene_species => $species,
species_code => $spc,
user_id => $tdruserid,
gene_database_source => $new{$id}{gene_database_source}
}
);
}
$guard->commit;
}
You create a scope guard object, and when you're done setting up your transaction, you commit it. If the object goes out of scope, i.e. because something died, it will rollback the transaction automatically.
The eval can catch the die, and your program will not crash. You had that part correct, but you're also right that your code will not undo previous inserts. Note that Try::Tiny's try provides nicer syntax. But it's not needed here.
Transaction in this case means that all queries are collected and run at the same time.
Note that this will still insert one row per INSERT statement only!
If you want to instead create larger INSERT statements, like the following, you need populate, not new.
INSERT INTO foo (bar, baz) VALUES
(1, 1),
(2, 2),
(3, 3),
...
The populate method lets you pass in an array reference with multiple rows at one time. This is supposed to be way faster than inserting one at a time.
$schema->resultset("Artist")->populate([
[ qw( artistid name ) ],
[ 100, 'A Formally Unknown Singer' ],
[ 101, 'A singer that jumped the shark two albums ago' ],
[ 102, 'An actually cool singer' ],
]);
Translated to your loop, that would be as follows. Note that the documentation claims that it's faster if you run it in void context.
eval {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
map {
[
$new{$_}{gene_name}, $new{$_}{product},
$new{$_}{sequence}, $new{$_}{seqlength},
$new{$_}{digest}, $new{$_}{iep},
$new{$_}{mw}, $new{$_}{tms},
$species, $spc,
$tdruserid, $new{$_}{gene_database_source},
]
} keys %new
],
);
}
Like this the scope guard is not needed. However, I would advise you to not do more than 1000 rows per statement though. Processing it in chunks might be a good idea for performance reasons. In that case, you'd loop over the keys 1000 at a time. List::MoreUtils has a nice natatime function for that.
use List::MoreUtils 'natatime';
eval {
my $guard = $schema->txn_scope_guard;
my $it = natatime 1_000, keys %new;
while ( my #keys = $it->() ) {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
map {
[
$new{$_}{gene_name}, $new{$_}{product},
$new{$_}{sequence}, $new{$_}{seqlength},
$new{$_}{digest}, $new{$_}{iep},
$new{$_}{mw}, $new{$_}{tms},
$species, $spc,
$tdruserid, $new{$_}{gene_database_source},
]
} #keys
],
);
}
$guard->commit;
}
Now it will do 1000 rows per insertion, and run all those queries in one big transaction. If one of them fails, none will be done.
The script needs to loop twice through very large datasets (one while reading and creating the hashes, and once again when reading the hashes and performing the insertions). This makes the process' performance rather poor.
You're not showing how you create the data, besides this assignment.
$new{$id}{gene_name} = $id;
$new{$id}{gene_database_source} = $gene_database_source
$new{$id}{product} = $product;
If that's all there is to it, nothing is stopping you from using the approach I've shown above directly where you're processing the data the first time and building the hash. The following code is incomplete, because you're not telling us where the data is coming from, but you should get the gist.
eval {
my $guard = $schema->txn_scope_guard;
# we use this to collect rows to process
my #rows;
# this is where your data comes in
while ( my $foo = <DATA> ) {
# here you process the data and come up with your variables
my ( $id, $gene_database_source, $product, $sequence, $seqlength,
$digest, $mw, $iep, $tms );
# collect the row so we can insert it later
push(
#rows,
[
$id, $gene_database_source, $product, $sequence, $seqlength,
$digest, $mw, $iep, $tms,
]
);
# only insert if we reached the limit
if ( scalar #rows == 1000 ) {
$schema->resultset('Genes')->populate(
[
[
qw(
gene_name gene_product sequence
gene_protein_length digest gene_isoelectric_point
gene_molecular_weight gene_tmd_count gene_species
species_code user_id gene_database_source
)
],
\#rows,
],
);
# empty the list of values
#rows = ();
}
}
$guard->commit;
}
Essentially we collect up to 1000 rows directly as array references while we process them, and when we've reached the limit, we pass them to the database. We then reset our row array and start over. Again, all of this is wrapped in a transaction, so it will only be committed if all the inserts are fine.
There is more information on transactions in DBIC in the cookbook.
Please note that I have not tested any of this code.

csv empty strings handling and values appending

With a csv of ~50 rows (stars) and ~30 columns (name, magnitudes and distance), that has some empty string values (''), I am trying to do two things in which all the help so far hasn't been useful. (1) I need to parse empty strings as 0.0, so I can (2) append each row in a list of lists (what I called s).
In other words:
- s is a list of stars (each one has all its parameters)
- d is a particular parameter for all the stars (distance), which I obtain correctly.
Big issue is with s. My try:
with open('stars.csv', 'r') as mycsv:
csv_stars = csv.reader(mycsv)
next(csv_stars) #skip header
stars = list(csv_stars)
s = [] # star
d = [] # distances
for row in stars:
row[row==''] = '0'
s.append(float(row)) #stars
d.append(arcsec*AU*float(row[30]))
I can't think of a better syntax, and so I get the error
s.append(float(row)) # stars
TypeError: float() argument must be a string or a number
From s I would obtain later the magnitudes for all the stars, separately. But first things first...
#cwasdwa Please look at below code. it will give you an idea. I am sure there might be better way. This solution is based on what I have understood from your code.
with open('stars.csv', 'r') as mycsv:
csv_stars = csv.reader(mycsv)
next(csv_stars) #skip header
stars = list(csv_stars)
s = [] # star
d = [] # distances
for row in stars:
newRow = [] #create new row array to convert all '' to 0.0
for x in row:
if x =='':
newRow.append(0.0)
else:
newRow.append(x)
s.append(newRow) #stars
if row[30] == '':
value = 0.0
else:
value = row[30]
d.append(arcsec*AU*float(value))

Postgres JSON data type Rails query

I am using Postgres' json data type but want to do a query/ordering with data that is nested within the json.
I want to order or query with .where on the json data type. For example, I want to query for users that have a follower count > 500 or I want to order by follower or following count.
Thanks!
Example:
model User
data: {
"photos"=>[
{"type"=>"facebook", "type_id"=>"facebook", "type_name"=>"Facebook", "url"=>"facebook.com"}
],
"social_profiles"=>[
{"type"=>"vimeo", "type_id"=>"vimeo", "type_name"=>"Vimeo", "url"=>"http://vimeo.com/", "username"=>"v", "id"=>"1"},
{"bio"=>"I am not a person, but a series of plants", "followers"=>1500, "following"=>240, "type"=>"twitter", "type_id"=>"twitter", "type_name"=>"Twitter", "url"=>"http://www.twitter.com/", "username"=>"123", "id"=>"123"}
]
}
For any who stumbles upon this. I have come up with a list of queries using ActiveRecord and Postgres' JSON data type. Feel free to edit this to make it more clear.
Documentation to the JSON operators used below: https://www.postgresql.org/docs/current/functions-json.html.
# Sort based on the Hstore data:
Post.order("data->'hello' DESC")
=> #<ActiveRecord::Relation [
#<Post id: 4, data: {"hi"=>"23", "hello"=>"22"}>,
#<Post id: 3, data: {"hi"=>"13", "hello"=>"21"}>,
#<Post id: 2, data: {"hi"=>"3", "hello"=>"2"}>,
#<Post id: 1, data: {"hi"=>"2", "hello"=>"1"}>]>
# Where inside a JSON object:
Record.where("data ->> 'likelihood' = '0.89'")
# Example json object:
r.column_data
=> {"data1"=>[1, 2, 3],
"data2"=>"data2-3",
"array"=>[{"hello"=>1}, {"hi"=>2}],
"nest"=>{"nest1"=>"yes"}}
# Nested search:
Record.where("column_data -> 'nest' ->> 'nest1' = 'yes' ")
# Search within array:
Record.where("column_data #>> '{data1,1}' = '2' ")
# Search within a value that's an array:
Record.where("column_data #> '{array,0}' ->> 'hello' = '1' ")
# this only find for one element of the array.
# All elements:
Record.where("column_data ->> 'array' LIKE '%hello%' ") # bad
Record.where("column_data ->> 'array' LIKE ?", "%hello%") # good
According to this http://edgeguides.rubyonrails.org/active_record_postgresql.html#json
there's a difference in using -> and ->>:
# db/migrate/20131220144913_create_events.rb
create_table :events do |t|
t.json 'payload'
end
# app/models/event.rb
class Event < ActiveRecord::Base
end
# Usage
Event.create(payload: { kind: "user_renamed", change: ["jack", "john"]})
event = Event.first
event.payload # => {"kind"=>"user_renamed", "change"=>["jack", "john"]}
## Query based on JSON document
# The -> operator returns the original JSON type (which might be an object), whereas ->> returns text
Event.where("payload->>'kind' = ?", "user_renamed")
So you should try Record.where("data ->> 'status' = 200 ") or the operator that suits your query (http://www.postgresql.org/docs/current/static/functions-json.html).
Your question doesn't seem to correspond to the data you've shown, but if your table is named users and data is a field in that table with JSON like {count:123}, then the query
SELECT * WHERE data->'count' > 500 FROM users
will work. Take a look at your database schema to make sure you understand the layout and check that the query works before complicating it with Rails conventions.
JSON filtering in Rails
Event.create( payload: [{ "name": 'Jack', "age": 12 },
{ "name": 'John', "age": 13 },
{ "name": 'Dohn', "age": 24 }]
Event.where('payload #> ?', '[{"age": 12}]')
#You can also filter by name key
Event.where('payload #> ?', '[{"name": "John"}]')
#You can also filter by {"name":"Jack", "age":12}
Event.where('payload #> ?', {"name":"Jack", "age":12}.to_json)
You can find more about this here