Kibana: how can I sort by the relative frequency of one of possible values in char? - bar-chart

I'm working on following visualization: frequency of specific job build status result per job. Here's what it looks like:
The question is - how can I sort by one of the specific values, say, by "success"?
I've taken tens of attempts to attack the issue, re-read docs but I'm still failing. Let's say that the field name is buildStatus.

Have you tried to order your elasticsearch output as described in the documentation?
You should also have a look at this Kibana blog post, maybe it will help you.

Related

Batch file to verify to verify a .csv file

I am hoping someone can point me in the right direction, in relation to the scenario I am faced with.
Essentially, I am given a csv each day containing payment information of 200+ lines
As the Payment reference is input by the user at source, this isn't always in the format I need.
The process is currently done manually, and can take considerable time, therefore I was hoping to come up with a batch file to isolate the reference I require, based on a set of parameters.
Each reference should be; 11 digits in length, be numeric only and start either 1,2 or 3.
I have attached a basic example with this post.
It may be that this isn't possible in batch, but any ideas would be appreciated.
Thanks in advance :-)
I'm not too sure about batch but Python and Regexcan help you out here.
Here is a great tutorial on using csv's with python.
Once you have that down, you could use Regex to filter out the correct values.
Here is the correct expression to help you out ^[1|2|3][0-9]{10}$

Store and export raw tokens in a solr index

I'd like to store tokens generated by Solr during indexing, such as DictionaryCompoundWordToken and then export them, hopefully using CSVResponseWriter. Is there a way to do that?
I know it's possible to use the Analysis tool to provide values and see how they are tokenized, but I am unaware of how to do this for entirety of the index, or at least on a query basis.
Let's see, I think what you want is to store, alongside the original content of some field, the field value but after it goes through some analysis chain, right?
You would think copyFields would help, but they don't as if you store them, the original field value is stored. You need to use an updateProcessor. Look at this talk Erik Hatcher gave, minutes 7:30 to 20:00 aprox, and you will see exactly this case explained very well, with examples and all.
Once you have that stored in the index, you can return it and do anything you like.
One way to look at this is this, you will index your document content into a field "mytext" with your DictionaryCompoundWordToken or any other analysis that fits your needs. Then you can facet on "mytext" with q=*:* , your query would look like this : http://localhost:8983/solr/collection1/select?q=*%3A*&start=0&rows=1&wt=xml&indent=true&facet=true&facet.field=mytext That should give all the tokens that went into mytext. But i am not 100% sure of your expectation with what you said in the question. Let me know this helps.

Formatting data for use with JSON and D3.js

I have the following data in my MySQL database. These three columns are a subset of a table that I have selected using a query.
Value Date Time
230.8 13/08/08 15:01:22+22
233.7 13/08/08 15:13:12+22
234.5 13/08/08 15:40:33+22
I want to represent this data on a graph of (Value) versus (Date & Time) in a chronological manner. What is the format I need to put the above data into before using JSON cause I've had a look at a few tutorials and when I apply the same logic (like this:http://www.d3noob.org/2013/02/using-mysql-database-as-source-of-data.html) I don't seem to be getting any graph at all.
Or will JSON and D3.js not work for my requirement? Do I need to look at something else? Like some other JavaScript?
Your question is a little bit vague, but I'll try to adress a few of your topics to help you get started.
Firstly, I would suggest finding the visualization that fits your needs. From the data subset that you showed in the question, I would suggest maybe this one. It is interesting because if you have multiple values for different times in a given day, you could construct various time series graphs and compare them interactively. There are other options, so you should explore and find a good starting point to improve and adapt to your needs.
Regarding the origin/format of the data, if you are able to extract that data you showed to a variable (with PHP, for example), you can then manipulate the data and build a structure from it. It doesn't necessarily have to be JSON and/or CSV. As long as you can handle it with d3.js's API functions. It isn't very difficult, but it is something that requires you to understand and read about the topic. First understand how to query for your needs with MySQL. Then, I would suggest starting here if you decide to go with JSON.
The example visualization I mentioned above uses a CSV file as a data source. Other option could be for instance to build a CSV file (or data structure - ie, an array) to feed into d3.js. There are various questions covering "how to create CSV with PHP", so you shouldn't have much difficulty finding the info you need.
Either way, after you feel confortable with what you know about these topics, start breaking your problem into smaller tasks and finding answers to one question at a time. If you need, post more questions here in SO and include your attempts at coding a solution, this will definitely get you all the help you might need.
in python it would look like this:
import json
output = json.dumps(['data', {'data_1': ('230.8', '13/08/08', '15:01:22+22')}, {'data_2': ('233.7', '13/08/08', '15:13:12+22')}, {'data_3': ('234.5', '13/08/08', '15:40:33+22')}])
print output
more information about python and json can be found here

What is the "Rails way" to do correlated subqueries?

I asked nearly the same question in probably the wrong way, so I apologize for both the near duplicate and lousy original phrasing. I feel like my problem now is attempting to fight Rails, which is, of course, a losing battle. Accordingly, I am looking for the idiomatic Rails way to do this.
I have a table containing rows of user data which is scraped from a third party site periodically. The old data is just as important as the new data; the old data is, in fact, probably used more often. There are no performance concerns about referencing the new data, because only a couple people will ever use my service (I keep my standards realistic). But thousands of users are scraped periodically (i.e., way too often). I have named the corresponding models "User" and "UserScrape"
Table users has columns: id, name, email
Table user_scrapes has columns: id, user_id, created_at, address_id, awesomesauce_preference
Note: These are not the real models - user_scrapes has a lot more columns - but you probably get the point
At any given time, I want to find the most recent user_scrapes values associated with the data retrieved from an external source from a given user. I want to find out that my current awesomeauce_preference is, because lately it's probably 'lamesauce' but before, it was 'saucy_sauce'.
I want to have a convenient method that allows me to access the newest scraped data for each user in such a way that I can combine it with separate WHERE clauses to narrow it down further. That's because in at least a dozen parts of my code, I need to deal with the data from the latest scrape.
What I have done so far is this horrible hack that selects the latest user_scrapes for each user with a regular find_by_sql correlated sub-query, then I pluck out the ids of the scrapes, then I put an additional where clause in any relevant query (that needs the latest data).
This is already an issue performance-wise because I don't want to buffer over a million integers (yes, a lot of pages get scraped very often) then try to pass the MySQL driver a list of these and have it miraculously execute a perfect query plan. In my benchmark it took almost as long as it did for me to write this post, so I lied before. Performance is sort of an issue, but not really.
My question
So with my UserScrape class, how can I make a method called 'current', as in: UserScrape.find(1337).current.where(address_id: 1234).awesomesauce_preference when I live at addresses 1234 and 1235 and I want to find out what my awesomsauce_preference is at my latest address?
I think what you are looking for are scopes:
http://guides.rubyonrails.org/active_record_querying.html#scopes
In particular, you can probably use:
scope :current, order("user_scrapes.created_at DESC").limit(1)
Update:
Scopes are meant to return an ActiveRecord object, so that you can continue chaining methods if you wish. There is nothing to prevent you (last I checked anyways) from writing this instead, however:
scope :current, order("user_scrapes.created_at DESC").first
This returns just the one object, and is not chainable, but it may be a more useful function ultimately.
UserScrape.where(address_id: 1234).current.awesomesauce_preference

Best approach to construct complex MySQL joins and groups?

I find that when trying to construct complex MySQL joins and groups between many tables I usually run into strife and have to spend a lot of 'trial and error' time to get the result I want.
I was wondering how other people approach the problems. Do you isolate the smaller blocks of data at the end of the branches and get these working first? Or do you start with what you want to return and just start linking tables on as you need them?
Also wondering if there are any good books or sites about approaching the problem.
I don't work in mySQL but I do frequently write extremely complex SQL and here's how I approach it.
First, there is no substitute whatsoever for thoroughly understanding your database structure.
Next I try to break up the task into chunks.
For instance, suppose I'm writing a report concerning the details of a meeting (the company I work for does meeting planning). I will need to know the meeting name and sales rep, the meeting venue and dates, the people who attened and the speaker information.
First I determine which of the tables will have the information for each field in the report. Now I know what I will have to join together, but not exactly how as yet.
So first I write a query to get the meetings I want. This is the basis for all the rest of the report, so I start there. Now the rest of the report can probably be done in any order although I prefer to work through the parts that should have one-one relationshisps first, so next I'll add the joins and the fields that will get me all the sales rep associated information.
Suppose I only want one rep per meeting (if there are multiple reps, I only want the main one) so I check to make sure that I'm still returning the same number of records as when I just had meeting information. If not I look at my joins and decide which one is giving me more records than I need. In this case it might be the address table as we are storing multiple address for the rep. I then adjust the query to get only one. This may be easy (you may have a field that indicates the specific unique address you want and so only need to add a where condition) or you may need to do some grouping and aggregate functions to get what you want.
Then I go on to the next chunk (working first through all the chunks that should have a 1-1 relationshisp to the central data in this case the meeting). Runthe query nd check the data after each addition.
Finally I move to those records which might have a one-many relationship and add them. Again I run the query and check the data. For instance, I might check the raw data for a particular meeting and make sure what my query is returning is exactly what I expect to see.
Suppose in one of these additions of a join I find the number of distinct meetings has dropped. Oops, then there is no data in one of the tables I just added and I need to change that to a left join.
Another time I may find too many records returned. Then I look to see if my where clause needs to have more filtering info or if I need to use an aggreagte function to get the data I need. Sometimes I will add other fields to the report temporarily to see if I can see what is causing the duplicated data. This helps me know what needs to be adjusted.
The real key is to work slowly, understand your data model and check the data after every new chunk is added to make sure it is returning the results the way you think they should be.
Sometimes, If I'm returning a lot of data, I will temporarily put an additonal where clause on the query to restrict to a few items I can easily check. I also strongly suggest the use of order by because it will help you see if you are getting duplicated records.
Well the best approach to break down your MySQL query is to run the EXPLAIN command as well as looking at the MySQL documentation for Optimization with the EXPLAIN command.
MySQL provides some great free GUI tools as well, the MySQL Query Browser is what you need to use.
When running the EXPLAIN command this will break down how MySQL interprets your query and displays the complexity. It might take some time to decode the output but thats another question in itself.
As for a good book I would recommend: High Performance MySQL: Optimization, Backups, Replication, and More
I haven't used them myself so can't comment on their effectiveness, but perhaps a GUI based query builder such as dbForge or Code Factory might help?
And while the use of Venn diagrams to think about MySQL joins doesn't necessarily help with the SQL, they can help visualise the data you are trying to pull back (see Jeff Atwood's post).