Migrating from graphite to graph-explorer - vimeo

The graphite-webapp does not encourage ad-hoc graphing. Graphiti et al are just fancy UIs that, while improve UI-UX, do not do much regarding the inherent linear metric search that plagues the graphite-webapp. Correct me if wrong here, but the only option I came across that encourages ad-hoc graphing has been Graph-Explorer. Assuming, that Graph-Explorer is the only way ahead.
I have some 1000 distinct metrics currently. Named in the following fashion-
stats.beta.pluto.ip-10-0-1-81.helios.pa.v4.reminder.total
stats.beta.pluto.ip-10-0-1-81.helios.pa.v4.reminder.failed
stats.beta.pluto.ip-10-0-1-81.helios.pa.v4.reminder.delivered
stats.dev.ganglia.ip-10-0-3-40.ink.web.pi.notification.android.total
stats.dev.ganglia.ip-10-0-3-40.ink.web.pi.notification.android.failed
stats.dev.ganglia.ip-10-0-3-40.ink.web.pi.notification.android.delivered
I understand that these will become-
metric=stats.env=dev.role=ganglia.server=ip-10-0-3-40. application=ink.endpoint=web.src=pi.metric=notification.what=total
Where do I insert unit and target_type tags?
Similarly, I have 500 timers.
How do I go about migrating from 'proto1' to 'proto2'?
Also where exactly does Carbon-Tagger come into the stack?
Do I rename my metrics at the source level?
Do I modify the structured_metrics/plugins/statsd.py file as we have fixed hierarchy across our distributed infrastructure?
Anything I am missing?
What will I have to change in my statsd? I quote the carbon-tagger documentation- "aggregators like statsd will need proto2 support."

the structured metrics plugins will set the tags for proto1 ("old style") metrics, see https://github.com/vimeo/graph-explorer/wiki/Structured-Metrics
if you want to stick to proto1 you just have to create a plugin to tag your metrics see https://github.com/vimeo/graph-explorer/wiki/Structured-Metrics#writing-your-own-plugins and existing plugins for examples
you can basically ignore carbon-tagger if you want to stick with proto1, so 3 is not needed, but otherwise yes. the statsd plugin just converts statsd's internal metrics to proto2.

Related

Status of in-place `rfft` and `irfft` in Julia

So I'm doing some hobby-related stuff which involves taking Fourier transforms of large real arrays which barely fit in memory, and was curious to see if there was an in-place version of rfft and irfft that saved RAM, since RAM consumption is important to me. These transforms are possible despite the input-vs-output-type mismatch, and require an extra row of padding.
In Implement in-place rfft! and irfft!, Tim Holy said he was working on an in-place rfft! and irfft! that made use of a buffer-containing RCpair object, but then Steven Johnson said that he was implementing something equivalent using A_mul_B!(y, plan, x), which he elaborated on here.
Things get a little weird from then on. In the documentation for both 0.3.0 and 0.4.0 there is no mention of A_mul_B!, although A_mul_B is listed. But when I try entering them into Julia, I get
A_mul_B!
A_mul_B! (generic function with 28 methods)
A_mul_B
ERROR: A_mul_B not defined
which suggests that the situation is actually the opposite of what the documentation currently describes.
So since A_mul_B! seems to exist, but isn't documented anywhere, I tried to guess how to test it in-place as follows:
A = rand(Float32, 10, 10);
p = plan_rfft(A);
A_mul_B!(A,p,A)
which resulted in
ERROR: `A_mul_B!` has no method matching A_mul_B!(::Array{Float32,2}, ::Function, ::Array{Float32,2})
So...
Are in-place real FFTs still a work in progress? Or am I using A_mul_B! wrong?
Is there a mismatch between the 0.3.0 documentation and 0.3.0's function library?
That pull request from Steven Johnson is listed as open, not merged; that means the work hasn't been finished yet. The one from me is closed, but if you want the code you can grab it by clicking on the commits.
The docs indeed omit mention of A_mul_B!. A_mul_B is equivalent to A*B, and so isn't exported independently now. A_mul_B! would be used like this: instead of C = A*B, you could say A_mul_B!(C, A, B).
Can you please edit the docs to fix these issues? (You can edit files here in your webbrowser.)

What data format is this?

I was checking one share trading site's AJAX response and below is what it showed up in Firebug Response tab of XHR section. Can anyone explain me what format is this and how is it parsed ?
<ST=tat>
<SI=0>
<TB=txtSearch>
<560v=Tata Motors Ltdv=TATMOT>
<566v=Tata Steel Ltdv=TATSTE>
<3199v=Ashram Online.com Ltdv=ASHONL>
<4866v=Kreon Finnancial Services Ltdv=KREFIN>
<552v=Tata Chemicals Ltdv=TATCHE>
<554v=Tata Power Company Ltdv=TATPOW>
<2986v=Tata Metaliks Ltdv=TATMET>
<300v=Tata Sponge Iron Ltdv=TATSPO>
<121v=Tata Coffee Ltdv=TATCOF>
<2295v=Tata Communications Ltdv=TATCOM>
<0v=Time In Milli-Secondsv=0>
I think what we are dealing with here is some proprietary format, likely an Eldricht SGML Horror of some sort.
Banking in general has all sorts of Eldricht horrors running about.
On a related note, this is very much not XML.
Edit:
A quick analysis* indicates that this is a format consisting of a series of statements bracketed by <>; with the parts of the statements separated by = or v=. = seems to indicate a parameter to a control statement, indicated by a two-letter code. (<ST=tat>), while v= seems to indicate an assignment or coupling of some kind (short for "value"?), or perhaps just a field separator.
<ST appears to be short for "search term"; <TB appears to be short for "(source) table". The meaning of <SI eludes me. It is possible that <TB terminates the metadata section, but it's equally possible that the metadata section has a fixed number of terms.
As nothing refers to the number of fields in each statement in the data section, and they are all of the same length (3 fields), it is likely that the number of fields is fixed, but it might derive from the value of <TB, or even <SI, in some way.
What is abundantly clear, however, is that this data is not intended for consumption by other applications than the one that supplies it.
*Caveat: Without a much larger sample it's impossible to tell if this analysis is valid.
It is not a commonly used "web format".
It is probably a proprietary format used by that site and will be parsed by their custom JavaScript.

When could a CSV records *not* have the same number of fields?

I am storing a series of events to a CSV file, each event type comes with a different set of data.
To illustrate, say I have two events (there will be many more):
Running, which has a data set containing speed and incline.
Sleeping, which has a data set containing snores.
There are two options to store this data in CSV records:
Option A
Storing each possible item of data in it's own field...
speed, incline, snores
therefore...
15mph, 20%, ,
, , 12
16mph, 20%, ,
14mph, 20%, ,
Option B
Storing each event in its own record...
event, value1...
therefore...
running, 15mph, 20%
sleeping, 12
running, 16mph, 20%
running, 14mph, 20%
Without a specific CSV specification, the consensus seems to be:
Each record "should" contain the same number of comma-separated fields.
Context
There are a number of events which each have a large & different set of data values.
CSV data is to be of use to other developers (I will/could/should/won't use either structure).
The 'other developers' to be toward the novice end of the spectrum and/or using resource limited systems. CSV is accessible.
The CSV format is being provided non-exclusively as feature not requirement. Although, if said application is providing a CSV file it should be provided in the correct manner from now on.
Question
Would it be valid – in this case - to go with Option B?
Thoughts
Option B maintains a level of human readability, which is an advantage say CSV is read by human not processor. Neither method is more complex to parse using a custom parser, but will Option B void the usefulness of a CSV format with other libraries, frameworks, applications et al. With Option A future changes/versions to the data set of an individual event may break the CSV structure (zombie , , to maintain forwards compatibility); whereas Option B will fail gracefully.
edit
This may be aimed at students and frameworks like OpenFrameworks, Plask, Proccessing et al. where CSV is easier to implement.
Any "other frameworks, libraries and applications" I've ever used all handle CSV parsing differently, so trying to conform to one or many of these standards might over-complicate your end result. My recommendation would be to keep it simple and use what works for your specific task. If human readbility is a requirement, then CSV in the form of Option B would work fine. Otherwise, you may want to consider JSON or XML.
As you say there is no "CSV Standard" with regard to contents. The real answer depend on what you are doing and why. You mention "other frameworks, libraries and applications". The one thing I've learnt is "Dont over engineer". i.e. Don't write reams of code today on the assumption that you will plug it into some other framework tomorrow.
I'd say option B is fine, unless you have specific requirements to use other apps etc.
< edit >
Having re-read your context, I'd probably pick one output format and use it, and forget about having multiple formats:
Having multiple output formats is a source of inconsistency (e.g. bug in one format but not another).
Having multiple formats means more code that needs to be
tested
documented
supported
< /edit >
Is there any reason you can't use XML? Yes, it's slightly more difficult to parse, at least for novices, but if so they probably need the practice. File size would be much greater, of course, but it's compressible.

Tesseract-Job: how to parse an image in order to get the information out of it

good moring.
first of all. This is the most impressive community i ever saw!
Well several days i mused about the three-folded job of
a. getting
b. parsing
c. storing a number of pages.
Two days ago i thought that getting the pages would be the major-task. No this isnt the case - i guess that the parser-job would be a heroic task. Each of the pages that are intended to be parsed is a png-image.
So the question is - after getting all them. How to parse them!? This seems to be the issue. Guess that there are some perl-modules out there - that can help in doing this...
Well - i think that this job only can be done with some OCR embedded! Question: is there a perl-module that can be use here to support this task:
BTW: see the result-pages.
BTW;: and as i thought i can find all 790 resultpages within a certain range between
Id= 0 and Id= 100000 i thought, that i can go the way with a loop:
http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=949&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=20011&InterfaceLanguage=1&Type=Html
http://www.foundationfinder.ch/ShowDetails.php?Id=10579&InterfaceLanguage=1&Type=Html
i thought i can go the Perl-Way but i am not very very sure:
I was trying to use LWP::UserAgent on the same URLs [see below]
with different query arguments, and i am wondering if LWP::UserAgent provides a
way for us to loop through the query arguments? I am not sure that LWP::UserAgent has a method for us to do that. Well - i sometimes heard that it is easier to use Mechanize. But is it really easier!?
But - to be frank; The first task " GETTING all the pages is not very difficult - if we compare this task with the parsing... How can this be done!?
Any ideas - suggestions -
look forward to hear from you...
zero
You do not need a Perl module, you only need the system function.
system qw[ tesseract.exe foo.png foo.txt ];
my $text = read_file('foo.txt');
You may need to preprocess the images to help Tesseract, say using ImageMagick like:
system qw[ convert.exe -resize 200% image.jpg foo.png ];

Listing of All Mysql Data Types and Syntax For All Settings

I'm looking for a listing of all MySQL data types and the available settings for each option for each data type.
After a bit of googling I couldn't find anything quite like that.
here you can find a quick summary of mysql data types, with range, attributes and default value
For completeness' sake, don't forget the MySQL documentation.
Although the list is broken across multiple pages, often with a lot of commentary in between, it's a useful resource when you need to check some aspect of a particular type. There are also overviews of the basic types, but again, there's a lot of cruft mixed in with it.
if anyone ever needs them as a json array:
"[\"TINYINT[(M)]\", \"SMALLINT[(M)]\", \"MEDIUMINT[(M)]\", \"INT[(M)]\", \"BIGINT[(M)]\", \"FLOAT(p)\", \"FLOAT[(M,D)]\", \"DOUBLE[(M,D)]\", \"DECIMAL[(M,[D])]\", \"BIT[(M)]\", \"CHAR[(M)]\", \"VARCHAR(M)\", \"TINYTEXT\", \"TEXT\", \"MEDIUMTEXT\", \"LONGTEXT\", \"BINARY[(M)]\", \"VARBINARY(M)\", \"TINYBLOB\", \"BLOB\", \"MEDIUMBLOB\", \"LONGBLOB\", \"ENUM(\\\"A1\\\",\\\"A2\\\",...)\", \"SET(\\\"A1\\\",\\\"A2\\\",...)\", \"DATE\", \"DATETIME\", \"TIME\", \"TIMESTAMP\", \"YEAR\"]"