The problem of metrics/report generation - mysql

We store a lot of metrics from our service (approx 80 million events).
We have to generate reports based on the data.
My question is rather general,
What tools do use for your metrics/reporting need?
Is there anything recommended?
We use Apache to write the log files, back-process to add it into the DB, and a daily MySql script to generate reports.
Many thanks,

SenSage. Expensive and worth it.

It really depends on the type of reports you are generating.
If you are handling things at the level of mean/count/sum/deviance/min/max of a population, there's a variety of tools that can handle what you want. Open source tools like Graphite/Statsd (using Statsd can also give you things like the quantile of a population) work well. There are commercial tools like my company's Instrumental product which provide some of the same features, as well as niceties like a good UI and well documented API.
Depending on the nature of your data, there are other services that offer more specific report types like cohort analysis (see Mixpanel ). Services like Splunk can let you create some great ad hoc reports of live data as well, generated from your application logs and other data sinks.

Related

Amazon MWS and Microsoft Access for a DB Layman?

I have some experience in MS Access, but mostly only as an offline DB tool.
I have begun working with both Seller and Vendor Central at my new company, and am in charge of scrubbing the vast amount of data for trends and whatnot. At the moment our company is solely relying on exporting reports from Seller Central directly, and cross referencing documents. I was hoping to get us started with a rudimentary database hooked into Seller directly. Our company already has a MWS Developer ID, and I see an MWS Access Key and whatnot.
I'm surprised to not finding any resources as to how I should actually connect MWS to Access. I feel confident that I can find some success by dabbling with the API once I get it connected, but I can't actually find any references on how to actually establish that connection.
Any resources you guys can forward me? Maybe I'm searching for the wrong terms. Everything I search just comes up with data service companies advertising their tools.
Well, the interface to AWS is going to be web service based. And access unfortantly does not have a built in web services interface.
So, your choices are:
Write some VBA code to hit/use/consume AWS web services. Web services are just that -a web API. (likly REST services. REST is just a fancy term that you have to type in a given URL.
So, what you looking to search for?
How can I consume web based data in Access.
Say this answer on SO
Making a SOAP request from Access 2007
The main issue is that Access does not have really good tools for consuming web data.
However, most web front "store" applcations tend to have a user area in which you can export the daily sales or data say to csv. You now can import that data into Access (or Excel).
And they often have a report area - you can generate a report, and then download again in some format like xml or csv (and again, import into Access or Excel).
If you don't want to have to maually import the data?
Then you have to code out web requests. And that can be painful.
This unfortantly means you can use say a linked table (ODBC) like you can for Acces say to some database.
So, you can start to write web interface code (it will be SOAP or REST.
Believe it or not, there was a SOAP add-in tool kit for Access 2003. But, no one used it, so they dropped it. (of course now 17 years later -gee, a truckload of people GET IT - and now see the need to consume web data!
So, you question and what to learn about?
You asking how does one consume web services.
Well, using a tool designed to work with web services helps a lot. (that's why I suggest Visual Studio and .net). If they have a WSDL for you? Then you can point Visual Studio at the web (WSDL), and it will crank out a set of "methods" and properites for you. (it will create a class. But then again, did you use and write class objects in VBA? (it does support you creating classes. But the SOAP tool kit (no longer avaiable) would write this code for you!
So, if you want to go beyond their built-in repoting tools (that let you export + download the data in some format like csv for use with Access or Excel)?
Then you have to write writing code to make web calls.
This is not a lot different in the past. If you wanted some data from the accounting system? Well, you can/could/usually do some export with the accounting package to spit out a csv file of some sort. You then import into Access.
However, if you had better skills, you might link up to the database from Access, using ODBC and then write some SQL queries against that data. So, it really comes down to skill level here. Some could not be bothered to learn say SQL and a query. So, they just export the data out of accounting, and then import into access.
The problem is now you can't link to that web site, and use SQL queries of data. You have to use web service calls. (at least if you want to make some of this process automatic).
So, you might be just fine by exporting data/files from the AWS services, and then just import into Excel or Access. As such, you not writing any code, and you just use the Access GUI to import data.
But, some want to just hit a button in Access, and see all the orders and sales from today - and have Access pull that data from the web site with one click.
For some simple data pulls? You could make a web call from Access. But for complex web interfaces? Then you need to use tools that support web interfacing (say like Visual Studio .net).
For a simple data pull? I'll use VBA and MSXML.
But, if the parameters and data call is complex? Then I write it in .net, and THEN expose that code as a consuming library to MS-Access.
So, once you signed up for AWS and what ever web services? Then they will supply you with the web calls, and documentation. You then are free to use your programming tools of choice to interface. But, this can be quite a bit of work. So, you might use VBA, but .net is much better for this type of work. (and it also a lot more difficult to code out).
As a developer who has done this, I would write a "sync" program that connects to MWS, pulls back your data, and then inserts that into MS Access. In my case, it was a C# .NET Core app with SQL Server and I used the available MWS SDK that Amazon provides for free to handle all the API calls to MWS. You can create a schedule so your app pulls the data on an interval, or make it manual where you push a button to sync it into your system.
Of course you can use Java or PHP instead of C#, or you can roll your own MWS API calls. Or like you mention there are several third party vendors that have out-of-the-box ready solutions.
I haven't used MS Access in 20 years or so, so I'm not sure about calling MWS directly. I would gather it could be done, but is probably too much work, but I could be wrong. A .NET app can insert into MS Access, no problem, but also handle the HTTP calls to MWS for you.

Recommendation for BigQuery Reporting/BI Tool

I work for web hosting company looking to integrate different data sources with BigQuery but the question now is what would be an ideal reporting/BI tool to get the data from BigQuery so proper/fast/easy retrieval/analysis/ reporting can be done with it.
I'm looking into the options suggested by google here: https://cloud.google.com/bigquery/partners/ but I was wondering if someone out there has possibly a more hands-on experience that could make a recommendation.
the company works with a mysql based billing system (with client, support, service data) which is the main source of info, along with other chat, cms and inhouse-developed systems that provide other sources of information that allow to maintain the web infrastructure where the business depends on.
Thank you.
It's really hard to answer this. Depends on the personnel you have at hand.
We are doing for idea validation mostly Data Studio.
Some personnel knows Tableau, but once you are out from GCP, all become a slow process, queries and interface updates in 30-60 seconds, as they all relay and store on their own the data.
We have wired some data to ElasticSearch as well, and we use Kibana.
But once it's all validated, we consolidated into our own Dashboards the reports. Mainly because we are mostly developers and can do the programming. If you have a data analyist or data scientist with their own tools, let them use what they are comfortable with.
Always do iteration and versioning, you as a developer should be driven by a good product manager who tells exactly what charts to build out.

Is Alteryx an ETL tool? How it differs from SSIS? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
My client want me to implement ETL process using Alteryx as they have a license of it. I am confused whether the Alteryx is an ETL tool or not. I believe that Alteryx is commonly used to prepare data for Tableau data visualization tool.
Please advise whether its an ETL tool or not? How it differs from SSIS?
Thanks,
Alteryx is a data preparation / advanced anaytics application. People use it in many different ways due to the fact it allows data preparation, spatial analytics and predictive.
I work with many clients who choose to use Alteryx purely for its ETL capabilities moving data from one database to another, e.g. I have worked with one client who has used Alteryx to automate their loads into their Amazon Redshift database from MySQL, another who is using SQL -> Tableau data engine, and many other examples involving a range of data inputs (Alteryx supports everything from custom APIs -> Excel).
If you're already working with SSIS then you'll find Alteryx a breathe of fresh air to be honest, I was working with SSIS in a past life and have since found Alteryx to be much faster to develop with. It is more forgiving to changes to data and allows tighter integration of many different data sources. The new in-database tools give a much tighter integration with SQL as was previously possible allowing the work to be done inside the database.
Finally, compared to SSIS, I think you'll find Alteryx very simple to learn. The online training videos on their site will give you as much introduction as you need.
Enjoy, I think you'll enjoy the experience.
Chris
Alteryx can be used for ETL as long as you have an Alteryx Server. I've used it for a number of use-cases especially between cloud & database.
Some things that in my personal opinion make it clearly superior to SSIS:
If input has column names (from database or from csv file with headers), it handles unexpected new columns or column order changes automatically, without requiring you to change the flows at all.
You can build flows as "macros" which you can then unit test completely independently of your source/destination databases (try that in SSIS..)
Ability to drop a browse tool anywhere in the flow and effectively debug.
Build in assertions using "Test" tools.
Flows are runnable from the commandline on a server, and easiest way I've found (besides using Alteryx's own scheduler) is to save as an "App", and then run from the command line using the Alteryx engine executable, passing it parameters via xml file. You can save a sample xml parameter file from your flow by hitting the magic wand button (after saving the flow as a .yxwz (app)) This brings up a panel that lets you set the variables, and that panel has a handy "save" button which generates an xml file in the right format.
Within the flows themselves, parameterise things like environment settings either via action tools or module level parameters (User.*) - you can then for example set a database server on an input using %User.[Your variable name]% in the field.
Error logs are generally excellent (identify the tool that failed, useful error messages), and command line throws useful errorlevel numbers, so pretty trivial to schedule with some third party scheduler (or just use the Alteryx Server's own scheduler).
Obviously if you need to do any serious data manipulation, pivoting etc, then it's hands down the easiest tool I've used.
Yes, Alteryx is a ETL and data wrangling tool but it does a lot more than pure ETL. Alteryx wraps up pre-baked connectivity (Experian / Tableau etc) options alongside a host of embedded features (like data mining, geospatial, data cleansing) to provide a suite of tools within one product.
If all you are looking for is basic a->b ETL mapping, and you dont have a need for the additional features that Alteryx has, a cheaper product like SSIS would tend be more than sufficient.
Alteryx is a data mining workbench, and ETL is often a big part of the data mining process. Alteryx has plenty of ETL tools/capabilities, and much more too. I haven't used SSIS in ages, certainly not since acquiring Alteryx.
Cate
Alteryx has three basic capabilities ETL , Advance Analytics and Reporting.
Best part that I like is advance analytics but ETL is also there . So, I consider it a complete Analytics tool that starts from ETL up to reporting. I used to connect it with data that is stored in magnetic tapes.

Can Tableau be used in customer-facing and SaaS web applications?

I was hoping someone could help me answer a couple of questions regarding Tableau. I am not as familiar with the platform, but I have a client who is looking for a reporting/analytics/data visualization platform that they could use for many of the internal apps (for their employees) and external (customer facing internet with login) applications.
The driver is that each of their internal teams has used many disparate technologies such as SSRS, Crystal, custom ASP.NET controls (Kendo/Telerik, etc), but now they have the opportunity to choose a common platform that could serve most/all of the future reporting and data visualization needs for enterprise and customer facing solutions.
They are looking for a platform that provides everything from simple grids with basic filter/sort/group, all the way to rich charting and ad-hoc reporting with slicing and dicing of data.
They will not always be creating dashboards in these apps since they are customer-facing, but they may want to have dashboards for internal (intranet) apps. They will definitely want the ability to build true internal BI dashboards to report on data from all these online apps across all customers, to whom they provide their SaaS/customer-facing web apps.
One of our main concerns revolves around security of data, as some of these customer-facing web apps are multi-tenant, so we'd need to ensure that data is always filtered by the client tenant id. Also we have a very customized security model, with data driven roles, permissions that may prevent showing certain types of data (e.g. SSN, Salary, etc) etc.
Does Tableau fits this model, can it meet most/all of these requirements, or is it meant more for internal data?
It should be quite possible by setting up a reverse proxy that would front end your multi tenant web application. There is a document on how to setup Apache as reverse proxy with Tableau with/without SSL.
I am familiar with how to configure Apache as reverse proxy and so here are the details with Apache Web server on how to setup reverse proxy rules.
There may be some documentation for front ending with IIS/Nginx so you should do some googling by yourself.
You need to harden your webserver configuration by limiting access from the external firewall to read only pages and the internal user can access allpages. Since you mentioned that the external users are allowed access to readonly pages, I presume all the requests from external requests will be only GET requests and a few PUT/POST requests when users choose to use filters. So you can block external users from any request except GET. Exceptions should be made for the pages that allow applying filters and grouping.
In your mutitenant application make sure you refer to the tableau URL's by the apache server url that is exposed to the outside world. If any url not configured in apache is used, users will recieve a access denied error. You need to create a role that has readonly access to tableau pages for external users. To address mulitenancy you need to set a cookie or something to identify the tenant and something similar to identify the user. To filter SSN and some more information you can use mod_proxy_html which filters content. You can also use mod_security module of Apache to block SSNs and Credit Card Numbers.
References:
Configuring Apache Server as Proxy with Tableau
Apache mod Proxy documentation
Blocking POST requests
mod_security FAQs
Yes to most of your questions -- with just a little fine print.
First remember Tableau is primarily about visualizing data, so it is great for publishing readonly interactive views of data. If you want allow end users to edit data, you'll have to do that by another means. Fortunately, the Tableau JavaScript API lets you interact closely with Tableau with your custom Javascript code. So if your needs are mostly about visualization, but want want to be able to trigger some custom code to modify data in some of your apps, you should be fine. But Tableau is not designed for creating custom CRUD apps as a rule.
The great thing about Tableau server is that many people can learn to use it and publish their own visualizations -- even if they don't know how to program. That doesn't mean they will win visualization design awards the first time, or that they shouldn't learn something about how databases work if they want have good performance. But it does mean the people that know their data best can learn to design and publish their own visualizations without having to wait three months on a backlog queue so the one IT guy can change the color of a button or add a field. It still would be good to get good system, database and visualization folks to help train, organize data, set governance and security rules, optimize, etc, but business users can learn to be the ones with hands on control over how their information is presented. That's a good thing.
The security question has several moving parts, and usually there are usually good answers from Tableau depending on what you're trying to accomplish. Tableau server does support multi-tenancy using sites. There is fairly flexible permissions and group policy system. It can use SAML for authentication, and has several features providing access to specific to the user/tenant. It works with almost every database, and you can in some cases push your security enforcement to the database server -- SQL server for instance. There is a trusted ticket feature where you can defer some authorization decisions to another server, say a web portal server. Useful when Tableau visualizations are embedded in some other web page.
Most security use cases can be supported out of the box, but there are some complex custom access control situations that are tricky to implement currently in Tableau server. Nothing you've listed sounds out of the normal swim lane, but the only way to know whether your security model is too complex is to dive into the details. Hopefully they will release a custom access control API for users who want to extend it.
At the high level, you sure can use Tableau to build customer-facing dashboards. You can quickly build and deploy those and as others mentioned, you can iFrame them with Javascript APIs, you can customize most of it. But it doesn't provide complete flexibility for user interaction, which you can if you use other technologies. Other options include hand coding framework and then using charting applications.
For simple dashboards, Tableau would be the obvious choice if you have already bought core-licenses. But when looking at what's going on in the industry, Tableau will not be able to fulfill all needs.
If using Tableau
1. Building Charts/Tables/Visualization is a super simple, efficient way.
2. You can expose low grained data to customers, because of Tableau's propitiatory columnar database engine, you can potentially expose millions of records via a dashboard.
3. You can use Tableau's security and access control mechanism.
4. As other user mentioned, you can use trusted ticketing mechanism to integrate easily with other applications (portals etc).
Challenges with Tableau approach.
1. If you have late arriving transactions (in Internet world it's so common to mark a click as fraudulent after few days) with late arriving transactions, you have to have full refresh the extracts, which means if you are showing say 13 months worth of data, you have refresh it all, all the time. Now with bigData, business needs all data all the time, which means you would end up extracting millions of records, throughout the day.
2. Very little flexibility in user interactions, like menus,drop downs etc. you have to work with what's been provided by Tableau.
3. If you have multiple charts on same dashboard page, not so user friendly way to download underlying data.
4. Many other challenges, in laying out visualizations on dashboard page, as there is no easy way to control canvas with pixel control, white spaces etc.
You should be very careful, after analyzing your use case, whether Tableau would be the right product before you invest in it.
Tableau's primary power comes from its desktop tool for data visualization/exploration and not from pre-built dashboards.
Best of luck.
Since Tableau public is also based on Tableau, I assume that you can put your dashboards in public using your own Tableau infrastructure.

Comparison of reporting engines for Force.com

I'm investigating options for reporting on data in a custom salesforce application, since the built-in reporting tool is a bad joke.
The requirements are that the data needs to be accessible on-demand through the Salesforce website (likely through a web-tab, visualforce page, etc.), and must be able to do arbitrary joins of the tables, like ANY other relational database reporting tool. It is a huge plus to be able to give much of the specific report-design power to the end user, as well. Ideally it would play well with Oracle if an external DBMS is required, though this is not a strict requirement.
I hear good things about MS SQL Reporting Services, and there has been some talk around here about Crystal Reports. I'd be much obliged to get any thoughts and opinion on the various options and approaches out there.
It may be worth looking at tools similar to Teiid. What this does is provides a standard sql jdbc interface to any data source - including salesforce. With that in mind, that means you can then use any reporting tool. It also allows you to join across data sources etc.
I'm glad you call the current salesforce tool a joke! :)
As for reporting, we use Pentaho from the open source world, which is a very powerful tool, but does take some learning. Of course, the final decision wont just come down to functionality, but cost too, and this is where Pentaho is likely to win hands down. Pentaho plays very well with Oracle, and also MySQL too. (And many more dbs)
Finally you probably want to nail down your requirements a bit more. Do you need plain reporting, dashboards, more advanced analysis? Data mining? How far do you need to go..