Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
My client want me to implement ETL process using Alteryx as they have a license of it. I am confused whether the Alteryx is an ETL tool or not. I believe that Alteryx is commonly used to prepare data for Tableau data visualization tool.
Please advise whether its an ETL tool or not? How it differs from SSIS?
Thanks,
Alteryx is a data preparation / advanced anaytics application. People use it in many different ways due to the fact it allows data preparation, spatial analytics and predictive.
I work with many clients who choose to use Alteryx purely for its ETL capabilities moving data from one database to another, e.g. I have worked with one client who has used Alteryx to automate their loads into their Amazon Redshift database from MySQL, another who is using SQL -> Tableau data engine, and many other examples involving a range of data inputs (Alteryx supports everything from custom APIs -> Excel).
If you're already working with SSIS then you'll find Alteryx a breathe of fresh air to be honest, I was working with SSIS in a past life and have since found Alteryx to be much faster to develop with. It is more forgiving to changes to data and allows tighter integration of many different data sources. The new in-database tools give a much tighter integration with SQL as was previously possible allowing the work to be done inside the database.
Finally, compared to SSIS, I think you'll find Alteryx very simple to learn. The online training videos on their site will give you as much introduction as you need.
Enjoy, I think you'll enjoy the experience.
Chris
Alteryx can be used for ETL as long as you have an Alteryx Server. I've used it for a number of use-cases especially between cloud & database.
Some things that in my personal opinion make it clearly superior to SSIS:
If input has column names (from database or from csv file with headers), it handles unexpected new columns or column order changes automatically, without requiring you to change the flows at all.
You can build flows as "macros" which you can then unit test completely independently of your source/destination databases (try that in SSIS..)
Ability to drop a browse tool anywhere in the flow and effectively debug.
Build in assertions using "Test" tools.
Flows are runnable from the commandline on a server, and easiest way I've found (besides using Alteryx's own scheduler) is to save as an "App", and then run from the command line using the Alteryx engine executable, passing it parameters via xml file. You can save a sample xml parameter file from your flow by hitting the magic wand button (after saving the flow as a .yxwz (app)) This brings up a panel that lets you set the variables, and that panel has a handy "save" button which generates an xml file in the right format.
Within the flows themselves, parameterise things like environment settings either via action tools or module level parameters (User.*) - you can then for example set a database server on an input using %User.[Your variable name]% in the field.
Error logs are generally excellent (identify the tool that failed, useful error messages), and command line throws useful errorlevel numbers, so pretty trivial to schedule with some third party scheduler (or just use the Alteryx Server's own scheduler).
Obviously if you need to do any serious data manipulation, pivoting etc, then it's hands down the easiest tool I've used.
Yes, Alteryx is a ETL and data wrangling tool but it does a lot more than pure ETL. Alteryx wraps up pre-baked connectivity (Experian / Tableau etc) options alongside a host of embedded features (like data mining, geospatial, data cleansing) to provide a suite of tools within one product.
If all you are looking for is basic a->b ETL mapping, and you dont have a need for the additional features that Alteryx has, a cheaper product like SSIS would tend be more than sufficient.
Alteryx is a data mining workbench, and ETL is often a big part of the data mining process. Alteryx has plenty of ETL tools/capabilities, and much more too. I haven't used SSIS in ages, certainly not since acquiring Alteryx.
Cate
Alteryx has three basic capabilities ETL , Advance Analytics and Reporting.
Best part that I like is advance analytics but ETL is also there . So, I consider it a complete Analytics tool that starts from ETL up to reporting. I used to connect it with data that is stored in magnetic tapes.
Related
I work for web hosting company looking to integrate different data sources with BigQuery but the question now is what would be an ideal reporting/BI tool to get the data from BigQuery so proper/fast/easy retrieval/analysis/ reporting can be done with it.
I'm looking into the options suggested by google here: https://cloud.google.com/bigquery/partners/ but I was wondering if someone out there has possibly a more hands-on experience that could make a recommendation.
the company works with a mysql based billing system (with client, support, service data) which is the main source of info, along with other chat, cms and inhouse-developed systems that provide other sources of information that allow to maintain the web infrastructure where the business depends on.
Thank you.
It's really hard to answer this. Depends on the personnel you have at hand.
We are doing for idea validation mostly Data Studio.
Some personnel knows Tableau, but once you are out from GCP, all become a slow process, queries and interface updates in 30-60 seconds, as they all relay and store on their own the data.
We have wired some data to ElasticSearch as well, and we use Kibana.
But once it's all validated, we consolidated into our own Dashboards the reports. Mainly because we are mostly developers and can do the programming. If you have a data analyist or data scientist with their own tools, let them use what they are comfortable with.
Always do iteration and versioning, you as a developer should be driven by a good product manager who tells exactly what charts to build out.
As a Front end developer I have less knowledge on Databases. But recently we started to develop an CRM application.
My question is, how feasible to migrate from one database to other. Lets say our application now supports mysql but later customer comes up with IBM's DB2 or sql lite. What are the things that we need to take care while developing to support easy migration ?
How cloud will help to solve my problem?
Just keep your data model separate from actual database calls and you should be good. Use a database abstraction layer in your model to make calls to the database. You'll only have to change the bottom layer for specific databases.
Some best practices:
Avoid DBMS specific features, data types and SQL/DDL constructs; keep to the SQL[92] standard. Test against e. g. SQLite, which keeps rather close to the standard.
Use an Entity Relationship Modeling tool that supports exporting DDL files for all targeted DBMS, or to standard SQL. Or write and maintain your DDL scripts by hand. Vendor specific tools usually don't do this.
Use the existing SQL abstraction layer that comes with your language/toolkit/environment, or implement one with an eye on portability (which reinvents the wheel another time).
Keep the logic in your application; the DB is for data only. Avoid triggers, stored procedures etc.
Generally apply the KISS principle to your data storage.
You may get more help on specific questions about general/abstract issues (not the implementation details, which belong here) over at Programmers.
I am currently doing a project in Ruby on Rails and I have been presented with a dilemma.
The dilemma is that the users of my system will be uploading an excel spreadsheet. The issue is should I just read straight from this excel spreadsheet into my front-end or should I load this spreadsheet into my MySQL database and then to my front-end.
I have asked numerous people about this issue and have researched on-line to no avail.
Any help would be much appreciated.
The Excel file is not a database. If you need to allow it as source input, parse it, copy the data into a real database and connect to it.
The database is more flexible and efficient for querying and processing information.
I can think of two benefits, or rather options, of having them upload the excel spreadsheet for processing by your back end.
1) would be for your tracking purposes (who sent what and here is what the back-end did with it...). In fact consider that other formats/versions could be introduced, would it be important to keep them to identify what went wrong? "How can we handle this new format"?
2) On the other side, the front-end way that is, you offload processing from the back-end, but that means that the browser app could get fairly complex and depending on your excel, that is if it has many relationships, sending that data up to the server could be complex. However if is simply a flat spreadsheet, say simple rows without totals/tax calc/..., then it might be an advantage of loading it into the browser and then sending these rows up to the server if offloading processing is of any importance.
However point 2 really is diluted by point 1, which to me would be of greater importance for future migration of this service. So I personally would choose uploading it and processing on the back end.
Update
As you clarified in the comments, if you are asking about the use of Excel on the backend as a database? I would agree with Simone Carletti's answer here. Maybe just add a real database gives you much more flexibility, more tools and, more performance. This difference is loading a file, parsing it into some structure, then saving it (unless you are using some .NET framework and even if, the Database (MySQL, MongoDB...) would give you much more flexibility in structuring and querying, over the headache of managing with the speed of DB connections. You might just want to write a sample in both to evaluate, the DB solution will probably win you over.
I'm investigating options for reporting on data in a custom salesforce application, since the built-in reporting tool is a bad joke.
The requirements are that the data needs to be accessible on-demand through the Salesforce website (likely through a web-tab, visualforce page, etc.), and must be able to do arbitrary joins of the tables, like ANY other relational database reporting tool. It is a huge plus to be able to give much of the specific report-design power to the end user, as well. Ideally it would play well with Oracle if an external DBMS is required, though this is not a strict requirement.
I hear good things about MS SQL Reporting Services, and there has been some talk around here about Crystal Reports. I'd be much obliged to get any thoughts and opinion on the various options and approaches out there.
It may be worth looking at tools similar to Teiid. What this does is provides a standard sql jdbc interface to any data source - including salesforce. With that in mind, that means you can then use any reporting tool. It also allows you to join across data sources etc.
I'm glad you call the current salesforce tool a joke! :)
As for reporting, we use Pentaho from the open source world, which is a very powerful tool, but does take some learning. Of course, the final decision wont just come down to functionality, but cost too, and this is where Pentaho is likely to win hands down. Pentaho plays very well with Oracle, and also MySQL too. (And many more dbs)
Finally you probably want to nail down your requirements a bit more. Do you need plain reporting, dashboards, more advanced analysis? Data mining? How far do you need to go..
After looking at a lot of questions..i found no real answer for this.
I redisigned an Database for our customer.
With Microsoft Access i found a good Tool to get old table Data in my new well formed Database Structure. It is really easy but takes a lot of time (cause handling old Data with a lot of care).
Are there any Open Source Tools that bring that facilities like Microsoft Access?
To clear it up: I "just" want to reorder old Firebird Database Data in a new "best-practise" Way.
Edit:
I would be really nice if i can get a Log File or something similar to have some documentation on the changes.
Update:
After checking some of the Tools of that Wikipedia Site. I found no real Logging Mechanism.
How do you documentate the changes on a Database? Simply by writing it down?
Result:
So i dont got an real answer...i ma still searching for an nice tool. thnak you guys for the hints and your thoughts regarding this question. I want to reward Kenneth Cochran with the Bounty cause he pointed me to ETL. Thank you!
Talend's Open Source ETL supports FireBird. Very cool tool.
http://www.talend.com/download.php?src=DataGovernanceBlog
It sounds like what you're asking for is an ETL(extract, transform, load) tool.
Wikipedia has a list of open source tools that may help with this. I've not used any of them personally.
Well, I used the Pentaho suite for doing ETL using their Kettle tool.
It's quite easy to use and should be more than enough to reach your intent.
And it's open source.
Give a look at it.
I advice you to use a tool like IBExpert or Database Workbench which are the best tools for Firebird.
For migrating Firebird 1.5 to Firebird 2.1 : you just have to make a backup of your database with Firebird 1.5 server and restore your database with Firebird 2.1 server
I've used Excel in the past to document data model changes - each worksheet used the application version in order to sync with our tags in CVS. Every thing was logged in it - columns that were removed as well as minor alterations to datatypes like varchar(10) to varchar(20) etc along with a note describing why the change was made.
Personally, I've only ever scripted things like these as DDL/DML scripts broken into a script that dealt with table creation, constraint dropping, index drops, DML script(s), constraint application, index application, and removing orphaned tables.
If you want a basic ETL tool, that is client based (and cheap at $300), look at Advanced Query Tool. It mainly queries any type of ODBC connection(including Excel files set up that way), but also has some extended features, including moving data. And has a command line interface. http://www.querytool.com/
I've used it instead of Informatica for one-off jobs, but I've also used to extract from Excel to another file for business users, for a few months, scheduled from my desktop.