How to handle very large data with Tableau - mysql

All,
I am using Tableau 9.0 to do data analysis. My data set is very large containing 100 billion records.
I want to use filter to filter out the data firstly. But, when I try to add filter on the specific column of the data in Tableau, it keep running... for ever. The reason is tableau wants to display all this field value to me with ascending order, then allow me to make selection. e.g to select only one or two value to filter...
But it keeps running due to 100 billion records. How to solve this problem? Could I switch off this function (display all specific field value..)? How to filter so large data sets?
Thank you in advance

Pause Auto-Updates via the toolbar pause button before dragging a field to the filter shelf (or doing anything that you don't want to trigger a query refresh). Then either hit refresh or turn auto-updates back on when you want to run a query.
For discrete dimension filter, you can enter custom value lists to avoid querying to fill a list of items in the filter dialog.

you can improve your performance by considering the following tips-
Use custom sql query in tableau to filter the data source to get the smaller working data.(data filtered at backend will be added advantage)
Hide unwanted fields from the data source pane.
Publish your data set to tableau server and then connect the tde server extract to tableau.

I don't feel Tableau is the right tool for such a large data set. But check out this article on performance.
http://kb.tableau.com/articles/knowledgebase/database-query-performance

Related

How can I get Access to list available values in column filters when using a non-Access backend?

I'm using MS-Access (365) as a frontend to a Postgres table backend. Communications mech between them is ODBC. That seems to work fine.
Before I migrated away from the MA-Access backend, a list of check-able options would appear in a column's filter options when filtering column data. For example, when you clicked on the little carat next to the column name/header, you would see the "Sort A to Z" and "Sort Z to A" (as before) then for the filtering, you would see "Text Filters" as before (with options for specifying the filtering), but then you would also see the options, including "Select All" (a toggle) and then each value with a checkable box next to it. The user could select/deselect to filter based on the values. But I no longer see that list of checkable values in the filtering.
The column values are constrained to values in a different table which contains the valid values for that column (a traditional primary/foreign key relationship) and that works fine in the pulldown as a means for a user to pick one of the valid values when editing a record/column.
Since the values of the column are constrained in a predictable, or at least queryable way, I would think that there might be a way to use that to restore those checkable boxes in the filtering. I looked at the "Design View" of the table and tried a few things to see if I could get this to work. No luck.
Any ideas ?
You can try this setting:
File->options->Current Database-> Filter lookup options.
the real problem here is that you really should not be loading up forms with SO MANY records in the first place. But you can try the above settings - and of course increase the 1000 rows. But this is going to cause a performance issues, since this suggests that forms and view are loading large recordsets and THEN you are applying filters to that data. So keep this pulling of data in mind. It is preferable to provide some kind of search form, and let the user search BEFORE you pulled large amounts of data.
So all in all this is a less then ideal option from a performance point of view. But if data pulls are not too large, then this certainly is a "nice" feature - just not a great from a performance point of view.

SSRS - filter existing dataset

I have a report that uses parameters. The default parameters are defaulted to contain all available values, so by default the report the contains all possible data.
I want the user to then be able to deselect some of the values in the parameters, and to refresh the charts in the report, so they can drill down to the data that interests them.
But each time the report is refreshed, it runs the query again, slowing down the process.
Is there a way to allow the user to filter the data in the charts, without re-running the query?
I did find this, but it seems that he also didn't get a solution, or I didn't understand how the solution would work.
http://social.msdn.microsoft.com/Forums/en-US/0f905bdb-b8f2-4d9d-ac5b-e85d2f94f0cf/textbox-action-to-filter-existing-dataset-rather-than-rerun-query
To keep the query from running again, two high level steps must happen:
1) Make sure that your filters(parameters) are not included in the query. The query needs to be identical, no matter what the user has selected for a filter. This is done by moving the filters into the report. You can set them up as the filter on the tablix or on the row groups that are displaying the data.
2) Set up caching for the dataset. The easiest way to do this is by pulling the data set out of the report and create a "Shared Dataset." when you upload that to SSRS, define the dataset caching: maybe set it to last an hour. Connect the report to the shared dataset as well.
The full details of this can fill an article, such as http://www.mssqltips.com/sqlservertip/1919/how-to-enable-caching-in-sql-server-reporting-services-ssrs/ (for an old version of SSRS, but these concepts haven't changed much.)

Google Refine and fetching data from freebase for a large data set to create a column from URL not working

I have a google refine project with 36k rows of data. I would like to add another column with fetching json data from freebase url. I was able to get it working on a small dataset but when i ran it on this project it took few hours to process and then most of the results were blank. I did get some results with data though. Is there a way to limit on amount of rows the data will be fetched or a better way of getting the data from the url.
Thank You!
If you're adding data from Freebase, you'd probably be better off using the "Add column from Freebase" rather than "Add column by fetching URL."
Facets are one of the most powerful Google Refine features and they can be used to control all kinds of things. In this case, you could use a facet to select a subset of your data and only do the fetch on that subset (and then repeat with a different subset).
The next version of Refine will include better error reporting on the results of URL fetches to help debug problems like this, but make sure that you're respecting all the limits of the remote site as far as total number of requests, requests per second, etc.

Query, Display, and Filter Large Database Lists

I am trying to determine the best method of collecting a large list from a database and then displaying and filtering the results on the client side. Let me give a quick example:
Example: I've got a database with customer data and currently it contains around 2000 records. This number is constantly increasing. On my website I have a page that I want to be able to query said database based on information such as name, email, phone number etc. and of course display the results (when a user types in Smith it returns all records containing the name Smith). I am planning on using AJAX so that I can query the database and display the results on the fly similar to how google does it. When a user begins searching, results will start showing up on the page as they are found.
Possible Solutions:
Unfortunately I am stumped on how to go about implementing something like this. I am considering using a ValueList pattern. When the user first loads the page, should I be querying the database and storing every record in a collection and then searching that collection list and displaying the results on my jsp page? Essentially creating a java database. The thing I like about the ValueList pattern is that I take one huge hit on page load and dump the entire database in objects stored in a list. What if the database is larger though, say 2,000,000 records?
Or should I be using a simple DOA pattern without the ValueList and query the database for each individual search? This would result in a LOT of database queries, especially considering that I plan on returning results as the user types in the search box.
Edit: The more I think about this, the more it is an AJAX question. My biggest concern should be how to query my database while the user is typing. Do I set some sort of listener to listen for the user to stop typing and then perform the query?
I would use Solr for this type of task.
Fields, which you are going to use for searching should be indexed with Solr.
Then you do an ajax query to Solr and get the result. You can set the order, number of items per page and show results only for current page.
Solr has a lot of other features that can be useful for you.

Displaying search results dynamically as use interacts with controls

I have a website and want to display search results dynamically meaning that as the user interacts with controls and selects options, the search results are populated in realtime - i.e. the user doesnt need to click the search button.
The data is stored in a MySQL relational data base.
Now I know this is likely to lead to a large server load for a user-set above a certain size - are there anyways to mitigate this?
Max.
One way to mitigate the server load would be to introduce a slight timer delay before posting back to the server after each control is populated. If you give the user 3 seconds or so to input an additional field, the user may have time to add a search parameter. That could eliminate an extraneous query or two.
Also I always like to set a max numbers of results returned.