Tableau Deep Dive: Improving Performance with Data Extracts – Part 1

Readers:

It has been a while since I posted a new Tableau Deep Dive. I have to really admire the folks that blog on various Tableau topics every week. Researching, writing, and validating these technical blogs posts takes a lot of time and I wish I could devote more time to posting these.

This week, I will have a multi-part series discussing Tableau Data Extracts. Data Extracts are saved subsets of a data that you can use to improve the performance or to take advantage of Tableau functionality not available or supported in your original data. [2]

In Part 1, I will provide an overview of Tableau Extracts, the characteristics of an extract, the underlying server architecture, benefits of Tableau Extracts, and when not to use a Tableau Extract.

I hope you enjoy this series about data extracts in Tableau.

Best regards,

Michael

Data Extracts in Tableau

Video Book Cover PageMuch of the content, datasets,  and Tableau Workbooks I am using for this blog post series comes from Tim Messar’s video book, Mastering Tableau 10. I cite Tim as the primary source for this blog post in the “sources” section at the end of this post (see video book cover, right).

Most of the introductory portions of Part 1 come from Joshua Milligan’s excellent book, Learning Tableau 10 – Second Edition,

Tableau Extracts are saved subsets of data that you can use to improve performance or to take advantage of Tableau functionality not available or supported in your original data. When you extract your data to create an extract, you can reduce the total amount of data by using filters and defining other limits.

After you create an extract you can refresh it with data from the original data. When refreshing the data, you have the option to either do a full refresh, which replaces all of the extract contents, or you can an incremental refresh, which only adds rows that are new since the previous refresh.

Extracts can do the following:

Improve performance. For file-based data sources such as Excel or Access, a full extract takes advantage of the Tableau data engine. For large data sources, a filtered extract can limit the load on the server when you only need a subset of data.

Take advantage of Tableau functionality that is not available in the original data source, such as the ability to compute Count Distinct.

Provide offline access to your data. If your data source is not available (for example, because you are traveling), you can extract the data to a local data source.

A columnar database is a database management system (DBMS) that stores data in columns instead of rows. The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query. [5]

 

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

The process of setting up a materialized view is sometimes called materialization. This is a form of caching the results of a query, similar to memoization of the value of a function in functional languages, and it is sometimes described as a form of precomputation. As with other forms of precomputation, database users typically use materialized views for performance reasons, i.e. as a form of optimization. [6]

Performance [4]

Extracts are very efficient. In fact, extracts are generally faster than most live connections, except for a few extremely efficient columnar databases. This is a result of several factors, such as:

  • Extracts are columnar and are extremely efficient for executing the query.
  • Extracts are structured so that they can be loaded quickly into memory without additional processing and moved between memory and disk storage so the size is not limited to the amount of RAM available.
  • Many calculated fields are materialized in the extract. The pre-calculated value stored in the extract can often be read faster than executing the calculation every time the query is executed.

You may choose to use the extracts to increase performance over traditional databases. To maximize your performance gain, consider the following:

  • Prior to creating the extract, hide unused fields. If you have created all desired visualizations, you can click on the Hide Unused Fields button on the extract dialog to hide all the fields that are not used in any view or calculation.
  • If possible, use a subset of data from the original source. For example, if you have historical data for the last 10 years, but only need the last two years for analysis, then filter the extract by the date field.
  • Optimize an extract after creating or editing calculated fields, or deleting or hiding fields.
  • Store extracts on solid state disks or drives that are defragmented regularly.

Portability and Security [4][7]

Let’s say that your data is hosted on an SQL Server accessible only from inside your office network. Normally, you’d have to be onsite or using a VPN to work with the data. With an extract, you can take the data with you and work offline.

A Tableau Data Extract (.tde) file contains data extracted from the source. When you save a workbook, you may save it as a Tableau Workbook (.twb) file or a Tableau Packaged Workbook (.twbx) file. A workbook (.twb) contains definitions for all the connections, fields, visualizations, and dashboards but does not contain any data or external files, such as images. When you save a packaged workbook (.twbx), any extracts and external files are packaged together in a single file with the workbook.

Below is a diagram I created for some internal Tableau training I did at work to illustrate Tableau Desktop File Types.

Tableau Desktop File Types

A packaged workbook file (.twbxusing extracts can be opened with Tableau Desktop or Tableau Reader, and published to Tableau Public or Tableau Online.

Server Architecture [3]

NOTE: This architecture is based on the current TDE engine for Tableau. I will discuss, in a future post, how this will work under the new v10.5 engine, Hyper.Tableau Server - Data Extract 2

TIP

A packaged workbook file (.twbx) is really just a compressed zip file. If you rename the extension from .twbx to .zip, you can access the contents as you would any other zip file. You may also consider associating the .twbx extension with your zip utility so you won’t have to rename the files.

There are a couple of security considerations to keep in mind when using an extract:

  • The extract is made using a single set of credentials. Any security layers that limit what data can be accessed according to the credentials used will not be effective after the extract is created. An extract does not require a username or password. All data in an extract can be read by anyone.
  • Any data for visible (non-hidden) fields contained in an extract file (.tde) or in an extract contained a packaged workbook (.twbx) can be accessed even if the data is not shown in the visualization. Be very careful when distributing extracts or packaged workbooks containing sensitive or proprietary data.

In his book, Learning Tableau 10 – Second Edition, Joshua Milligan notes a story of an employee who sent a packaged workbook containing HR data to others in the company. Even though none of his dashboards displayed sensitive data, the extract contained it. It wasn’t long before everyone in the company knew everyone else’s salary and the original individual was no longer an employee.

When to use an extract [4]

You should consider various factors when determining whether or not to use an extract. In some cases, you won’t have an option (for example, OLAP requires a live connection and some cloud-based data sources require an extract). In other cases, you’ll want to evaluate your options.

In general, use an extract when:

  • You need a better performance than you can get with the live connection.
  • You need the data to be portable.
  • You are using legacy (JET driver) connections to Excel, Access, or text files and you need to use functions, such as COUNTD (count distinct), that are not supported by the JET driver but are supported when the data is extracted.
  • You want to share a packaged workbook. This is especially true if you want to share a packaged workbook with someone who uses the free Tableau Reader, which can only read packaged workbooks with extracts.

In general, do not use an extract when:

  • You have sensitive data. However, you may hide sensitive fields prior to creating the extract, in which case they are no longer part of the extract.
  • You need to manage security based on the login credentials. (However, if you are using Tableau Server, you may still use extracted connections hosted on Tableau Server that are secured by login.
  • You need to see changes in the source data updated in real-time.
  • The volume of data makes the time required to build the extract impractical. The number of records that can be extracted in a reasonable amount of time will depend on factors, such as the data types of fields, the number of fields, the speed of the data source, and the network bandwidth.

Next: Creating a Tableau Data Extract

Source(s):

[1] I will rely heavily on the fantastic Tableau video book written by Tim Messar titled Mastering Tableau 10 (see cover image below). Click here to purchase your own copy of this book.

Video Book Cover Page

[2] Tableau Software, Extract Your Data, Tableau Software, Online Help, All Tableau Help > Tableau Help > Connect to and Prepare Data > Set Up Data Sources > Extract Your Data, http://onlinehelp.tableau.com/current/pro/desktop/en-us/extracting_data.html.

[3] Raby, Chris, Tableau Desktop & Server, The Information Lab, February 5, 2013, https://www.slideshare.net/chrisraby/tableau-desktop-server.

[4] Milligan, Joshua N., Learning Tableau 10 – Second Edition, Packt Publishing, 2016.

[5] Rouse, Margaret, Columnar Database Definition, TechTarget, September 2010, http://searchdatamanagement.techtarget.com/definition/columnar-database.

[6] Wikipedia, Materialized View, https://en.wikipedia.org/wiki/Materialized_view.

[7] Sandberg, Michael, Tableau Boot Camp Additional Reference Material, City of Glendale, Arizona, September 28th, 2017.

 

 

 

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s