Data
Jun 11, 2020
Temporal Table Series Part 3: Integrating Into Existing Data Architecture
Data architecture decisions can feel like an activity solely in the realm of IT, but architecture choices ultimately impact your business more broadly. Before committing to a particular platform or design, it’s important for your organization to understand the long-term impacts of the architecture decisions you make today. Developing modern data solutions that can evolve and support your business as it changes is critical to fully leveraging your data.
In our first blog post, we introduced temporal tables and discussed four use cases in which temporal tables can be particularly helpful. In our second blog post, we showed how to correct historical data in temporal tables and walked through best practices for making changes. In this third blog post, we’ll discuss long term considerations that your organization should understand before using temporal tables in your data architecture. This includes the SQL Server implementation of temporal tables versus platform-independent options and potential downstream impacts.
Platform Dependence
The four use cases (historical trend analysis, point-in-time reporting, slowly changing dimensions, and compliance auditing) discussed in our first blog post in this series are use cases that have been around since long before the introduction of the temporal table functionality. Solutions to these cases have been addressed historically through well-known and well-established modeling and coding templates. One of these cases in particular, slowly changing dimensions, has a historical solution that looks almost identical to the physical implementation of temporal tables in SQL Server. Organizations should consider trade-offs when deciding whether or not to introduce temporal tables into a solution architecture.
Temporal tables were introduced over a span of many years, starting in 2011 with the ANSI SQL Standard, followed by the 2016 addition of temporal tables into SQL Server. This gap led to a lack of uniformity in the temporal table syntax between SQL Server and other RDBMs. The current and expected future state of any application should be evaluated before implementing temporal tables, as transferring code between systems could incur significant implementation time. In the next section, we’ll discuss what a platform-independent technique can look like.
Platform Independence
Slowly changing dimensions (SCD) is a concept popularized by Ralph Kimball in the various editions of his Data Warehouse Toolkit. While there may be slightly differing definitions, the most common types generally understood are:
Type 1 – Overwrite keeping only current state—very common.
Type 2 – Add new row and keep all change states with effective dating—common in a normalized data warehouse but less common in a dimensional mart, though still widely used. Most closely resembles the temporal table.
Type 3 – Add new attribute to track prior state in addition to current state—very rarely used.
A common problem with highly volatile Type 2 dimensions is the rapid growth of the dimension, resulting in decreased performance in both loading and querying. In order to address this performance impact, a Type 4 SCD was defined. While there are differing definitions of Type 4 SCD, the more common established definition was a combination of a Type 1 table containing the current state and a Type 2 table with all historical values.
The common definition of a Type 4 involves the introduction of a history dimension version of the base dimension table. The difference between the common Type 4 approach and SQL Server temporal tables is the persistence of the surrogate key value in temporal tables. However, a change to the Type 4 to have a persistent surrogate key (either an identity or a hash value) following a hub type modeling approach would allow you to have a single link to any facts and join to either the base table for current-record querying or to the history table using a date field for historical querying.
Below is a sample of how a Type 4 model might look and be populated in this scenario:
In the case where the above customer dimension was to be used for current marketing purposes, it would be possible to use the base table where only three entries are found. If you needed to look at trends in customer changes over time, you would need to query the history table, which in this case has 18 entries and grows much more rapidly than the base table. Given modern marketing trends, where there may be tens or hundreds of millions of customers in a customer table, this difference could have a significant performance impact and allow you to optimize the current view of customer while still supporting a historical view. Additionally, this approach would work with essentially no changes needed whether the structure was to remain on SQL Server or be moved to another RDBMS platform. Examples of modern columnar cloud databases where this might be used include Snowflake, Google Big Query, or AWS Redshift.
Downstream Impacts
As a final step, organizations should examine the potential downstream impacts of using temporal tables in data architecture. As detailed in part one of the Temporal Tables Series, querying and updating temporal tables is slightly different than the typical database table. This includes a broader group of roles than just database developers and therefore other audiences should be consulted on database design.
Reading From Temporal Tables
Querying these tables can be confusing when first encountered, including the syntax used to query historical data. If a database is used by multiple groups, some of whom are not familiar with temporal tables or the unique syntax, training may be required to prepare them to properly query the data.
Another consideration is the downstream use of these tables in software like Power BI and Tableau. Both tools allow for selecting of tables from a database, but do not have default importing of full temporal tables. In order to get both the active and historical tables into a business intelligence software, a view or special query statement would need to be written.
ETL Design
Inserting data into temporal tables can bring about many issues in the design of extract, transform, load (ETL). Temporal tables will by default send changed or deleted records to the history table but do not take into account source-deleted records (or ‘Type-1 Deletes’). ETL developers need to be aware and certain of data source settings in order to properly develop statements with temporal tables. This logic must include a clear indication when a record was deleted from the source system and thereby pushed into the historical table, rather than simply updated and pushed to history.
Table Maintenance
Some database management tasks will have an increased level of effort when using temporal tables. For typical maintenance like fixing historical records or updating values, tables need to be taken offline and altered to turn off temporality and then restored to the original state after the changes have been made. Other tasks, like table defragmentation, will also have an increased level of effort when using temporal tables.
Consider Long-Term Impacts
When investing in modern data architecture, your organization should consider the impact of committing to specific platforms or taking a platform agnostic approach. Be sure to consider the downstream impacts that particular architecture patterns may have. Bring together your technical experts, data analysts, and business users to ensure you’re capturing the right data and making it easy to consume.
If you have questions about using temporal tables in your data architecture, we’d love to help! Feel free to reach out to us at findoutmore@credera.com.
Contact Us
Ready to achieve your vision? We're here to help.
We'd love to start a conversation. Fill out the form and we'll connect you with the right person.
Searching for a new career?
View job openings