What is a temporal data warehouse?
What is a temporal data warehouse?
What is a temporal data warehouse?
A temporal data warehouse is a specialized type of a data warehouse designed to manage and analyse time-varying data. Unlike traditional data warehouses, which typically store the most current or aggregated state of data, a temporal data warehouse retains historical changes, enabling users to track how data evolves over time.
A common approach in data warehousing is to keep the history of record activity in a persistent staging area, but use only the current records of the staging tables to feed the subsequent layers of the data warehouse. All non-staging tables in such a data warehouse contain only the current state of information.
In a temporal data warehouse as described here, the changes that are present in the staging tables are propagated to other layers and are visible in the serving layer (e.g. facts and dims). As a result, the users of the data warehouse can see how the information they care about has changed over time.
A temporal data warehouse is a specialized type of a data warehouse designed to manage and analyse time-varying data. Unlike traditional data warehouses, which typically store the most current or aggregated state of data, a temporal data warehouse retains historical changes, enabling users to track how data evolves over time.
A common approach in data warehousing is to keep the history of record activity in a persistent staging area, but use only the current records of the staging tables to feed the subsequent layers of the data warehouse. All non-staging tables in such a data warehouse contain only the current state of information.
In a temporal data warehouse as described here, the changes that are present in the staging tables are propagated to other layers and are visible in the serving layer (e.g. facts and dims). As a result, the users of the data warehouse can see how the information they care about has changed over time.
A temporal data warehouse is a specialized type of a data warehouse designed to manage and analyse time-varying data. Unlike traditional data warehouses, which typically store the most current or aggregated state of data, a temporal data warehouse retains historical changes, enabling users to track how data evolves over time.
A common approach in data warehousing is to keep the history of record activity in a persistent staging area, but use only the current records of the staging tables to feed the subsequent layers of the data warehouse. All non-staging tables in such a data warehouse contain only the current state of information.
In a temporal data warehouse as described here, the changes that are present in the staging tables are propagated to other layers and are visible in the serving layer (e.g. facts and dims). As a result, the users of the data warehouse can see how the information they care about has changed over time.
Why would you want
a temporal data warehouse?
Why would you want
a temporal data warehouse?
Why would you want
a temporal data warehouse?
Short answer – it gives you time travel capability. In our version of a temporal data warehouse you can travel back in time to see how the data in the facts, the dimensions or even the data marts would look like if you were to run the ETL at any moment in the past and this is simply available in all the tables in the data warehouse after the daily refresh.
A non-temporal data warehouse is good enough for most purposes. The management reporting, campaign scoring, capacity management – these are the types of use cases in which the general picture is usually sufficient and sub-permille erroneous data does not normally cause substantial harm.
On the other hand, if you use the data warehouse for auditing and compliance purposes, it might be crucial to make sure that every datapoint is correctly represented in your serving layer. You might recognize questions like “On the screen in the application XYZ I see THIS and I expect to see THIS in the FCT_XYZ table as well, but I see SOMETHING ELSE instead.” It used to bother me to have to reply with “The data has changed and I can no longer see what you saw”. In the temporal data warehouse, you are able to see the timeline of all the changes of the source data at any ETL stage, which makes answering these types of questions possible.
The ability to answer such questions becomes absolutely crucial when your source systems struggle with data quality issues. You can easily track down any suspicious changes in the data mart to the staging tables even if the change is already in the past.
You can also build checks to show if your data warehouse represents the source correctly to a single data point, because your data warehouse can tell you how many records were there in the database and in the serving-layer tables at any time in the past.
Another benefit of a temporal data warehouse is the ability to serve consistent data in a scenario where the data is refreshed throughout the day from different sources at different moments. Let’s say that we need to refresh financial data every hour for the finance department, but the logistics needs to have it near-real time. In a regular data warehouse, all is fine until we need to present some logistic information in the reports for finance – if we just combine our serving layer tables, we will present finance with an inconsistent view. In a temporal data warehouse, we can just select the logistic data that was active at the time of the last financial data refresh.
This is an extract from our whitepaper – read the whole thing here.
Short answer – it gives you time travel capability. In our version of a temporal data warehouse you can travel back in time to see how the data in the facts, the dimensions or even the data marts would look like if you were to run the ETL at any moment in the past and this is simply available in all the tables in the data warehouse after the daily refresh.
A non-temporal data warehouse is good enough for most purposes. The management reporting, campaign scoring, capacity management – these are the types of use cases in which the general picture is usually sufficient and sub-permille erroneous data does not normally cause s ubstantial h arm.
On the other hand, if you use the data warehouse for auditing and compliance purposes, it might be crucial to make sure that every datapoint is correctly represented in your serving layer. You might recognize questions like “On the screen in the application XYZ I see THIS and I expect to see THIS in the FCT_XYZ table as well, but I see SOMETHING ELSE instead.” It used to bother me to have to reply with “The data has changed and I can no longer see what you saw”. In the temporal data warehouse, you are able to see the timeline of all the changes of the source data at any ETL stage, which makes answering these types of questions possible.
The ability to answer such questions becomes absolutely crucial when your source systems struggle with data quality issues. You can easily track down any suspicious changes in the data mart to the staging tables even if the change is already in the past.
You can also build checks to show if your data warehouse represents the source correctly to a single data point, because your data warehouse can tell you how many records were there in the database and in the serving-layer tables at any time in the past.
Another benefit of a temporal data warehouse is the ability to serve consistent data in a scenario where the data is refreshed throughout the day from different sources at different moments. Let’s say that we need to refresh financial data every hour for the finance department, but the logistics needs to have it near-real time. In a regular data warehouse, all is fine until we need to present some logistic information in the reports for finance – if we just combine our serving layer tables, we will present finance with an inconsistent view. In a temporal data warehouse, we can just select the logistic data that was active at the time of the last financial data refresh.
This is an extract from our whitepaper – read the whole thing here.
Short answer – it gives you time travel capability. In our version of a temporal data warehouse you can travel back in time to see how the data in the facts, the dimensions or even the data marts would look like if you were to run the ETL at any moment in the past and this is simply available in all the tables in the data warehouse after the daily refresh.
A non-temporal data warehouse is good enough for most purposes. The management reporting, campaign scoring, capacity management – these are the types of use cases in which the general picture is usually sufficient and sub-permille erroneous data does not normally cause substantial harm.
On the other hand, if you use the data warehouse for auditing and compliance purposes, it might be crucial to make sure that every datapoint is correctly represented in your serving layer. You might recognize questions like “On the screen in the application XYZ I see THIS and I expect to see THIS in the FCT_XYZ table as well, but I see SOMETHING ELSE instead.” It used to bother me to have to reply with “The data has changed and I can no longer see what you saw”. In the temporal data warehouse, you are able to see the timeline of all the changes of the source data at any ETL stage, which makes answering these types of questions possible.
The ability to answer such questions becomes absolutely crucial when your source systems struggle with data quality issues. You can easily track down any suspicious changes in the data mart to the staging tables even if the change is already in the past.
You can also build checks to show if your data warehouse represents the source correctly to a single data point, because your data warehouse can tell you how many records were there in the database and in the serving-layer tables at any time in the past.
Another benefit of a temporal data warehouse is the ability to serve consistent data in a scenario where the data is refreshed throughout the day from different sources at different moments. Let’s say that we need to refresh financial data every hour for the finance department, but the logistics needs to have it near-real time. In a regular data warehouse, all is fine until we need to present some logistic information in the reports for finance – if we just combine our serving layer tables, we will present finance with an inconsistent view. In a temporal data warehouse, we can just select the logistic data that was active at the time of the last financial data refresh.
This is an extract from our whitepaper – read the whole thing here.