Technical Design Template for Data Engineers
The purpose of this template is to write technical design elements before on boarding a data feed or asset to data platform
Hey There! I work for an airline and I am part of “Data Solutions Development” team. We use AWS analytic services and Snowflake in combination to build and deploy data pipelines into production.
In this blog post, I would like to emphasize the importance of writing technical design before implementing a data pipeline for a particular data feed or asset. Its also an essential skill for data engineers.
As a developer our first instinct may be to start writing code as soon as we have some information at our hand . But that can be a terrible idea if you haven’t thought through about your solution.
When you start writing specs in detail, you get a chance to think about the solution way before and you get a better view of the scope of the solution before you even develop it.
This content serve as documentation for the team and saves you time from repeatedly explaining your design to multiple teammates and stakeholders. Newer team members unfamiliar with the existing setup can onboard themselves.
Also its very important to get it reviewed by your peers and the whole team can collaboratively add if something important is missed out from the design. With everyone on the same page, it limits the obstacles that may arise in future. The more team mates are reviewing your work the better.
Different development teams will have different standards and conventions for technical design template depending on their situation. What I am going to share with you is one way we have used it in our team. Also based on my own experience for working different organizations I compiled the list of following category's as part of the template.
Without further delay I would like to share below template with you all
- Define what is in scope and what is left out of scope to provide better understanding what needs to be done (in scope) and what can be left undone (out of scope).
- Explain how out of scope items will be handled in the future
- Add Architecture diagram
- Explain the data flow in steps briefly
Proposed Tech Stack
- Define the proposed technologies or tools to be used (and a short reasoning)
Design Decision and Tradeoffs
- Specify the design decisions that are taken by the solution design or architecture team which are to be adopted or to be implemented during technical design
- List the constraints or limitations which are unclear or unknown
- List all the reusable components that can be used for development
- List dependencies if any
Data Integration Pattern
- Specify whether the integration is streaming or batch and different characteristics of it (data volume, variety, velocity, veracity)
- Attach relevant documentation and links to be followed
- Specify whether it is a inbound or outbound Integration
- Specify what integration pattern is used (e.g. push vs. pull)
- Specify what technologies are used to establish the interface between two systems
Data Processing Pattern
- Specify whether this data feed will be loaded in data lake, data warehouse or both. Data can be processed also for other purposes and stored elsewhere.
- State the reasons as why it need to be loaded and what kind of advantage/outcome it brings by loading in both.
Data Layers & Storage
- Specify each data layer, significance of each layer and why this data feed need to be stored in each layer. For example, In data lake layers can be raw, transformed and aggregated and in data warehouse layers can be Stage, Raw, Data Vault, Business Vault and data mart. Its not mandatory to always load data to all the layers.
- Specify weather data is transient or permanent on each layer. For example in a data warehouse the stage layer is always transient and sometimes Raw is skipped totally if you treat your data lake as raw layer
- Add entity relationship model for source data
- Design Data vault or Dimensional model if required
Estimation of cost
- List cost estimates for the data feed to be loaded in data lake or data warehouse or both. This helps in foreseeing the cost
Estimation of work
- List the estimates for how long this will take to implement in man days
Privacy and Security
- Does the pipeline stores or transmits sensitive information such as financial information and personal information (identified by the business as sensitive), payment cards like credit/debit card information etc
- Does the pipeline introduce any architectural changes ? any platform changes ? any new tech stack introduced into the platform ? Any new connectivity/interface classified as ‘Secret’ ?
- if so do we need to get an explicit approval from Security Team ?
Thanks to my team members Hans and Janne who reviewed and contributed to this template.
Hope this template helps you to start writing technical specs in your work. Thank you very much for reading.
Disclaimer: The opinions expressed in my articles are my own and will not necessarily reflect those of my employer (past or present) or indeed any client I have worked with.