Technical Design Template for Data Engineers

Photo by Edho Pratama on Unsplash

Scope

  • Define what is in scope and what is left out of scope to provide better understanding what needs to be done (in scope) and what can be left undone (out of scope).
  • Explain how out of scope items will be handled in the future

Technical Architecture

Architecture Design

  • Add Architecture diagram
  • Explain the data flow in steps briefly

Proposed Tech Stack

  • Define the proposed technologies or tools to be used (and a short reasoning)

Design Decision and Tradeoffs

  • Specify the design decisions that are taken by the solution design or architecture team which are to be adopted or to be implemented during technical design

Design Constraints

  • List the constraints or limitations which are unclear or unknown

Reusable Components

  • List all the reusable components that can be used for development

Dependencies

  • List dependencies if any

Data Management

Data Integration Pattern

  • Specify whether the integration is streaming or batch and different characteristics of it (data volume, variety, velocity, veracity)
  • Attach relevant documentation and links to be followed
  • Specify whether it is a inbound or outbound Integration
  • Specify what integration pattern is used (e.g. push vs. pull)
  • Specify what technologies are used to establish the interface between two systems

Data Processing Pattern

  • Specify whether this data feed will be loaded in data lake, data warehouse or both. Data can be processed also for other purposes and stored elsewhere.
  • State the reasons as why it need to be loaded and what kind of advantage/outcome it brings by loading in both.

Data Layers & Storage

  • Specify each data layer, significance of each layer and why this data feed need to be stored in each layer. For example, In data lake layers can be raw, transformed and aggregated and in data warehouse layers can be Stage, Raw, Data Vault, Business Vault and data mart. Its not mandatory to always load data to all the layers.
  • Specify weather data is transient or permanent on each layer. For example in a data warehouse the stage layer is always transient and sometimes Raw is skipped totally if you treat your data lake as raw layer

Data Model

  • Add entity relationship model for source data
  • Design Data vault or Dimensional model if required

Practicalities

Estimation of cost

  • List cost estimates for the data feed to be loaded in data lake or data warehouse or both. This helps in foreseeing the cost

Estimation of work

  • List the estimates for how long this will take to implement in man days

Privacy and Security

  • Does the pipeline stores or transmits sensitive information such as financial information and personal information (identified by the business as sensitive), payment cards like credit/debit card information etc
  • Does the pipeline introduce any architectural changes ? any platform changes ? any new tech stack introduced into the platform ? Any new connectivity/interface classified as ‘Secret’ ?
  • if so do we need to get an explicit approval from Security Team ?

--

--

--

Data Platform Architect at Finnair

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Coding - Just Coding?

Turning Tech Hobbies into Side Hustle

Simple Audio Compression With Python

Traveling abroad, taking a sabbatical and deciding to go remote

The three laws of config dynamics

Andromeda Galaxy

Amazon Web Services S3 storage: Rails and React

Go — Setting up development environment

Within the Scope of This Article

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Venkata Gowri Sai Rakesh kumar Varanasi

Venkata Gowri Sai Rakesh kumar Varanasi

Data Platform Architect at Finnair

More from Medium

How to Parse JSON Objects With Snowflake

Real-time BI transactional systems

Streamlined Authentication with Tableau’s Connected Apps

AWS Redshift — Automated Table Design