About

About Me

Data Engineer with +6 years of experience working in industries such as telecommunications, telematics and semi-state/government bodies. My interests include building data pipelines, experimenting with new automation tools and trying to keep up with never ending stream of cloud based data transformation tools/platforms.

Advocate of web minimalism. Modern websites are a bloated mess of trackers, popups and advertisements (not to mention the pain of GDPR popups for those of us in the EU). Most websites are a pain to visit, a return to simple intuitive UI's is required. Hence, this website.


Skill Stack

Tools/languages that I have the most experience with:

  • Database Technologies - Teradata, Postgres.
  • Teradata Tools & Utilities - (TTU) bteq, fastload, mload and fastexport tools for building data pipelines ontop of Teradata.
  • SQL - Procedural SQL, Functions and triggers.
  • Bash/korne shell - Unix shell scripting including core utilities such as sed, grep, gawk, cut, paste etc.
  • Linux Administration - Certified RHCSA.
  • Python scripting
  • Apache Airflow - version 2.0+, developing DAGs according to best practices.
  • Apache Spark - Developing pyspark jobs for batch or streaming.
  • CI/CD - I prefer using a stack like circleci, docker, Artifactory and github for version control.
  • AWS - I'm familar with a lot of the core services on AWS, too many to list here. I've used EMR for running pyspark workloads, EKS for deploying container microservices and running pyspark jobs on AWS Glue for ETL.

Tools/languages that I'm currently trying to learn:

  • Github Actions - I'm familar with the basics but I'd like to use this a lot more, particularly for MLops.
  • Google cloud platform - I'm certified as a Data engineering professional on GCP but I've never really built anything substantial on GCP yet.
  • MLops - Creating automated pipelines for deploying machine learning into production. This is arguably the most important part of the ML development lifecycle, how to deploy into production, monitor it and achieve high availability.