This repository contains all the scripts and instructions to run the scripts that constitute my (James Mbewu) submission to the Data Engineer portion of the City of Cape Town - Data Science Unit Code Challenge. This is required as a skills assessment used for shortlisting for positions in the City of Cape Town's Data Science Unit.
Code has primarily been written in R and reports will be in R Markdown. The code was developed i 7F8C n Unix Ubuntu 20.04.2 LTS and RStudio Version 1.4.1106.
The Code Challenge for Data Engineers consists of 4 tasks, which have been attempted and reported on here:
-
- Data Extraction
-
- Initial Data Transformation
-
- Further Data Transformations
-
- Data Loading Tasks
The full description of the tasks is available in CHALLENGESPECS.md. Numbers above skipped were tasks not attempted as they were intended for applicants to other positions (Data Scientist and Data Analyst).
An R Markdown report that includes the code can be found at data_extraction_report.Rmd. The code can be run from the R Markdown file or can be run from the file data_extraction.R that has been included separately for convenience.
An R Markdown report that includes the code can be found at initial_data_transformation_report.Rmd. The code in this R Markdown file is not meant to be run (the parallelisation doesn't seem to work in it). The code that should be run has also been included separately as initial_data_transformation.R.
An R Markdown report that includes the code can be found at further_data_transformations_report.Rmd. The code can be run from the R Markdown file or can be run from the file further_data_transformations.R that has been included separately for convenience.
An R Markdown report that includes the code can be found at data_loading_tasks_report.Rmd. The code can be run from the R Markdown file or can be run from the file data_loading_tasks.R that has been included separately for convenience.
You can contact me at james.mbewu @ gmail.com for any queries.