Background and Motivation
There have been challenges with organizing the knowledge accumlated and experience in attempting to implement Apach Spark across clients and projects. When it comes to documenting these experiences, there are challenges with the following: 1. Consistency of format. This happens because of attempts to branch out to different langauges and “shiny things” that are not necessarily relevant to the task at hand. 2. Usability. When the documentation is not available to me to use, then it is no good. Also, some things may need to stay private, so it is important to have a way to keep things private. There should also be a way to share things with others via slides and slidedocs.
Goals
- Create a consistent format for documenting Apache Spark implementations and administration.
- Concentrate with the Python language.
- Pick one theme and only work with it. PyData looks like the winner.
- Make a website map of the future state with stubs for each future
article. These stubs should contain some pseudocode with a
pass
statement to prime the state.
Target Architecture
- Operating System: Ubuntu 22.04.2 LTS
- Java: Zulu 8.70.0.23-CA-linux64
- Scala: 2.12.15
- Python: 3.10.6
- R: 4.2.2
- Delta Lake: 2.4.0
Inital Tasks (backlog)
- Set up a Python 3.10
venv
environment. - Create a
hello world
Python script with a docstring for the "Docs" Section. - Create a “Tutorials” section and then migrate the Docsy material from the old site.
- Write stubs for SkyBot, DBeaver and Postgres.
- Write article for EM.