Background and Motivation

There have been challenges with organizing the knowledge accumlated and experience in attempting to implement Apach Spark across clients and projects. When it comes to documenting these experiences, there are challenges with the following: 1. Consistency of format. This happens because of attempts to branch out to different langauges and “shiny things” that are not necessarily relevant to the task at hand. 2. Usability. When the documentation is not available to me to use, then it is no good. Also, some things may need to stay private, so it is important to have a way to keep things private. There should also be a way to share things with others via slides and slidedocs.

Goals

  1. Create a consistent format for documenting Apache Spark implementations and administration.
  2. Concentrate with the Python language.
  3. Pick one theme and only work with it. PyData looks like the winner.
  4. Make a website map of the future state with stubs for each future article. These stubs should contain some pseudocode with a pass statement to prime the state.

Target Architecture

Inital Tasks (backlog)