Extending Jupyter with Google Cloud Storage file system backend

Egor Bulychev, source{d}

Extending Jupyter with Google Cloud Storage file system backend

Egor Bulychev, source{d}

background

About me

About source{d}

Plan

egorbu.github.io/pydata_2017_barcelona/index.html
(view this on your device)
  1. Motivation
  2. Google Cloud side
  3. Jupyter side
  4. Workflow

Motivation

  1. Easy sharing notebooks & results
  2. Easy deployment
  3. Persistent notebooks from the box
  4. Use as reference for S3, Azure, etc.
  5. Make life of community easier
  6. As result -> Less pain

Google Cloud

Google Cloud: Storage & Databases

Google Cloud Storage: Data Model

Bucket
- Buckets are the basic containers that hold your data.
- Everything that you store in Google Cloud Storage must be contained in a bucket.
Blob
- A wrapper around Cloud Storage's concept of an ``Object``
- Objects are the individual pieces of data that you store in Google Cloud Storage.

Google Cloud Storage: Data Model Notice

Google Cloud Storage uses a flat namespace to store objects.
.
├── foo/bar/image.png
└── foo/text.txt
However, it's possible to work with objects as if they are stored in a virtual hierarchy, as a convenience.
.
└── foo
    ├── bar
    │   └── img.png
    └── text.txt

Google Cloud Storage: python API

Client

google
└── cloud
    └── storage
        └── Client
            ├── create_bucket
            ├── get_bucket
            └── list_buckets
3 methods are used out of 7

Google Cloud Storage: python API

Bucket

google
└── cloud
    └── storage
        └── Bucket
            ├── blob
            ├── copy_blob
            ├── delete
            ├── delete_blobs
            ├── get_blob
            ├── list_blobs
            └── rename_blob
7 methods are used out of 40

Google Cloud Storage: python API

Blob

google
└── cloud
    └── storage
        └── Blob
            ├── content_type
            ├── download_as_string
            ├── exist
            ├── updated
            └── upload_from_string
5 methods are used out of 38

Jupyter API:

ContentsManager

notebook
└── services
    └── contents
        └── manager
            └── ContentsManager
                ├── delete_file
                ├── dir_exists
                ├── file_exists
                ├── get
                ├── is_hidden
                ├── rename_file
                └── save

Jupyter API:

Checkpoints

notebook
└── services
    └── contents
        └── checkpoints
            └── Checkpoints
                ├── create_checkpoint
                ├── delete_checkpoint
                ├── list_checkpoints
                └── rename_checkpoint

Jupyter API:

GenericCheckpointsMixin

notebook
└── services
    └── contents
        └── checkpoints
            └── GenericCheckpointsMixin
                ├── create_file_checkpoint
                ├── create_notebook_checkpoint
                ├── get_file_checkpoint
                └── get_notebook_checkpoint

Jupyter:

Base model

model = {
    "name": None,
    "path": None,
    "type": None,
    "created": None,
    "last_modified": None,
    "content": None,
    "format": None,
    "mimetype": None
}

Jupyter:

File model

model["type"] = "file"
model["mimetype"] = "text/plain"/"application/octet-stream"
model["format"] = "text" or "base64"
model["content"] = content

Jupyter:

Notebook model

model["type"] = "notebook"
model["mimetype"] = "application/x-ipynb+json"
model["format"] = "json"
model["content"] = content of notebook

Jupyter:

Directory model

model["type"] = "directory"
model["mimetype"] = "application/x-directory"
model["format"] = "json"
model["content"] = list(content-free-models of files/nb/dirs)

Jupyter Google Storage Contents Manager:

Task summary

What to implement:
- 2 classes and 3 models.
- at least 17 methods.
What to use:
- 3 classes.
- 15 methods out of 85 in this 3 classes.

Workflow:

Workflow: Initialization

Google Cloud

Workflow: Initialization

Google Cloud Keyfile

Workflow: Initialization

Jupyter

Workflow: Initialization

Jupyter

Workflow: Initialization

Jupyter

Workflow:

Navigation

Workflow:

Create file / folder

Workflow:

Create file / folder: Notebook

Workflow:

Create file / folder: file

Workflow:

Create file / folder: directory

Workflow:

Delete file / folder: delete_file(path)

Workflow:

Rename file / folder

Workflow:

Open file

Workflow: debug

background

Workflow: Optimization

Questions?

project on github
(view this on your device)