Why Jupyter Notebook?

Jupyter Notebook is a great way to explore data and process it on WEkEO. You have two options for using Jupyter Notebooks, either directly from the WEkEO portal or hosting them directly from your Virtual Machines. Both options are described below, as well as how to access WEkEO data from a notebook.

 

Using Jupyter Notebooks from the WEkEO portal

With this option, the you can create and work with Jupyter notebooks directly from the WEkEO portal instead of requiring a virtual machine. The Harmonised data access API can be used from with the Jupyter notebook enabling to query and access data for your processing needs. 

 

Creating a Jupyter Notebooks from the portal

  • Select the "Services" tab option from the portal home page and then select "Jupyter Notebook"
  • a login window asking user credentials is shown. 
  • in case you are already logged in,  then a web page with "My Server" button is shown. This is the JupyterHub home page.

 

MyServerButton

 

  • Click the "My Server" to start the server and to access workspace.
  • A default folder called "work" is available for all the users. This folder has a notebook "sample.ipynb" showing how to query and access datasets using the harmonised data access API.

 

Sample Notebook

     

    Harmonised Data Access (HDA) API in Jupyter Notebook

    This section provides insights into "sample.ipynb" notebook that demonstrates the use of HDA API for querying and accessing datasets. 

    Initializations The first step is to initialize the variables as shown below

    # HDA-API endpoint
    apis_endpoint="https://apis.wekeo.eu"
    
    # Data broker address
    broker_address = apis_endpoint + "/databroker/0.1.0"
    
    # Terms and conditions
    acceptTandC_address = apis_endpoint + "/dcsi-tac/0.1.0/termsaccepted/Copernicus_General_License"
    
    # Access-token address
    accessToken_address = apis_endpoint + '/token'
    
    # We are going to use the Sentinel-2 Dataset
    dataset_id = "EO:EUM:DAT:SENTINEL-3:SR_2_WAT___"  

    /token: Next step in using the API is to get the access token.

    print("Step-1: Getting an access token. This token is valid for one hour only.")
    response = requests.post(accessToken_address, headers=headers, data=data, verify=False)
    access_token = json.loads(response.text)['access_token']
    

    /querymetadata: With the access token, you can query the dataset of interest.

    In the code snippet below, dataset collection "EO:MO:DAT:OCEANCOLOUR_ARC_CHL_L3_NRT_OBSERVATIONS_009_047" is being queried. This will return the list of parameters that can be used for querying on the collection. The response will also indicate if the user has accepted the terms and conditions for the queried dataset or not.

    headers = {
        'Authorization': 'Bearer ' + access_token,
    }
    
    response = requests.get(broker_address + '/querymetadata/' + encoded_dataset_id, headers=headers)
    

    The complete list of dataset IDs offered via WEkEO is available at https://www.wekeo.eu/dataset-navigator/extended?query=&filter=distribution__Harmonized%20Data%20Access

    /termsaccepted: When accessing the Copernicus data for the first time, you need to accept the terms and conditions of the Copernicus General License. This can be done as shown below.

    response = requests.get(acceptTandC_address, headers=headers)
    
    isTandCAccepted = json.loads(response.text)['accepted']
    if isTandCAccepted is 'False':
        print("Accepting Terms and Conditions of Copernicus_General_License")
        response = requests.put(acceptTandC_address, headers=headers)
    else:
        print("Copernicus_General_License Terms and Conditions already accepted")

     

    /datarequest: The user can construct a query by setting values of the all the required/mandatory parameters (as per the response of the previous step) and submit it as part of datarequest as shown below

    #Example query for Sentinel-2 data. This query is constructed based on the response of the metadata query 
    data = {
        "datasetId": "EO:EUM:DAT:SENTINEL-3:SR_2_WAT___",
        "stringChoiceValues": [
            {
                "name": "sat",
                "value": "Sentinel-3A"
            }
        ],
        "dateRangeSelectValues": [
            {
                "name": "dtrange",
                "start": "2018-04-08T00:03:47.526Z",
                "end": "2018-10-22T20:08:09.499Z"
            }
        ],
        "equi7GridSelectValues": [
            {
                "name": "zone",
                "zone": "EU",
                "tiles": [
                    "EU-054_018"
                ]
            }
        ]
    }
    response = requests.post(broker_address + '/datarequest', headers=headers, json=data, verify=False)
    job_id = json.loads(response.text)['jobId']
    

     

    /datarequest/status: The job id (returned in the previous step) can be used to query the job status as shown below.

    isComplete = False
    while not isComplete:
        response = requests.get(broker_address + '/datarequest/status/' + job_id, headers=headers)
        results = json.loads(response.text)['resultNumber']
        isComplete = json.loads(response.text)['complete']
        print("Has the Job " + job_id + " completed ?  " + str(isComplete))
        # sleep for 2 seconds before checking the job status again
        if not isComplete:
            time.sleep(2)

    Example response:

    {
       "jobId":"341552be-7ce4-470d-8c32-7e6a31c836f0",
       "complete":true,
       "status":"COMPLETED",
       "message":null,
       "resultNumber":729327,
       "created":"2018-11-21T15:11:38.208"
    }

     /datarequest/jobs/{jobID}/result: The query results are paginated. Parameters for page number and the number of results per page can be used to fetch only the necessary results. The page number are numbered from 0 (i.e. first page is numbered 0).  

    In the example below, each page contains 5 results and we are going to show the results from the 3rd page (zero based numbering) 

    params = {'page':'2', 'size':'5'}
    response = requests.get(broker_address + '/datarequest/jobs/' + job_id + '/result', headers=headers, params = params)
    results = json.loads(response.text)

    Example response:

    {
        "content": [
            {
                "additionalMetadata": null,
                "externalUri": "/olda-service-download/products/S3A_SR_2_WAT____20181012T092106_20181012T100910_20181107T012523_2884_036_364______MAR_O_NT_003.SEN3",
                "fileName": "S3A_SR_2_WAT____20181012T092106_20181012T100910_20181107T012523_2884_036_364______MAR_O_NT_003.SEN3",
                "fileSize": 44833000
            },
            {
                "additionalMetadata": null,
                "externalUri": "/olda-service-download/products/S3A_SR_2_WAT____20181011T190833_20181011T195312_20181106T110855_2678_036_355______MAR_O_NT_003.SEN3",
                "fileName": "S3A_SR_2_WAT____20181011T190833_20181011T195312_20181106T110855_2678_036_355______MAR_O_NT_003.SEN3",
                "fileSize": 34644000
            }
        ],
        "number": 3,
        "numberOfElements": 2,
        "size": 2,
        "totalElements": 110
    }

    /datarequest/result/{jobID}?externalUri={path}: The products download link can be constructed as shown below by iterating through the results. The following code also shows that it is possible to retrieve the productname (filename) even before downloading the product.

    for result in results['content']:
        externalUri = result['externalUri']
        product_size = result['fileSize']/(1024*1024)
        product_name = result['fileName']
        download_url = broker_address + '/datarequest/result/' + job_id + '?externalUri=' + urllib.parse.quote(externalUri) +"&access_token="+access_token
        print("Download link for " + product_name + "(" + "{:.2f}".format(product_size) + " MB) :")
        print(download_url)

    Hosting Jupyter Notebooks in your Virtual Machine

    In order to install Jupyter Notebooks in your VM, follow these steps:

    • Make sure you have either Python 3.3 or greater, or Python 2.7 (run python --version or python3 --version to be sure)
    • Make sure you have pip or pip3, for Python 2 and Python 3, respectively. (run pip --version or pip3 --version to be sure). If you don't, run sudo apt install python3-pip (Ubuntu)
    • Install Jupyter Notebook: pip3 install jupyter (or pip install jupyter)

     

    Run Jupyter and tell it to listen to all network interfaces:

    $ jupyter notebook --ip=0.0.0.0
    
    [I 15:26:41.689 NotebookApp] Serving notebooks from local directory: /home/linux
    [I 15:26:41.690 NotebookApp] 0 active kernels
    [I 15:26:41.690 NotebookApp] The Jupyter Notebook is running at:
    [I 15:26:41.690 NotebookApp] http://xxx:8888/?token=yyy
    [I 15:26:41.690 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [W 15:26:41.690 NotebookApp] No web browser found: could not locate runnable browser.
    [C 15:26:41.691 NotebookApp]
    
        Copy/paste this URL into your browser when you connect for the first time,
        to login with a token:
            http://xxx:8888/?token=yyy
    

     

    Jupyter is configured by default to listen on port 8888. You will need to forward incoming requests at port 80 (http) or 443 (https) to port 8888. We recommend using Apache or NGINX for this purpose.

     

    Accessing WEkEO data from your notebook

    Refer to this guide for more information and examples.