DataFlow API walkthrough

Suhas Somnath 4/6/2022 Oak Ridge National Laboratory

0. Prepare to use DataFlow’s API:

  1. Install the ordflow python package from PyPi via:

pip install ordflow

  1. Generate an API Key from DataFlow’s web interface

Note: API Keys are not reusable across DataFlow servers (e.g. facility-local and central at https://dataflow.ornl.gov). You will need to get an API key to suit the specific instance of DataFlow you are communicating with

[1]:
api_key = "eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo1LCJjcmVhdGVkX2F0IjoiMjAyMi1wNS0wMlQwOTo1ODoxMi0wNDowMCIsImV4cCI6MTY4Mjk4NTYwMH0.jYqV0YNn1dO_8bdQGvVY5MFqfX_xR1DxRKNZANuemuU"
  1. Encrypt password(s) necessary to activate Globus endpoints securely

Here, the two Globus endpoints (DataFlow server and destination) use the same authentication (ORNL’s XCAMS)

Note: You will need to get your passwords encrypted by the specific deployment of DataFlow (central / facility-local) that you intend to use

[2]:
enc_pwd = "V5yYQFuavTo83XQ9BFA04azG--5LiXo6OOA3cFPqhm--Hg3wpLrSO0wIswtbFdsz1A=="
  1. Import the API class from the dflow package.

[3]:
from ordflow import API

Instantiate the API object with your personal API Key:

1. Instantiate the API

[4]:
api = API(api_key)
Using server at: https://dataflow.ornl.gov/api/v1 as default

2. Check default settings

Primarily pay attention to the destination_globus parameter since this is the only parameter that can be changed / has any significant effect

[5]:
response = api.settings_get()
response
[5]:
{'globus': {'destination_endpoint': '57230a10-7ba2-11e7-8c3b-22000b9923ef'},
 'transport': {'protocol': 'globus'}}

3. Update a default setting

Here, we will switch the destination endpoint to olcf#dtn for illustration purposes

[6]:
response = api.settings_set("globus.destination_endpoint",
                            "ef1a9560-7ca1-11e5-992c-22000b96db58")
response
[6]:
{'globus': {'destination_endpoint': 'ef1a9560-7ca1-11e5-992c-22000b96db58'},
 'transport': {'protocol': 'globus'}}

Switching back the destination endpoint to cades#CADES-OR which is the default

[7]:
response = api.settings_set("globus.destination_endpoint",
                            "57230a10-7ba2-11e7-8c3b-22000b9923ef")
response
[7]:
{'globus': {'destination_endpoint': '57230a10-7ba2-11e7-8c3b-22000b9923ef'},
 'transport': {'protocol': 'globus'}}

4. List and view registered instruments

Contact a DataFlow server administrator to add an instrument for you.

[8]:
response = api.instrument_list()
response
[8]:
[{'id': 2,
  'name': 'Asylum Research Cypher West',
  'description': 'AR Cypher located in building 8610 in room JG 55. This instrument is capable of Band Excitation and General-mode based measurements in addition to common advanced AFM measurements.',
  'instrument_type': None}]
[9]:
response = api.instrument_info(2)
response
[9]:
{'id': 2,
 'name': 'Asylum Research Cypher West',
 'description': 'AR Cypher located in building 8610 in room JG 55. This instrument is capable of Band Excitation and General-mode based measurements in addition to common advanced AFM measurements.',
 'instrument_type': None}

5. Check to see if Globus endpoints are active:

[10]:
response = api.globus_endpoints_active("57230a10-7ba2-11e7-8c3b-22000b9923ef")
response
[10]:
{'source_activation': {'code': 'AutoActivationFailed'},
 'destination_activation': {'code': 'AutoActivationFailed'}}

6. Activate one or both endpoints as necessary:

Because the destination wasn’t already activated, we can activate that specific endpoint.

Note: An encrypted password is being used in place of the conventional password for safety reasons.

[11]:
response = api.globus_endpoints_activate("syz",
                                         enc_pwd,
                                         encrypted=True,
                                         endpoint="destination")
response
[11]:
{'status': 'ok'}
[12]:
response = api.globus_endpoints_active()
response
[12]:
{'source_activation': {'code': 'AutoActivated.CachedCredential'},
 'destination_activation': {'code': 'AlreadyActivated'}}

7. Create a measurement Dataset

This creates a directory at the destination Globus Endpoint:

[13]:
response = api.dataset_create("My new dataset with nested metadata",
                               metadata={"Sample": "PZT",
                                         "Microscope": {
                                             "Vendor": "Asylum Research",
                                             "Model": "MFP3D"
                                             },
                                         "Temperature": 373
                                        }
                              )
response
[13]:
{'id': 12,
 'name': 'My new dataset with nested metadata',
 'creator': {'id': 5, 'name': 'Suhas Somnath'},
 'dataset_files': [],
 'instrument': None,
 'metadata_field_values': [{'id': 13,
   'field_value': 'PZT',
   'field_name': 'Sample',
   'metadata_field': None},
  {'id': 14,
   'field_value': 'Asylum Research',
   'field_name': 'Microscope-Vendor',
   'metadata_field': None},
  {'id': 15,
   'field_value': 'MFP3D',
   'field_name': 'Microscope-Model',
   'metadata_field': None},
  {'id': 16,
   'field_value': '373',
   'field_name': 'Temperature',
   'metadata_field': None}]}

Getting the dataset ID programmatically to use later on:

[14]:
dataset_id = response['id']
dataset_id
[14]:
12

8. Upload data file(s) to Dataset

[16]:
response = api.file_upload("./AFM_Topography.PNG", dataset_id)
response
using Globus since other file transfer adapters have not been implemented
[16]:
{'id': 9,
 'name': 'AFM_Topography.PNG',
 'file_length': 162,
 'file_type': '',
 'created_at': '2022-05-02 15:07:04 UTC',
 'relative_path': '',
 'is_directory': False}

Upload another data file to the same dataset:

[17]:
response = api.file_upload("./measurement_configuration.txt", dataset_id, relative_path="foo/bar")
response
using Globus since other file transfer adapters have not been implemented
[17]:
{'id': 10,
 'name': 'measurement_configuration.txt',
 'file_length': 162,
 'file_type': '',
 'created_at': '2022-05-02 15:07:08 UTC',
 'relative_path': 'foo/bar',
 'is_directory': False}

9. Search Dataset:

[18]:
response = api.dataset_search("nested")
response
[18]:
{'total': 1,
 'has_more': False,
 'results': [{'id': 12,
   'created_at': '2022-05-02T15:03:49Z',
   'name': 'My new dataset with nested metadata',
   'dataset_files': [{'id': 9,
     'name': 'AFM_Topography.PNG',
     'file_length': 162,
     'file_type': '',
     'created_at': '2022-05-02 15:07:04 UTC',
     'relative_path': '',
     'is_directory': False},
    {'id': 10,
     'name': 'measurement_configuration.txt',
     'file_length': 162,
     'file_type': '',
     'created_at': '2022-05-02 15:07:08 UTC',
     'relative_path': 'foo/bar',
     'is_directory': False},
    {'id': 11,
     'name': 'foo',
     'file_length': None,
     'file_type': None,
     'created_at': '2022-05-02 15:07:08 UTC',
     'relative_path': '',
     'is_directory': True},
    {'id': 12,
     'name': 'bar',
     'file_length': None,
     'file_type': None,
     'created_at': '2022-05-02 15:07:08 UTC',
     'relative_path': 'foo',
     'is_directory': True}],
   'metadata_field_values': [{'id': 13,
     'field_value': 'PZT',
     'field_name': 'Sample',
     'metadata_field': None},
    {'id': 14,
     'field_value': 'Asylum Research',
     'field_name': 'Microscope-Vendor',
     'metadata_field': None},
    {'id': 15,
     'field_value': 'MFP3D',
     'field_name': 'Microscope-Model',
     'metadata_field': None},
    {'id': 16,
     'field_value': '373',
     'field_name': 'Temperature',
     'metadata_field': None}]}]}

Parsing the response to get the dataset of interest for us:

[20]:
dset_id = response['results'][0]['id']
dset_id
[20]:
12

10. View this Dataset:

This view shows both the files and metadata contained in a dataset:

[21]:
response = api.dataset_info(dset_id)
response
[21]:
{'id': 12,
 'name': 'My new dataset with nested metadata',
 'creator': {'id': 5, 'name': 'Suhas Somnath'},
 'dataset_files': [{'id': 9,
   'name': 'AFM_Topography.PNG',
   'file_length': 162,
   'file_type': '',
   'created_at': '2022-05-02 15:07:04 UTC',
   'relative_path': '',
   'is_directory': False},
  {'id': 10,
   'name': 'measurement_configuration.txt',
   'file_length': 162,
   'file_type': '',
   'created_at': '2022-05-02 15:07:08 UTC',
   'relative_path': 'foo/bar',
   'is_directory': False},
  {'id': 11,
   'name': 'foo',
   'file_length': None,
   'file_type': None,
   'created_at': '2022-05-02 15:07:08 UTC',
   'relative_path': '',
   'is_directory': True},
  {'id': 12,
   'name': 'bar',
   'file_length': None,
   'file_type': None,
   'created_at': '2022-05-02 15:07:08 UTC',
   'relative_path': 'foo',
   'is_directory': True}],
 'instrument': None,
 'metadata_field_values': [{'id': 13,
   'field_value': 'PZT',
   'field_name': 'Sample',
   'metadata_field': None},
  {'id': 14,
   'field_value': 'Asylum Research',
   'field_name': 'Microscope-Vendor',
   'metadata_field': None},
  {'id': 15,
   'field_value': 'MFP3D',
   'field_name': 'Microscope-Model',
   'metadata_field': None},
  {'id': 16,
   'field_value': '373',
   'field_name': 'Temperature',
   'metadata_field': None}]}

11. View files uploaded via DataFlow:

We’re not using DataFlow here but just viewing the destination file system.

Datasets are sorted by date:

[ ]:
! ls -hlt ~/dataflow/untitled_instrument/

There may be more than one dataset per day. Here we only have one

[ ]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/

Viewing the root directory of the dataset we just created:

[ ]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/

We will very soon be able to specify root level metadata that will be stored in metadata.json.

We can also see the nested directories: foo/bar where we uploaded the second file:

[ ]:
!ls -hlt  ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/foo/

Looking at the inner most directory - bar:

[ ]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/foo/bar
[ ]: