DataFlow API walkthrough¶
Suhas Somnath 4/6/2022 Oak Ridge National Laboratory
0. Prepare to use DataFlow’s API:¶
Install the
ordflow
python package from PyPi via:
pip install ordflow
Generate an API Key from DataFlow’s web interface
Note: API Keys are not reusable across DataFlow servers (e.g. facility-local and central at https://dataflow.ornl.gov). You will need to get an API key to suit the specific instance of DataFlow you are communicating with
[1]:
api_key = "eyJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo1LCJjcmVhdGVkX2F0IjoiMjAyMi1wNS0wMlQwOTo1ODoxMi0wNDowMCIsImV4cCI6MTY4Mjk4NTYwMH0.jYqV0YNn1dO_8bdQGvVY5MFqfX_xR1DxRKNZANuemuU"
Encrypt password(s) necessary to activate Globus endpoints securely
Here, the two Globus endpoints (DataFlow server and destination) use the same authentication (ORNL’s XCAMS)
Note: You will need to get your passwords encrypted by the specific deployment of DataFlow (central / facility-local) that you intend to use
[2]:
enc_pwd = "V5yYQFuavTo83XQ9BFA04azG--5LiXo6OOA3cFPqhm--Hg3wpLrSO0wIswtbFdsz1A=="
Import the
API
class from thedflow
package.
[3]:
from ordflow import API
Instantiate the API object with your personal API Key:
1. Instantiate the API¶
[4]:
api = API(api_key)
Using server at: https://dataflow.ornl.gov/api/v1 as default
2. Check default settings¶
Primarily pay attention to the destination_globus
parameter since this is the only parameter that can be changed / has any significant effect
[5]:
response = api.settings_get()
response
[5]:
{'globus': {'destination_endpoint': '57230a10-7ba2-11e7-8c3b-22000b9923ef'},
'transport': {'protocol': 'globus'}}
3. Update a default setting¶
Here, we will switch the destination endpoint to olcf#dtn
for illustration purposes
[6]:
response = api.settings_set("globus.destination_endpoint",
"ef1a9560-7ca1-11e5-992c-22000b96db58")
response
[6]:
{'globus': {'destination_endpoint': 'ef1a9560-7ca1-11e5-992c-22000b96db58'},
'transport': {'protocol': 'globus'}}
Switching back the destination endpoint to cades#CADES-OR
which is the default
[7]:
response = api.settings_set("globus.destination_endpoint",
"57230a10-7ba2-11e7-8c3b-22000b9923ef")
response
[7]:
{'globus': {'destination_endpoint': '57230a10-7ba2-11e7-8c3b-22000b9923ef'},
'transport': {'protocol': 'globus'}}
4. List and view registered instruments¶
Contact a DataFlow server administrator to add an instrument for you.
[8]:
response = api.instrument_list()
response
[8]:
[{'id': 2,
'name': 'Asylum Research Cypher West',
'description': 'AR Cypher located in building 8610 in room JG 55. This instrument is capable of Band Excitation and General-mode based measurements in addition to common advanced AFM measurements.',
'instrument_type': None}]
[9]:
response = api.instrument_info(2)
response
[9]:
{'id': 2,
'name': 'Asylum Research Cypher West',
'description': 'AR Cypher located in building 8610 in room JG 55. This instrument is capable of Band Excitation and General-mode based measurements in addition to common advanced AFM measurements.',
'instrument_type': None}
5. Check to see if Globus endpoints are active:¶
[10]:
response = api.globus_endpoints_active("57230a10-7ba2-11e7-8c3b-22000b9923ef")
response
[10]:
{'source_activation': {'code': 'AutoActivationFailed'},
'destination_activation': {'code': 'AutoActivationFailed'}}
6. Activate one or both endpoints as necessary:¶
Because the destination wasn’t already activated, we can activate that specific endpoint.
Note: An encrypted password is being used in place of the conventional password for safety reasons.
[11]:
response = api.globus_endpoints_activate("syz",
enc_pwd,
encrypted=True,
endpoint="destination")
response
[11]:
{'status': 'ok'}
[12]:
response = api.globus_endpoints_active()
response
[12]:
{'source_activation': {'code': 'AutoActivated.CachedCredential'},
'destination_activation': {'code': 'AlreadyActivated'}}
7. Create a measurement Dataset¶
This creates a directory at the destination Globus Endpoint:
[13]:
response = api.dataset_create("My new dataset with nested metadata",
metadata={"Sample": "PZT",
"Microscope": {
"Vendor": "Asylum Research",
"Model": "MFP3D"
},
"Temperature": 373
}
)
response
[13]:
{'id': 12,
'name': 'My new dataset with nested metadata',
'creator': {'id': 5, 'name': 'Suhas Somnath'},
'dataset_files': [],
'instrument': None,
'metadata_field_values': [{'id': 13,
'field_value': 'PZT',
'field_name': 'Sample',
'metadata_field': None},
{'id': 14,
'field_value': 'Asylum Research',
'field_name': 'Microscope-Vendor',
'metadata_field': None},
{'id': 15,
'field_value': 'MFP3D',
'field_name': 'Microscope-Model',
'metadata_field': None},
{'id': 16,
'field_value': '373',
'field_name': 'Temperature',
'metadata_field': None}]}
Getting the dataset ID programmatically to use later on:
[14]:
dataset_id = response['id']
dataset_id
[14]:
12
8. Upload data file(s) to Dataset¶
[16]:
response = api.file_upload("./AFM_Topography.PNG", dataset_id)
response
using Globus since other file transfer adapters have not been implemented
[16]:
{'id': 9,
'name': 'AFM_Topography.PNG',
'file_length': 162,
'file_type': '',
'created_at': '2022-05-02 15:07:04 UTC',
'relative_path': '',
'is_directory': False}
Upload another data file to the same dataset:
[17]:
response = api.file_upload("./measurement_configuration.txt", dataset_id, relative_path="foo/bar")
response
using Globus since other file transfer adapters have not been implemented
[17]:
{'id': 10,
'name': 'measurement_configuration.txt',
'file_length': 162,
'file_type': '',
'created_at': '2022-05-02 15:07:08 UTC',
'relative_path': 'foo/bar',
'is_directory': False}
9. Search Dataset:¶
[18]:
response = api.dataset_search("nested")
response
[18]:
{'total': 1,
'has_more': False,
'results': [{'id': 12,
'created_at': '2022-05-02T15:03:49Z',
'name': 'My new dataset with nested metadata',
'dataset_files': [{'id': 9,
'name': 'AFM_Topography.PNG',
'file_length': 162,
'file_type': '',
'created_at': '2022-05-02 15:07:04 UTC',
'relative_path': '',
'is_directory': False},
{'id': 10,
'name': 'measurement_configuration.txt',
'file_length': 162,
'file_type': '',
'created_at': '2022-05-02 15:07:08 UTC',
'relative_path': 'foo/bar',
'is_directory': False},
{'id': 11,
'name': 'foo',
'file_length': None,
'file_type': None,
'created_at': '2022-05-02 15:07:08 UTC',
'relative_path': '',
'is_directory': True},
{'id': 12,
'name': 'bar',
'file_length': None,
'file_type': None,
'created_at': '2022-05-02 15:07:08 UTC',
'relative_path': 'foo',
'is_directory': True}],
'metadata_field_values': [{'id': 13,
'field_value': 'PZT',
'field_name': 'Sample',
'metadata_field': None},
{'id': 14,
'field_value': 'Asylum Research',
'field_name': 'Microscope-Vendor',
'metadata_field': None},
{'id': 15,
'field_value': 'MFP3D',
'field_name': 'Microscope-Model',
'metadata_field': None},
{'id': 16,
'field_value': '373',
'field_name': 'Temperature',
'metadata_field': None}]}]}
Parsing the response to get the dataset of interest for us:
[20]:
dset_id = response['results'][0]['id']
dset_id
[20]:
12
10. View this Dataset:¶
This view shows both the files and metadata contained in a dataset:
[21]:
response = api.dataset_info(dset_id)
response
[21]:
{'id': 12,
'name': 'My new dataset with nested metadata',
'creator': {'id': 5, 'name': 'Suhas Somnath'},
'dataset_files': [{'id': 9,
'name': 'AFM_Topography.PNG',
'file_length': 162,
'file_type': '',
'created_at': '2022-05-02 15:07:04 UTC',
'relative_path': '',
'is_directory': False},
{'id': 10,
'name': 'measurement_configuration.txt',
'file_length': 162,
'file_type': '',
'created_at': '2022-05-02 15:07:08 UTC',
'relative_path': 'foo/bar',
'is_directory': False},
{'id': 11,
'name': 'foo',
'file_length': None,
'file_type': None,
'created_at': '2022-05-02 15:07:08 UTC',
'relative_path': '',
'is_directory': True},
{'id': 12,
'name': 'bar',
'file_length': None,
'file_type': None,
'created_at': '2022-05-02 15:07:08 UTC',
'relative_path': 'foo',
'is_directory': True}],
'instrument': None,
'metadata_field_values': [{'id': 13,
'field_value': 'PZT',
'field_name': 'Sample',
'metadata_field': None},
{'id': 14,
'field_value': 'Asylum Research',
'field_name': 'Microscope-Vendor',
'metadata_field': None},
{'id': 15,
'field_value': 'MFP3D',
'field_name': 'Microscope-Model',
'metadata_field': None},
{'id': 16,
'field_value': '373',
'field_name': 'Temperature',
'metadata_field': None}]}
11. View files uploaded via DataFlow:¶
We’re not using DataFlow here but just viewing the destination file system.
Datasets are sorted by date:
[ ]:
! ls -hlt ~/dataflow/untitled_instrument/
There may be more than one dataset per day. Here we only have one
[ ]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/
Viewing the root directory of the dataset we just created:
[ ]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/
We will very soon be able to specify root level metadata that will be stored in metadata.json
.
We can also see the nested directories: foo/bar
where we uploaded the second file:
[ ]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/foo/
Looking at the inner most directory - bar
:
[ ]:
!ls -hlt ~/dataflow/untitled_instrument/2022-04-06/135750_atomic_force_microscopy_scan_of_pzt/foo/bar
[ ]: