Cloud Materials Database (CMDB)

User ViewPoint

Usage Case Diagram Actors

General Description

The Cloud Materials Data Base (CMDB) offers homogeneous access to heterogenous data storage. Processes, linked to a CHADA and MODA definition, will be represented, and their different steps connected to material data. Upon uploading a certain file, the CMDB offers different data processing options to extract and generate semantic and relational data from the files. All data is then stored and interconnected for future retrieval.

Model

Roles

System admin: Has full control over the system and its management.
Data scientist: Generates data that will be stored in the CMDB.
Data consumer:: Browses the data stored in the CMDB.

Mockups

Activity 1

Create a new knowledge item

Add files to a knowledge item

Add metadata to a kItem

Activity 2

Browse all available knowledge items

View a specific knowledge item

Functional ViewPoint

General architecture

Implementation ViewPoint

Architecture of Toolkits

The CMDB offers a homogeneous storage solution for heterogeneous data. Relational, graph and object storage solutions are available and orchestrated via a central backend. Users (both human and machines) interact with the platform via the GUI or the Python SDK, with the user authentication component checking their identity. Internal data processing tools allow the pre-processing of the data to extract relevant knowledge prior to its storage.

Required components

Hardware componentes

The only hardware requirement for deployment of the toolkit is an internet reachable server with enough available resources for handling the requests and sufficient storage for all the datasets that might be sent to the CMDB.

Data Storage

The core functionality of this toolkit is data storage. Three types of databases are required:

Relational: Tabular data extracted from the processed uploaded data.
Graph: Semantic data and metadata extracted from the raw file or defined by users.
Object storage: Storage of experimental raw files and other datasets.

Implementation Map

Data upload: The raw data coming for instance from a machine in a lab, is uploaded via the GUI, and additional metadata inputted in the upload form. This is sent to the backend, which may carry out additional data processing operations to extract relevant information from within the raw files. All data is then stored in the different databases, with unique identifiers to ensure consistency.

Data query via SDK: In this example, the GUI is circumvented, and the backend contacted directly. This allows people and machines (other DiMAT toolkits, for instance) to interact with toolkits via Python scripts. The method, in this case a query, is translated into the appropriate HTTP REST API request and sent to the backend, which gathers the relevant data from the different storages and provides a response.