EIDF Data Science Cloud service for research

EPCC's Amy Krause gives an overview of the Edinburgh International Data Facility's Portal, which enables data scientists, engineers, and researchers to access the Facility's Data Science Cloud.

The Edinburgh International Data Facility (EIDF) is a high-powered data analytics and storage service that supports research and data-driven innovation. Built and managed by EPCC, EIDF underpins the Data Driven Innovation initiative (DDI) of the Edinburgh and South East Scotland City Region Deal. EIDF's Data Science Cloud supports a wide range of projects, from PhD students testing their ideas on a GPU-accelerated machine to academic and commercial collaborations that prototype large-scale data processing clusters built on Kubernetes.

The EIDF Portal is a self-service platform for managing EIDF cloud resources that enables project managers to create and destroy virtual machines (VMs) by controlling a number of options within their quota such as VM size, operating system, and software tooling. The Portal interface offers a simple web form with a prepared configuration that was developed in consultation with our users to cover the most common scenarios.

The Portal was launched in October 2022 and now has almost 400 users. It was first rolled out to researchers at the University of Edinburgh but projects are now joining from further afield, including partners from industry, Scottish Government, and students of EPCC’s MSc in High Performance Computing with Data Science

The EPCC SAFE

When designing the Portal, the challenge was to create a web interface that would enable non-technical users with no experience of managing compute infrastructure to create a data science desktop with just the click of a button in their browser. This desktop must also integrate with existing EPCC infrastructure. 

The solution is the EIDF Portal interacting with EPCC’s central machine and user management system (SAFE), in combination with a database, a task scheduling system, and a sophisticated job engine that manages cloud resources within EIDF using hundreds of lines of Ansible playbooks.

The EPCC SAFE is a software framework designed to support resource management, accounting, reporting, and usage monitoring on advanced computing facilities. As an alternative web interface for the SAFE, the EIDF Portal utilises a custom HTTP interface to interact with the SAFE to manage projects, user accounts, and machines for the EIDF DSC, and the soon-to-be-released EIDF GPU Service. 

EIDF-hosted VMs are automatically registered in the SAFE, with user accounts and access permissions managed via the web interface of the EIDF Portal.

By ensuring that unique identifiers are used across all EPCC systems, the user management is set up to allow cross mounting of file systems in the future, so that users can share their data between EIDF data science services and HPC systems.

Authentication for all web components is handled by OpenID Connect (OIDC) with the SAFE as an identity provider. This way users only need one login for any EPCC managed service with a web interface, such as the SAFE, the Portal, and the Virtual Desktop Infrastructure (VDI). User accounts, permissions and project membership are passed as claims in the user info from which OIDC clients can infer roles and permissions, for example the virtual machines for which a user has login permissions. 

What’s under the hood?

The EIDF data science cloud is built on Openstack, and currently it has a capacity of around 30 hypervisors with over 8500 virtual cores and almost 60TB of DRAM memory. We now have over 50 active projects using the cloud, with more joining every week.

The EIDF Portal web interface is implemented using the Django web framework and is backed by a MariaDB database. It interacts with Celery, a task framework, to launch asynchronous requests using a RabbitMQ messaging service as a broker, and uses Rundeck and AWX automation services for the execution and monitoring of complex infrastructure build requests.

When a user creates a virtual machine by clicking a button in the web interface, under the hood the backend submits a Celery task which verifies and builds the request and adds it to a queue. Each project has its own task queue. As tasks arrive at the head of the queue, they call out to the automation services' webhooks which launch and monitor a sequence of bespoke Ansible playbooks. As an example, the job deploying an EIDF hosted virtual machine goes through the following steps: the instance is launched in Openstack, then it is registered as a machine in the SAFE and enrolled in the EIDF authentication and authorisation system, and connections to access the VM are added to the Guacamole remote desktop gateway. The job engine sends notifications to the Portal API as a job progresses, to indicate that the job has started running and whether it has completed successfully or a failure has occurred. Jobs may also report custom updates, such as the IP address of a newly deployed virtual machine.

Access to the EIDF Data Science Cloud

We ask researchers who wish to host and process their project’s data on the DSC (and, very soon, the EIDF GPU Service) to complete an application form in which they outline their project and resource requirements and describe the data they want to store and process. This application is reviewed by EPCC and, if accepted, is assigned a quota for the maximum number and size of virtual machines, as well as storage, that the project may host (or container sizes and limits).

Applications are generally reviewed within a week. We assess whether the project is suitable for the EIDF Data Science Cloud or whether another HPC platform would be a better fit. We also check that the EIDF DSC has enough capacity to cover the resources requested by the applicant, that the request is reasonable for what they want to do, and finally that they have the appropriate licence for the data they want to store and process.

Coming soon... EIDF GPU Service

The EIDF GPU Service is expected to be available for general access in July 2023.

Links

EIDF Portal, including link to application process: https://portal.eidf.ac.uk

EPCC home: https://www.epcc.ed.ac.uk

EIDF Services: https://www.ed.ac.uk/edinburgh-international-data-facility/services

Access to EIDF: https://www.ed.ac.uk/edinburgh-international-data-facility/access

EIDF GPU Service: https://www.ed.ac.uk/edinburgh-international-data-facility/services/computing/gpu-service