An “edge-to-cloud” strategy is being tested as a workaround to traditionally slow transmission with the ISS.
Microsoft and Hewlett Packard Enterprise (HPE) recently completed a 200GB genomics experiment utilizing the HPE Spaceborne Computer-2 and Azure, the cloud computing software. The experiment simulated how the National Aeronautics and Space Administration (NASA) could monitor astronaut health in the presence of increased radiation exposure.
The technology addressed a concern about the limited bandwidth between Earth and the International Space Station (ISS). The solution is to utilize the edge readiness and processing power of the Spaceborne Computer-2 and the capabilities of Azure. The two technologies working together showcase what can be accomplished by pairing edge and cloud technology.
The experiment involved regularly monitoring blood samples from astronauts with a gene sequencer onboard the ISS. Gene sequencing generates a great deal of data. The Spaceborne Computer-2 supports the maximum available network speeds, but only receives two hours of communication bandwidth a week from the ISS to transmit data to Earth. The maximum download speed is 250 kilobytes per second.
Another issue is that the data from the gene sequencing must be compared with the National Institutes of Health (NIH) dbSNP database. The dbSNP is a free public archive for genetic variation within different species, including Homo sapiens. The dbSNP is continuously being updated and peer reviewed by scientists around the world. As a result, it would be extremely difficult to have a synchronous copy of the dbSNP available on the ISS.
How Improved Transmissions with the ISS Work
The current setup for the experiment involves the Spaceborne Computer-2, which runs Red Hat Linux 7.4, performing the first step. The Spaceborne Computer-2 compares extracted gene sequences with reference DNA segments. The computer notices only the differences, or mutations. If a perfect match cannot be found for a sequence, the sequence is assumed to be a potential mutation. The Spaceborne Computer-2 then downloads a compressed output folder containing only the mutations to an HPE ground station.
In other words, data generated by the Spaceborne Computer-2 on the ISS, which simulates a gene sequencer reading a full human genome, produces a dataset (200GB) that is too large to send to Earth for processing. So instead, the Spaceborne Computer-2 compares the simulated gene sequence with a known human genome. It then isolates the sequences that have a potential mutation. This reduces the dataset from 200GB to 13MB, which can be sent to Earth in two minutes.
The reduced dataset is copied to Azure using AzCopy, which is a command-line utility that copy files to a storage account. An event-driven, serverless function written in Python then retrieves the data and sends it to the Microsoft Genomics service, a software program hosted on Azure.
The Microsoft Genomics service is a cloud implementation of the open-source Burroughs-Wheeler Aligner and Genome Analysis Toolkit, a software package that maps differing sequences against a reference genome. Microsoft adjusted this software for the cloud.
A second serverless function hosted in Azure Functions retrieves the Variant Call Format records, or records of text files used to store gene sequence variations. The function uses the location of each mutation to query the dbSNP database hosted by the NIH. It writes that information to a JavaScript Object Notation (JSON) file. Microsoft Power BI, a business analytics service, retrieves the data containing the clinical significance, or health impacts, of the mutated genes in a format that is easy to explore.
How NASA and the Public Can Use This Data
Understanding how astronaut health is affected by increased radiation exposure is necessary for missions in which astronauts travel beyond the ISS’s low Earth orbit into and beyond the Van Allen Belts. The Van Allen Belts are donut-shaped zones of radiation that surround Earth. Exposure to high levels of radiation can result in skin burns and radiation sickness. Long-term exposures to radiation can lead to cancer and heart disease.
|
Test |
Small |
Medium |
Full human genome |
Raw data examined |
500KB |
6MB |
150MB |
182GB |
Downloaded to Earth |
4KB |
40KB |
900KB |
13MB |
Run time on ISS |
20 seconds |
2 minutes |
1 hour |
78 minutes |
Download time from ISS |
<1 second |
2 seconds |
17 seconds |
1:56 minutes |
This table shows the results of two experiments in terms of processing times and data volumes. (Image credit: Microsoft.)
The entire experiment was coded by 10 volunteers from the Azure Space team and its parent organization, the Azure Special Capabilities, Infrastructure and Innovation Team. David Weinstein, principal software engineering manager at Azure Space, led the three-day development effort, which consisted of a one-day hackathon and two days of cleanup.
Team members wrote the major ISS- and Azure-based software components in Python and bash using Visual Studio Code, GitHub and the Python libraries for Azure Functions, and Azure Blob Storage.
To support the development of additional experiments by others, Weinstein’s team at Azure Space published the Resource Manager templates. The templates contain the simulated ISS and ground station environments the team used for development and testing. The Azure Resource Manager (ARM) templates are JSON files that define the infrastructures and configurations for the project. They enable anyone to easily create their own mock space station in Azure.
The experiment also relied on other code, including Docker containers, which are stand-alone packages of software that run application, serverless functions and the Microsoft Genomics service. Developers packaged both the ISS and ground station environments into an ARM template. They simulated the latency, or lag, between the ISS and the ground station by deploying the Spaceborne Computer-2 environment to an Azure data center in Australia and the ground station environment to a data center in Virginia.
There are currently no shareable plans to publish the results, although Microsoft encourages feedback. The experiment used Python, Docker containers, Visual Studio Code, serverless functions, and the Microsoft Genomics service, all of which are available today. Microsoft’s open-source software was created with code that is available to the public. Anyone with a basic skillset and knowledge of computer science can use the code as a template.
“The mission for the template is to enable organizations who build, launch and operate spacecraft and satellites by enabling more opportunities for everyone, similar to how open source on Azure has helped democratize cloud computing. By integrating open-source technology that runs on the same tools and languages as a regular computer, Microsoft is being incredibly cost effective. [This] also leads to more accessibility across the industry,” said a Microsoft spokesperson.
How Spaceborne Computer-2 Was Adapted for the Experiment
The Spaceborne Computer-2 contains the HPE Edgeline EL4000 Converged Edge System and the HPE ProLiant DL360 Gen10 server. Mark Fernandez, principal investigator for Spaceborne Computer-2 at HPE, said that the HPE team made some changes to the hardware of the Spaceborne Computer-2 to help the experiment run smoothly.
“Several things were done, in addition to including as much commercially available and supported hardware as possible such as redundant power supplies, DC power feeds, network switches and water-cooling, RAID controllers (hardware devices or software programs to manage hard disk drives or solid-state drives) for the disks that would be used in any harsh or edge environment,” said Fernandez.
Fernandez added that HPE included a large Scratch storage space for datasets and files created at the edge. Scratch space is space on a hard disk drive that is dedicated to temporary data storage.
“Scratch space allows reading from one set of disks while processing and writing in parallel on a different set of disks. Also, Spaceborne Computer-2 consists of two identical lockers (digital storage spaces) each containing a pair of servers. One server is focused on Graphics Processing Unit (GPU)-enabled Artificial Intelligence/Machine Learning and other analytics processing. The other server is dedicated to traditional data processing. We plan experiments that use the lockers and the servers in parallel to maximize the benefits while simultaneously avoiding congestion,” said Fernandez.
Fernandez said that HPE is using the data collected from the experiment to better serve the next set of researchers and experiments onboard the ISS.
“This includes software and data upload techniques and schedules, on-board disk space usage, temporary file storage and networking demands and requirements, ISS connectivity and availability; and results delivery mechanism and timings. We are continually updating our ConOps (concept of operations) to better serve the community with the most “Earth-like” experience from the edge of the edge,” noted Fernandez.