Supplementary Regulations for Data Protection of FlowHub Platform
In general, data security in the computing field is a very important issue. We take a lot of effort in ensuring data security mainly for two reasons.
First of all,FlowHub Platform is a cloud computing platform that uses existing tools to process multiple types of data including genomic data, and there are various types of data information involving individuals. Therefore, the value of user data itself and personal privacy are particularly important.
Secondly, FlowHub Platform is an online platform built on a cloud server provided by a cloud partner. In addition to important data, this cloud platform includes analysis processes from different organizations or users. Corresponding technical results, and business value created by analysis, all need protection. Therefore, FlowHub Platform should reach the highest level of privacy protection.
Therefore, FlowHub Platform is designed and built with very strict security and privacy requirements. This article describes the security measures adopted in the construction of the FlowHub platform in order to ensure the high security level of the platform and at the same time realize the user access control system.
Legal policy for special data
For more special genomic data, we have noticed that although several management frameworks and regulations for genomic data (the U.S. Health Security and Privacy Insurance Circulation and Accountability Act (HIPAA), Clinical Laboratory Improvement Amendment (CLIA), ISO/ IEC 27001:2013) already exists, but these regulatory frameworks do not cover specific requirements, guidelines, and regulations for processing genomic data on cloud platforms. Nevertheless, in the design and development process of FlowHub Platform, we still strictly follow the principles and basic spirit of these regulations and laws.
The approach taken
FlowHub Platform's security design is divided into two fields, the infrastructure construction field and the business logic field. At the field of infrastructure construction, we have integrated security measures commonly used in the cloud computing industry, including data encryption, identity authentication, API rate limiting, VPC protection, firewall protection, and vulnerability protection. At the business logic field, the most important consideration is how to provide support and authority management for cooperation among cloud computing platform users while providing business logic that is both easy to manage and good security protection, and to ensure the data set and process on the system is safe. To this end, FlowHub Platform adopts several tailor-made design concepts, including de-identified objects, careful access control, and file sharing mechanisms. We will take the cooperative cloud service provider Alibaba Cloud as an example to illustrate.
Infrastructure construction
1. Encryption
On FlowHub Platform, all information processed by the system will be encrypted during transmission and rest. For the data that needs to be transmitted, there is no data link in the system to allow the the no encrypted data to be transmitted. For those links with encryption options (such as HTTPS), we will force the use of SSL encrypted transmission.
On FlowHub Platform, the "rest state" actually corresponds to three different storage states: 1) Temporary storage used in ECS computing instances; 2) First-level cache including ECS instances and multiple temporary disks; 3) Ali Cloud Object Storage Service (OSS).
FlowHub Platform encrypts all "rest state" data. By default, the industry standard AES256 encryption algorithm is used when data is uploaded to the first-level cache. At the same time, these data will also be synchronized to the Alibaba Cloud object storage server and encrypted on the server side. In the calculation process, all data and files on the temporary disk are encrypted with AES256. Infrequently accessed data will be removed from the first-level cache or further moved to cloud archive storage. These data are also encrypted by AES256 on the server side for archiving. All data is transmitted through an encrypted SSL/TLS channel.
When the data is no longer used in a specific location (such as a computing node) or an authorized user decides to delete the data from FlowHub Platform, the data will be cleared according to the US Department of Energy M205.1-2 standard to ensure the safety of all data. The standard uses the following three passes: Pass 1-2: rewrite the pseudo-random value data; Pass 3: Data covering the zero fill pattern.
2.Authorization
The authorization on Alibaba Cloud follows the best standard advocated by Alibaba Cloud. This authentication requires a strict RSA key. These measures ensure that the infrastructure is well protected. At the system level, FlowHub Platform users are authenticated on the platform based on user names and passwords. In the case of successful authentication, a temporary token with a time limit will be generated. This token will be used to access the system within a short period of time and will be stored securely. These measures reduce the possibility of any hacker gaining access to the system through violent trial and error.
3.API rate limit
FlowHub Platform sets a rate limit for requests to access the system. All access to the system, including the operation of front-end web pages, is done through API. Each user's access to the system API will have a maximum rate limit. These measures limit the possibility of malicious users using denial of service attacks to tamper with the system.
4.VPC protection
All the Alibaba Cloud server instances used by FlowHub Platform are running on a virtual private cloud (VPC) of Alibaba Cloud. When FlowHub Platform calls Alibaba Cloud resources, the virtual private cloud will provide FlowHub Platform with a proprietary virtual network environment that is logically isolated from other parts of Alibaba Cloud. Through the virtual private cloud, FlowHub Platform can use dedicated IP address ranges, sub-nets, routing tables and network gateways.
5.Firewall protection
FlowHub Platform uses the security group provided by Alibaba Cloud ECS to control computing resources. The security group is used as a virtual firewall to control the flow of network traffic in and out of FlowHub Platform. Each instance of FlowHub Platform is only related to the necessary security groups. By setting the rules of different security groups, FlowHub Platform realizes careful control of communication between other instances.
6.Vulnerability protection
Some third-party open source libraries and software were used in the development of FlowHub Platform. Like other software, these libraries and software may be found to have vulnerabilities over time. The FlowHub Platform team will take pre-action and regularly conduct vulnerability assessments. When potential risks are detected, timely remedial measures will be taken to ensure good protection of the system. In addition, Alibaba Cloud also provides users with vulnerability inspection services, and the security guidelines and recommendations it provides will be actively followed to improve the level of vulnerability protection of the system.
Business logic level security
1.Object de-identification
All entities in the FlowHub Platform system are represented by a UUID. To ensure uniqueness in practical applications, this UUID is represented by a 128-bit value. In actual use, the information of the entity cannot be determined using this UUID. For example, getting the UUID of a file can not get any information about the file, such as name, metadata, owner, creation date, belonging project, etc. Similarly, obtaining the UUID of an item will not bring any information about the item to the holder. Although the possible values of UUID are limited, FlowHub Platform only uses an extremely sparse subset of all possible values. Therefore, it is actually impossible for users to obtain information about other users or other items by guessing the UUID or deriving from the UUID they hold.
2.Careful access control
On the FlowHub Platform platform, access control is very careful. There are six types of permissions, including "manage", "upload", "view", "modify", "run" and "share", which can be set separately for each user of each project. Files are grouped according to projects, and each project member can have different permissions on the files. As a result, different users (members) can be assigned different permissions to ensure that they can only access information related to their work. These permissions also include sharing of data within the group.