We will shortly be deprecating the Java-based download client also known as API V2.
The Java client began its life at the back end of 2014, and during its time has seen many developments that have served our users well. During its tenure, the Java client enabled distribution of over 6.5 million files totalling nearly 15 petabytes of data to researchers around the globe.
Recent increases in genomic and phenotypic data file sizes, the number of datasets archived in the EGA, the number of users downloading data, and the diversity of download locations has meant that at times the Java client struggled to distribute data efficiently. In response, our team of developers have built a new download client with more robust features and better functionality and reliability. Enter stage left our shiny new API V3.1!
The new API V3.1 will replace both API V2 and API V3. The download client for the new API V3.1 is written in Python, and the code is openly available on Github. This Python client makes https calls to the EGA AAI (https://ega.ebi.ac.uk:8443/) and to the EGA Data API (https://ega.ebi.ac.uk:8052), both of which must be accessible from the client location. In addition to general updates to improve performance, the API V3.1 has three new features enhancing the EGA distributions services.
First, requested data are now downloaded over secure https connections rather than http, meaning that data can be delivered unencrypted. Instead of separately requesting, downloading, and decrypting data using the Java client, data can now be requested and downloaded using a single command without the need for decrypting. The API V3.1 also automatically verifies each file’s unencrypted md5 after download to ensure the file has downloaded correctly.
Second, the new API supports segmenting and automatic resuming. To enable quicker download speeds, the API breaks files into up to four segments and downloads them in parallel. With the new resume feature, downloads - both segmented and continuous stream - will automatically resume if they encounter any errors or the connection is interrupted. These features result in both a faster and more reliable download experience.
Finally, the new API V3.1 implements the GA4GH-compliant htsget protocol for supporting requests over genomic ranges. This exciting new feature means that for data files with accompanying index files (e.g. .crai for CRAM files) users can download specific regions of interest rather than the entire file saving both time and storage space.
The new features of the Python-based API V3.1 represent significant improvements in security, speed, reliability, and flexibility that are sure to provide a better data download experience. To take full advantage of these features, users must update to Python 3.X. Considering the changes in the data structures with the new API, the upgrade to a new version of the Python client is also mandatory. Updating the client to the latest version can be achieved by running the following command: pip3 install pyega3 -U
We are excited for users to try out the new features and take advantage of the multifariously growing EGA distribution services for human biomedical omics data. If you have any questions or feedback, please email the EGA Helpdesk team. We'd love to hear from you!