System Architecture

Overview

From a user’s perspective, the BV-BRC systems are primarily interfaced through the BV-BRC website using a standard modern web browser. The BV-BRC website server software is designed to be hosted by industry standard application containers and is deployable in several different configurations. The server software, as well as the client interface (browser application), rely upon several additional systems to successfully build, deploy, and provide querying and analysis services to these applications.

Direct support for the website application is provided by several different databases and services. These services typically support the interactive capabilities of the website and its users. For example, Solr database instances provide all the scientific querying capabilities against the BV-BRC data. The BV-BRC website and application aggregates the data and capabilities of BV-BRC services to present them interactively to the user.

Several other key components are used—and are critical—to the BV-BRC project, but don’t directly support the BV-BRC website itself in production. Examples include data analysis services, software, and databases used to collect, analyze and annotate BV-BRC data prior to release and deployment to the production BV-BRC website. Additionally, software and scripts to manage the data, services, and database loading & extraction are required.

Software Architecture

The Software Architecture section of this document describes the general use and interaction of the components that make up the BV-BRC website and its direct and indirect components. Some of these components within BV-BRC’s architecture are from third-party sources, so their architecture and deployment process will not be detailed here except where relevant to the understanding of the overall architecture described by the BV-BRC Systems Documentation.

BV-BRC Website

The Browser Application

The user’s web browser is the host of the entire BV-BRC website and is more accurately described as a Web Application. Logically, the set of pages that the BV-BRC Web Server provides make up the entirety of the BV-BRC Web Application. A user’s state is maintained across all pages at any one time, and from the user’s perspective, they are navigating through the interactive space of BV-BRC. It could then be considered that each page is the host for one or more individual applications that communicate with the web server to provide an interactive experience.

The browser application is written with ECMAScript (Javascript, JS) and utilizes the Dojo Toolkit framework. It communicates with the server via HTTP Requests (AJAX). The browser application is part of the BV-BRC Server application and intermingled with other content on pages generated by the BV-BRC Server. However, browser application run in the user’s web browser, on a different network endpoint from the server, may be restarted (when a page reloads) at any time, may be composed of “mashup” data from external sites, and so they require independent consideration from the server side of the BV-BRC website.

The browser application is fully supported by modern, up-to-date web browsers. Support for specific UI functionalities may degrade if the underlying browser does not support it. Instead of requiring all users to conform to a specific set of browsers, we prefer to provide the best support possible for modern browsers, and aim to support older versions via fallback mechanisms or degraded functionality in certain areas of the application. Browsers that are currently known to work are Chrome, Firefox, Safari, and IE7+. Some applications (pages) may require Adobe Flash for full functionality.

Source Code: https://github.com/BV-BRC/BV-BRC-Web

Web Application Server

This component serves the web content to client browsers. It is currently comprised of an Express.js application running in a Node.js webserver. It serves HTML, CSS, JavaScript, and images to client browsers. The bulk of the user interface is implemented in the Browser Application, which itself is built upon the Dojo Toolkit framework, as well as many other libraries used to implement features and functionality. The files are stored in the BV-BRC GitHub repository linked below.

Source Code: https://github.com/BV-BRC/BV-BRC-Web

Static Content

Static content refers to electronic documents that contain the web application’s main Use Case / Tutorial, command line interface Use Case / Tutorial, Quick Reference Guides, and BV-BRC news. The contents of these documents are served independently of the main web server software and are publicly accessible. This site provides an RSS feed, which the main website application consumes and displays on its front page. Files are converted to HTML using the Python-based Sphinx documentation generator. The files are stored in the BV-BRC GitHub repository linked below.

Source Code: https://github.com/BV-BRC/BV-BRC-Docs

Workspace

The Workspace is an online document-based data store where data is organized into user-owned directories, analogous to Dropbox or Google Drive. Any top-level directory may be shared with multiple users to enable collaborative work on uploaded data (another feature similar to Dropbox or Google Drive).

Source Code: https://github.com/PATRIC3/Workspace

Workspace API:

The Workspace is connected to the rest of the BV-BRC tools and website via a programmatic JSON-RPC API.

The API has 11 commands:

  • create: allows for the creation of a directory or a data object itself

  • get: allows for retrieval of an object from the workspace

  • ls: list the objects present in a particular directory of the workspace

  • copy: copy an object from one location to another

  • delete: delete an object

  • set_permissions: set permissions on a top-level directory to share with another user

  • list_permissions: list permissions currently set for a top-level directory

  • get_download_url: allows for retrieval of a RESTful URL to download an object

  • get_archive_url: allows for retrieval of a RESTful URL to download an archive of multiple objects

  • update_metadata: allows for the manipulation of metadata associated with an object or directory in the Workspace

  • update_auto_meta: an internal function enabling the update of automated-metadata for an object

The associated resource is located here: https://p3.theseed.org/services/Workspace

Data formats:

Objects of any type may be stored in the workspace, but most typically objects are simple text files, often stored in JSON format. Additionally, all objects are assigned a type (e.g., Genome, Model, FeatureSet), and this type indicates how the object is treated when viewed on the BV-BRC website, as well as the handling of the object by automated processing scripts built into the workspace. The types accepted by the workspace are configurable and completely extensible.

Database structure:

The workspace uses MongoDB to store the directory structure, directory permissions, object lists, and object metadata. The objects themselves are stored either in Shock (typically for very large objects) or in a simple file-system. Because of its connection to Shock, the workspace supports federated data storage, which enables the handling of big data.

Object processing:

When an object is saved to the workspace, it always undergoes a processing step, the specific actions of which depend on the type on the object. This step computes automated metadata for the object to facilitate object query and summary, but it can also handle other tasks as needed (e.g., indexing in Solr).

Download service:

In order to support transparent and efficient downloading of data files from the workspace, the Download Service allows the BV-BRC website to provide URL-based access to private files in the workspace. Access to these URLs do not require a password; to ensure privacy, they are un-guessable hashes and are only valid for a short time.

Data API

The data API provides access to querying, retrieval, and indexing of public BV-BRC data and for private annotated data. The API provides a REST interface to the rich data BV-BRC provides. The data can be retrieved directly by ID or it can be queried using the Request Query Language (RQL) syntax or using Solr syntax. As queries are submitted to the API they are modified and submitted to the backend data sources (Solr) to retrieve the data that is visible to the user. Users are able to view public data, any data they own, or any data that another user has shared with them.

Source Code: https://github.com/PATRIC3/p3_api

Data API:

The data API has two functions for each data type:

  • get()

  • query()

The associated resources are, respectively:

In addition to the API for querying and retrieving data, there is also an API endpoint for submitting new data to the system to be indexed in the database.

Command-line Interface (CLI)

BV-BRC is an integration of different types of data and software tools that support research on bacterial pathogens. The typical biologist seeking access to the BV-BRC data and tools will usually explore the web-based user interface. However, there are many instances in which programatic or command-line interfaces are more suitable, specially for querying data or submitting jobs in batch mode. For users that wish command-line access to BV-BRC, we provide the tools described in this document. We call these tools the P3-scripts. They are intended to run on your machine, going over the network to access the services provided by BV-BRC.

Source Code and Client Application: https://github.com/PATRIC3/PATRIC-distribution/

Databases

BV-BRC data is stored Solr and indexed in its entirety (all fields) as BV-BRC releases data. Solr then provides read-only searching services to both the server and browser side of the BV-BRC via HTTP requests. A standard Solr 6 installation can host the BV-BRC data, but the deployment of Solr can be accomplished in a number of different ways that can have a dramatic impact on performance for many of the BV-BRC activities. The performance of the Solr service is heavily memory dependent. It is important, at a minimum, to be able to fit the entire set of data indexes into memory. Additionally, cache and other such tunable parameters can require additional memory. In any deployment, this physical limitation of the available resources is likely to be one of the key defining factors for Solr configuration and performance.

Source Code: https://github.com/PATRIC3/patric_solr_cloud

User Service

The user service provides user profile management and authentication for the BV-BRC system. The user system provides a REST interface to read and modify a user’s profile. It also provides authentication services for the BV-BRC web application and related components. The backend services consume authentication tokens that are generated by the user service.

Source Code: https://github.com/PATRIC3/p3_user

Web/Proxy Server

All BV-BRC websites and web applications run behind a web server that hosts static files, proxy requests to underlying application servers, and in some cases, load balancing amongst the web server instances. This component is not a strict requirement for deployment of the BV-BRC infrastructure in its basic form, but it simplifies the deployment process and is the current method used for load balancing. NGINX is deployed on hosts that contain the websites on the standard HTTP and HTTPS ports (80,443), while the underlying applications deploy on unused ports. NGINX is configured to proxy requests to these localhosts using its Named Virtual Hosting system.

App Service

The BV-BRC resource supports a number of computational services (e.g., genome assembly and annotation, model production, etc.). These services are hosted on an extensible set of computational resources at Argonne. The interface between the user’s interaction with the BV-BRC website and the computational resources is called the App Service. The App Service presents a unified view of all supported services, allowing the user to submit requests, monitor progress, and view results within a common framework on the BV-BRC website. For the developers, the App Service enables the development of new applications without the need to handle the details of process execution and management.

Source Code: https://github.com/PATRIC3/app_service

App Service API:

The App Service is connected to the rest of the BV-BRC tools and website via a programmatic JSON-RPC API. The API has 6 commands:

  • enumerate_apps

  • start_app

  • query_tasks

  • query_task_summary

  • query_task_details

  • enumerate_tasks

The associated resource is: https://p3.theseed.org/services/app_service

Hardware Deployment

The hardware hosted at Argonne National Laboratory on behalf of the University of Chicago’s bioinformatics computing core supporting the BV-BRC services are as follows:

  • Production support services

    • 24 x E5-2620 CPUs

    • 256 GB RAM

  • Production support services

    • 40 x E5-2640 CPUs

    • 768 GB RAM

  • User Data Management and Compute Scheduling

    • 12 x E5-2620 CPUs

    • 256 GB RAM

  • Solr Cloud servers (x3)

    • 32 Xeon Gold 6134 CPUs

    • 760 GB RAM

    • 5.3 TB SSD storage

  • ARAST Server and Primary Compute

    • 12 x E5-2620 CPUs

    • 256 GB RAM

  • Compute server

    • 12 x E5-2620 CPUs

    • 256 GB RAM

  • Compute server (3)

    • 32 x Xeon Gold 6134 CPUs

    • 786 GB RAM

  • Loadbalanced / Failover Proxy Server

    • 2 systems, each 4 CPUs, 64GB RAM, 10Gb network

Storage is provided to the above systems through Fibre Channel SAN storage. The Solr portion of BV-BRC and the FTP site are currently consuming approximately 10 TB of storage.