Scaling out a combinatorial algorithm for discovering carcinogenic gene combinations to thousands of GPUs

Dash, Sajal; Al-Hajri, Qais; Feng, Wu-chun; Garner, Harold R.; Anandakrishnan, Ramu

Scaling out a combinatorial algorithm for discovering carcinogenic gene combinations to thousands of GPUs

dc.contributor.author	Dash, Sajal	en
dc.contributor.author	Al-Hajri, Qais	en
dc.contributor.author	Feng, Wu-chun	en
dc.contributor.author	Garner, Harold R.	en
dc.contributor.author	Anandakrishnan, Ramu	en
dc.date.accessioned	2024-03-04T15:11:38Z	en
dc.date.available	2024-03-04T15:11:38Z	en
dc.date.issued	2021-05-01	en
dc.description.abstract	Cancer is a leading cause of death in the US, second only to heart disease. It is primarily a result of a combination of an estimated two-nine genetic mutations (multi-hit combinations). Although a body of research has identified hundreds of cancer-causing genetic mutations, we don't know the specific combination of mutations responsible for specific instances of cancer for most cancer types. An approximate algorithm for solving the weighted set cover problem was previously adapted to identify combinations of genes with mutations that may be responsible for individual instances of cancer. However, the algorithm's computational requirement scales exponentially with the number of genes, making it impractical for identifying more than three-hit combinations, even after the algorithm was parallelized and scaled up to a V100 GPU. Since most cancers have been estimated to require more than three hits, we scaled out the algorithm to identify combinations of four or more hits using 1000 nodes (6000 V100 GPUs with ≈ 48× 106 processing cores) on the Summit supercomputer at Oak Ridge National Laboratory. Efficiently scaling out the algorithm required a series of algorithmic innovations and optimizations for balancing an exponentially divergent workload across processors and for minimizing memory latency and inter-node communication. We achieved an average strong scaling efficiency of 90.14% (80.96%-97.96% for 200 to 1000 nodes), compared to a 100 node run, with 84.18% scaling efficiency for 1000 nodes. With experimental validation, the multi-hit combinations identified here could provide further insight into the etiology of different cancer subtypes and provide a rational basis for targeted combination therapy.	en
dc.description.version	Accepted version	en
dc.format.extent	Pages 837-846	en
dc.identifier.doi	https://doi.org/10.1109/IPDPS49936.2021.00093	en
dc.identifier.isbn	9781665440660	en
dc.identifier.orcid	Feng, Wu-chun [0000-0002-6015-0727]	en
dc.identifier.uri	https://hdl.handle.net/10919/118249	en
dc.publisher	IEEE	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.title	Scaling out a combinatorial algorithm for discovering carcinogenic gene combinations to thousands of GPUs	en
dc.title.serial	Proceedings - 2021 IEEE 35th International Parallel and Distributed Processing Symposium, IPDPS 2021	en
dc.type	Conference proceeding	en
dc.type.other	Conference Proceeding	en
pubs.finish-date	2021-05-21	en
pubs.organisational-group	/Virginia Tech	en
pubs.organisational-group	/Virginia Tech/Engineering	en
pubs.organisational-group	/Virginia Tech/Engineering/Computer Science	en
pubs.organisational-group	/Virginia Tech/Faculty of Health Sciences	en
pubs.organisational-group	/Virginia Tech/All T&R Faculty	en
pubs.organisational-group	/Virginia Tech/Engineering/COE T&R Faculty	en
pubs.start-date	2021-05-17	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Feng-IPDPS17-Carcinogenesis-on-Summit.pdf
Size:: 2.72 MB
Format:: Adobe Portable Document Format
Description:: Accepted version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Plain Text
Description:

Download

Collections

All Faculty Deposits
Scholarly Works, Computer Science
Scholarly Works, Electrical and Computer Engineering