benchmark/README.md

Benchmarking Setup Documentation

1. benchmark_{Name_of_Dataset}.R

These R scripts contain the benchmarking logic for the MSstats-based proteomics data analysis. They execute specific workflows to analyze proteomics datasets, compute metrics (e.g., False Discovery Rate (FDR)), and ensure the validity of the MSstats library's updates. The scripts read input data, process it, and output benchmarking metrics.

2. config.slurm

This SLURM configuration file automates the execution of the R benchmarking script on an HPC system. It includes directives for resource allocation, job naming, and runtime limits. It ensures efficient utilization of HPC resources for running computationally intensive workflows.

3. benchmark.yml

This YAML configuration file is part of a GitHub Actions pipeline. It defines workflows for automating benchmarking tasks. The file contains instructions for setting up the R environment, pulling the required repositories, and executing the benchmarks.

Setup Instructions for New Users

1. Prerequisites

Ensure you have access to the following: - An HPC account with SLURM job scheduler. - Required R dependencies installed (check benchmark_{Name_of_Dataset}.R for library imports). - A GitHub account with access to the repository containing these files.

2. Setup HPC Environment

  1. Transfer the benchmark R scripts and config.slurm files to your HPC environment.
  2. Modify the config.slurm file to include your job-specific parameters (e.g., email, account name, partitions).
  3. Submit the job using sbatch config.slurm.

3. Setup GitHub Actions

  1. Place the benchmark.yml file in the .github/workflows/ directory of your repository.
  2. Configure the benchmark.yml file with appropriate paths and repository settings.
  3. Push the changes to your repository to trigger the pipeline.

4. Verify Execution

  1. Check the SLURM job output logs for successful execution of the benchmark_{Name_of_dataset}.R script.
  2. Validate that the benchmarking metrics are generated correctly.
  3. Monitor the GitHub Actions logs to ensure the workflows execute without errors.

SSH Access Setup for a New User

Why is SSH Needed

In this setup, SSH is needed to securely connect to the HPC cluster, submit SLURM jobs, and transfer benchmarking scripts and results. Private keys without passwords are essential for automation, allowing GitHub Actions to authenticate and run benchmarks without manual input. This ensures smooth execution of workflows, enabling continuous integration without interruptions. It also enhances security by eliminating the risks of password-based authentication while maintaining controlled access.

Steps to Set Up SSH Access for a New User

1. Generate SSH Key Pair

On the new user's local machine, generate an SSH key pair (if not already created):

ssh-keygen -t rsa -b 4096 -C "new_user_email@example.com"

Example : current user email configured is : raina.ans@login-00.discovery.neu.edu
You can check this by navigating to shell through Discovery Cluster Dashboard > Clusters > Discovery Shell Access

2. Copy the Public Key to the Remote Server

Manually Copy the Key

  1. SSH into the remote server using an existing account with sufficient privileges: bash ssh existing_user@remote_server (e.g. raina.ans@login-00.discovery.neu.edu)

  2. Append the public key to the authorized_keys file: bash mkdir -p ~/.ssh echo "paste_the_public_key_here" >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys chmod 700 ~/.ssh

3. Verify the New User's SSH Access

From the new user's local machine, attempt to log in to the remote server:

ssh new_user@remote_server

If successful, the new user should be logged into the remote server.

4. Add private key as secret in the MSStats repo

Adding a Private Key as a GitHub Secret

  1. Navigate to your GitHub repository.
  2. Go to Settings > Secrets and variables > Actions.
  3. Enter a name (e.g., SSH_PRIVATE_KEY).
  4. Click New repository secret > Add secret. -->

Using the Secret in a GitHub Actions Workflow

To use this secret in a GitHub Actions workflow, see the current configured secret's use case.

Notes



MeenaChoi/MSstats documentation built on Feb. 9, 2025, 11:23 a.m.