Rules

The rules and information about the HSC2024 can also be found in this document HSC2024.pdf . The document is also available in arXiv: https://arxiv.org/abs/2406.04123.

How to enter the competition

  1. Register before 1. September 2024 23:59 (UTC+3) using this electronic form. The teams don’t need to register multiple times in case they decide to submit more than one algorithm to the challenge.
    (If you missed the registration deadline, please contact us via email)
  2. Send your submission via email (hsc2024@helsinki.fi) before 6. October 2024 23:59 (UTC+3). See the instructions below, on what needs to be submitted.
  3. Make your Github repository public before 27. October 2024.

Rules of the competition

Scoring

The core evaluation of the participants is as follows:

  • All participants start with 0 points. To advance to the next level, a group must achieve a Mean CER below 0.3 on the current level. If the Mean CER of the noisy data is already below 0.3, participants must achieve a Mean CER lower than that of the noisy data. Additionally, they must pass a sanity check (see below).
  • Upon completing a level of a specific task, the team gains one point and can proceed to the next level of that task. Note that tasks can be completed independently of each other.
  • The winner is the team with the most points, i.e., the team that completes the most levels. In the event of a tie, the winner will be determined by the average Mean CER across all completed levels.

The sanity check will consist of the judges listening to a few predetermined test files containing speech passed through the algorithm, that may or may not be obtained through a different synthesis process than the training data. The groups will only pass the sanity check if it is clear that the recovered audio comes from the same speaker as the clean and noisy data.

Additional Rules

  • Submissions must adhere to the guidelines outlined in Section Submission.
  • Each group is allowed a maximum of three submissions.
  • Models must handle 16 kHz audio data of arbitrary length.
  • Participants should primarily use the provided dataset and avoid augmenting it with external data. If additional data is used, it must be explicitly stated, along with results demonstrating its impact on performance. Generating new data from the provided dataset (e.g., creating noisy data from clean data) is permitted. Using the OpenAI tts-1 model for anything other than testing is not allowed, as this is a commercial text-to-speech software, and would skew results in favour to those who purchase a subscription.
  • Although participants receive matching text for evaluation, they are prohibited from using speech recognition models during training. This includes optimizing or backpropagating through models like DeepSpeech. Parameter tuning based on the test script output is allowed. Participants are encouraged to explore and report on this approach outside of the official submission.
  • Participants are encouraged to create lightweight models. The Real-Time Factor (RTF), defined as processing time divided by audio length, must average no more than 3. All groups’ RTFs will be reported. Models achieving an RTF below 1 are particularly encouraged. Evaluation will be conducted on a modern workstation with a GPU.

Submission

The algorithms must be shared with us as a private GitHub repository at latest on the deadline. The codes should be in Matlab or Python3.

After the deadline there is a brief period during which we can troubleshoot the codes together with the participants. This is to ensure that we are able to run the codes.

Participants can update the contents of the shared repository as many times as needed before the deadline. We will consider only the latest release of your repository on Github.

Your repository must contain a README.md file with at least the following sections:

  •  Authors, institution, location
  • Brief description of your algorithm and a mention of the competition.
  • Installation instructions, including any requirements.
  • Usage instructions.
  • An illustration of some examples, either audio files or spectrograms.

The repository must contain a main routine that we can run to apply your algorithm automatically to every audio file in a given directory, and store the result with the same name in a different folder. This is the file we will run to evaluate your code. Give it an easy to identify name like main.m or main.py.

Your main routine must require three input arguments:

  • (string) Folder where the input audio files are located.
  • (string) Folder where the output output audio files will be stored.
  • (string) task ID on the form TXLY, where X is the task and Y is the level.

Below are the expected formats of the main routines in python and Matlab:

function main(inputFolder,outputFolder,taskID)
...

your code comes here
...

Example calling the function:

>> main('path/to/input/files', 'path/to/output/files', T1L3)

Python: The main function must be a callable function from the command line. To achieve this you can use sys.argv or argparse module.

Example calling the function:

$ python3 main.py path/to/input/files path/to/output/files T1L3

The main routine must produce deconvolved audio files in the output folder with the same name for each audio file in the input folder, saved in .wav format. There is no requirement that the audio files are of the exact same length, but they should not be much longer, and they should be 16-bit 16kHz audio files.

The teams are allowed to use freely available python modules or Matlab toolboxes. Toolboxes, libraries and modules with paid licenses can also be used if the organizing committee also have the license. For example, the most usual Matlab toolboxes for audio processing and deconvolutioncan be used (audio toolbox, wavelet toolbox, PDE toolbox, deep learning toolbox, optimization toolbox). For Python, we recommend using the Librosa package and/or PyTorch/Torchaudio. The teams can contact us to check if other toolboxes and packages are available.

Finally, the competitors must make their GitHub repositories public at latest on 27. October2024. In the spirit of open science, only a public code can win the data challenge.