.. _parallelization: ======================= Parallelization in DELi ======================= DELi as a package does not implement any type of parallelization. It is designed to run on a single CPU core. This is in part because most analysis methods are quick enough that no parallelization is needed. The only place where parallelization could be useful is in enumeration and decoding. Yet, the ``mp`` module from python is not great and both these processes are `embarrassingly parallel `_. This means it's better to just run separate jobs on your system than to let python do it for you. There are a few hiccups to overcome. For example in decoding you need to break up you fastq file into chunks, and at the end the UMI corrected counters need to all be loaded at once to get the right counts (others the same UMIs could be double counted). For enumeration you need to figure out how to split the library building block up into subsets so each job enumerates only part of the library. These are not always trivial issues, but it is also not very hard to write a script to handle them. To help show how this works, DELi has some example scripts using `Nextflow `_ to parallelize decoding. The DELi team will continue to provide example Nextflow scripts for future tasks as well, as Nextflow works well with HPC systems (which most academic labs are limited too) *and* a cloud environment since it is file based. Running parallel decoding with Nextflow --------------------------------------- You can run the nextflow script very similar the the decoding CLI command in DELi. The largest difference is you can provide a chunk size which will determine how many jobs are needed. .. code-block:: shell nextflow run decode.nf ./example_decode.yaml --chunk_size 100000 One thing to keep in mind is that since Nextflow is file based, copies of the FASTQ files will be made. If these are large, you can need alot of disk space. This workflow also requires the raw counts for each DEL compound and their UMIs are saved and then merged at the end (all loaded into memory). This step would be required even if DELi handled parallelization internally, but with Nextflow we can limit the large memory usage to the end, limiting how long we need the big memory VM for (and saving cost).