AlphaFold 3 structure prediction script.

AlphaFold 3 source code is licensed under CC BY-NC-SA 4.0. To view a copy of
this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/

To request access to the AlphaFold 3 model parameters, follow the process set
out at https://github.com/google-deepmind/alphafold3. You may only use these
if received directly from Google. Use is subject to terms of use available at
https://github.com/google-deepmind/alphafold3/blob/main/WEIGHTS_TERMS_OF_USE.md

flags:

/app/alphafold/run_alphafold.py:
  --buckets: Strictly increasing order of token sizes for which to cache
    compilations. For any input with more tokens than the largest bucket size, a
    new bucket is created for exactly that number of tokens.
    (default: '256,512,768,1024,1280,1536,2048,2560,3072,3584,4096,4608,5120')
    (a comma separated list)
  --conformer_max_iterations: Optional override for maximum number of iterations
    to run for RDKit conformer search.
    (an integer)
  --db_dir: Path to the directory containing the databases. Can be specified
    multiple times to search multiple directories in order.;
    repeat this option to specify a list of values
    (default: "['/home/appadm/public_databases']")
  --flash_attention_implementation: <triton|cudnn|xla>: Flash attention
    implementation to use. 'triton' and 'cudnn' uses a Triton and cuDNN flash
    attention implementation, respectively. The Triton kernel is fastest and has
    been tested more thoroughly. The Triton and cuDNN kernels require Ampere
    GPUs or later. 'xla' uses an XLA attention implementation (no flash
    attention) and is portable across GPU devices.
    (default: 'triton')
  --[no]force_output_dir: Whether to force the output directory to be used even
    if it already exists and is non-empty. Useful to set this to True to run the
    data pipeline and the inference separately, but use the same output
    directory.
    (default: 'false')
  --gpu_device: Optional override for the GPU device to use for inference.
    Defaults to the 1st GPU on the system. Useful on multi-GPU systems to pin
    each run to a specific GPU.
    (default: '0')
    (an integer)
  --hmmalign_binary_path: Path to the Hmmalign binary.
    (default: '/hmmer/bin/hmmalign')
  --hmmbuild_binary_path: Path to the Hmmbuild binary.
    (default: '/hmmer/bin/hmmbuild')
  --hmmsearch_binary_path: Path to the Hmmsearch binary.
    (default: '/hmmer/bin/hmmsearch')
  --input_dir: Path to the directory containing input JSON files.
  --jackhmmer_binary_path: Path to the Jackhmmer binary.
    (default: '/hmmer/bin/jackhmmer')
  --jackhmmer_n_cpu: Number of CPUs to use for Jackhmmer. Default to
    min(cpu_count, 8). Going beyond 8 CPUs provides very little additional
    speedup.
    (default: '8')
    (an integer)
  --jax_compilation_cache_dir: Path to a directory for the JAX compilation
    cache.
  --json_path: Path to the input JSON file.
  --max_template_date: Maximum template release date to consider. Format: YYYY-
    MM-DD. All templates released after this date will be ignored. Controls also
    whether to allow use of model coordinates for a chemical component from the
    CCD if RDKit conformer generation fails and the component does not have
    ideal coordinates set. Only for components that have been released before
    this date the model coordinates can be used as a fallback.
    (default: '2021-09-30')
  --mgnify_database_path: Mgnify database path, used for protein MSA search.
    (default: '${DB_DIR}/mgy_clusters_2022_05.fa')
  --model_dir: Path to the model to use for inference.
    (default: '/home/appadm/models')
  --nhmmer_binary_path: Path to the Nhmmer binary.
    (default: '/hmmer/bin/nhmmer')
  --nhmmer_n_cpu: Number of CPUs to use for Nhmmer. Default to min(cpu_count,
    8). Going beyond 8 CPUs provides very little additional speedup.
    (default: '8')
    (an integer)
  --ntrna_database_path: NT-RNA database path, used for RNA MSA search.
    (default:
    '${DB_DIR}/nt_rna_2023_02_23_clust_seq_id_90_cov_80_rep_seq.fasta')
  --num_diffusion_samples: Number of diffusion samples to generate.
    (default: '5')
    (a positive integer)
  --num_recycles: Number of recycles to use during inference.
    (default: '10')
    (a positive integer)
  --num_seeds: Number of seeds to use for inference. If set, only a single seed
    must be provided in the input JSON. AlphaFold 3 will then generate random
    seeds in sequence, starting from the single seed specified in the input
    JSON. The full input JSON produced by AlphaFold 3 will include the generated
    random seeds. If not set, AlphaFold 3 will use the seeds as provided in the
    input JSON.
    (a positive integer)
  --output_dir: Path to a directory where the results will be saved.
  --pdb_database_path: PDB database directory with mmCIF files path, used for
    template search.
    (default: '${DB_DIR}/mmcif_files')
  --rfam_database_path: Rfam database path, used for RNA MSA search.
    (default: '${DB_DIR}/rfam_14_9_clust_seq_id_90_cov_80_rep_seq.fasta')
  --rna_central_database_path: RNAcentral database path, used for RNA MSA
    search.
    (default: '${DB_DIR}/rnacentral_active_seq_id_90_cov_80_linclust.fasta')
  --[no]run_data_pipeline: Whether to run the data pipeline on the fold inputs.
    (default: 'true')
  --[no]run_inference: Whether to run inference on the fold inputs.
    (default: 'true')
  --[no]save_embeddings: Whether to save the final trunk single and pair
    embeddings in the output.
    (default: 'false')
  --seqres_database_path: PDB sequence database path, used for template search.
    (default: '${DB_DIR}/pdb_seqres_2022_09_28.fasta')
  --small_bfd_database_path: Small BFD database path, used for protein MSA
    search.
    (default: '${DB_DIR}/bfd-first_non_consensus_sequences.fasta')
  --uniprot_cluster_annot_database_path: UniProt database path, used for protein
    paired MSA search.
    (default: '${DB_DIR}/uniprot_all_2021_04.fa')
  --uniref90_database_path: UniRef90 database path, used for MSA search. The MSA
    obtained by searching it is used to construct the profile for template
    search.
    (default: '${DB_DIR}/uniref90_2022_05.fa')

Try --helpfull to get a list of all flags.