Optuna - the choice of Kagglers

While doing HF RL course I bumped into Optuna, then I’ve noticed that ppl do it on Kaggle a lot - this is like a badge of honor for a package if it is being used on a top ML competition site ergo it is worth learning!

Some vocab to get started:

objective(trial) function - function to optimize
trial - a single test, also an object passed to the objective function
study - a set of trial at the end of which you get a suggestion of parameters to use
parameter - parameter to optimize
setting initial values for parameters to optimize:
- optuna.trial.Trial.suggest_categeorical(‘name’, [‘list’])
- optuna.trial.Trial.suggest_int(‘name’, min, max)
- optuna.trial.Trial.suggest_float(‘name’, min, max)

Here is a quick summary of how to use it:

Code

%pip install optuna

import optuna
# to supress unnecessary output as it prints quite a lot by default
optuna.logging.set_verbosity(optuna.logging.WARNING) 

# Task: with 100 trials find a minimum for a function (x-10)**2

# objective function to minimize
def objective(trial):
    # this is just returning float and internally in the trial optuna 
    # keeps track of all the values used
    x = trial.suggest_float("x", -100, 100) 
    return (x - 10)**2

# create optimization object that will keep track of the whole process
study = optuna.create_study() 

# and run optimization with 100 runs
study.optimize(objective, n_trials=100)

# get the optimized values
study.best_params['x'] # we are pretty close

10.108796067394927

# however it won't do magic if you don't give it enough "space"
# here if you give it just 10 trials, it will usually miss quite 
# substantially
def objective(trial):
    x = trial.suggest_float("x", -100, 100)
    return (x - 10)**2

study = optuna.create_study()
study.optimize(objective, n_trials=10)

study.best_params['x'] # ... it is usually not so good

-10.114148942248292

how optuna works internally is quite simple but ingenious: each call to trial.suggest_*() function already returns a python variable, so you can use it in your code straight away:

study = optuna.create_study()
def objective(trial):
    # trial.suggest_int returns integer - all the magic of storing
    # what value was used in a specific trial is recorded in trial object
    i = trial.suggest_int('x', 0, 100, step=10)
    print(f"next {i=}")
    return i

study.optimize(objective, n_trials=10)
study.best_params['x']

next i=0
next i=0
next i=90
next i=60
next i=70
next i=10
next i=10
next i=10
next i=60
next i=90

It is interesting at first how the numbers are drawn from the space - this is all quasi random and duplicates are possible. Especially if we have very limited space of available unique values as in here. This is not an implementation bug - here we deal with a single variable, but if we have multiple ones, it quite makes sense to try simillar values if we variate other parameters at the same time. This is default, but you can choose different strategires for drawing values.

Useful tricks!

Continue the optimization

# however with 10 trials...
def objective(trial):
    x = trial.suggest_float("x", -100, 100)
    return (x - 10)**2

study = optuna.create_study()
study.optimize(objective, n_trials=10)

print(f"After 10 trials we got {study.best_params['x']=:.2f}") # ... it is usually not so good

# but training for another 10 iterations does the trick
study.optimize(objective, n_trials=10)

print(f"... but 10 more runs get us closer {study.best_params['x']=:.2f}") # ok, now it is better :)

After 10 trials we got study.best_params['x']=1.35
... but 10 more runs get us closer study.best_params['x']=9.38

Use a db to tune on multiple machines

Or just run hiperparameter searches when your colab disconnects

Optuna allows for distrubuted trials

#straight from optuna docs @ https://optuna.readthedocs.io/en/stable/tutorial/10_key_features/004_distributed.html
def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2


if __name__ == "__main__":
    study = optuna.load_study(
        study_name="distributed-example", storage="mysql://root@localhost/example"
    )
    study.optimize(objective, n_trials=100)

Early trial stopping (vel pruning in Optuna’s terms)

It was a bit tricky to understand for me how it works, as it is usually hidden in handlers to specific libraries. But here is a clear example that doesn’t hide anything from you:

optuna.logging.set_verbosity(optuna.logging.INFO)
import random

def objective(trial):
    x = trial.suggest_float("x", -10, 10) # you draw next value

    # and this is an inner loop simulating inner loop in the 
    # optimization functions, like going through batches in
    # NN training
    for i in reversed(range(10)): 
        # here we just make up a number simulating intermediate result
        # that is sent to optuna to validate if it is worth continuing
        made_up_intermediate_value = random.randint(1, 10)
        # it is reported to optuna
        trial.report(made_up_intermediate_value, i)

        # Handle pruning based on the intermediate value.
        if trial.should_prune(): # Optuna suggests to prune?
            print(f'''\
Pruning trial {trial.number} with value {made_up_intermediate_value=}\n\
because it is already less optimal than previously recorded best value''', flush=True, end='')
            # if yes we throw exception that is handled by `optimize` method
            # in optuna
            raise optuna.TrialPruned() 



    return (x - 2) ** 2

study = optuna.create_study()
study.optimize(objective, n_trials=10)
study.best_params['x']

[I 2023-01-13 13:49:05,151] A new study created in memory with name: no-name-826daad7-a560-4139-8012-384a08aecae9
[I 2023-01-13 13:49:05,154] Trial 0 finished with value: 27.811950820406743 and parameters: {'x': 7.2737037099562905}. Best is trial 0 with value: 27.811950820406743.
[I 2023-01-13 13:49:05,157] Trial 1 finished with value: 81.56034995467249 and parameters: {'x': -7.0310768989458}. Best is trial 0 with value: 27.811950820406743.
[I 2023-01-13 13:49:05,162] Trial 2 finished with value: 1.8078821300442816 and parameters: {'x': 3.3445750741569924}. Best is trial 2 with value: 1.8078821300442816.
[I 2023-01-13 13:49:05,165] Trial 3 finished with value: 51.599347421072665 and parameters: {'x': 9.183268575034116}. Best is trial 2 with value: 1.8078821300442816.
[I 2023-01-13 13:49:05,168] Trial 4 finished with value: 87.26649837508509 and parameters: {'x': -7.341653942160622}. Best is trial 2 with value: 1.8078821300442816.

Pruning trial 5 with value made_up_intermediate_value=10
because it is already less optimal than previously recorded best value

[I 2023-01-13 13:49:05,170] Trial 5 pruned. 
[I 2023-01-13 13:49:05,181] Trial 6 finished with value: 26.47065704510398 and parameters: {'x': -3.1449642413824392}. Best is trial 2 with value: 1.8078821300442816.
[I 2023-01-13 13:49:05,185] Trial 7 finished with value: 29.163269432320956 and parameters: {'x': 7.400302716729957}. Best is trial 2 with value: 1.8078821300442816.

Pruning trial 8 with value made_up_intermediate_value=6
because it is already less optimal than previously recorded best value

[I 2023-01-13 13:49:05,187] Trial 8 pruned. 
[I 2023-01-13 13:49:05,194] Trial 9 finished with value: 38.838470030340275 and parameters: {'x': 8.232051831486984}. Best is trial 2 with value: 1.8078821300442816.

3.3445750741569924