Databricks: Pass Parameters To Notebook (Python)

by Admin 49 views
Databricks: Pass Parameters to Notebook (Python)

Passing parameters to a Databricks notebook using Python is a common requirement when you want to create reusable and dynamic notebooks. Whether you're orchestrating workflows, running parameterized reports, or triggering notebooks from other applications, understanding how to pass parameters is crucial. In this comprehensive guide, we'll explore several methods to achieve this, complete with detailed examples and best practices.

Method 1: Using dbutils.widgets

The most straightforward and recommended method for passing parameters to a Databricks notebook is by using the dbutils.widgets utility. This utility allows you to define widgets within your notebook, which can then be populated with values when the notebook is executed. These widgets act as input parameters that your notebook can access and use in its computations. Let's dive into the details.

Defining Widgets

First, you need to define the widgets in your notebook. Databricks supports several types of widgets, including text, dropdown, combobox, and multiselect. Here’s how to define a simple text widget:

dbutils.widgets.text("param1", "", "Enter value for param1")

In this example:

  • "param1" is the name of the widget (parameter).
  • "" is the default value (empty string in this case).
  • "Enter value for param1" is the label that will be displayed to the user in the Databricks UI.

You can also define other types of widgets. For example, a dropdown widget:

dbutils.widgets.dropdown("param2", "option1", ["option1", "option2", "option3"], "Select an option")

Here:

  • "param2" is the name of the dropdown widget.
  • "option1" is the default selected option.
  • ["option1", "option2", "option3"] is the list of available options.
  • "Select an option" is the label.

Accessing Widget Values

Once the widgets are defined, you can access their values using the dbutils.widgets.get() method:

param1_value = dbutils.widgets.get("param1")
param2_value = dbutils.widgets.get("param2")

print(f"Value of param1: {param1_value}")
print(f"Value of param2: {param2_value}")

This code retrieves the values entered or selected in the widgets and stores them in the param1_value and param2_value variables. You can then use these values in your notebook's logic.

Example

Let's put it all together with a complete example:

# Define widgets
dbutils.widgets.text("input_name", "", "Enter your name")
dbutils.widgets.dropdown("greeting_type", "Hello", ["Hello", "Hi", "Greetings"], "Select greeting type")

# Get widget values
name = dbutils.widgets.get("input_name")
greeting = dbutils.widgets.get("greeting_type")

# Print a personalized greeting
print(f"{greeting}, {name}!")

When you run this notebook, Databricks will display the widgets at the top. After entering your name and selecting a greeting type, the notebook will print a personalized greeting using the provided values.

Advantages

  • User-Friendly Interface: Widgets provide an easy-to-use interface for users to input parameters directly in the Databricks UI.
  • Dynamic Notebooks: You can create notebooks that adapt their behavior based on the input parameters.
  • Integration with Workflows: Widgets are well-integrated with Databricks workflows, making it easy to pass parameters when scheduling or triggering notebooks.

Disadvantages

  • Manual Input: Requires manual input through the Databricks UI, which may not be suitable for fully automated processes.
  • Limited Types: The widget types are limited to the predefined options (text, dropdown, combobox, multiselect).

Method 2: Using %run Command

Another way to pass parameters to a Databricks notebook is by using the %run command. This command allows you to execute another notebook within the current notebook, and you can pass parameters by defining variables before running the target notebook. Here’s how it works.

Defining Variables

Before using the %run command, define the variables that you want to pass as parameters:

param1 = "value1"
param2 = 123

Executing the Target Notebook

Use the %run command followed by the path to the target notebook:

%run ./path/to/target_notebook $param1=$param1 $param2=$param2

In the target notebook, you can access these parameters directly as variables.

Example

Parent Notebook (Passing Parameters):

param1 = "Hello"
param2 = "World"

%run ./TargetNotebook $param1=$param1 $param2=$param2

Target Notebook (Receiving Parameters):

print(f"Param1: {param1}")
print(f"Param2: {param2}")

When you run the parent notebook, it will execute the target notebook and pass the param1 and param2 variables. The target notebook will then print the values of these parameters.

Advantages

  • Simple Syntax: The %run command provides a straightforward way to pass parameters.
  • Direct Variable Access: Parameters are directly accessible as variables in the target notebook.

Disadvantages

  • Limited to Variable Passing: Only variables defined in the parent notebook can be passed.
  • No User Interface: Does not provide a user interface for parameter input.
  • Potential for Namespace Conflicts: Be mindful of potential namespace conflicts if the target notebook uses the same variable names.

Method 3: Using the Databricks Jobs API

For more advanced use cases, such as scheduling notebooks or triggering them from external applications, you can use the Databricks Jobs API. This API allows you to programmatically create, manage, and run Databricks jobs, including passing parameters to notebooks.

Creating a Job

You can create a job using the Databricks Jobs API by sending a POST request to the /api/2.1/jobs/create endpoint. The request body should be a JSON object that defines the job configuration, including the notebook path and parameters.

Here’s an example of how to create a job using the Databricks CLI:

databricks jobs create --json '{