< Back
Add-DatabricksNotebookJob
Post
NAME Add-DatabricksNotebookJob
SYNOPSIS
Creates Notebook Job in Databricks. Script uses Databricks API 2.0 create job query:
https://docs.azuredatabricks.net/api/la ... tml#create
SYNTAX
Add-DatabricksNotebookJob [[-BearerToken] <String>] [[-Region] <String>] [-JobName] <String> [[-ClusterId]
<String>] [[-SparkVersion] <String>] [[-NodeType] <String>] [[-DriverNodeType] <String>] [[-MinNumberOfWorkers]
<Int32>] [[-MaxNumberOfWorkers] <Int32>] [[-Timeout] <Int32>] [[-EmailAlertsOnFailure] <String>]
[[-EmailAlertsOnStart] <String>] [[-EmailAlertsOnSuccess] <String>] [-noAlertSkippedRuns] [[-MaxRetries] <Int32>]
[[-ScheduleCronExpression] <String>] [[-Timezone] <String>] [-NotebookPath] <String> [[-NotebookParametersJson]
<String>] [[-Libraries] <String[]>] [[-PythonVersion] <String>] [[-Spark_conf] <Hashtable>] [[-CustomTags]
<Hashtable>] [[-InitScripts] <String[]>] [[-SparkEnvVars] <Hashtable>] [-RunImmediate] [[-ClusterLogPath]
<String>] [[-InstancePoolId] <String>] [<CommonParameters>]
DESCRIPTION
Creates Notebook Job in Databricks. Script uses Databricks API 2.0 create job query:
https://docs.azuredatabricks.net/api/la ... tml#create
If the job name exists it will be updated instead of creating a new job.
PARAMETERS
-BearerToken <String>
Your Databricks Bearer token to authenticate to your workspace (see User Settings in Datatbricks WebUI)
Required? false
Position? 1
Default value
Accept pipeline input? false
Accept wildcard characters? false
-Region <String>
Azure Region - must match the URL of your Databricks workspace, example: northeurope
Required? false
Position? 2
Default value
Accept pipeline input? false
Accept wildcard characters? false
-JobName <String>
Name of the job that will appear in the Job list. If a job with this name exists
it will be updated.
Required? true
Position? 3
Default value
Accept pipeline input? false
Accept wildcard characters? false
-ClusterId <String>
The ClusterId of an existing cluster to use. Optional.
Required? false
Position? 4
Default value
Accept pipeline input? false
Accept wildcard characters? false
-SparkVersion <String>
Spark version for cluster that will run the job. Example: 5.3.x-scala2.11
Note: Ignored if ClusterId is populated.
Required? false
Position? 5
Default value
Accept pipeline input? false
Accept wildcard characters? false
-NodeType <String>
Type of worker for cluster that will run the job. Example: Standard_D3_v2.
Note: Ignored if ClusterId is populated.
Required? false
Position? 6
Default value
Accept pipeline input? false
Accept wildcard characters? false
-DriverNodeType <String>
Type of driver for cluster that will run the job. Example: Standard_D3_v2.
If not provided the NodeType will be used.
Note: Ignored if ClusterId is populated.
Required? false
Position? 7
Default value
Accept pipeline input? false
Accept wildcard characters? false
-MinNumberOfWorkers <Int32>
Number of workers for cluster that will run the job.
Note: If Min & Max Workers are the same autoscale is disabled.
Note: Ignored if ClusterId is populated.
Required? false
Position? 8
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-MaxNumberOfWorkers <Int32>
Number of workers for cluster that will run the job.
Note: If Min & Max Workers are the same autoscale is disabled.
Note: Ignored if ClusterId is populated.
Required? false
Position? 9
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-Timeout <Int32>
Timeout, in seconds, applied to each run of the job. If not set, there will be no timeout.
Required? false
Position? 10
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-EmailAlertsOnFailure <String>
A string of email accounts that will receive an email if the job is failed
Example "andrea.lewis@microsoft.com,maria.wood@microsoft.com"
Required? false
Position? 11
Default value
Accept pipeline input? false
Accept wildcard characters? false
-EmailAlertsOnStart <String>
A string of email accounts that will receive an email if the job is started
Example "bob.orear@microsoft.com,bob.greenberg@microsoft.com"
Required? false
Position? 12
Default value
Accept pipeline input? false
Accept wildcard characters? false
-EmailAlertsOnSuccess <String>
A string of email accounts that will receive an email if the job is succeeded
Example "marc.mcdonald@microsoft.com,gordon.letwin@microsoft.com"
Required? false
Position? 13
Default value
Accept pipeline input? false
Accept wildcard characters? false
-noAlertSkippedRuns [<SwitchParameter>]
Switch.
if set, do not send email to recipients specified in on_failure if the run is skipped.
Required? false
Position? named
Default value False
Accept pipeline input? false
Accept wildcard characters? false
-MaxRetries <Int32>
An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it
completes with a FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry
indefinitely and the value 0 means to never retry. If not set, the default behavior will be never retry.
Required? false
Position? 14
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-ScheduleCronExpression <String>
By default, job will run when triggered using Jobs UI or sending API request to run. You can provide cron
schedule expression for job's periodic run. How to compose cron schedule expression:
http://www.quartz-scheduler.org/documen ... on-06.html
Required? false
Position? 15
Default value
Accept pipeline input? false
Accept wildcard characters? false
-Timezone <String>
Timezone for Cron Schedule Expression. Required if ScheduleCronExpression provided. See here for all possible
timezones: http://joda-time.sourceforge.net/timezones.html
Example: UTC
Required? false
Position? 16
Default value
Accept pipeline input? false
Accept wildcard characters? false
-NotebookPath <String>
Path to the Notebook in Databricks that will be executed by this Job.
Required? true
Position? 17
Default value
Accept pipeline input? false
Accept wildcard characters? false
-NotebookParametersJson <String>
Required? false
Position? 18
Default value
Accept pipeline input? false
Accept wildcard characters? false
-Libraries <String[]>
Optional. Array of json strings. Example: '{"pypi":{package:"simplejson"}}', '{"jar",
"DBFS:/mylibraries/test.jar"}'
Required? false
Position? 19
Default value
Accept pipeline input? false
Accept wildcard characters? false
-PythonVersion <String>
2 or 3 - defaults to 2.
Required? false
Position? 20
Default value 3
Accept pipeline input? false
Accept wildcard characters? false
-Spark_conf <Hashtable>
Hashtable.
Example @{"spark.speculation"=$true; "spark.streaming.ui.retainedBatches"= 5}
Required? false
Position? 21
Default value
Accept pipeline input? false
Accept wildcard characters? false
-CustomTags <Hashtable>
Custom Tags to set, provide hash table of tags. Example: @{CreatedBy="SimonDM";NumOfNodes=2;CanDelete=$true}
Required? false
Position? 22
Default value
Accept pipeline input? false
Accept wildcard characters? false
-InitScripts <String[]>
Init scripts to run post creation. Example: "dbfs:/script/script1", "dbfs:/script/script2"
Required? false
Position? 23
Default value
Accept pipeline input? false
Accept wildcard characters? false
-SparkEnvVars <Hashtable>
An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs
of the form (X,Y) are exported as is (i.e., export X='Y') while launching the driver and workers.
Example: '@{SPARK_WORKER_MEMORY="29000m";SPARK_LOCAL_DIRS="/local_disk0"}
Required? false
Position? 24
Default value
Accept pipeline input? false
Accept wildcard characters? false
-RunImmediate [<SwitchParameter>]
Switch.
Performs a Run Now task instead of creating a job. The process is executed immediately in an async process.
Setting this option returns a RunId.
Required? false
Position? named
Default value False
Accept pipeline input? false
Accept wildcard characters? false
-ClusterLogPath <String>
DBFS Location for Cluster logs - must start with dbfs:/
Example dbfs:/logs/mycluster
Required? false
Position? 25
Default value
Accept pipeline input? false
Accept wildcard characters? false
-InstancePoolId <String>
Required? false
Position? 26
Default value
Accept pipeline input? false
Accept wildcard characters? false
<CommonParameters>
This cmdlet supports the common parameters: Verbose, Debug,
ErrorAction, ErrorVariable, WarningAction, WarningVariable,
OutBuffer, PipelineVariable, and OutVariable. For more information, see
about_CommonParameters (https:/go.microsoft.com/fwlink/?LinkID=113216).
INPUTS
OUTPUTS
NOTES
Author: Tadeusz Balcer
Extended: Simon D'Morias / Data Thirst Ltd
-------------------------- EXAMPLE 1 --------------------------
PS C:\\>Add-DatabricksNotebookJob -BearerToken $BearerToken -Region $Region -JobName "Job1" -SparkVersion
"5.3.x-scala2.11" -NodeType "Standard_D3_v2" -MinNumberOfWorkers 2 -MaxNumberOfWorkers 2 -Timeout 100 -MaxRetries
3 -ScheduleCronExpression "0 15 22 ? * *" -Timezone "UTC" -NotebookPath "/Shared/Test" -NotebookParametersJson
'{"key": "value", "name": "test2"}' -Libraries '{"pypi":{package:"simplejson"}}', '{"jar":
"DBFS:/mylibraries/test.jar"}'
The above example create a job on a new cluster.
RELATED LINKS
SYNOPSIS
Creates Notebook Job in Databricks. Script uses Databricks API 2.0 create job query:
https://docs.azuredatabricks.net/api/la ... tml#create
SYNTAX
Add-DatabricksNotebookJob [[-BearerToken] <String>] [[-Region] <String>] [-JobName] <String> [[-ClusterId]
<String>] [[-SparkVersion] <String>] [[-NodeType] <String>] [[-DriverNodeType] <String>] [[-MinNumberOfWorkers]
<Int32>] [[-MaxNumberOfWorkers] <Int32>] [[-Timeout] <Int32>] [[-EmailAlertsOnFailure] <String>]
[[-EmailAlertsOnStart] <String>] [[-EmailAlertsOnSuccess] <String>] [-noAlertSkippedRuns] [[-MaxRetries] <Int32>]
[[-ScheduleCronExpression] <String>] [[-Timezone] <String>] [-NotebookPath] <String> [[-NotebookParametersJson]
<String>] [[-Libraries] <String[]>] [[-PythonVersion] <String>] [[-Spark_conf] <Hashtable>] [[-CustomTags]
<Hashtable>] [[-InitScripts] <String[]>] [[-SparkEnvVars] <Hashtable>] [-RunImmediate] [[-ClusterLogPath]
<String>] [[-InstancePoolId] <String>] [<CommonParameters>]
DESCRIPTION
Creates Notebook Job in Databricks. Script uses Databricks API 2.0 create job query:
https://docs.azuredatabricks.net/api/la ... tml#create
If the job name exists it will be updated instead of creating a new job.
PARAMETERS
-BearerToken <String>
Your Databricks Bearer token to authenticate to your workspace (see User Settings in Datatbricks WebUI)
Required? false
Position? 1
Default value
Accept pipeline input? false
Accept wildcard characters? false
-Region <String>
Azure Region - must match the URL of your Databricks workspace, example: northeurope
Required? false
Position? 2
Default value
Accept pipeline input? false
Accept wildcard characters? false
-JobName <String>
Name of the job that will appear in the Job list. If a job with this name exists
it will be updated.
Required? true
Position? 3
Default value
Accept pipeline input? false
Accept wildcard characters? false
-ClusterId <String>
The ClusterId of an existing cluster to use. Optional.
Required? false
Position? 4
Default value
Accept pipeline input? false
Accept wildcard characters? false
-SparkVersion <String>
Spark version for cluster that will run the job. Example: 5.3.x-scala2.11
Note: Ignored if ClusterId is populated.
Required? false
Position? 5
Default value
Accept pipeline input? false
Accept wildcard characters? false
-NodeType <String>
Type of worker for cluster that will run the job. Example: Standard_D3_v2.
Note: Ignored if ClusterId is populated.
Required? false
Position? 6
Default value
Accept pipeline input? false
Accept wildcard characters? false
-DriverNodeType <String>
Type of driver for cluster that will run the job. Example: Standard_D3_v2.
If not provided the NodeType will be used.
Note: Ignored if ClusterId is populated.
Required? false
Position? 7
Default value
Accept pipeline input? false
Accept wildcard characters? false
-MinNumberOfWorkers <Int32>
Number of workers for cluster that will run the job.
Note: If Min & Max Workers are the same autoscale is disabled.
Note: Ignored if ClusterId is populated.
Required? false
Position? 8
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-MaxNumberOfWorkers <Int32>
Number of workers for cluster that will run the job.
Note: If Min & Max Workers are the same autoscale is disabled.
Note: Ignored if ClusterId is populated.
Required? false
Position? 9
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-Timeout <Int32>
Timeout, in seconds, applied to each run of the job. If not set, there will be no timeout.
Required? false
Position? 10
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-EmailAlertsOnFailure <String>
A string of email accounts that will receive an email if the job is failed
Example "andrea.lewis@microsoft.com,maria.wood@microsoft.com"
Required? false
Position? 11
Default value
Accept pipeline input? false
Accept wildcard characters? false
-EmailAlertsOnStart <String>
A string of email accounts that will receive an email if the job is started
Example "bob.orear@microsoft.com,bob.greenberg@microsoft.com"
Required? false
Position? 12
Default value
Accept pipeline input? false
Accept wildcard characters? false
-EmailAlertsOnSuccess <String>
A string of email accounts that will receive an email if the job is succeeded
Example "marc.mcdonald@microsoft.com,gordon.letwin@microsoft.com"
Required? false
Position? 13
Default value
Accept pipeline input? false
Accept wildcard characters? false
-noAlertSkippedRuns [<SwitchParameter>]
Switch.
if set, do not send email to recipients specified in on_failure if the run is skipped.
Required? false
Position? named
Default value False
Accept pipeline input? false
Accept wildcard characters? false
-MaxRetries <Int32>
An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it
completes with a FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry
indefinitely and the value 0 means to never retry. If not set, the default behavior will be never retry.
Required? false
Position? 14
Default value 0
Accept pipeline input? false
Accept wildcard characters? false
-ScheduleCronExpression <String>
By default, job will run when triggered using Jobs UI or sending API request to run. You can provide cron
schedule expression for job's periodic run. How to compose cron schedule expression:
http://www.quartz-scheduler.org/documen ... on-06.html
Required? false
Position? 15
Default value
Accept pipeline input? false
Accept wildcard characters? false
-Timezone <String>
Timezone for Cron Schedule Expression. Required if ScheduleCronExpression provided. See here for all possible
timezones: http://joda-time.sourceforge.net/timezones.html
Example: UTC
Required? false
Position? 16
Default value
Accept pipeline input? false
Accept wildcard characters? false
-NotebookPath <String>
Path to the Notebook in Databricks that will be executed by this Job.
Required? true
Position? 17
Default value
Accept pipeline input? false
Accept wildcard characters? false
-NotebookParametersJson <String>
Required? false
Position? 18
Default value
Accept pipeline input? false
Accept wildcard characters? false
-Libraries <String[]>
Optional. Array of json strings. Example: '{"pypi":{package:"simplejson"}}', '{"jar",
"DBFS:/mylibraries/test.jar"}'
Required? false
Position? 19
Default value
Accept pipeline input? false
Accept wildcard characters? false
-PythonVersion <String>
2 or 3 - defaults to 2.
Required? false
Position? 20
Default value 3
Accept pipeline input? false
Accept wildcard characters? false
-Spark_conf <Hashtable>
Hashtable.
Example @{"spark.speculation"=$true; "spark.streaming.ui.retainedBatches"= 5}
Required? false
Position? 21
Default value
Accept pipeline input? false
Accept wildcard characters? false
-CustomTags <Hashtable>
Custom Tags to set, provide hash table of tags. Example: @{CreatedBy="SimonDM";NumOfNodes=2;CanDelete=$true}
Required? false
Position? 22
Default value
Accept pipeline input? false
Accept wildcard characters? false
-InitScripts <String[]>
Init scripts to run post creation. Example: "dbfs:/script/script1", "dbfs:/script/script2"
Required? false
Position? 23
Default value
Accept pipeline input? false
Accept wildcard characters? false
-SparkEnvVars <Hashtable>
An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs
of the form (X,Y) are exported as is (i.e., export X='Y') while launching the driver and workers.
Example: '@{SPARK_WORKER_MEMORY="29000m";SPARK_LOCAL_DIRS="/local_disk0"}
Required? false
Position? 24
Default value
Accept pipeline input? false
Accept wildcard characters? false
-RunImmediate [<SwitchParameter>]
Switch.
Performs a Run Now task instead of creating a job. The process is executed immediately in an async process.
Setting this option returns a RunId.
Required? false
Position? named
Default value False
Accept pipeline input? false
Accept wildcard characters? false
-ClusterLogPath <String>
DBFS Location for Cluster logs - must start with dbfs:/
Example dbfs:/logs/mycluster
Required? false
Position? 25
Default value
Accept pipeline input? false
Accept wildcard characters? false
-InstancePoolId <String>
Required? false
Position? 26
Default value
Accept pipeline input? false
Accept wildcard characters? false
<CommonParameters>
This cmdlet supports the common parameters: Verbose, Debug,
ErrorAction, ErrorVariable, WarningAction, WarningVariable,
OutBuffer, PipelineVariable, and OutVariable. For more information, see
about_CommonParameters (https:/go.microsoft.com/fwlink/?LinkID=113216).
INPUTS
OUTPUTS
NOTES
Author: Tadeusz Balcer
Extended: Simon D'Morias / Data Thirst Ltd
-------------------------- EXAMPLE 1 --------------------------
PS C:\\>Add-DatabricksNotebookJob -BearerToken $BearerToken -Region $Region -JobName "Job1" -SparkVersion
"5.3.x-scala2.11" -NodeType "Standard_D3_v2" -MinNumberOfWorkers 2 -MaxNumberOfWorkers 2 -Timeout 100 -MaxRetries
3 -ScheduleCronExpression "0 15 22 ? * *" -Timezone "UTC" -NotebookPath "/Shared/Test" -NotebookParametersJson
'{"key": "value", "name": "test2"}' -Libraries '{"pypi":{package:"simplejson"}}', '{"jar":
"DBFS:/mylibraries/test.jar"}'
The above example create a job on a new cluster.
RELATED LINKS