< Back

Add-DatabricksPythonJob

Sat Jan 11, 2020 9:50 am

NAME Add-DatabricksPythonJob



SYNOPSIS

Creates Python Job in Databricks. Script uses Databricks API 2.0 create job query:

https://docs.azuredatabricks.net/api/la ... tml#create





SYNTAX

Add-DatabricksPythonJob [[-BearerToken] <String>] [[-Region] <String>] [-JobName] <String> [[-ClusterId] <String>]

[[-SparkVersion] <String>] [[-NodeType] <String>] [[-DriverNodeType] <String>] [[-MinNumberOfWorkers] <Int32>]

[[-MaxNumberOfWorkers] <Int32>] [[-Timeout] <Int32>] [[-MaxRetries] <Int32>] [[-ScheduleCronExpression] <String>]

[[-Timezone] <String>] [-PythonPath] <String> [[-PythonParameters] <String[]>] [[-Libraries] <String[]>]

[[-PythonVersion] <String>] [[-Spark_conf] <Hashtable>] [[-CustomTags] <Hashtable>] [[-InitScripts] <String[]>]

[[-SparkEnvVars] <Hashtable>] [-RunImmediate] [[-ClusterLogPath] <String>] [[-InstancePoolId] <String>]

[<CommonParameters>]





DESCRIPTION

Creates Python Job in Databricks. Script uses Databricks API 2.0 create job query:

https://docs.azuredatabricks.net/api/la ... tml#create

If the job name exists it will be updated instead of creating a new job.





PARAMETERS

-BearerToken <String>

Your Databricks Bearer token to authenticate to your workspace (see User Settings in Datatbricks WebUI)



Required? false

Position? 1

Default value

Accept pipeline input? false

Accept wildcard characters? false



-Region <String>

Azure Region - must match the URL of your Databricks workspace, example: northeurope



Required? false

Position? 2

Default value

Accept pipeline input? false

Accept wildcard characters? false



-JobName <String>

Name of the job that will appear in the Job list. If a job with this name exists

it will be updated.



Required? true

Position? 3

Default value

Accept pipeline input? false

Accept wildcard characters? false



-ClusterId <String>

The ClusterId of an existing cluster to use. Optional.



Required? false

Position? 4

Default value

Accept pipeline input? false

Accept wildcard characters? false



-SparkVersion <String>

Spark version for cluster that will run the job. Example: 5.3.x-scala2.11

Note: Ignored if ClusterId is populated.



Required? false

Position? 5

Default value

Accept pipeline input? false

Accept wildcard characters? false



-NodeType <String>

Type of worker for cluster that will run the job. Example: Standard_D3_v2.

Note: Ignored if ClusterId is populated.



Required? false

Position? 6

Default value

Accept pipeline input? false

Accept wildcard characters? false



-DriverNodeType <String>

Type of driver for cluster that will run the job. Example: Standard_D3_v2.

If not provided the NodeType will be used.

Note: Ignored if ClusterId is populated.



Required? false

Position? 7

Default value

Accept pipeline input? false

Accept wildcard characters? false



-MinNumberOfWorkers <Int32>

Number of workers for cluster that will run the job.

Note: If Min & Max Workers are the same autoscale is disabled.

Note: Ignored if ClusterId is populated.



Required? false

Position? 8

Default value 0

Accept pipeline input? false

Accept wildcard characters? false



-MaxNumberOfWorkers <Int32>

Number of workers for cluster that will run the job.

Note: If Min & Max Workers are the same autoscale is disabled.

Note: Ignored if ClusterId is populated.



Required? false

Position? 9

Default value 0

Accept pipeline input? false

Accept wildcard characters? false



-Timeout <Int32>

Timeout, in seconds, applied to each run of the job. If not set, there will be no timeout.



Required? false

Position? 10

Default value 0

Accept pipeline input? false

Accept wildcard characters? false



-MaxRetries <Int32>

An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it

completes with a FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry

indefinitely and the value 0 means to never retry. If not set, the default behavior will be never retry.



Required? false

Position? 11

Default value 0

Accept pipeline input? false

Accept wildcard characters? false



-ScheduleCronExpression <String>

By default, job will run when triggered using Jobs UI or sending API request to run. You can provide cron

schedule expression for job's periodic run. How to compose cron schedule expression:

http://www.quartz-scheduler.org/documen ... on-06.html



Required? false

Position? 12

Default value

Accept pipeline input? false

Accept wildcard characters? false



-Timezone <String>

Timezone for Cron Schedule Expression. Required if ScheduleCronExpression provided. See here for all possible

timezones: http://joda-time.sourceforge.net/timezones.html

Example: UTC



Required? false

Position? 13

Default value

Accept pipeline input? false

Accept wildcard characters? false



-PythonPath <String>

Path to the py script in Databricks that will be executed by this Job. Must be a DBFS location from root,

example "dbfs:/folder/file.py".



Required? true

Position? 14

Default value

Accept pipeline input? false

Accept wildcard characters? false



-PythonParameters <String[]>

Optional parameters that will be provided to script when Job is executed. Example: "val1", "val2"



Required? false

Position? 15

Default value

Accept pipeline input? false

Accept wildcard characters? false



-Libraries <String[]>

Optional. Array of json strings. Example: '{"pypi":{package:"simplejson"}}', '{"jar",

"DBFS:/mylibraries/test.jar"}'



Required? false

Position? 16

Default value

Accept pipeline input? false

Accept wildcard characters? false



-PythonVersion <String>

2 or 3 - defaults to 2.



Required? false

Position? 17

Default value 3

Accept pipeline input? false

Accept wildcard characters? false



-Spark_conf <Hashtable>

Hashtable.

Example @{"spark.speculation"=$true; "spark.streaming.ui.retainedBatches"= 5}



Required? false

Position? 18

Default value

Accept pipeline input? false

Accept wildcard characters? false



-CustomTags <Hashtable>

Custom Tags to set, provide hash table of tags. Example: @{CreatedBy="SimonDM";NumOfNodes=2;CanDelete=$true}



Required? false

Position? 19

Default value

Accept pipeline input? false

Accept wildcard characters? false



-InitScripts <String[]>

Init scripts to run post creation. Example: "dbfs:/script/script1", "dbfs:/script/script2"



Required? false

Position? 20

Default value

Accept pipeline input? false

Accept wildcard characters? false



-SparkEnvVars <Hashtable>

An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs

of the form (X,Y) are exported as is (i.e., export X='Y') while launching the driver and workers.

Example: '@{SPARK_WORKER_MEMORY="29000m";SPARK_LOCAL_DIRS="/local_disk0"}



Required? false

Position? 21

Default value

Accept pipeline input? false

Accept wildcard characters? false



-RunImmediate [<SwitchParameter>]

Switch.

Performs a Run Now task instead of creating a job. The process is executed immediately in an async process.

Setting this option returns a RunId.



Required? false

Position? named

Default value False

Accept pipeline input? false

Accept wildcard characters? false



-ClusterLogPath <String>

DBFS Location for Cluster logs - must start with dbfs:/

Example dbfs:/logs/mycluster



Required? false

Position? 22

Default value

Accept pipeline input? false

Accept wildcard characters? false



-InstancePoolId <String>



Required? false

Position? 23

Default value

Accept pipeline input? false

Accept wildcard characters? false



<CommonParameters>

This cmdlet supports the common parameters: Verbose, Debug,

ErrorAction, ErrorVariable, WarningAction, WarningVariable,

OutBuffer, PipelineVariable, and OutVariable. For more information, see

about_CommonParameters (https:/go.microsoft.com/fwlink/?LinkID=113216).



INPUTS



OUTPUTS



NOTES





Author: Simon D'Morias / Data Thirst Ltd



-------------------------- EXAMPLE 1 --------------------------



PS C:\\>Add-DatabricksPythonJob -BearerToken $BearerToken -Region $Region -JobName "Job1" -SparkVersion

"5.3.x-scala2.11" -NodeType "Standard_D3_v2" -MinNumberOfWorkers 2 -MaxNumberOfWorkers 2 -Timeout 100 -MaxRetries

3 -ScheduleCronExpression "0 15 22 ? * *" -Timezone "UTC" -PythonPath "/Shared/TestPython.py" -PythonParameters

"val1", "val2" -Libraries '{"pypi":{package:"simplejson"}}', '{"jar": "DBFS:/mylibraries/test.jar"}'



The above example create a job on a new cluster.











RELATED LINKS