< Back

New-AzureDatabricksJob

Sun Jan 19, 2020 6:07 pm

NAME New-AzureDatabricksJob



SYNOPSIS

Dynamically create a job on an Azure Databricks cluster. Returns an object defining the job and the newly assigned

job ID number.





SYNTAX

New-AzureDatabricksJob -Connection <Object> -JobName <String> -JobType <String> -NotebookPath <String>

[-JobParameters <Hashtable>] [-JobLibraries <Hashtable>] [-UseExistingCluster <String>] [-NodeType <String>]

[-NumWorkers <Int32>] [-SparkVersion <String>] [<CommonParameters>]





DESCRIPTION

You can use this function to create a new defined job on your Azure Databricks cluster. Currently only supports

Notebook-based jobs. You can also dynamically pass in

libraries to use in the job, as well as pre-defined parameters. Other non-requied options allow you to change the

cluster node and driver types as well as total number

of worker nodes (or to use an existing defined cluster).





PARAMETERS

-Connection <Object>

An object that represents an Azure Databricks API connection where you want to create your job.



Required? true

Position? named

Default value

Accept pipeline input? false

Accept wildcard characters? false



-JobName <String>

The name of the new job.



Required? true

Position? named

Default value

Accept pipeline input? false

Accept wildcard characters? false



-JobType <String>

The type of job to run. Currently only supports "Notebook" job types.



Required? true

Position? named

Default value

Accept pipeline input? false

Accept wildcard characters? false



-NotebookPath <String>

The path on your Azure Databricks instance where your job's notebook resides.



Required? true

Position? named

Default value

Accept pipeline input? false

Accept wildcard characters? false



-JobParameters <Hashtable>

What parameters you should pass into your notebook. Should be a hashtable (see notes).



Required? false

Position? named

Default value

Accept pipeline input? false

Accept wildcard characters? false



-JobLibraries <Hashtable>



Required? false

Position? named

Default value

Accept pipeline input? false

Accept wildcard characters? false



-UseExistingCluster <String>

If you want this job to use a predefied Azure Databtricks cluster, specify a named cluster here.



Required? false

Position? named

Default value

Accept pipeline input? false

Accept wildcard characters? false



-NodeType <String>

For dynamic job clusters, what is the node type you want to use (defaults to: Standard_DS3_v2)



Required? false

Position? named

Default value Standard_DS3_v2

Accept pipeline input? false

Accept wildcard characters? false



-NumWorkers <Int32>

For dynamic job clusters, what is our max number of workers? (defaults to: 4)



Required? false

Position? named

Default value 4

Accept pipeline input? false

Accept wildcard characters? false



-SparkVersion <String>

What version of Spark should the dynamic cluster use? (defaults to: 4.2.x-scala2.11)



Required? false

Position? named

Default value 4.2.x-scala2.11

Accept pipeline input? false

Accept wildcard characters? false



<CommonParameters>

This cmdlet supports the common parameters: Verbose, Debug,

ErrorAction, ErrorVariable, WarningAction, WarningVariable,

OutBuffer, PipelineVariable, and OutVariable. For more information, see

about_CommonParameters (https:/go.microsoft.com/fwlink/?LinkID=113216).



INPUTS



OUTPUTS



NOTES





A sample of the hashtables needed for this function:



$JobLibraries = @{

'pypi' = 'simplejson=3.8.0'

}

Each line of your hashtable should be either of type pypi or egg. If egg, specify the path to the egg.



$Parameters = @{

'Param1' = 'X'

'Param2' = 2

}

Each line of your hashtable should a key/value pair of the name of the paramter in your notebook and the value

you want to pass in.



Author: Drew Furgiuele (@pittfurg), http://www.port1433.com

Website: https://www.igs.com

Copyright: (c) 2019 by IGS, licensed under MIT

License: MIT https://opensource.org/licenses/MIT



-------------------------- EXAMPLE 1 --------------------------



PS C:\\>New-AzureDatabricksJob -Connection $Connection -JobName "New Job" -JobType Notebook -NotebookPath

"/Users/Drew/SomeNotebook" -UseExistingCluster "DrewsCluster"



Defines a new job called "New Job" to runs the notebook "SomeNotebook" on the existing cluster "DrewsCluster"









-------------------------- EXAMPLE 2 --------------------------



PS C:\\>New-AzureDatabricksJob -Connection $Connection -JobName "New Job" -JobType Notebook -NotebookPath

"/Users/Drew/SomeNotebook" -UseExistingCluster "DrewsCluster" -JobParameters $Parameters



Defines a new job called "New Job" to runs the notebook "SomeNotebook" on the existing cluster "DrewsCluster" and

will use the paremeters in the hashtable $Parameters to pass to the notebook when it runs.









-------------------------- EXAMPLE 3 --------------------------



PS C:\\>New-AzureDatabricksJob -Connection $Connection -JobName "New Job" -JobType Notebook -NotebookPath

"/Users/Drew/SomeNotebook"



Defines a new job called "New Job" to runs the notebook "SomeNotebook" as a new cluster with the default node

type, number of works, and Spark version.











RELATED LINKS