Run Applications

Learn how to run the applications you have created in Data Flow, provide argument and parameter values, review the results, and diagnose and tune the runs, including providing JVM options.

Data Flow automatically stops long running batch jobs (more than 24 hours) using a delegation token. In this case, if the application isn't finished with processing the data, you might get run failure and the job remains unfinished. To prevent this, use the following options to limit the total time the application can run:
When Creating Runs using the Console
Under Advanced Options, specify the duration in Max run duration minutes.
When Creating Runs using the CLI
Pass command line option of --max-duration-in-minutes <number>
When Creating Runs using the SDK
Provide optional argument max_duration_in_minutes
When Creating Runs using the API
Set the optional argument maxDurationInMinutes

Understand Runs

Every time a Data Flow Application is executed, a Data Flow Run is created. The Data Flow Run captures and securely stores the application's output, logs, and statistics. The output is saved so it can be viewed by anyone with the correct permissions using the UI or REST API. Runs also give you secure access to the Spark UI  for debugging and diagnostics.