Package 'cloudml' reference manual

Title:	Interface to the Google Cloud Machine Learning Platform
Description:	Interface to the Google Cloud Machine Learning Platform <https://cloud.google.com/ml-engine>, which provides cloud tools for training machine learning models.
Authors:	Daniel Falbel [aut, cre], Javier Luraschi [aut], JJ Allaire [aut], Kevin Ushey [aut], RStudio [cph]
Maintainer:	Daniel Falbel <[email protected]>
License:	Apache License 2.0
Version:	0.6.1
Built:	2025-02-08 05:23:15 UTC
Source:	https://github.com/cran/cloudml

Deploy SavedModel to CloudML

Description

Deploys a SavedModel to CloudML model for online predictions.

Usage

cloudml_deploy(export_dir_base, name, version = paste0(name, "_1"),
  region = NULL, config = NULL)
cloudml_deploy(export_dir_base, name, version = paste0(name, "_1"),
  region = NULL, config = NULL)

Arguments

`export_dir_base`	A string containing a directory containing an exported SavedModels. Consider using `tensorflow::export_savedmodel()` to export this SavedModel.
`name`	The name for this model (required)
`version`	The version for this model. Versions start with a letter and contain only letters, numbers and underscores. Defaults to name_1
`region`	The region to be used to deploy this model.
`config`	A list, `YAML` or `JSON` configuration file as described https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs.

Perform Prediction over a CloudML Model.

Description

Perform online prediction over a CloudML model, usually, created using cloudml_deploy()

Usage

cloudml_predict(instances, name, version = paste0(name, "_1"),
  verbose = FALSE)
cloudml_predict(instances, name, version = paste0(name, "_1"),
  verbose = FALSE)

Arguments

`instances`	A list of instances to be predicted. While predicting a single instance, list wrapping this single instance is still expected.
`name`	The name for this model (required)
`version`	The version for this model. Versions start with a letter and contain only letters, numbers and underscores. Defaults to name_1
`verbose`	Should additional information be reported?

Train a model using Cloud ML

Description

Upload a TensorFlow application to Google Cloud, and use that application to train a model.

Usage

cloudml_train(file = "train.R", master_type = NULL, flags = NULL,
  region = NULL, config = NULL, collect = "ask", dry_run = FALSE)
cloudml_train(file = "train.R", master_type = NULL, flags = NULL,
  region = NULL, config = NULL, collect = "ask", dry_run = FALSE)

Arguments

`file`	File to be used as entrypoint for training.
`master_type`	Training master node machine type. "standard" provides a basic machine configuration suitable for training simple models with small to moderate datasets. See the documentation at https://cloud.google.com/ml-engine/docs/tensorflow/machine-types#machine_type_table for details on available machine types.
`flags`	Named list with flag values (see `flags()`) or path to YAML file containing flag values.
`region`	The region to be used for training.
`config`	A list, `YAML` or `JSON` configuration file as described https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs.
`collect`	Logical. If TRUE, collect job when training is completed (blocks waiting for the job to complete). The default (`"ask"`) will interactively prompt the user whether to collect the results or not.
`dry_run`	Triggers a local dry run over the deployment phase to validate packages and packing work as expected.

Examples

## Not run: 
library(cloudml)

gcloud_install()
job <- cloudml_train("train.R")

## End(Not run)

## Not run: 
library(cloudml)

gcloud_install()
job <- cloudml_train("train.R")

## End(Not run)

Initialize the Google Cloud SDK

Description

Initialize the Google Cloud SDK

Usage

gcloud_init()
gcloud_init()

Install the Google Cloud SDK

Description

Installs the Google Cloud SDK which enables CloudML operations.

Usage

gcloud_install(update = TRUE)
gcloud_install(update = TRUE)

Arguments

update

Attempt to update an existing installation.

Examples

## Not run: 
library(cloudml)
gcloud_install()

## End(Not run)

## Not run: 
library(cloudml)
gcloud_install()

## End(Not run)

Create an RStudio terminal with access to the Google Cloud SDK

Description

Create an RStudio terminal with access to the Google Cloud SDK

Usage

gcloud_terminal(command = NULL, clear = FALSE)
gcloud_terminal(command = NULL, clear = FALSE)

Arguments

`command`	Command to send to terminal
`clear`	Clear terminal buffer

Value

Terminal id (invisibly)

Gcloud version

Description

Get version of Google Cloud SDK components.

Usage

gcloud_version()
gcloud_version()

Value

a list with the version of each component.

Copy files to / from Google Storage

Description

Use the gsutil cp command to copy data between your local file system and the cloud, copy data within the cloud, and copy data between cloud storage providers.

Usage

gs_copy(source, destination, recursive = FALSE, echo = TRUE)
gs_copy(source, destination, recursive = FALSE, echo = TRUE)

Arguments

`source`	The file to be copied. This can be either a path on the local filesystem, or a Google Storage URI (e.g. `gs://[BUCKET_NAME]/[FILENAME.CSV]`).
`destination`	The location where the `source` file should be copied to. This can be either a path on the local filesystem, or a Google Storage URI (e.g. `gs://[BUCKET_NAME]/[FILENAME.CSV]`).
`recursive`	Boolean; perform a recursive copy? This must be specified if you intend on copying directories.
`echo`	Echo command output to console.

Google storage bucket path that syncs to local storage when not running on CloudML.

Description

Refer to data within a Google Storage bucket. When running on CloudML the bucket will be read from directly. Otherwise, the bucket will be automatically synchronized to a local directory.

Usage

gs_data_dir(url, local_dir = "gs", force_sync = FALSE, echo = TRUE)
gs_data_dir(url, local_dir = "gs", force_sync = FALSE, echo = TRUE)

Arguments

`url`	Google Storage bucket URL (e.g. `gs://<your-bucket>`).
`local_dir`	Local directory to synchonize Google Storage bucket(s) to.
`force_sync`	Force local synchonization even if the data directory already exists.
`echo`	Echo command output to console.

Details

This function is suitable for use in TensorFlow APIs that accept gs:// URLs (e.g. TensorFlow datasets). However, many package functions accept only local filesystem paths as input (rather than gs:// URLs). For these cases you can the gs_data_dir_local() function, which will always synchronize gs:// buckets to the local filesystem and provide a local path interface to their contents.

Value

Path to contents of data directory.

Get a local path to the contents of Google Storage bucket

Description

Provides a local filesystem interface to Google Storage buckets. Many package functions accept only local filesystem paths as input (rather than gs:// URLs). For these cases the gcloud_path() function will synchronize gs:// buckets to the local filesystem and provide a local path interface to their contents.

Usage

gs_data_dir_local(url, local_dir = "gs", echo = FALSE)
gs_data_dir_local(url, local_dir = "gs", echo = FALSE)

Arguments

`url`	Google Storage bucket URL (e.g. `gs://<your-bucket>`).
`local_dir`	Local directory to synchonize Google Storage bucket(s) to.
`echo`	Echo command output to console.

Details

If you pass a local path as the url it will be returned unmodified. This allows you to for example use a training flag for the location of data which points to a local directory during development and a Google Cloud bucket during cloud training.

Value

Local path to contents of bucket.

Note

For APIs that accept gs:// URLs directly (e.g. TensorFlow datasets) you should use the gs_data_dir() function.

Synchronize content of two buckets/directories

Description

The gs_rsync function makes the contents under destination the same as the contents under source, by copying any missing files/objects (or those whose data has changed), and (if the delete option is specified) deleting any extra files/objects. source must specify a directory, bucket, or bucket subdirectory.

Usage

gs_rsync(source, destination, delete = FALSE, recursive = FALSE,
  parallel = TRUE, dry_run = FALSE, options = NULL, echo = TRUE)
gs_rsync(source, destination, delete = FALSE, recursive = FALSE,
  parallel = TRUE, dry_run = FALSE, options = NULL, echo = TRUE)

Arguments

`source`	The file to be copied. This can be either a path on the local filesystem, or a Google Storage URI (e.g. `gs://[BUCKET_NAME]/[FILENAME.CSV]`).
`destination`	The location where the `source` file should be copied to. This can be either a path on the local filesystem, or a Google Storage URI (e.g. `gs://[BUCKET_NAME]/[FILENAME.CSV]`).
`delete`	Delete extra files under `destination` not found under `source` By default extra files are not deleted.
`recursive`	Causes directories, buckets, and bucket subdirectories to be synchronized recursively. If you neglect to use this option `gs_rsync()` will make only the top-level directory in the source and destination URLs match, skipping any sub-directories.
`parallel`	Causes synchronization to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.
`dry_run`	Causes rsync to run in "dry run" mode, i.e., just outputting what would be copied or deleted without actually doing any copying/deleting.
`options`	Character vector of additional command line options to the gsutil rsync command (as specified at https://cloud.google.com/storage/docs/gsutil/commands/rsync).
`echo`	Echo command output to console.

Cancel a job

Description

Cancel a job.

Usage

job_cancel(job = "latest")
job_cancel(job = "latest")

Arguments

job

Job name or job object. Pass "latest" to indicate the most recently submitted job.

Collect job output

Description

Collect the job outputs (e.g. fitted model) from a job. If the job has not yet finished running, job_collect() will block and wait until the job has finished.

Usage

job_collect(job = "latest", trials = "best", destination = "runs",
  timeout = NULL, view = interactive())
job_collect(job = "latest", trials = "best", destination = "runs",
  timeout = NULL, view = interactive())

Arguments

`job`	Job name or job object. Pass "latest" to indicate the most recently submitted job.
`trials`	Under hyperparameter tuning, specifies which trials to download. Use `"best"` to download best trial, `"all"` to download all, or a vector of trials `c(1,2)` or `1`.
`destination`	The destination directory in which model outputs should be downloaded. Defaults to `runs`.
`timeout`	Give up collecting job after the specified minutes.
`view`	View the job results after collecting it. You can also pass "save" to save a copy of the run report at `tfruns.d/view.html`

List all jobs

Description

List existing Google Cloud ML jobs.

Usage

job_list(filter = NULL, limit = NULL, page_size = NULL,
  sort_by = NULL, uri = FALSE)
job_list(filter = NULL, limit = NULL, page_size = NULL,
  sort_by = NULL, uri = FALSE)

Arguments

`filter`	Filter the set of jobs to be returned.
`limit`	The maximum number of resources to list. By default, all jobs will be listed.
`page_size`	Some services group resource list output into pages. This flag specifies the maximum number of resources per page. The default is determined by the service if it supports paging, otherwise it is unlimited (no paging).
`sort_by`	A comma-separated list of resource field key names to sort by. The default order is ascending. Prefix a field with `~` for descending order on that field.
`uri`	Print a list of resource URIs instead of the default output.

Current status of a job

Description

Get the status of a job, as an R list.

Usage

job_status(job = "latest")
job_status(job = "latest")

Arguments

job

Job name or job object. Pass "latest" to indicate the most recently submitted job.

Show job log stream

Description

Show logs from a running Cloud ML Engine job.

Usage

job_stream_logs(job = "latest",
  polling_interval = getOption("cloudml.stream_logs.polling", 5),
  task_name = NULL, allow_multiline_logs = FALSE)
job_stream_logs(job = "latest",
  polling_interval = getOption("cloudml.stream_logs.polling", 5),
  task_name = NULL, allow_multiline_logs = FALSE)

Arguments

`job`	Job name or job object. Pass "latest" to indicate the most recently submitted job.
`polling_interval`	Number of seconds to wait between efforts to fetch the latest log messages.
`task_name`	If set, display only the logs for this particular task.
`allow_multiline_logs`	Output multiline log messages as single records.

Current trials of a job

Description

Get the hyperparameter trials for job, as an R data frame

Usage

job_trials(x)
job_trials(x)

Arguments

`x`	Job name or job object.

Package 'cloudml'

Help Index

Deploy SavedModel to CloudML

Description

Usage

Arguments

See Also

Perform Prediction over a CloudML Model.

Description

Usage

Arguments

See Also

Train a model using Cloud ML

Description

Usage

Arguments

See Also

Examples

Initialize the Google Cloud SDK

Description

Usage

See Also

Install the Google Cloud SDK

Description

Usage

Arguments

See Also

Examples

Create an RStudio terminal with access to the Google Cloud SDK

Description

Usage

Arguments

Value

See Also

Gcloud version

Description

Usage

Value

Copy files to / from Google Storage

Description

Usage

Arguments

Google storage bucket path that syncs to local storage when not running on CloudML.

Description

Usage

Arguments

Details

Value

See Also

Get a local path to the contents of Google Storage bucket

Description

Usage

Arguments

Details

Value

Note

See Also

Synchronize content of two buckets/directories

Description

Usage

Arguments

Cancel a job

Description

Usage

Arguments

See Also

Collect job output

Description

Usage

Arguments

See Also

List all jobs

Description

Usage

Arguments

See Also

Current status of a job

Description

Usage

Arguments