How to choose the right tool in IBM Watson Studio

Inge Halilovic
8 min readJul 19, 2019

Watson Studio provides a range of tools for users with all levels of experience in preparing, analyzing, and modeling data, from beginner to expert.

To pick the right tool, consider these factors.

The type of data you have:

  • Tabular data in delimited files or relational data in remote data sources
  • Image files
  • Textual data in documents

The type of tasks you need to do:

  • Prepare data: cleanse, shape, visualize, organize, and validate data.
  • Analyze data: identify patterns and relationships in data, and display insights.
  • Build models: build, train, test, and deploy models to classify data, make predictions, or optimize decisions.

How much automation you want:

  • Code editor tools: Use the Jupyter notebook editor or the RStudio IDE to write code to work with any type of data and do any type of task.
  • Graphical canvas tools: Use menus and drag-and-drop to visually program. Build dashboards to analyze data or build multi-step flows to prepare data, analyze data, or build models.
  • Automatic builder tools: Use to build and train models with very limited user input.

Note: This information is also in the Watson Studio documentation, where we’ll keep it up-to-date when more tools are added to Watson Studio.

Tools for tabular or relational data

Tools for tabular or relational data by task:

Tools for textual data

Tools for building a model that classifies textual data:

Tools for image data

Tools for building a model that classifies images:

Jupyter notebook editor

Use the Jupyter notebook editor to create a notebook in which you run code to prepare, visualize, and analyze data, or build and train a model.

Data format: Any

Data size: Any

How you can prepare data, analyze data, or build models:

  • Write code in Python, R, or Scala
  • Include rich text and media with your code
  • Work with any kind of data in any way you want
  • Use preinstalled or install other open source and IBM libraries and packages
  • Schedule runs of your code
  • Import a notebook from a file, a URL, or the Community
  • Share read-only copies of your notebook externally

Learn more:

Load and analyze public data sets video
Videos about notebooks
Sample notebooks
Documentation about notebooks

Data Refinery

Use Data Refinery to prepare and visualize tabular data with a graphical flow editor. You create and then run a Data Refinery flow as a set of ordered operations on data.

Data format:
Tabular: Avro, CSV, JSON, Parquet, or plain text files
Relational: Tables in relational data sources

Data size: Any

How you can prepare data:

  • Cleanse, shape, organize data with over 60 operations
  • Save refined data as a new data set or update the original data
  • Annotate data with crowd annotation platforms
  • Profile data to validate it
  • Write R scripts to manipulate data
  • Schedule recurring operations on data

How you can analyze data:
Visualize data with over 40 types of graphs

Learn more:
Videos about Data Refinery
Documentation about Data Refinery

Streams flow editor

Use the streams flow editor to access and analyze streaming data. You can create a streams flow with a wizard or with a flow editor on a graphical canvas.

Required service:
Streaming Analytics service

Data format:
Streaming data as JSON messages
Streaming binary data

Data size: Any

How you can prepare data:

  • Ingest streaming data
  • Aggregate, filter, and process streaming data
  • Process streaming data for a model

How you can analyze data:
Run real-time analytics on streaming data

Learn more:
Streams flow Overview video
Videos about streams flows
Documentation about streams flows

Dashboard editor

Use the Dashboard editor to create a set of visualizations of analytical results on a graphical canvas.

Required service:
Cognos Dashboard Embedded service

Data format:
Tabular: CSV files
Relational: Tables in some relational data sources

Data size: Any size

How you can analyze data:

  • Create graphs without coding
  • Include text, media, web pages, images, and shapes in your dashboard
  • Share interactive dashboards externally

Learn more:
Dashboards for Interactive and Informative Data Visualizations
Videos about dashboards
Documentation about dashboards

SPSS Modeler

Use SPSS Modeler to create a flow to prepare data and build and train a model with a flow editor on a graphical canvas.

Data formats:
Relational: Tables in relational data sources
Tabular: Excel files (.xls or .xlsx) or CSV files
Textual: In the supported relational tables or files

Data size: Any

How you can prepare data:

  • Use automatic data preparation functions
  • Write SQL statements to manipulate data
  • Cleanse, shape, sample, sort, and derive data

How you can analyze data:

  • Visualize data with over 40 graphs
  • Identify the natural language of a text field

How you can build models:

  • Build predictive models
  • Choose from over 40 modeling algorithms
  • Use automatic modeling functions
  • Model time series or geospatial data
  • Classify textual data
  • Identify relationships between the concepts in textual data

Learn more:
SPSS Modeler — refreshed UI for an enterprise data science powerhouse video
Documentation about SPSS Modeler

Spark MLlib modeler

Use the SparkML modeler to create a flow to prepare relational data and build and train a model with a flow editor on a graphical canvas.

Required service:
Watson Machine Learning service

Data format:
Accepts data organized into named columns, such as a Spark DataFrame. The columns can store text, feature vectors, true labels, and predictions.

Data size: Any

How you can prepare data:

  • Transform data with SQL statements

How you can build models:

  • Build predictive or classification models
  • Choose from 10 Spark MLlib modeling algorithms

Learn more:
Documentation about Spark MLlib modeler

Neural network modeler

Use the Neural Network Modeler to design a neural network for text and image data with a flow editor on a graphical canvas.

Data format:
Textual: CSV files with labeled text data
Image: Image files in a PKL file. For example, a model testing signatures uses images resized to 32×32 pixels and stored as numpy arrays in a pickled format.

Data size:
Extremely large data sets

How you can build models:

  • Create a deep learning flow to design and run experiments without coding
  • Tune many hyperparameters
  • Standardize the components of a deep learning experiment for easier collaboration

Learn more:
Neural Network Modeler and Deep Learning Experiments on Watson Studio
Videos about deep learning
Documentation about Neural Network modeler

AutoAI tool

Use the AutoAI tool to automatically analyze your tabular data and generate candidate model pipelines customized for your predictive modeling problem.

Required service:
Watson Machine Learning service

Data format:
Tabular: CSV files

Data size:
Less than 100 MB

How you can prepare data:

Automatically transform data, such as impute missing values

How you can build models:

  • Train a binary classification, multiclass classification, or regression model
  • View a tree infographic that shows the sequences of AutoAI training stages
  • Generate a leaderboard of model pipelines ranked by cross-validation scores
  • Save a pipeline as a model

Learn more:
Documentation about AutoAI

Synthesized Neural Network tool

Use the Synthesized Neural Network tool to fully automate the synthesis and training of a neural network with your image or text training data.

Required service:
Watson OpenScale service

Data format:
Textual: CSV files with labeled textual data (UTF-8 encoded and English-only)
Image: Image files in a compressed file plus a CSV file that labels the image files

Data size:
Extremely large data sets

How you can build models:

  • Create a deep learning flow to design and run experiments
  • Use built-in training data
  • Automatically test a series of algorithm and optimization options
  • Track, audit, and tune the model in production on a Watson OpenScale dashboard

Learn more:
Documentation about Synthesized Neural Networks

Experiment builder

Use the Experiment builder to build deep learning experiments and run hundreds of training runs. This method requires that you provide code to define the training run. You run, track, store, and compare the results in the Experiment Builder graphical interface, then save the best configuration as a model.

Data format:
Textual: CSV files with labeled textual data
Image: Image files in a PKL file. For example, a model testing signatures uses images resized to 32×32 pixels and stored as numpy arrays in a pickled format.

Data size:
Large data sets

How you can build models:

  • Write Python code to specify metrics for training runs
  • Write a training definition in Python code
  • Define hyperparameters, or choose the RBFOpt method or random hyperparameter settings
  • Find the optimal values for large numbers of hyperparameters by running hundreds or thousands of training runs
  • Run distributed training with GPUs and specialized, powerful hardware and infrastructure
  • Compare the performance of training runs
  • Save a training run as a model

Learn more:
Neural Network Modeler and Deep Learning Experiments on Watson Studio video
Documentation about Experiment builder

Visual Recognition modeler

Use the Visual Recognition modeler to automatically train a model to classify images for scenes, objects, faces, and other content.

Required service:
Visual Recognition service

Data format:
Image: JPEG or PNG files in a .zip file, separated by class

Data size:
Small to medium data sets

How you can build models:

  • Collaborate to classify images
  • Use one of five built-in models
  • Test the model with sample images
  • Use CoreML to develop iOS apps
  • Provide as few as 10 images per class
  • Add or remove images to retrain the model
  • Use Watson Visual Recognition APIs in applications

Learn more:
Get Started With Visual Recognition video
Videos about Visual Recognition
Documentation about Visual Recognition

Natural Language Classifier modeler

Use the Natural Language Classifier modeler to automatically train a model to classify text according to classes you define.

Required service:
Natural Language Classifier service

Data format:
Textual: CSV files with sample text and class names

Data size:
Small to medium data sets

How you can build models:

  • Provide as few as 3 text samples per class
  • Collaborate to classify text samples
  • Test the model with sample text
  • Add or remove test data to retrain the model
  • Classify text in eight languages other than English
  • Use Watson Natural Language Classifier APIs in applications

Learn more:
Documentation about Natural Language Classifier modeler

RStudio IDE

Use RStudio IDE to analyze data or create Shiny applications by writing R code.

Data format:
Tabular or relational data
Textual data
Images
Unstructured data

Data size: Any size

How you can prepare data, analyze data, and build models:

  • Write code in R
  • Create Shiny apps
  • Use open source libraries and packages
  • Include rich text and media with your code
  • Prepare data
  • Visualize data
  • Discover insights from data
  • Build and train a model using open source libraries

Learn more:
Overview of RStudio IDE video
Videos about RStudio
Documentation about RStudio

--

--

Inge Halilovic

I’m a content strategist at IBM. I architect the documentation for watsonx.ai and Cloud Pak for Data as a Service.