ADS Release Notes¶
June 9 2020¶
Numerous bug fixes including:
Support for Data Flow applications and runs outside of a notebook session compartment. Support for specific object storage logs and script buckets at the application and run levels.
ADS detects small shapes and gives warnings for AutoML execution.
Removal of triggers in the Oracle Cloud Infrastructure Functions
DatasetFactory.open()incorrectly yielding a classification dataset for a continuous target was fixed.
LabelEncoderproducing the wrong results for category and object columns was fixed.
An untrusted notebook issue when running model explanation visualizations was fixed.
A warning about adaptive sampling requiring at least 1000 datapoints was added.
A dtype cast float to integer into
An option to specify the bucket of Data Flow logs when you create the application was added.
AutoML upgraded to 0.4.2 the changes include:
Reduced parallelization on low compute hardware.
Support for passing in a custom logger object in
datetimecolumns. AutoML should automatically infer
datetimecolumns based on the Pandas dataframe, and perform feature engineering on them. This can also be forced by using the
pipeline.fit(). The supported types are:
['categorical', 'numerical', 'datetime']
MLX upgraded to MLX 1.0.7 the changes include:
Updated the feature distributions in the PDP/ICE plots (performance improvement).
All distributions are now shown as PMFs. Categorical features show the category frequency and continuous features are computed using a NumPy histogram (with ‘auto’). They are also separate sub-plots, which are interactive.
Classification PDP: The y-axis for continous features are now auto-scaled (not fixed to 0-1).
1-feature PDP/ICE: The x-axis for continuous features now shows the entire feature distribution, whereas the plot may show a subset depending on the
partial_rangeparameter (for example,
partial_range=[0.2, 0.8]computes the PDP between the 20th and 80th percentile. The plot now shows the full distribution on the x-axis, but the line charts are only drawn between the specified percentile ranges).
2-feature PDP: The plot x and y axes are now auto-set to match the
partial_rangespecified by the user. This ensures that the heatmap fills the entire plot by default. However, the entire feature distribution can be viewed by zooming out or clicking Autoscale in plotly.
Support for plotting scatter plots using WebGL (
show_in_notebook(..., use_webgl=True)) was added.
The side-issues that were causing the MLX Visualization Omitted warnings in JupyterLab was fixed.
April 30 2020¶
homefolder is now backed by block volume. You can now save all your files to the
/home/datasciencefolder and they will persist when you deactivate and activate your sessions. The
block_storagefolder no longer exists. The Oracle Cloud Infrastructure keys can be saved directly to the
~/.ocifolder, and no symbolic links are required.
Note that the
ads-examples folder in the home folder is a symbolic link to the
/opt/notebooks/ads-examples folder. Any changes made in
ads-examples are not be saved if you deactivate a notebook.
* Each new notebook that is launched has a pre-populated accordion-style cell containing useful tips.
The following packages were added:
ADS integration with the Oracle Cloud Infrastructure Data Flow service provides a more efficient and convenient to launch a Spark application and run Spark jobs
show_in_notebook()has had “head” removed from accordion and is replaced with dataset “warnings”.
get_recommendations()is deprecated and replaced with
suggest_recommendations(), which returns a pandas dataframe with all the recommendations and suggested code to implement each action.
A progress indication of Autonomous Data Warehouse reads has been added.
A new notebook is included in the
ads-examplesfolder to demonstrate ADS and DataFlow Integration.
A new notebook is included in the
ads-examplesfolder which demonstrates advanced custom scoring functions within AutoML by implementing custom class weights.
New version of the notebook example for deployment to Functions and API Gateway: Now using cloud shell.
Significant improvements were made to existing ADS Notebooks.
AutoML updated to version 0.4.1 from 0.3.1:
More consistent handling of stratification and random state.
Bug fix for
XGBoostcrashing on AMD shapes was implemented.
Unified Proxy Models across all stages of the AutoML Pipeline, ensuring leaderboard rankings are consistent was implemented.
Remove visual option from the interface.
The default tuning metric for both binary and multi-class classification has been changed to
Bug fix in AutoML
XGBoost, where the predicted probabilities were sometimes NaN, was implemented.
Fixed several corner case issues in Hyperparameter Optimization.
MLX updated to version 1.0.3 from 1.0.0:
Added support for specifying the ‘average’ parameter in
<metric>_<average>, for examlple
Fixed an issue with the detailed scatter plot visualizations and cutoff feature/axis names.
Fixed an issue with the balanced sampling in the Global Feature Permutation Importance explainer.
Updated the supported scoring metrics in MLX. The
PermutationImportanceexplainer now supports a large number of classification and regression metrics. Also, many of the metrics names were changed.
Updated LIME and
Fixed an issue where
March 18 2020¶
Access to ADW performance has been improved significantly
Major improvements made to the performance of the ADW
dataset loader. Your data is now loaded much faster, depending on your environment.
Change to DatasetFactory.open() with ADW
format='sql' no longer requires the
index_col to be specified. This was confusing, since “index” means something very different in databases. Additionally, the
table parameter may now be either a table or a
ds = DatasetFactory.open( connection_string, format = 'sql', table = """ SELECT * FROM sh.times WHERE rownum <= 30 """ )
No longer automatically starts an H2O cluster
ADS no longer instantiates an H2O cluster on behalf of the user. Instead you need to
import h2o on your own and then start your own cluster.
Preloaded Jupyter extensions
JupyterLab now supports these extensions:
Profiling Dask APIs
With support for Bokeh extension, you can now profile Dask operations and visualize profiler output. For more details, see Dask ResourceProfiler.
You can use the
ads.common.analyzer.resource_analyze decorator to visualize the CPU and memory utilization of operations.
During execution, it records the following information for each timestep:
Time in seconds since the epoch
Memory usage in MB
% CPU usage
from ads.common.analyzer import resource_analyze from ads.dataset.dataset_browser import DatasetBrowser @resource_analyze def fetch_data(): sklearn = DatasetBrowser.sklearn() wine_ds = sklearn.open('wine').set_target("target") return wine_ds fetch_data()
The output shows two lines, one for total CPU percentage used by all the workers, and one for total memory used.
Dask is updated to version 2.10.1 with support for Oracle Cloud Infrastructure Object Storage. The 2.10.1 version provides better performance over the older version.