Introduction
API Reference
The Fiddler API contains many useful tools for sending and receiving data to and from the Fiddler platform.
Fiddler provides a Python SDK client that allows you to connect to Fiddler directly from a Jupyter notebook or automated pipeline.
Each API is documented with a description, usage information, and code examples.
Client Setup
fdl.FiddlerApi
The API client object used to communicate with Fiddler.
In order to use the client, you'll need to provide authentication details as shown below.
For more information, see Authorizing the Client.
Usage
import fiddler as fdl
URL = 'https://app.fiddler.ai'
ORG_ID = 'my_org'
AUTH_TOKEN = 'p9uqlkKz1zAA3KAU8kiB6zJkXiQoqFgkUgEa1sv4u58'
client = fdl.FiddlerApi(
url=URL,
org_id=ORG_ID,
auth_token=AUTH_TOKEN
)
Proxy URLs
proxies = {
'http' : 'http://proxy.example.com:1234',
'https': 'https://proxy.example.com:5678'
}
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
url |
str |
None |
The URL used to connect to Fiddler. |
org_id |
str |
None |
The organization ID for a Fiddler instance. Can be found on the General tab of the Settings page. |
auth_token |
str |
None |
The authorization token used to authenticate with Fiddler. Can be found on the Credentials tab of the Settings page. |
proxies |
Optional[dict] |
None |
A dictionary containing proxy URLs. |
verbose |
Optional[bool] |
False |
If True , API calls will be logged verbosely. |
Writing
fiddler.ini
%%writefile fiddler.ini
[FIDDLER]
url = https://app.fiddler.ai
org_id = my_org
auth_token = p9uqlkKz1zAA3KAU8kiB6zJkXiQoqFgkUgEa1sv4u58
If you want to authenticate with Fiddler without passing this information directly into the function call, you can store it in a file named fiddler.ini
, which should be stored in the same directory as your notebook or script.
Usage
client = fdl.FiddlerApi()
Projects
Projects are used to organize your models and datasets. Each project can represent a machine learning task (e.g. predicting house prices, assessing creditworthiness, or detecting fraud).
A project can contain one or more models (e.g. lin_reg_house_predict, random_forest_house_predict).
For more information on projects, click here.
client.list_projects
Usage
client.list_projects()
Response
[
'project_a',
'project_b',
'project_c'
]
Retrieves the project IDs of all projects accessible by the user.
Returns
Type | Description |
---|---|
list |
A list containing the project ID string for each project. |
client.create_project
Usage
PROJECT_ID = 'example_project'
client.create_project(
project_id=PROJECT_ID
)
Response
{
'project_name': 'example_project'
}
Creates a project using the specified ID.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
A unique identifier for the project. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. |
Returns
Type | Description |
---|---|
dict |
A dictionary mapping project_name to the project ID string specified. |
client.delete_project
Deletes a project.
Usage
PROJECT_ID = 'example_project'
client.delete_project(
project_id=PROJECT_ID
)
Response
True
Deletes a specified project.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. |
Returns
Type | Description |
---|---|
bool |
A boolean denoting whether deletion was successful. |
Datasets
Datasets (or baseline datasets) are used for making comparisons with production data.
A baseline dataset should be sampled from your model's training set, so it can serve as a representation of what the model expects to see in production.
For more information, see Uploading a Baseline Dataset.
For guidance on how to design a baseline dataset, see Designing a Baseline Dataset.
client.list_datasets
Retrieves the dataset IDs of all datasets accessible within a project.
Usage
PROJECT_ID = "example_project"
client.list_datasets(
project_id=PROJECT_ID
)
Response
[
'dataset_a',
'dataset_b',
'dataset_c'
]
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. |
Returns
Type | Description |
---|---|
list |
A list containing the string ID of each dataset. |
client.upload_dataset
Uploads a dataset from a pandas DataFrame.
Usage
import pandas as pd
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
client.upload_dataset(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
dataset={
'baseline': df
},
info=dataset_info
)
Response
{
'row_count': 10000,
'col_count': 20,
'log': [
'Importing dataset example_dataset',
'Creating table for example_dataset',
'Importing data file: baseline.csv'
]
}
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset |
dict |
A dictionary mapping dataset slice names to pandas DataFrames. | |
dataset_id |
str |
A unique identifier for the dataset. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. | |
info |
Optional[fdl.DatasetInfo] |
None |
The Fiddler DatasetInfo object used to describe the dataset. Click here for more information. |
size_check_enabled |
Optional[bool] |
True |
If True , will issue a warning when a dataset has a large number of rows. |
Returns
Type | Description |
---|---|
dict |
A dictionary containing information about the uploaded dataset. |
client.delete_dataset
Deletes a dataset from a project.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
client.delete_dataset(
project_id=PROJECT_ID,
dataset_id=DATASET_ID
)
Response
'Dataset deleted example_dataset'
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset_id |
str |
The unique identifier for the dataset. |
Returns
Type | Description |
---|---|
str |
A message confirming that the dataset was deleted. |
client.get_dataset_info
Retrieves the DatasetInfo
object associated with a dataset.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
dataset_info = client.get_dataset_info(
project_id=PROJECT_ID,
dataset_id=DATASET_ID
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset_id |
str |
The unique identifier for the dataset. |
Returns
Type | Description |
---|---|
fdl.DatasetInfo |
The DatasetInfo object associated with the specified dataset. |
Models
A model is a representation of your machine learning model. Each model must have an associated dataset to be used as a baseline for monitoring, explainability, and fairness capabilities.
You do not need to upload your model artifact in order to register your model, but doing so will significantly improve the quality of explanations generated by Fiddler.
client.list_models
Retrieves the model IDs of all models accessible within a project.
Usage
PROJECT_ID = 'example_project'
client.list_models(
project_id=PROJECT_ID
)
Response
[
'model_a',
'model_b',
'model_c'
]
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. |
Returns
Type | Description |
---|---|
list |
A list containing the string ID of each model. |
client.register_model
Registers a model without uploading an artifact. Requires a fdl.ModelInfo
object containing information about the model.
For more information, see Registering a Model.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
dataset_info = client.get_dataset_info(
project_id=PROJECT_ID,
dataset_id=DATASET_ID
)
model_task = fdl.ModelTask.BINARY_CLASSIFICATION
model_target = 'target_column'
model_output = 'output_column'
model_features = [
'feature_1',
'feature_2',
'feature_3'
]
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
target=model_target,
outputs=[model_output],
model_task=model_task
)
client.register_model(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
model_id=MODEL_ID,
model_info=model_info
)
Response
'Model successfully registered on Fiddler. \n Visit https://app.fiddler.ai/projects/example_project'
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
A unique identifier for the model. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. | |
dataset_id |
str |
The unique identifier for the dataset. | |
model_info |
fdl.ModelInfo |
A ModelInfo object containing information about the model. |
|
deployment |
Optional[fdl.core_objects.DeploymentOptions] |
None |
A DeploymentOptions object containing information about the model deployment. |
cache_global_impact_importance |
Optional[bool] |
True |
If True , global feature impact and global feature importance will be precomputed and cached when the model is registered. |
cache_global_pdps |
Optional[bool] |
False |
If True , global partial dependence plots will be precomputed and cached when the model is registered. |
cache_dataset |
Optional[bool] |
True |
If True , histogram information for the baseline dataset will be precomputed and cached when the model is registered. |
Returns
Type | Description |
---|---|
str |
A message confirming that the model was registered. |
client.upload_model_package
Registers a model with Fiddler and uploads a model artifact to be used for explainability and fairness capabilities.
For more information, see Uploading a Model Artifact.
Usage
import pathlib
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
artifact_path = pathlib.Path('model_dir')
client.upload_model_package(
artifact_path=artifact_path,
project_id=PROJECT_ID,
model_id=MODEL_ID
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
artifact_path |
pathlib.Path |
None |
A path to the directory containing all of the model files needed to run the model. |
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
deployment_type |
Optional[str] |
'predictor' |
The type of deployment for the model. Can be one of
|
image_uri |
Optional[str] |
None |
A URI of the form '/:' . If specified, the image will be used to create a new runtime to serve the model. |
namespace |
Optional[str] |
'default' |
The Kubernetes namespace to use for the newly created runtime. image_uri must be specified. |
port |
Optional[int] |
5100 |
The port to use for the newly created runtime. image_uri must be specified. |
replicas |
Optional[int] |
1 |
The number of replicas running the model. image_uri must be specified. |
cpus |
Optional[int] |
0.25 |
The number of CPU cores reserved per replica. image_uri must be specified. |
memory |
Optional[str] |
'128m' |
The amount of memory reserved per replica. image_uri must be specified. |
gpus |
Optional[int] |
0 |
The number of GPU cores reserved per replica. image_uri must be specified. |
await_deployment |
Optional[bool] |
True |
If True , will block until deployment completes. |
client.update_model
Replaces the model artifact for a model.
Usage
import pathlib
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
model_dir = pathlib.Path('model_dir')
client.update_model(
project_id=PROJECT_ID,
model_id=MODEL_ID,
model_dir=model_dir
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
model_dir |
pathlib.Path |
A path to the directory containing all of the model files needed to run the model. | |
force_pre_compute |
bool |
True |
If True , re-run precomputation steps for the model. This can also be done manually by calling client.trigger_pre_computation . |
Returns
Type | Description |
---|---|
bool |
A boolean denoting whether the update was successful. |
client.delete_model
Deletes a model from a project.
Without deleting production data
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
client.delete_model(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
Deleting production data
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
client.delete_model(
project_id=PROJECT_ID,
model_id=MODEL_ID,
delete_prod=True
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
delete_prod |
Optional[bool] |
False |
If True , production data will also be deleted. |
delete_pred |
Optional[bool] |
True |
If True , prediction data will also be deleted. |
client.trigger_pre_computation
Runs a variety of precomputation steps for a model.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
client.trigger_pre_computation(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
model_id=MODEL_ID
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
dataset_id |
str |
The unique identifier for the dataset. | |
overwrite_cache |
Optional[bool] |
True |
If True , will overwrite existing cached information. |
batch_size |
Optional[int] |
10 |
The batch size used for global PDP calculations. |
calculate_predictions |
Optional[bool] |
True |
If True , will precompute and store model predictions. |
cache_global_pdps |
Optional[bool] |
True |
If True , will precompute and cache partial dependence plot information. |
cache_global_impact_importance |
Optional[bool] |
True |
If True , will precompute and cache global feature impact and global feature importance metrics. |
cache_dataset |
Optional[bool] |
False |
If True , will precompute and cache histogram information for the baseline dataset. |
client.get_model_info
Retrieves the ModelInfo
object associated with a model.
Usage
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
model_info = client.get_model_info(
project_id=PROJECT_ID,
model_id=MODEL_ID
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. |
Returns
Type | Description |
---|---|
fdl.ModelInfo |
The ModelInfo object associated with the specified model. |
Monitoring
client.publish_event
Publishes a single production event to Fiddler asynchronously.
Usage
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
example_event = {
'feature_1': 20.7,
'feature_2': 45000,
'feature_3': True,
'output_column': 0.79,
'target_column': 1
}
client.publish_event(
project_id=PROJECT_ID,
model_id=MODEL_ID,
event=example_event,
event_id='event_001',
event_timestamp=1637344470000
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
event |
dict |
A dictionary mapping field names to field values. Any fields found that are not present in the model's ModelInfo object will be dropped from the event. |
|
event_id |
Optional[str] |
None |
A unique identifier for the event. If not specified, Fiddler will generate its own ID, which can be retrived using the get_slice API. |
update_event |
Optional[bool] |
None |
If True , will only modify an existing event, referenced by event_id . If no event is found, no change will take place. |
event_timestamp |
Optional[int] |
None |
A timestamp for when the event took place. The format of this timestamp is given by timestamp_format . If no timestamp is provided, the current time will be used. |
timestamp_format |
Optional[fdl.FiddlerTimestamp] |
fdl.FiddlerTimestamp.INFER |
The format of the timestamp passed in event_timestamp . Can be one of
|
casting_type |
Optional[bool] |
False |
If True , will try to cast the data in event to be in line with the data types defined in the model's ModelInfo object. |
dry_run |
Optional[bool] |
False |
If True , the event will not be published, and instead a report will be generated with information about any problems with the event. Useful for debugging issues with event publishing. |
client.publish_events_batch
Publishes a batch of events to Fiddler synchronously.
Usage
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
path_to_batch = 'events_batch.csv'
client.publish_events_batch(
project_id=PROJECT_ID,
model_id=MODEL_ID,
batch_source=path_to_batch
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
batch_source |
Union[pd.DataFrame, str] |
Either a pandas DataFrame containing a batch of events, or the path to a file containing a batch of events. Supported file types are
|
|
id_field |
Optional[str] |
None |
The field containing event IDs for events in the batch. If not specified, Fiddler will generate its own IDs, which can be retrived using the get_slice API. |
update_event |
Optional[bool] |
False |
If True , will only modify existing events, referenced by IDs from id_field . If an ID is provided for which there is no event, no change will take place for that row. |
timestamp_field |
Optional[str] |
None |
The field containing timestamps for events in the batch. The format of these timestamps is given by timestamp_format . If no timestamp is provided for a given row, the current time will be used. |
timestamp_format |
Optional[fdl.FiddlerTimestamp] |
fdl.FiddlerTimestamp.INFER |
The format of the timestamps passed in timestamp_field . Can be one of
|
data_source |
Optional[fdl.BatchPublishType] |
None |
The location of the data source provided. By default, Fiddler will try to infer the value. Can be one of
|
casting_type |
Optional[bool] |
False |
If True , will try to cast the data in the events to be in line with the data types defined in the model's ModelInfo object. |
credentials |
Optional[dict] |
None |
A dictionary containing authorization information for AWS or GCP. For AWS, the expected keys are
For GCP, the expected keys are
|
group_by |
Optional[str] |
None |
The field used to group events together when computing performance metrics (for ranking models only). |
client.publish_events_batch_schema
Publishes a batch of events to Fiddler synchronously using a schema for locating fields within complex data structures.
Usage
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
path_to_batch = 'events_batch.avro'
schema = {
'__static': {
'__project': PROJECT_ID,
'__model': MODEL_ID
},
'__dynamic': {
'feature_1': 'features/feature_1',
'feature_2': 'features/feature_2',
'feature_3': 'features/feature_3',
'output_column': 'outputs/output_column',
'target_column': 'targets/target_column'
}
}
client.publish_events_batch_schema(
batch_source=path_to_batch,
publish_schema=schema
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
batch_source |
Union[pd.DataFrame, str] |
Either a pandas DataFrame containing a batch of events, or the path to a file containing a batch of events. Supported file types are
|
|
publish_schema |
dict |
A dictionary used for locating fields within complex or nested data structures. | |
data_source |
Optional[fdl.BatchPublishType] |
None |
The location of the data source provided. By default, Fiddler will try to infer the value. Can be one of
|
credentials |
Optional[dict] |
None |
A dictionary containing authorization information for AWS or GCP. For AWS, the expected keys are
For GCP, the expected keys are
|
group_by |
Optional[str] |
None |
The field used to group events together when computing performance metrics (for ranking models only). |
client.add_monitoring_config
Adds a custom configuration for monitoring.
Usage
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
monitoring_config = {
'min_bin_value': 3600,
'time_ranges': ['Day', 'Week', 'Month', 'Quarter', 'Year'],
'default_time_range': 7200
}
client.add_monitoring_config(
config_info=monitoring_config,
project_id=PROJECT_ID,
model_id=MODEL_ID
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
config_info |
dict |
Monitoring config info for an entire org or a project or a model. | |
project_id |
Optional[str] |
None |
The unique identifier for the project. |
model_id |
Optional[str] |
None |
The unique identifier for the model. |
Explainability
client.run_model
Runs a model on a pandas DataFrame and returns the predictions.
Usage
import pandas as pd
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
df = pd.read_csv('example_data.csv')
predictions = client.run_model(
project_id=PROJECT_ID,
model_id=MODEL_ID,
df=df
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
df |
pd.DataFrame |
A pandas DataFrame containing model input vectors as rows. | |
log_events |
bool |
False |
If True , the rows of df along with the model predictions will be logged as production events. |
casting_type |
bool |
False |
If True , will try to cast the data in event to be in line with the data types defined in the model's ModelInfo object. |
Returns
Type | Description |
---|---|
pd.DataFrame |
A pandas DataFrame containing model predictions for the given input vectors. |
client.run_explanation
Runs a point explanation for a given input vector.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
df = pd.read_csv('example_data.csv')
explanation = client.run_explanation(
project_id=PROJECT_ID,
model_id=MODEL_ID,
dataset_id=DATASET_ID,
df=df
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
dataset_id |
str |
The unique identifier for the dataset. | |
df |
pd.DataFrame |
A pandas DataFrame containing a model input vector as a row. If more than one row is included, the first row will be used. | |
explanations |
Union[str, list] |
'shap' |
A string or list of strings specifying which explanation algorithms to run. Can be one or more of
|
casting_type |
Optional[bool] |
False |
If True , will try to cast the data in event to be in line with the data types defined in the model's ModelInfo object. |
return_raw_response |
Optional[bool] |
False |
If True , a raw output will be returned instead of explanation objects. |
Returns
Type | Description |
---|---|
Union[fdl.AttributionExplanation, fdl.MulticlassAttributionExplanation, list] |
A fdl.AttributionExplanation object, fdl.MulticlassAttributionExplanation object, or list of such objects for each explanation method specified in explanations . |
client.run_feature_importance
Calculates global feature importance for a model over a specified dataset.
Usage
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset'
feature_importance = client.run_feature_importance(
project_id=PROJECT_ID,
model_id=MODEL_ID,
dataset_id=DATASET_ID
)
With a SQL query
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset'
slice_query = f""" SELECT * FROM "{DATASET_ID}.{MODEL_ID}" WHERE feature_1 < 20.0 LIMIT 100 """
feature_importance = client.run_feature_importance(
project_id=PROJECT_ID,
model_id=MODEL_ID,
dataset_id=DATASET_ID,
slice_query=slice_query
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
dataset_id |
str |
The unique identifier for the dataset. | |
dataset_splits |
Optional[list] |
None |
A list of dataset splits taken from the dataset argument of upload_dataset . If specified, feature importance will only be calculated over the provided splits. Otherwise, all splits will be used. |
slice_query |
Optional[str] |
None |
A SQL query. If specified, feature importance will only be calculated over the dataset slice specified by the query. |
**kwargs |
Additional arguments to be passed. Can be one or more of
|
Returns
Type | Description |
---|---|
dict |
A dictionary containing feature importance results. |
client.get_mutual_information
Calculates the mutual information (MI) between variables over a specified dataset.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
mutual_information_features = [
'feature_1',
'feature_2',
'feature_3'
]
mutual_information = client.get_mutual_information(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
features=mutual_information_features
)
With a SQL query
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
mutual_information_features = [
'feature_1',
'feature_2',
'feature_3'
]
slice_query = f""" SELECT * FROM "{DATASET_ID}.{MODEL_ID}" WHERE feature_1 < 20.0 LIMIT 100 """
mutual_information = client.get_mutual_information(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
features=mutual_information_features,
slice_query=slice_query
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset_id |
str |
The unique identifier for the dataset. | |
features |
list |
A list of features for which to compute mutual information. | |
normalized |
Optional[bool] |
False |
If True , will compute normalized mutual information (NMI) instead. |
slice_query |
Optional[str] |
None |
A SQL query. If specified, mutual information will only be calculated over the dataset slice specified by the query. |
sample_size |
Optional[int] |
None |
If specified, only sample_size samples will be used in the mutual information calculation. |
seed |
Optional[float] |
0.25 |
The random seed used to sample when sample_size is specified. |
Returns
Type | Description |
---|---|
dict |
A dictionary containing mutual information results. |
Analytics
client.get_slice
Retrieve a slice of data as a pandas DataFrame.
Querying a dataset
import pandas as pd
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
query = f""" SELECT * FROM "{DATASET_ID}.{MODEL_ID}" """
slice_df = client.get_slice(
sql_query=query,
project_id=PROJECT_ID
)
Querying production data
import pandas as pd
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
query = f""" SELECT * FROM "production.{MODEL_ID}" """
slice_df = client.get_slice(
sql_query=query,
project_id=PROJECT_ID
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
sql_query |
str |
The SQL query used to identify the slice. | |
project_id |
str |
The unique identifier for the project. | |
columns_override |
Optional[list] |
None |
A list of columns to include in the slice, even if they aren't specified in the query. |
Returns
Type | Description |
---|---|
pd.DataFrame |
A pandas DataFrame containing the slice returned by the specified query. |
Fairness
client.run_fairness
Calculates fairness metrics for a model over a specified dataset.
Usage
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset'
protected_features = [
'feature_1',
'feature_2'
]
positive_outcome = 1
fairness_metrics = client.run_fairness(
project_id=PROJECT_ID,
model_id=MODEL_ID,
dataset_id=DATASET_ID,
protected_features=protected_features,
positive_outcome=positive_outcome
)
With a SQL query
PROJECT_ID = 'example_project'
MODEL_ID = 'example_model'
DATASET_ID = 'example_dataset'
protected_features = [
'feature_1',
'feature_2'
]
positive_outcome = 1
slice_query = f""" SELECT * FROM "{DATASET_ID}.{MODEL_ID}" WHERE feature_1 < 20.0 LIMIT 100 """
fairness_metrics = client.run_fairness(
project_id=PROJECT_ID,
model_id=MODEL_ID,
dataset_id=DATASET_ID,
protected_features=protected_features,
positive_outcome=positive_outcome,
slice_query=slice_query
)
Get fairness metrics for a model over a dataset.
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
model_id |
str |
The unique identifier for the model. | |
dataset_id |
str |
The unique identifier for the dataset. | |
protected_features |
list |
A list of protected features. | |
positive_outcome |
Union[str, int] |
The name or value of the positive outcome for the model. | |
slice_query |
Optional[str] |
None |
A SQL query. If specified, fairness metrics will only be calculated over the dataset slice specified by the query. |
score_threshold |
Optional[float] |
0.5 |
The score threshold used to calculate model outcomes. |
Returns
Type | Description |
---|---|
dict |
A dictionary containing fairness metric results. |
Access Control
client.share_project
Shares a project with a user or team.
Usage
PROJECT_ID = 'example_project'
client.share_project(
project_name=PROJECT_ID,
role='READ',
user_name='user@example.com'
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
role |
str |
The permissions role being shared. Can be one of
|
|
user_name |
Optional[str] |
A username with which the project will be shared. Typically an email address. | |
team_name |
Optional[str] |
A team with which the project will be shared. |
client.unshare_project
Shares a project with a user or team.
Usage
PROJECT_ID = 'example_project'
client.unshare_project(
project_name=PROJECT_ID,
role='READ',
user_name='user@example.com'
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
role |
str |
The permissions role being revoked. Can be one of
|
|
user_name |
Optional[str] |
A username with which the project permissions will be revoked. Typically an email address. | |
team_name |
Optional[str] |
A team with which the project permissions will be revoked. |
client.list_org_roles
Retrieves the names of all users and their permissions roles.
Usage
client.list_org_roles()
Response
{
'members': [
{
'id': 1,
'user': 'admin@example.com',
'email': 'admin@example.com',
'isLoggedIn': True,
'firstName': 'Example',
'lastName': 'Administrator',
'imageUrl': None,
'settings': {'notifyNews': True,
'notifyAccount': True,
'sliceTutorialCompleted': True},
'role': 'ADMINISTRATOR'
},
{
'id': 2,
'user': 'user@example.com',
'email': 'user@example.com',
'isLoggedIn': True,
'firstName': 'Example',
'lastName': 'User',
'imageUrl': None,
'settings': {'notifyNews': True,
'notifyAccount': True,
'sliceTutorialCompleted': True},
'role': 'MEMBER'
}
],
'invitations': [
{
'id': 3,
'user': 'newuser@example.com',
'role': 'MEMBER',
'invited': True,
'link': 'http://app.fiddler.ai/signup/vSQWZkt3FP--pgzmuYe_-3-NNVuR58OLZalZOlvR0GY'
}
]
}
Returns
Type | Description |
---|---|
dict |
A dictionary of users and their roles in the organization. |
client.list_project_roles
Retrieves the names of users and their permissions roles for a given project.
Usage
PROJECT_ID = 'example_project'
client.list_project_roles(
project_name=PROJECT_ID
)
Response
{
'roles': [
{
'user': {
'email': 'admin@example.com'
},
'team': None,
'role': {
'name': 'OWNER'
}
},
{
'user': {
'email': 'user@example.com'
},
'team': None,
'role': {
'name': 'READ'
}
}
]
}
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. |
Returns
Type | Description |
---|---|
dict |
A dictionary of users and their roles for the specified project. |
client.list_teams
Retrieves the names of all teams and the users and roles within each team.
Usage
client.list_teams()
Response
{
'example_team': {
'members': [
{
'user': 'admin@example.com',
'role': 'MEMBER'
},
{
'user': 'user@example.com',
'role': 'MEMBER'
}
]
}
}
Returns
Type | Description |
---|---|
dict |
A dictionary containing information about teams and users. |
Experimental
Fiddler's experimental module offers a chance to preview functionality that hasn't been officially released.
client.experimental.tabular_similarity_search
Search for rows that are most similar to a given row.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
example_row = pd.DataFrame({
'feature_1': [50.4],
'feature_2': [64301],
'feature_3': ['Yes'],
'output_column': [0.22],
'target_column': [0]
})
client.experimental.tabular_similarity_search(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
model_id=MODEL_ID,
feature_point_to_match=example_row,
num_results=5
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset_id |
str |
The unique identifier for the dataset. | |
model_id |
str |
The unique identifier for the model. | |
feature_point_to_match |
pd.DataFrame |
The row for which to search for similarities. | |
num_results |
Optional[int] |
5 |
The number of search results to return. |
where_clause |
Optional[str] |
'' |
A SQL WHERE clause used for filtering results. |
Returns
Type | Description |
---|---|
pd.DataFrame |
A pandas DataFrame containing the rows in the baseline dataset that are most similar to the given row. |
client.experimental.initialize_embeddings
Initialize an embeddings file for advanced NLP models.
Usage
client.experimental.initialize_embeddings(
path='embeddings.txt'
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
path |
str |
The path to an embeddings file. |
client.experimental.upload_dataset_with_nlp_embeddings
Upload a dataset with NLP embeddings.
Embeddings must first be initialized with client.experimental.initialize_embeddings
.
Usage
import pandas as pd
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
client.experimental.upload_dataset_with_nlp_embeddings(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
dataset={
'baseline': df
},
info=dataset_info,
text_field_to_index='text_feature'
)
Response
{
'row_count': 10000,
'col_count': 20,
'log': [
'Importing dataset example_dataset',
'Creating table for example_dataset',
'Importing data file: baseline.csv'
]
}
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset_id |
str |
A unique identifier for the dataset. Must be a lowercase string between 2-30 characters containing only alphanumeric characters and underscores. Additionally, it must not start with a numeric character. | |
dataset |
dict |
A dictionary mapping dataset slice names to pandas DataFrames. | |
info |
Optional[fdl.DatasetInfo] |
None |
The Fiddler DatasetInfo object used to describe the dataset. Click here for more information. |
text_field_to_index |
str |
The name of the text feature to be used for indexing. |
Returns
Type | Description |
---|---|
dict |
A dictionary containing information about the uploaded dataset. |
client.experimental.nlp_similarity_search
Search for texts that are most similar to a given text.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
example_text = "Let's get together and eat."
client.experimental.nlp_similarity_search(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
model_id=MODEL_ID,
nlp_field='text_feature',
string_to_match=example_text,
num_results=5
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset_id |
str |
The unique identifier for the dataset. | |
model_id |
str |
The unique identifier for the model. | |
nlp_field |
str |
'' |
The feature containing NLP/text data used for searching. |
string_to_match |
str |
'' |
The text for which to search for similarities. |
num_results |
Optional[int] |
5 |
The number of search results to return. |
where_clause |
Optional[str] |
'' |
A SQL WHERE clause used for filtering results. |
Returns
Type | Description |
---|---|
pd.DataFrame |
A pandas DataFrame containing the texts in the baseline dataset that are most similar to the given text. |
client.experimental.run_nlp_feature_impact
Get feature impact metrics for each word in a collection of texts.
Usage
PROJECT_ID = 'example_project'
DATASET_ID = 'example_dataset'
MODEL_ID = 'example_model'
client.experimental.run_nlp_feature_impact(
project_id=PROJECT_ID,
dataset_id=DATASET_ID,
model_id=MODEL_ID
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
project_id |
str |
The unique identifier for the project. | |
dataset_id |
str |
The unique identifier for the dataset. | |
model_id |
str |
The unique identifier for the model. | |
source |
Optional[str] |
None |
The dataset split over which to compute feature impact. |
Returns
Type | Description |
---|---|
tuple |
A tuple containing results of the feature impact computation. |
Objects
fdl.DatasetInfo
Stores information about a dataset.
For information on how to customize these objects, see Customizing Your Dataset Schema.
Usage
columns = [
fdl.Column(
name='feature_1',
data_type=fdl.DataType.FLOAT
),
fdl.Column(
name='feature_2',
data_type=fdl.DataType.INTEGER
),
fdl.Column(
name='feature_3',
data_type=fdl.DataType.BOOLEAN
),
fdl.Column(
name='output_column',
data_type=fdl.DataType.FLOAT
),
fdl.Column(
name='target_column',
data_type=fdl.DataType.INTEGER
)
]
dataset_info = fdl.DatasetInfo(
display_name='Example Dataset',
columns=columns
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
display_name |
str |
A display name for the dataset. | |
columns |
list |
A list of fdl.Column objects containing information about the columns. |
|
files |
Optional[list] |
None |
A list of strings pointing to CSV files to use. |
dataset_id |
str |
None |
The unique identifier for the dataset. |
**kwargs |
Additional arguments to be passed. |
fdl.DatasetInfo.from_dataframe
Constructs a DatasetInfo
object from a pandas DataFrame.
Usage
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
df |
Union[pd.DataFrame, list] |
Either a single pandas DataFrame or a list of DataFrames. If a list is given, all dataframes must have the same columns. | |
display_name |
str |
'' |
A display name for the dataset. |
max_inferred_cardinality |
Optional[int] |
None |
If specified, any string column containing fewer than max_inferred_cardinality unique values will be converted to a categorical data type. |
dataset_id |
Optional[str] |
None |
The unique identifier for the dataset. |
Returns
Type | Description |
---|---|
fdl.DatasetInfo |
A DatasetInfo object constructed from the pandas DataFrame provided. |
fdl.DatasetInfo.to_dict
Converts a DatasetInfo
object to a dictionary.
Usage
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
dataset_info_dict = dataset_info.to_dict()
Response
{
'name': 'Example Dataset',
'columns': [
{
'column-name': 'feature_1',
'data-type': 'float'
},
{
'column-name': 'feature_2',
'data-type': 'int'
},
{
'column-name': 'feature_3',
'data-type': 'bool'
},
{
'column-name': 'output_column',
'data-type': 'float'
},
{
'column-name': 'target_column',
'data-type': 'int'
}
],
'files': []
}
Returns
Type | Description |
---|---|
dict |
A dictionary containing information from the DatasetInfo object. |
fdl.DatasetInfo.from_dict
Converts a dictionary to a DatasetInfo
object.
Usage
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
dataset_info_dict = dataset_info.to_dict()
new_dataset_info = fdl.DatasetInfo.from_dict(
deserialized_json={
'dataset': dataset_info_dict
}
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
deserialized_json |
dict |
The dictionary to be converted. |
Returns
Type | Description |
---|---|
fdl.DatasetInfo |
A DatasetInfo object constructed from the dictionary. |
fdl.ModelInfo
Stores information about a model.
Usage
inputs = [
fdl.Column(
name='feature_1',
data_type=fdl.DataType.FLOAT
),
fdl.Column(
name='feature_2',
data_type=fdl.DataType.INTEGER
),
fdl.Column(
name='feature_3',
data_type=fdl.DataType.BOOLEAN
)
]
outputs = [
fdl.Column(
name='output_column',
data_type=fdl.DataType.FLOAT
)
]
targets = [
fdl.Column(
name='target_column',
data_type=fdl.DataType.INTEGER
)
]
model_info = fdl.ModelInfo(
display_name='Example Model',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION,
inputs=inputs,
outputs=outputs,
targets=targets
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
display_name |
str |
A display name for the model. | |
input_type |
fdl.ModelInputType |
A ModelInputType object containing the input type of the model. |
|
model_task |
ModelTask |
A ModelTask object containing the model task. |
|
inputs |
list |
A list of Column objects corresponding to the inputs (features) of the model. |
|
outputs |
list |
A list of Column objects corresponding to the outputs (predictions) of the model. |
|
target_class_order |
Optional[list] |
None |
A list denoting the order of classes in the target. |
metadata |
Optional[list] |
None |
A list of Column objects corresponding to any metadata fields. |
decisions |
Optional[list] |
None |
A list of Column objects corresponding to any decision fields (post-prediction business decisions). |
targets |
Optional[list] |
None |
A list of Column objects corresponding to the targets (ground truth) of the model. |
framework |
Optional[str] |
None |
A string providing information about the software library and version used to train and run this model. |
description |
Optional[str] |
None |
A description of the model. |
datasets |
Optional[list] |
None |
A list of the dataset IDs used by the model. |
mlflow_params |
Optional[fdl.MLFlowParams] |
None |
A MLFlowParams object containing information about MLFlow parameters. |
model_deployment_params |
Optional[fdl.ModelDeploymentParams] |
None |
A ModelDeploymentParams object containing information about model deployment. |
artifact_status |
Optional[fdl.ArtifactStatus] |
None |
An ArtifactStatus object containing information about the model artifact. |
preferred_explanation_method |
Optional[fdl.ExplanationMethod] |
None |
An ExplanationMethod object that specifies the default explanation algorithm to use for the model. |
custom_explanation_names |
Optional[list] |
[] |
A list of names that can be passed to the explanation_name argument of the optional user-defined explain_custom method of the model object defined in package.py . |
binary_classification_threshold |
Optional[float] |
None |
The threshold used for classifying examples for binary classifiers. |
ranking_top_k |
Optional[int] |
None |
Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. |
group_by |
Optional[str] |
None |
The column by which to group events for certain performance metrics like MAP and NDCG. |
**kwargs |
Additional arguments to be passed. |
fdl.ModelInfo.from_dataset_info
Constructs a ModelInfo
object from a DatasetInfo
object.
Usage
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
features=[
'feature_1',
'feature_2',
'feature_3'
],
outputs=[
'output_column'
],
target='target_column',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
dataset_info |
fdl.DatasetInfo |
The DatasetInfo object from which to construct the ModelInfo object. |
|
target |
str |
The column to be used as the target (ground truth). | |
dataset_id |
Optional[str] |
None |
The unique identifier for the dataset. |
features |
Optional[list] |
None |
A list of columns to be used as features. |
metadata_cols |
Optional[list] |
None |
A list of columns to be used as metadata fields. |
decision_cols |
Optional[list] |
None |
A list of columns to be used as decision fields. |
display_name |
Optional[str] |
None |
A display name for the model. |
description |
Optional[str] |
None |
A description of the model. |
input_type |
fdl.ModelInputType |
fdl.ModelInputType.TABULAR |
A ModelInputType object containing the input type for the model. |
model_task |
Optional[fdl.ModelTask] |
None |
A ModelTask object containing the model task. |
outputs |
Optional[list] |
None |
A list of columns containing model outputs (predictions). |
categorical_target_class_details |
Optional[list] |
None |
A list denoting the order of classes in the target. |
model_deployment_params |
Optional[fdl.ModelDeploymentParams] |
None |
A ModelDeploymentParams object containing information about model deployment. |
preferred_explanation_method |
Optional[fdl.ExplanationMethod] |
None |
An ExplanationMethod object that specifies the default explanation algorithm to use for the model. |
custom_explanation_names |
Optional[list] |
[] |
A list of names that can be passed to the explanation_name argument of the optional user-defined explain_custom method of the model object defined in package.py . |
binary_classification_threshold |
Optional[float] |
None |
The threshold used for classifying examples for binary classifiers. |
ranking_top_k |
Optional[int] |
None |
Used only for ranking models. Sets the top k results to take into consideration when computing performance metrics like MAP and NDCG. |
group_by |
Optional[str] |
None |
The column by which to group events for certain performance metrics like MAP and NDCG. |
Returns
Type | Description |
---|---|
fdl.ModelInfo |
A ModelInfo object constructed from the DatasetInfo object provided. |
fdl.ModelInfo.to_dict
Converts a ModelInfo
object to a dictionary.
Usage
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
features=[
'feature_1',
'feature_2',
'feature_3'
],
outputs=[
'output_column'
],
target='target_column',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)
model_info_dict = model_info.to_dict()
Response
{
'name': 'Example Model',
'input-type': 'structured',
'model-task': 'binary_classification',
'inputs': [
{
'column-name': 'feature_1',
'data-type': 'float'
},
{
'column-name': 'feature_2',
'data-type': 'int'
},
{
'column-name': 'feature_3',
'data-type': 'bool'
},
{
'column-name': 'target_column',
'data-type': 'int'
}
],
'outputs': [
{
'column-name': 'output_column',
'data-type': 'float'
}
],
'datasets': [],
'targets': [
{
'column-name': 'target_column',
'data-type': 'int'
}
],
'custom-explanation-names': []
}
Returns
Type | Description |
---|---|
dict |
A dictionary containing information from the DatasetInfo object. |
fdl.ModelInfo.from_dict
Converts a dictionary to a ModelInfo
object.
Usage
import pandas as pd
df = pd.read_csv('example_dataset.csv')
dataset_info = fdl.DatasetInfo.from_dataframe(
df=df
)
model_info = fdl.ModelInfo.from_dataset_info(
dataset_info=dataset_info,
features=[
'feature_1',
'feature_2',
'feature_3'
],
outputs=[
'output_column'
],
target='target_column',
input_type=fdl.ModelInputType.TABULAR,
model_task=fdl.ModelTask.BINARY_CLASSIFICATION
)
model_info_dict = model_info.to_dict()
new_model_info = fdl.ModelInfo.from_dict(
deserialized_json={
'model': model_info_dict
}
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
deserialized_json |
dict |
The dictionary to be converted. |
Returns
Type | Description |
---|---|
fdl.ModelInfo |
A ModelInfo object constructed from the dictionary. |
fdl.ModelInputType
Represents supported model input types.
Usage
model_input_type = fdl.ModelInputType.TABULAR
Enum Values
Enum Value | Description |
---|---|
fdl.ModelInputType.TABULAR |
For tabular models. |
fdl.ModelInputType.TEXT |
For text models. |
fdl.ModelTask
Represents supported model tasks.
Usage
model_task = fdl.ModelTask.BINARY_CLASSIFICATION
Enum Values
Enum Value | Description |
---|---|
fdl.ModelTask.REGRESSION |
For regression models. |
fdl.ModelTask.BINARY_CLASSIFICATION |
For binary classification models. |
fdl.ModelTask.MULTICLASS_CLASSIFICATION |
For multiclass classification models. |
fdl.ModelTask.RANKING |
For ranking models. |
fdl.DataType
Represents supported data types.
Usage
data_type = fdl.DataType.FLOAT
Enum Values
Enum Value | Description |
---|---|
fdl.DataType.FLOAT |
For floats. |
fdl.DataType.INTEGER |
For integers. |
fdl.DataType.BOOLEAN |
For booleans. |
fdl.DataType.STRING |
For strings. |
fdl.DataType.CATEGORY |
For categorical types. |
fdl.Column
Represents a column of a dataset.
Usage
column = fdl.Column(
name='feature_1',
data_type=fdl.DataType.FLOAT,
value_range_min=0.0,
value_range_max=80.0
)
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
name |
str |
The name of the column. | |
data_type |
fdl.DataType |
The DataType object corresponding to the data type of the column. |
|
possible_values |
Optional[list] |
None |
A list of unique values used for categorical columns. |
is_nullable |
Optional[bool] |
None |
If True , will expect missing values in the column. |
value_range_min |
Optional[float] |
None |
The minimum value used for numeric columns. |
value_range_max |
Optional[float] |
None |
The maximum value used for numeric columns. |