Skip to main content

Databricks Integration

Neurolabs enables Databricks access to the Item Catalog and Image Recognition results for CPGs and ecosystem partners, bringing the Visual AI Layer inside the Lakehouse [1].

This guide covers how to use the Neurolabs ZIA Python SDK to store and manage image recognition results and catalog data with Databricks Notebooks & Unity Catalog. It includes a quickstart, sample code snippets for data processing, Unity Catalog integration and an end to end working example.

If you'd like to learn more about the reports & dashboards Neurolabs Visual AI data enables, and how to create these using Databricks native visualisation tools, please reach out.

Quick Start

Overview of Neurolabs Databricks Integration Workflows

Neurolabs Databricks Integration Workflow

For a complete end to end workflow, clone the Neurolabs Blueprint notebook for populating Unity Catalog with IR Results: IR Results Neurolabs Ingestion [DB Integration].ipynb

High-Level Steps

  1. Prerequisites - Prerequisites for setting up the integration
  2. Setup Client Configuration - Initialize the ZIA client and configure authentication
  3. Get IR Results - Fetch image recognition results or catalog data from Neurolabs
  4. Setup Unity Catalog - Prepare Unity Catalog structure
  5. Populate Unity Catalog - Convert and write results to Unity Catalog tables

Table of Contents

  1. Prerequisites
  2. Installation
  3. Setup Client
  4. Get IR Results
  5. Setup Unity Catalog
  6. Populate Unity Catalog
  7. Advanced Examples
  8. Troubleshooting

Prerequisites

  • Setup Databricks workspace with Unity Catalog enabled.
  • Python 3.11+ runtime
  • Get access to Neurolabs Zia Platform and create an API key
  • Unity Catalog permissions for creating/updating tables

[Optional]

Install Databricks CLI

brew tap databricks/tap
brew install databricks

We recommend configuring Databricks Secrets with Neurolabs API Key at Workspace Level.

databricks auth login --host <your_hostname> 

databricks secrets create-scope <scope-name>
databricks secrets put-secret --json '{
"scope": "<scope-name>",
"key": "<key-name>",
"string_value": "<secret>"
}'

For production jobs, it is recommended setup a principal service account.

Installation

Install ZIA Neurolabs SDK in Databricks

In your Databricks Notebook, install the ZIA SDK with PySpark & Pandas extras enabled. To find out out more about the ZIA SDK, check the PYPI Project.

# Install ZIA SDK
!pip install zia-sdk-python[databricks]

# Restart Python to ensure the package is available
dbutils.library.restartPython()

Configure Neurolabs Secrets

Set up your Neurolabs API credentials with databricks secrets or using environment variables:

import os

# Option A:
os.environ["NEUROLABS_API_KEY"] = "your-api-key-here"

# Option B
try:
api_key = dbutils.secrets.get(scope="neurolabs-api", key="demo-key")
except Exception as e:
raise RuntimeError("Failed to retrieve API key from Databricks secrets. Make sure the secret scope and key are set up.") from e

1. Setup Client

Initialize ZIA Client

# Import zia-sdk depdendencies 
from neurolabszia import Zia

# Initialize the client
client = Zia(api-key)

# Test the connection
try:
# Get catalog items to verify connection
catalog_items = await client.catalog.get_all_items()
print(f"Successfully connected! Found {len(catalog_items)} catalog items")
except Exception as e:
print(f"Connection failed: {e}")

2. Get IR Results

Fetch Image Recognition Results

Using the account provided by Neurolabs and with access to a Task UUID, you can now retrieve some image recognition results. The SDK supports both the NLIRResult data model, and raw JSON results.

# Fetch some results from a specific task
task_uuid = "your-task-uuid"
batch_size = 10
offset = 0

results = await client.result_management.get_task_results(
task_uuid=task_uuid,
limit=batch_size,
offset=offset
)

# Optional: Parse Raw JSON Response
results_json = await client.result_management.get_task_results_raw(
task_uuid=task_uuid,
limit=batch_size,
offset=offset
)


print(f"Retrieved {len(results)} results from task {task_uuid}")

[Optional] Convert IR Results to Pandas DataFrame

from neurolabszia.utils import ir_results_to_dataframe

# Convert results to pandas DataFrame
df = ir_results_to_dataframe(
results,
include_bbox=True,
include_alternative_predictions=True,
include_modalities=True, # Include realogram data
include_shares=True # Include share of shelf data
)

print(f"DataFrame shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")

# Display sample data
display(df.head())

Convert IR Results to Spark DataFrame

In order to populate the resuls into Unity Catalog, the first step is to convert the IR Results into a Spark dataframe.

from neurolabszia import Zia, NLIRResult 
from neurolabszia.utils import to_spark_dataframe

# Create Spark Session
spark = SparkSession.builder.appName("NLIRResultsIngestion").getOrCreate()

# Convert to Spark DataFrame
spark_df = to_spark_dataframe(
results,
spark,
include_bbox=True,
include_alternative_predictions=True,
include_modalities=True,
include_shares=True
)

print(f"Spark DataFrame count: {spark_df.count()}")
display(spark_df.limit(10))

3. Setup Unity Catalog

Create Unity Catalog Structure

Before populating with data, ensure your Unity Catalog structure is set up:

# Create catalog and schema if they don't exist
catalog_name = "neurolabs"
schema_name = "image_recognition"

# Note: In production, these should be created by your Unity Catalog admin
print(f"Ensure Unity Catalog structure exists: {catalog_name}.{schema_name}")

spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog_name}.{schema_name}")

4. Populate Unity Catalog

Populate Unity Catalog Table from IR Results

from neurolabszia.utils import to_spark_dataframe

# Convert results to Spark DataFrame
spark_df = to_spark_dataframe(
results,
spark,
include_bbox=True,
include_alternative_predictions=True,
include_modalities=True,
include_shares=True
)

# Create the full table path
table_path = f"{catalog_name}.{schema_name}.{table_name}"

# Write to Unity Catalog

spark_df.write.format("delta").mode(mode).saveAsTable(table_path)

print(f"Successfully created/updated table: {table_path}")
print(f"Row count: {spark_df.count()}")

Populate Unity Catalog Table from Catalog Items

def create_neurolabs_catalog_table(
catalog_name: str,
schema_name: str,
table_name: str,
mode: str = "overwrite"
):
"""
Create a Unity Catalog table from catalog items.
"""
# Fetch all catalog items
catalog_items = await client.catalog.get_all_items()

# Convert to Spark DataFrame
catalog_df = spark.createDataFrame([
{
"uuid": item.uuid,
"name": item.name,
"status": item.status.value,
"thumbnail_url": item.thumbnail_url,
"brand": item.brand,
"barcode": item.barcode,
"custom_id": item.custom_id,
"height": item.height,
"width": item.width,
"depth": item.depth,
"size": item.size,
"container_type": item.container_type,
"flavour": item.flavour,
"packaging_size": item.packaging_size,
"created_at": item.created_at,
"updated_at": item.updated_at
}
for item in catalog_items
])

# Create the full table path
table_path = f"{catalog_name}.{schema_name}.{table_name}"

# Write to Unity Catalog
catalog_df.write.format("delta").mode(mode).saveAsTable(table_path)

print(f"Successfully created/updated catalog table: {table_path}")
print(f"Catalog items count: {catalog_df.count()}")

return table_path

# Example usage
catalog_table_path = create_neurolabs_catalog_table(
catalog_name="neurolabs",
schema_name="catalog",
table_name="products",
mode="overwrite"
)

Advanced Examples

1. Batch Processing Multiple Tasks

Coming soon ..

2. Data Quality Checks

Coming soon ..

3. Analytics and Insights

Coming soon ..

Troubleshooting

Common Issues

  1. Authentication Errors

  2. Schema Mismatches

  3. Unity Catalog Permissions

    # Test table creation permissions
    try:
    test_df = spark.createDataFrame([{"test": "data"}])
    test_df.write.format('delta').mode("overwrite").saveAsTable("test_table")
    print("Unity Catalog permissions OK")
    except Exception as e:
    print(f"Unity Catalog permission error: {e}")

Performance Optimization

  1. Use appropriate cluster size for your data volume
  2. Enable autoscaling for variable workloads
  3. Cache frequently accessed DataFrames

Support

For issues specific to the ZIA SDK, check the main Zia SDK README.md file or contact the development team at support@neurolabs.ai.

For Databricks-specific issues, refer to the Databricks documentation.