Close Menu
    Trending
    • Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025
    • How Smart Entrepreneurs Turn Mid-Year Tax Reviews Into Long-Term Financial Wins
    • Become a Better Data Scientist with These Prompt Engineering Tips and Tricks
    • Meanwhile in Europe: How We Learned to Stop Worrying and Love the AI Angst | by Andreas Maier | Jul, 2025
    • Transform Complexity into Opportunity with Digital Engineering
    • OpenAI Is Fighting Back Against Meta Poaching AI Talent
    • Lessons Learned After 6.5 Years Of Machine Learning
    • Handling Big Git Repos in AI Development | by Rajarshi Karmakar | Jul, 2025
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Building a Modular Computer Vision Perception System: Part 4 — Image Segmentation | by Leonardo | Apr, 2025
    Machine Learning

    Building a Modular Computer Vision Perception System: Part 4 — Image Segmentation | by Leonardo | Apr, 2025

    Team_AIBS NewsBy Team_AIBS NewsApril 18, 2025No Comments20 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    That is the fourth article in our collection exploring the design and implementation of a complete laptop imaginative and prescient notion system. In Part 1, we coated detection, in Part 2, we explored monitoring, and in Part 3, we added depth estimation. Immediately, we’ll implement picture segmentation to attain pixel-perfect object understanding.

    Think about a self-driving automotive approaching a busy intersection. It detects and tracks varied objects — automobiles, pedestrians, cyclists — and estimates their distances. However to navigate safely, it must know the exact boundaries of every object, distinguish between an individual and their shadow, and determine drivable highway surfaces all the way down to the pixel degree.

    That is the place picture segmentation turns into essential. Whereas detection offers us bounding containers that roughly find objects, segmentation offers pixel-precise masks that define precisely what elements of the picture belong to every object, enabling a lot finer-grained scene understanding.

    On this article, we’ll discover easy methods to implement a picture segmentation module utilizing Meta AI’s groundbreaking Phase Something Mannequin (SAM), and combine it into our notion pipeline. SAM represents a brand new paradigm in segmentation — a basis mannequin that may section just about something in a picture, even objects it wasn’t particularly educated to acknowledge.

    Picture segmentation offers pixel-precise masks that define precisely what elements of the picture belong to every object.

    Segmentation serves a number of vital capabilities in a notion system:

    1. Exact object boundaries: Figuring out precisely which pixels belong to every object
    2. Occasion separation: Distinguishing between a number of situations of the identical object class
    3. Fantastic-grained evaluation: Enabling detailed evaluation of object elements and options
    4. Scene understanding: Figuring out surfaces, areas, and background parts
    5. Occlusion dealing with: Higher understanding of partial occlusions between objects

    These capabilities are basic for functions requiring exact environmental understanding, akin to autonomous autos, robotics, medical imaging, and augmented actuality.

    To know segmentation, it’s useful to check it with detection:

    This comparability highlights why segmentation is so helpful: it offers a way more detailed understanding of objects and their boundaries.

    There are a number of forms of picture segmentation, every serving totally different functions:

    1. Semantic segmentation: Assigns a category label to every pixel (e.g., highway, automotive, individual)
    2. Occasion segmentation: Distinguishes between totally different situations of the identical class (e.g., automotive #1, automotive #2)
    3. Panoptic segmentation: Combines semantic and occasion segmentation
    4. Immediate-based segmentation: Generates masks based mostly on prompts like factors or containers

    For our notion system, we’re notably fascinated with occasion segmentation, because it aligns properly with our detection and monitoring modules.

    For our implementation, we selected the Phase Something Mannequin (SAM), launched by Kirillov et al. from Meta AI Analysis. SAM represents a serious advance in segmentation know-how:

    1. Immediate-based: Can generate masks from varied forms of prompts (factors, containers, textual content)
    2. Zero-shot efficiency: Works on objects and scenes it wasn’t particularly educated on
    3. Scale and variety: Skilled on over 1 billion masks throughout numerous photos
    4. Actual-time functionality: Smaller variants can run in real-time on fashionable {hardware}
    5. Basis mannequin: Serves as a base for personalisation and fine-tuning

    SAM is especially well-suited for integration with our notion system as a result of it will probably work with detection outcomes (bounding containers) as prompts to generate exact masks. This creates a pure workflow: detection → monitoring → segmentation.

    Following our design ideas of modularity and abstraction, we’ll create an summary base class for segmentation, then implement SAM-based segmentation.

    First, let’s outline the summary base class that establishes the interface for any segmentation implementation:

    # notion/segmentation/segmenter.py

    from abc import ABC, abstractmethod
    from typing import Checklist, Dict, Any, Tuple, Elective
    import numpy as np

    class Segmenter(ABC):
    """
    Summary base class for picture segmentation.

    All segmenter implementations ought to inherit from this class and
    implement the section technique.
    """

    def __init__(self, config: Dict = None):
    """
    Initialize the segmenter with configuration.

    Args:
    config: Configuration dictionary
    """
    self.config = config or {}
    self.is_initialized = False

    @abstractmethod
    def initialize(self) -> None:
    """Initialize the segmenter and cargo fashions."""
    cross

    @abstractmethod
    def section(self, body: np.ndarray) -> Dict[str, Any]:
    """
    Phase objects in a body.

    Args:
    body: Enter picture body (BGR format)

    Returns:
    Dictionary containing:
    - masks: Checklist of segmentation masks (every as a binary numpy array)
    - courses: Checklist of sophistication IDs for every masks
    - scores: Checklist of confidence scores for every masks
    - class_names: Checklist of human-readable class names
    """
    cross

    def preprocess(self, body: np.ndarray) -> np.ndarray:
    """
    Preprocess body for segmentation.

    Args:
    body: Enter body

    Returns:
    Preprocessed body
    """
    # Default preprocessing (might be overridden)
    return body

    def segment_by_points(self,
    body: np.ndarray,
    factors: Checklist[Tuple[int, int]],
    point_labels: Elective[List[int]] = None) -> Dict[str, Any]:
    """
    Phase based mostly on immediate factors.

    Args:
    body: Enter body
    factors: Checklist of (x, y) coordinate tuples to make use of as prompts
    point_labels: Checklist of labels for factors (1 for foreground, 0 for background)
    If None, all factors are thought of foreground

    Returns:
    Dictionary with segmentation outcomes
    """
    # Default implementation (needs to be overridden by implementations that assist this)
    increase NotImplementedError("Level-based segmentation not supported by this segmenter")

    def segment_by_boxes(self,
    body: np.ndarray,
    containers: Checklist[List[int]]) -> Dict[str, Any]:
    """
    Phase based mostly on bounding containers.

    Args:
    body: Enter body
    containers: Checklist of [x1, y1, x2, y2] bounding containers to make use of as prompts

    Returns:
    Dictionary with segmentation outcomes
    """
    # Default implementation (needs to be overridden by implementations that assist this)
    increase NotImplementedError("Field-based segmentation not supported by this segmenter")

    def segment_by_masks(self,
    body: np.ndarray,
    masks: Checklist[np.ndarray]) -> Dict[str, Any]:
    """
    Refine present masks.

    Args:
    body: Enter body
    masks: Checklist of binary masks to refine

    Returns:
    Dictionary with refined segmentation outcomes
    """
    # Default implementation (needs to be overridden by implementations that assist this)
    increase NotImplementedError("Masks-based segmentation not supported by this segmenter")

    This summary class defines:

    1. Core performance via the section() technique that each one implementations should present
    2. Immediate-based interfaces for various methods to information segmentation (factors, containers, masks)
    3. Constant return format with masks, scores, courses, and sophistication names
    4. Configuration assist for versatile habits customization

    Now, let’s implement the Phase Something Mannequin (SAM) segmenter:

    # notion/segmentation/segment_anything.py

    import os
    import sys
    import numpy as np
    import torch
    import cv2
    import logging
    from typing import Checklist, Dict, Tuple, Elective, Any
    from notion.segmentation.segmenter import Segmenter

    logger = logging.getLogger(__name__)

    class SegmentAnythingModel(Segmenter):
    """
    Picture segmentation utilizing Meta's Phase Something Mannequin (SAM).

    This class implements picture segmentation utilizing the SAM structure,
    which may section something in a picture based mostly on prompts.
    """

    def __init__(self, config: Dict = None):
    """
    Initialize SAM segmenter.

    Args:
    config: Configuration with keys:
    - model_type: SAM mannequin kind ('vit_h', 'vit_l', 'vit_b')
    - checkpoint: Path to mannequin checkpoint or 'default'
    - system: Inference system ('cuda', 'cpu')
    - points_per_side: Variety of factors for automated masks era
    - conf_threshold: Confidence threshold for predictions (0-1)
    """
    tremendous().__init__(config)
    self.config = {
    'model_type': 'vit_b', # Choices: 'vit_h', 'vit_l', 'vit_b'
    'checkpoint': 'default',
    'system': 'cuda' if torch.cuda.is_available() else 'cpu',
    'points_per_side': 32, # For automated masks era
    'conf_threshold': 0.8,
    'output_mode': 'binary_mask', # or 'crf_refined', 'full'
    **(config or {})
    }

    self.mannequin = None
    self.predictor = None
    self.system = None

    def initialize(self) -> None:
    """Initialize the SAM mannequin."""
    attempt:
    logger.information(f"Initializing SAM segmenter on {self.config['device']}...")

    # Import SAM required libraries
    attempt:
    from segment_anything import sam_model_registry, SamPredictor
    besides ImportError:
    logger.information("Putting in segment_anything...")
    os.system('pip set up git+https://github.com/facebookresearch/segment-anything.git')
    from segment_anything import sam_model_registry, SamPredictor

    # Set system
    self.system = torch.system(self.config['device'])

    # Decide checkpoint path
    checkpoint_path = self.config['checkpoint']
    if checkpoint_path == 'default':
    model_type = self.config['model_type']

    # Default checkpoints based mostly on mannequin kind
    if model_type == 'vit_h':
    checkpoint_path = "sam_vit_h_4b8939.pth"
    elif model_type == 'vit_l':
    checkpoint_path = "sam_vit_l_0b3195.pth"
    else: # vit_b
    checkpoint_path = "sam_vit_b_01ec64.pth"

    # Test if checkpoint exists, in any other case obtain it
    if not os.path.exists(checkpoint_path):
    logger.information(f"Downloading {checkpoint_path}...")
    import urllib.request
    model_urls = {
    "sam_vit_h_4b8939.pth": "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth",
    "sam_vit_l_0b3195.pth": "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth",
    "sam_vit_b_01ec64.pth": "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth"
    }
    urllib.request.urlretrieve(model_urls[checkpoint_path], checkpoint_path)

    # Load SAM mannequin
    sam = sam_model_registry[self.config['model_type']](checkpoint=checkpoint_path)
    sam.to(system=self.system)

    # Create predictor
    self.predictor = SamPredictor(sam)

    logger.information(f"SAM segmenter initialized with mannequin kind {self.config['model_type']}")
    self.is_initialized = True

    besides Exception as e:
    logger.error(f"Did not initialize SAM segmenter: {e}")
    increase

    def section(self, body: np.ndarray) -> Dict[str, Any]:
    """
    Mechanically section objects in a body utilizing SAM.

    Args:
    body: Enter picture body (BGR format)

    Returns:
    Dictionary with segmentation outcomes
    """
    if not self.is_initialized:
    self.initialize()

    # Convert BGR to RGB
    frame_rgb = cv2.cvtColor(body, cv2.COLOR_BGR2RGB)

    # Set picture in predictor
    self.predictor.set_image(frame_rgb)

    attempt:
    # For automated masks era
    from segment_anything import SamAutomaticMaskGenerator

    # Create masks generator with configuration
    mask_generator = SamAutomaticMaskGenerator(
    mannequin=self.predictor.mannequin,
    points_per_side=self.config['points_per_side'],
    pred_iou_thresh=self.config['conf_threshold'],
    stability_score_thresh=0.95,
    crop_n_layers=1,
    crop_n_points_downscale_factor=2,
    min_mask_region_area=100 # Threshold for small areas
    )

    # Generate masks
    masks = mask_generator.generate(frame_rgb)

    # Course of outcomes
    return self._process_automatic_masks(masks, body.form[:2])

    besides Exception as e:
    logger.error(f"Error in automated segmentation: {e}")
    return {
    'masks': [],
    'scores': [],
    'courses': [],
    'class_names': []
    }

    def segment_by_points(self,
    body: np.ndarray,
    factors: Checklist[Tuple[int, int]],
    point_labels: Elective[List[int]] = None) -> Dict[str, Any]:
    """
    Phase based mostly on immediate factors.

    Args:
    body: Enter body (BGR)
    factors: Checklist of (x, y) coordinate tuples to make use of as prompts
    point_labels: Checklist of labels for factors (1 for foreground, 0 for background)
    If None, all factors are thought of foreground

    Returns:
    Dictionary with segmentation outcomes
    """
    if not self.is_initialized:
    self.initialize()

    # Convert BGR to RGB
    frame_rgb = cv2.cvtColor(body, cv2.COLOR_BGR2RGB)

    # Set picture in predictor if not already set
    self.predictor.set_image(frame_rgb)

    # Put together factors and labels
    if point_labels is None:
    point_labels = [1] * len(factors)

    # Convert to numpy arrays
    input_points = np.array(factors)
    input_labels = np.array(point_labels)

    # Predict masks
    masks, scores, logits = self.predictor.predict(
    point_coords=input_points,
    point_labels=input_labels,
    multimask_output=True # Return a number of masks
    )

    # Course of outcomes
    end result = {
    'masks': masks.tolist(), # Checklist of binary masks
    'scores': scores.tolist(), # Confidence scores
    'courses': [0] * len(scores), # Generic class ID for all masks
    'class_names': ['object'] * len(scores), # Generic class title
    'logits': logits # Uncooked logits for potential additional processing
    }

    return end result

    def segment_by_boxes(self,
    body: np.ndarray,
    containers: Checklist[List[int]]) -> Dict[str, Any]:
    """
    Phase based mostly on bounding containers.

    Args:
    body: Enter body (BGR)
    containers: Checklist of [x1, y1, x2, y2] bounding containers to make use of as prompts

    Returns:
    Dictionary with segmentation outcomes
    """
    if not self.is_initialized:
    self.initialize()

    # Convert BGR to RGB
    frame_rgb = cv2.cvtColor(body, cv2.COLOR_BGR2RGB)

    # Set picture in predictor
    self.predictor.set_image(frame_rgb)

    outcomes = {
    'masks': [],
    'scores': [],
    'courses': [],
    'class_names': []
    }

    # Course of every field
    for field in containers:
    # Convert to tensor format anticipated by SAM
    input_box = np.array(field)

    # Predict masks for this field
    masks, scores, logits = self.predictor.predict(
    field=input_box,
    multimask_output=True
    )

    # Take the best scoring masks
    best_idx = np.argmax(scores)

    outcomes['masks'].append(masks[best_idx])
    outcomes['scores'].append(float(scores[best_idx]))
    outcomes['classes'].append(0) # Generic class ID
    outcomes['class_names'].append('object')

    return outcomes

    def _process_automatic_masks(self, masks: Checklist[Dict], image_shape: Tuple[int, int]) -> Dict[str, Any]:
    """
    Course of masks from automated masks generator right into a standardized format.

    Args:
    masks: Checklist of masks dictionaries from SAM automated masks generator
    image_shape: (peak, width) of the unique picture

    Returns:
    Standardized segmentation end result dictionary
    """
    if not masks:
    return {
    'masks': [],
    'scores': [],
    'courses': [],
    'class_names': []
    }

    # Convert to standardized format
    binary_masks = []
    scores = []
    courses = []
    class_names = []

    # Type masks by space (largest first)
    masks = sorted(masks, key=lambda x: -x['area'])

    for idx, mask_data in enumerate(masks):
    # Convert RLE to binary masks if wanted
    if isinstance(mask_data['segmentation'], dict):
    from pycocotools import masks as mask_utils
    binary_mask = mask_utils.decode(mask_data['segmentation'])
    else:
    binary_mask = mask_data['segmentation']

    # Guarantee masks is binary and accurately sized
    binary_mask = binary_mask.astype(bool)

    # Add to outcomes
    binary_masks.append(binary_mask)
    scores.append(float(mask_data.get('predicted_iou', 1.0)))
    courses.append(0) # Generic class ID since SAM does not classify
    class_names.append(f"object_{idx}")

    return {
    'masks': binary_masks,
    'scores': scores,
    'courses': courses,
    'class_names': class_names,
    'areas': [m['area'] for m in masks],
    'stability_scores': [m.get('stability_score', 1.0) for m in masks]
    }

    This implementation offers a number of methods to section photos:

    1. Automated segmentation by way of the section() technique, which tries to determine all objects
    2. Level-based segmentation by way of segment_by_points(), which generates masks from click on prompts
    3. Field-based segmentation by way of segment_by_boxes(), which refines bounding containers into exact masks

    The box-based method is especially vital for our notion system, because it creates a pure pipeline: detection → bounding containers → segmentation masks.

    The segmentation module integrates with our YAML-based configuration system:

    # config/perception_config.yaml

    # Segmentation configuration
    segmentation:
    enabled: true
    model_type: "vit_b" # Choices: "vit_b", "vit_l", "vit_h"
    checkpoint: "default" # Or path to a customized checkpoint
    points_per_side: 32 # For automated segmentation
    conf_threshold: 0.8 # Confidence threshold for segments

    # Phase Something Mannequin configuration
    segment_anything:
    model_type: "vit_b" # Choices: "vit_b", "vit_l", "vit_h"
    checkpoint: "default" # Or path to checkpoint
    points_per_side: 32 # Larger values = extra segments
    pred_iou_thresh: 0.8 # Minimal predicted IoU for a masks
    stability_score_thresh: 0.95 # Minimal stability rating
    system: "cuda"
    optimize: true

    This configuration method offers management over:

    1. Mannequin choice: Select between totally different SAM variants (vit_b, vit_l, vit_h)
    2. Segmentation density: Management what number of segments to generate
    3. High quality thresholds: Set confidence thresholds for acceptable segments
    4. Efficiency settings: Configure for various deployment environments

    Probably the most highly effective elements of our segmentation module is the way it can improve detection and monitoring outcomes by remodeling bounding containers into exact masks.

    Right here’s how we match segmentation masks to detected/tracked objects:

    def _match_masks_to_objects(self, objects: Checklist[Dict], segmentation_results: Dict) -> Checklist[Optional[np.ndarray]]:
    """
    Match segmentation masks to detection/monitoring objects.

    Args:
    objects: Checklist of detection/monitoring objects
    segmentation_results: Segmentation outcomes with masks, scores, and so forth.

    Returns:
    Checklist of masks matched to things (None for unmatched objects)
    """
    masks = segmentation_results.get('masks', [])
    scores = segmentation_results.get('scores', [1.0] * len(masks))
    class_ids = segmentation_results.get('courses', [0] * len(masks))

    # Filter masks by confidence threshold
    threshold = self.config['segment_confidence_threshold']
    valid_masks = [
    (mask, score, class_id)
    for mask, score, class_id in zip(masks, scores, class_ids)
    if score >= threshold
    ]

    if not valid_masks:
    return [None] * len(objects)

    # Convert to arrays for simpler processing
    valid_masks, valid_scores, valid_class_ids = zip(*valid_masks)

    # Match masks to every object
    matched_masks = []

    for obj in objects:
    if 'field' not in obj:
    matched_masks.append(None)
    proceed

    # Get object properties
    field = obj['box']
    obj_class_id = obj.get('class_id', -1)

    # Calculate IoU between object field and every masks
    best_iou = 0.0
    best_mask = None

    for i, masks in enumerate(valid_masks):
    # Skip masks with totally different class ID if specified
    mask_class_id = valid_class_ids[i]
    if obj_class_id != -1 and mask_class_id != -1 and obj_class_id != mask_class_id:
    proceed

    # Calculate IoU between field and masks
    iou = self._calculate_box_mask_iou(field, masks)

    # Replace finest match
    if iou > best_iou:
    best_iou = iou
    best_mask = masks

    # Add finest matching masks
    if best_iou > 0.3: # Threshold for matching
    matched_masks.append(best_mask)
    else:
    matched_masks.append(None)

    return matched_masks

    This matching course of creates a one-to-one relationship between detected/tracked objects and segmentation masks, enhancing our notion outcomes with exact object boundaries.

    Segmentation can really enhance detection outcomes by refining bounding containers to extra exactly match objects:

    def _refine_boxes_with_masks(self, objects: Checklist[Dict], masks: Checklist[Optional[np.ndarray]]) -> Checklist[Dict]:
    """
    Refine object bounding containers utilizing segmentation masks.

    Args:
    objects: Checklist of detection/monitoring objects
    masks: Checklist of masks matched to things

    Returns:
    Checklist of objects with refined containers
    """
    refined_objects = []

    for obj, masks in zip(objects, masks):
    refined_obj = obj.copy()

    # Skip if no masks or no field
    if masks is None or 'field' not in obj:
    refined_objects.append(refined_obj)
    proceed

    # Discover masks contours to get refined bbox
    attempt:
    contours, _ = cv2.findContours(
    masks.astype(np.uint8),
    cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE
    )

    if contours:
    # Discover bounding rect of all contours
    all_points = np.concatenate(contours)
    x, y, w, h = cv2.boundingRect(all_points)

    # Replace field with refined coordinates
    refined_obj['box'] = [float(x), float(y), float(x + w), float(y + h)]

    # Replace middle and dimensions
    refined_obj['center'] = [float(x + w/2), float(y + h/2)]
    refined_obj['dimensions'] = [float(w), float(h)]
    besides Exception as e:
    logger.warning(f"Error refining field with masks: {e}")

    refined_objects.append(refined_obj)

    return refined_objects

    This refinement course of can considerably enhance the accuracy of bounding containers, particularly for objects with irregular shapes or partial occlusions.

    Segmentation masks allow extra correct evaluation of objects, together with higher distance estimation from depth maps:

    def _calculate_object_distance_with_mask(self, depth_map: np.ndarray, masks: np.ndarray) -> float:
    """
    Calculate distance to an object utilizing a segmentation masks and depth map.

    Args:
    depth_map: Depth map
    masks: Segmentation masks for the item

    Returns:
    Estimated distance
    """
    # Guarantee masks is binary
    binary_mask = masks > 0

    # Test if masks has any pixels
    if not np.any(binary_mask):
    return 1.0 # Default max distance if masks is empty

    # Extract depth values inside the masks
    masked_depth = depth_map[binary_mask]

    # If utilizing the decrease a part of the item for higher distance estimation
    if self.config['use_lower_part_for_distance']:
    # Discover lowest 30% of masks pixels (highest y values)
    y_indices = np.the place(binary_mask)[0] # Row indices
    if len(y_indices) > 10: # Guarantee sufficient pixels
    threshold_y = np.percentile(y_indices, 70) # Backside 30%
    lower_mask = np.zeros_like(binary_mask)
    lower_mask[int(threshold_y):, :] = binary_mask[int(threshold_y):, :]

    if np.any(lower_mask):
    masked_depth = depth_map[lower_mask]

    # Use specified percentile (median by default)
    distance = float(np.percentile(masked_depth, self.config['distance_percentile']))

    return distance

    This method is extra correct than utilizing the whole bounding field for distance estimation as a result of it considers solely the precise object pixels, ignoring background or occluding objects.

    Masks additionally allow extra correct colour extraction:

    def _extract_color_from_mask(self, body: np.ndarray, masks: np.ndarray) -> Checklist[int]:
    """
    Extract the dominant colour from a masked area of the picture.

    Args:
    body: Enter picture
    masks: Binary masks

    Returns:
    Dominant colour as [R, G, B]
    """
    # Guarantee masks is binary
    binary_mask = masks > 0

    # Test if masks has any pixels
    if not np.any(binary_mask):
    return [0, 0, 0] # Return black for empty masks

    # Extract masked area
    masked_pixels = body[binary_mask]

    # Apply k-means to masked pixels
    pixels = np.float32(masked_pixels)

    # Use fewer clusters for smaller masks
    okay = min(self.config['kmeans_clusters'], len(pixels) // 10 + 1)
    okay = max(1, okay) # A minimum of 1 cluster

    if len(pixels) < okay:
    # Not sufficient pixels, use common
    avg_color = np.imply(pixels, axis=0).astype(int)
    dominant_color = avg_color[::-1].tolist() # BGR to RGB
    else:
    # Apply k-means
    standards = self.kmeans_criteria
    _, labels, facilities = cv2.kmeans(
    pixels, okay, None, standards, 10, cv2.KMEANS_RANDOM_CENTERS
    )

    # Discover largest cluster
    counts = np.bincount(labels.flatten())
    dominant_cluster = np.argmax(counts)

    # Get colour of dominant cluster
    dominant_color = facilities[dominant_cluster].astype(int)
    dominant_color = dominant_color[::-1].tolist() # BGR to RGB

    return dominant_color

    This technique extracts colours solely from the precise object pixels, ignoring surrounding content material.

    The segmentation module integrates with our notion pipeline, enhancing detection, monitoring, and depth outcomes:

    # pipeline/perception_pipeline.py (excerpt)

    def process_frame(self, body: np.ndarray, timestamp: float = None) -> PerceptionResult:
    """Course of a single body via the notion pipeline."""

    # Create end result container
    end result = PerceptionResult()
    end result.frame_id = self.frame_id
    end result.timestamp = timestamp or time.time()

    # 1. Object Detection
    detections = self.detector.detect(body)
    end result.detections = detections

    # 2. Object Monitoring (if accessible)
    if self.tracker:
    tracks = self.tracker.monitor(detections, body, end result.timestamp)
    end result.tracks = tracks

    # 3. Depth Estimation (if accessible)
    if self.depth_estimator:
    depth_map = self.depth_estimator.estimate_depth(body)
    end result.depth_map = depth_map

    # 4. Segmentation (if accessible)
    if self.segmenter:
    segmentation_result = None

    # Get objects to section (tracks if accessible, in any other case detections)
    objects_to_segment = end result.tracks if end result.tracks else end result.detections

    if self.config.get('use_box_prompts', True) and objects_to_segment:
    # Extract containers for box-prompted segmentation
    containers = [obj['box'] for obj in objects_to_segment if 'field' in obj]
    if containers:
    segmentation_result = self.segmenter.segment_by_boxes(body, containers)
    else:
    # Automated segmentation (no prompts)
    segmentation_result = self.segmenter.section(body)

    end result.segmentation = segmentation_result

    # 5. Object Fusion (if accessible)
    if self.fusion:
    objects_to_fuse = end result.tracks if end result.tracks else end result.detections
    fused_objects = self.fusion.fuse_objects(
    objects_to_fuse,
    end result.depth_map,
    body,
    end result.segmentation if hasattr(end result, 'segmentation') else None
    )
    end result.fused_objects = fused_objects

    # Increment body counter
    self.frame_id += 1

    return end result

    The pipeline can use both prompt-based segmentation (utilizing bounding containers from detection/monitoring) or automated segmentation, relying on configuration.

    Visualizing segmentation masks is crucial for understanding and debugging. Right here’s how we visualize segmentation outcomes:

    def draw_segmentation(self, body: np.ndarray, segmentation_result: Dict) -> np.ndarray:
    """
    Draw segmentation masks on the body.

    Args:
    body: Enter body
    segmentation_result: Segmentation end result dictionary

    Returns:
    Body with segmentation visualization
    """
    if not segmentation_result or 'masks' not in segmentation_result:
    return body

    # Get masks and scores
    masks = segmentation_result['masks']
    scores = segmentation_result.get('scores', [1.0] * len(masks))

    # Create a duplicate of the body
    vis_frame = body.copy()

    # Apply segmentation masks overlay
    alpha = self.config['segmentation_alpha']

    # Course of every masks
    for i, (masks, rating) in enumerate(zip(masks, scores)):
    # Skip low confidence masks
    if rating < self.config.get('mask_viz_threshold', 0.5):
    proceed

    # Generate colour based mostly on masks index
    colour = self.get_color_by_id(i)

    # Create coloured masks
    colored_mask = np.zeros_like(body)
    colored_mask[mask > 0] = colour

    # Mix with authentic body
    cv2.addWeighted(vis_frame, 1.0, colored_mask, alpha, 0, vis_frame)

    # Discover contour of masks for define
    binary_mask = masks.astype(np.uint8) * 255
    contours, _ = cv2.findContours(
    binary_mask,
    cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE
    )

    # Draw contour define
    cv2.drawContours(vis_frame, contours, -1, colour, 2)

    return vis_frame

    This visualization exhibits every segmented object with a semi-transparent colour overlay and outlined contours, making it simple to see the exact object boundaries.

    Visualization of detected objects with exact segmentation masks, coloured by object ID.

    Right here’s a standalone instance of easy methods to use the segmentation module:

    import cv2
    import numpy as np
    from notion.segmentation.segment_anything import SegmentAnythingModel

    # Create segmenter
    segmenter = SegmentAnythingModel({
    'model_type': 'vit_b',
    'points_per_side': 32
    })

    # Load a picture
    picture = cv2.imread('test_image.jpg')

    # Automated segmentation
    seg_result = segmenter.section(picture)
    print(f"Discovered {len(seg_result['masks'])} segments")

    # Visualize masks
    alpha = 0.5 # Transparency for masks overlay
    vis_image = picture.copy()

    for i, masks in enumerate(seg_result['masks']):
    # Generate colour for this masks
    color_id = i % 20 # Cycle via 20 colours
    hue = color_id / 20
    colour = [int(c * 255) for c in colorsys.hsv_to_rgb(hue, 0.8, 1.0)]
    colour = [color[2], colour[1], colour[0]] # Convert to BGR

    # Create coloured masks
    mask_colored = np.zeros_like(picture)
    mask_colored[mask > 0] = colour

    # Overlay on picture
    cv2.addWeighted(vis_image, 1.0, mask_colored, alpha, 0, vis_image)

    # Draw contour
    contours, _ = cv2.findContours(
    masks.astype(np.uint8),
    cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE
    )

    cv2.drawContours(vis_image, contours, -1, colour, 2)

    # Show end result
    cv2.imshow('Segmentation', vis_image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

    And right here’s easy methods to use segmentation with bounding containers from detection:

    # First, run detection
    from notion.detection.yolo_detector import YOLODetector
    detector = YOLODetector({'model_size': 's'})
    detections = detector.detect(picture)

    # Extract bounding containers
    containers = [det['box'] for det in detections]

    # Run box-prompted segmentation
    seg_result = segmenter.segment_by_boxes(picture, containers)

    # Visualize the outcomes
    vis_image = picture.copy()

    for i, (det, masks) in enumerate(zip(detections, seg_result['masks'])):
    # Get object information
    x1, y1, x2, y2 = [int(c) for c in det['box']]
    class_name = det['class_name']

    # Generate colour
    colour = [int(c * 255) for c in colorsys.hsv_to_rgb(i/len(detections), 0.8, 1.0)]
    colour = [color[2], colour[1], colour[0]] # BGR

    # Draw segmentation masks
    mask_colored = np.zeros_like(picture)
    mask_colored[mask > 0] = colour
    cv2.addWeighted(vis_image, 1.0, mask_colored, 0.5, 0, vis_image)

    # Draw label
    cv2.putText(vis_image, class_name, (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, colour, 2)

    cv2.imshow('Field-prompted Segmentation', vis_image)
    cv2.waitKey(0)

    Segmentation is computationally intensive, particularly with bigger fashions. Listed below are optimizations to contemplate:

    1. Mannequin choice: SAM provides totally different mannequin sizes (vit_b, vit_l, vit_h)
    2. Immediate-driven segmentation: Use field prompts as a substitute of automated segmentation
    3. Decision management: Course of at lowered decision for sooner inference
    4. Batch processing: Group a number of prompts for parallel processing
    5. Skip frames: Apply segmentation each N frames for video

    These choices might be configured based mostly on the deployment atmosphere:

    # Excessive-accuracy configuration (desktop/server)
    config_high_accuracy = {
    'model_type': 'vit_h',
    'system': 'cuda',
    'points_per_side': 64
    }

    # Balanced configuration (laptop computer)
    config_balanced = {
    'model_type': 'vit_l',
    'system': 'cuda',
    'points_per_side': 32
    }

    # Useful resource-constrained configuration (edge system)
    config_resource_constrained = {
    'model_type': 'vit_b',
    'system': 'cuda',
    'points_per_side': 16,
    'use_box_prompts': True
    }

    Probably the most highly effective elements of our notion system is the mixture of segmentation and depth estimation, which permits 3D scene understanding on the pixel degree.

    Right here’s how we are able to calculate per-pixel 3D coordinates for a segmented object:

    def calculate_object_3d_points(self, 
    depth_map: np.ndarray,
    masks: np.ndarray,
    camera_intrinsics: np.ndarray) -> np.ndarray:
    """
    Calculate 3D factors for pixels inside a segmentation masks.

    Args:
    depth_map: Depth map
    masks: Binary segmentation masks
    camera_intrinsics: Digital camera intrinsic matrix

    Returns:
    Nx3 array of 3D factors in digital camera coordinates
    """
    # Guarantee masks is binary
    binary_mask = masks > 0

    # Get pixel coordinates inside the masks
    y_indices, x_indices = np.the place(binary_mask)

    # Skip if masks is empty
    if len(y_indices) == 0:
    return np.array([])

    # Extract depth values for these pixels
    depth_values = depth_map[binary_mask]

    # Digital camera parameters
    fx = camera_intrinsics[0, 0]
    fy = camera_intrinsics[1, 1]
    cx = camera_intrinsics[0, 2]
    cy = camera_intrinsics[1, 2]

    # Calculate 3D coordinates
    points_3d = np.zeros((len(x_indices), 3))

    for i in vary(len(x_indices)):
    x = x_indices[i]
    y = y_indices[i]
    z = depth_values[i]

    # Convert from picture coordinates to digital camera coordinates
    points_3d[i, 0] = (x - cx) * z / fx # X
    points_3d[i, 1] = (y - cy) * z / fy # Y
    points_3d[i, 2] = z # Z

    return points_3d

    This creates a dense 3D level cloud for every segmented object, which can be utilized for functions like object modeling, 3D reconstruction, and exact collision detection.

    Whereas segmentation is highly effective, it’s vital to know its limitations:

    1. Computational value: Extra resource-intensive than detection or monitoring
    2. Ambiguity: Some object boundaries could also be inherently ambiguous
    3. Fantastic particulars: Might battle with very nice particulars like skinny buildings
    4. Temporal consistency: Masks might flicker or change between frames
    5. Classification: SAM offers masks however not semantic courses

    For functions requiring each exact segmentation and classification, think about combining SAM with a classifier or utilizing detection outcomes to label the segments.

    There are a number of methods to increase and enhance the segmentation module:

    1. Temporal consistency: Add monitoring of segments throughout frames
    2. Semantic integration: Mix with semantic segmentation for sophistication labels
    3. Hierarchical segmentation: Deal with part-whole relationships
    4. Interactivity: Add refinement based mostly on consumer suggestions
    5. Area adaptation: Fantastic-tune for particular environments

    Right here’s a sketch of how temporal section monitoring might be carried out:

    class TemporalSegmentTracker:
    """Monitor segmentation masks throughout frames."""

    def __init__(self, config=None):
    self.config = config or {}
    self.prev_segments = None
    self.segment_tracks = {} # ID -> monitor historical past
    self.next_id = 0

    def track_segments(self,
    current_masks: Checklist[np.ndarray],
    current_frame: np.ndarray) -> Checklist[Dict]:
    """
    Monitor segmentation masks throughout frames.

    Args:
    current_masks: Present body section masks
    current_frame: Present body

    Returns:
    Checklist of tracked segments with IDs
    """
    # First body case
    if self.prev_segments is None:
    tracked_segments = []
    for masks in current_masks:
    segment_id = self.next_id
    self.next_id += 1

    tracked_segments.append({
    'masks': masks,
    'id': segment_id
    })

    self.prev_segments = tracked_segments
    return tracked_segments

    # Match present masks to earlier masks
    matches = self._match_masks(self.prev_segments, current_masks, current_frame)

    # Create tracked segments with preserved IDs the place attainable
    tracked_segments = []
    for i, masks in enumerate(current_masks):
    if i in matches:
    # Matched to a earlier section
    prev_idx = matches[i]
    segment_id = self.prev_segments[prev_idx]['id']
    else:
    # New section
    segment_id = self.next_id
    self.next_id += 1

    tracked_segments.append({
    'masks': masks,
    'id': segment_id
    })

    # Replace monitor historical past
    if segment_id not in self.segment_tracks:
    self.segment_tracks[segment_id] = []

    # Calculate centroid
    y_indices, x_indices = np.the place(masks)
    if len(x_indices) > 0:
    centroid = (np.imply(x_indices), np.imply(y_indices))
    self.segment_tracks[segment_id].append(centroid)

    # Replace earlier segments
    self.prev_segments = tracked_segments

    return tracked_segments

    def _match_masks(self,
    prev_segments: Checklist[Dict],
    current_masks: Checklist[np.ndarray],
    current_frame: np.ndarray) -> Dict[int, int]:
    """
    Match present masks to earlier masks.

    Args:
    prev_segments: Earlier body's tracked segments
    current_masks: Present body's masks
    current_frame: Present body

    Returns:
    Dictionary mapping present masks index to earlier section index
    """
    # Calculate IoU matrix
    iou_matrix = np.zeros((len(current_masks), len(prev_segments)))

    for i, curr_mask in enumerate(current_masks):
    for j, prev_segment in enumerate(prev_segments):
    prev_mask = prev_segment['mask']
    iou = self._calculate_mask_iou(curr_mask, prev_mask)
    iou_matrix[i, j] = iou

    # Use Hungarian algorithm for optimum project
    from scipy.optimize import linear_sum_assignment
    row_indices, col_indices = linear_sum_assignment(-iou_matrix) # Detrimental for max IoU

    # Create matches dictionary
    matches = {}

    for i, j in zip(row_indices, col_indices):
    # Guarantee IoU is above threshold
    if iou_matrix[i, j] >= 0.5: # IoU threshold
    matches[i] = j

    return matches

    def _calculate_mask_iou(self, mask1: np.ndarray, mask2: np.ndarray) -> float:
    """Calculate IoU between two masks."""
    intersection = np.logical_and(mask1, mask2).sum()
    union = np.logical_or(mask1, mask2).sum()

    if union == 0:
    return 0.0

    return intersection / union

    This tracker would keep constant IDs for segments throughout frames, enabling temporal evaluation and movement monitoring on the masks degree.

    On this article, we’ve explored the segmentation module of our laptop imaginative and prescient notion system. We’ve seen how segmentation:

    1. Enhances notion with pixel-precise object boundaries
    2. Integrates with detection and monitoring to refine bounding containers
    3. Combines with depth estimation for 3D scene understanding
    4. Could be carried out effectively utilizing the Phase Something Mannequin

    Segmentation represents the head of 2D scene understanding, offering probably the most detailed illustration of objects attainable within the picture aircraft. When mixed with the opposite modules in our notion system — detection, monitoring, and depth estimation — it permits a complete understanding of the visible world.

    The whole notion system we’ve constructed all through this collection is a strong basis for functions in robotics, autonomous autos, surveillance, augmented actuality, and lots of different fields that require machines to know the visible world.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleThe One Mistake Is Putting Your Brand Reputation at Risk — and Most Startups Still Make It
    Next Article What Is Open on Easter? Walmart, Whole Foods, Wegmans, More
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025

    July 1, 2025
    Machine Learning

    Meanwhile in Europe: How We Learned to Stop Worrying and Love the AI Angst | by Andreas Maier | Jul, 2025

    July 1, 2025
    Machine Learning

    Handling Big Git Repos in AI Development | by Rajarshi Karmakar | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025

    July 1, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    IEEE-HKN Marks 120th Anniversary With Hackathon

    April 9, 2025

    Leveraging Data to Predict Myocardial Infarction: A Machine Learning Approach | by Raonak Shukla | Jan, 2025

    January 14, 2025

    Why Most Cyber Risk Models Fail Before They Begin

    April 24, 2025
    Our Picks

    Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025

    July 1, 2025

    How Smart Entrepreneurs Turn Mid-Year Tax Reviews Into Long-Term Financial Wins

    July 1, 2025

    Become a Better Data Scientist with These Prompt Engineering Tips and Tricks

    July 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.