ADAS & Autonomous Driving

Role

Senior Product Manager

Company

diconium

Domain

Automotive Software / ADAS

Timeline

November 2023 – Present

Strategic framing

The labeled dataset is the product.

The product

Labeled datasets — structured, versioned, quality-validated
Delivered to spec, with defined quality thresholds and release cycles
Continuously improved based on model performance feedback

The customers

ML engineers on the client (OEM) side
Their success metric: model accuracy in real-world conditions
Their dependency: the quality and volume of data I deliver

My role

Own end-to-end delivery — from raw data to validated dataset
Translate OEM requirements into supplier-ready specifications
Every prioritisation call, process improvement, and supplier decision serves one outcome: data the model can learn from

Context & Challenge

Autonomous driving doesn't happen without data — and not just any data. The Machine Learning models that power Advanced Driver Assistance Systems (ADAS) are only as reliable as the labeled datasets they are trained on. At diconium, I own the end-to-end delivery of these labeled datasets, coordinating a global network of suppliers to ensure the right data reaches the right place, at the right quality and volume, on time.

The core challenge is one of precision at scale: ML models must perceive the environment around a vehicle with enough accuracy to make safety-critical decisions. That means the data feeding those models — camera frames, LiDAR point clouds, radar sequences — must be labeled with surgical accuracy, governed by strict technical requirements, and continuously refined as the model evolves.

How the ML Pipeline Works — and Where I Fit

The labeling workflow begins long before a single image is annotated. An AI Developer and a Requirements Specialist jointly define the technical specifications for what the model needs to learn. From there, data is collected from specially equipped test vehicles, then selected, pre-processed, and sent for labeling. My role sits across this entire chain: translating engineering requirements into supplier briefs, validating quality at each stage, and ensuring datasets are delivered within agreed timelines and thresholds.

Data preparation involves several decision points: choosing between real-world recorded data and synthetic data, applying filters to extract the most relevant edge cases, anonymizing faces and license plates, and generating pre-labels — AI-assisted detections that reduce manual effort while still requiring human validation. Each of these steps involves trade-offs between cost, speed, and accuracy, and I work closely with engineering and supplier teams to navigate them.

Case Study: Traffic Sign Recognition

Traffic sign recognition is one of the most technically demanding labeling tasks in the ADAS space — and a strong example of how I approach complex, multi-stakeholder challenges. Signs vary by country, condition, and context. A speed limit sign partially obscured by a branch, a faded warning sign, or a sign that applies only to other road users — all of these require precise, rule-based annotation that accounts for local legislation and real-world degradation.

The labeling output for traffic signs is not a single annotation but a structured set of properties: sign type and action category (restriction, warning, directional), text and numeric values (e.g. speed limit), relevance to the Ego vehicle, and visible defects such as faded paint or graffiti. Getting this right requires highly trained annotators, well-maintained tooling, and clear, unambiguous requirement documentation — all areas I actively manage.

On the model side, an effective architecture for traffic sign recognition uses two ML models in parallel: one for detection (identifying where a sign is in the camera frame, drawing on input from multiple sensors to resolve position accurately) and one for classification (determining exactly what kind of sign it is, trained on image crops rather than full-frame data). The labeled data I deliver feeds both. Bridging that technical architecture with the operational reality of supplier workflows — different tools, formats, and expertise levels across teams — is where a lot of the product management work happens.

Iteration, Feedback Loops & Prioritization

One of the core rhythms of this work is the model-data feedback loop. After each training cycle, engineers evaluate the model's performance against agreed threshold values. Where the model underperforms — producing false positives, missing detections, or misclassifying edge cases — I work with the development team to prioritize which data gaps to address first, whether through targeted extraction from the existing data bank or by commissioning new collection campaigns.

This requires constant stakeholder alignment: engineering teams have performance targets, suppliers have capacity constraints, and the product roadmap has delivery commitments. My job is to hold all of those in balance — making data-driven decisions about where to invest labeling effort, and communicating clearly across all parties so that the iteration cycle stays efficient and no team is blocked.

Tooling & Operational Efficiency

Beyond the content of the data, I focus heavily on the health of the system that produces it. Using Databricks and SQL, I monitor dataset performance metrics and track delivery status across suppliers. I have also introduced Python-based automation for documentation workflows — reducing manual reporting overhead and giving the team faster visibility into bottlenecks.

These operational improvements aren't just about efficiency. In a domain where the quality of labeled data directly impacts vehicle safety, having reliable, real-time visibility into what's been delivered, validated, and integrated is a product requirement in itself.

Impact & Measurable Outcomes

Across the lifetime of this engagement, the data pipeline has matured from a fragile, high-effort process into a scalable, repeatable system. Below are the outcomes I can point to directly.

~2 weeks ↓ from 1–2 months

New country onboarding cycle time

Reusable specification templates per region
Supplier pre-qualification process established
Streamlined QA protocols for new geographies

>98% ↑ from 94–95%

Data quality rate — sustained across all conditions

Consistent across weather, country, and driving context
Directly reduces risk of model degradation
Maintained at scale as volumes more than doubled

>2× ↑ labeled volume

Dataset throughput growth

Volumes more than doubled from initial delivery levels
Achieved through smarter supplier coordination
Pre-labeling automation reduced manual effort per frame

Automated ↓ manual overhead

Technical specification deployment

Helper documents now generated via Python script
Previously created by hand for every spec release
Eliminated a recurring time cost and class of human error

How I Prioritise & Key Decisions

In a pipeline that feeds safety-critical systems, prioritisation isn't just a planning exercise — it's an active, ongoing negotiation between engineering ambition, supplier capacity, and delivery reality. Two recurring challenges shaped how I approach this work.

⚡

Managing last-minute requests without breaking the system

Recurring challenge — pre-test-drive urgency

Before critical test drives, stakeholders frequently raise urgent, unplanned requests to label a specific dataset — disrupting live supplier workflows and threatening broader delivery timelines.

My approach

Supplier relationships first. Years of trust-based, respectful communication mean suppliers are genuinely willing to absorb urgent work — not because they have to, but because the relationship makes it worth it.
A limited-offer policy for stakeholders. Each stakeholder can raise a last-minute request a maximum of two to three times per year. This creates a natural forcing function: if you only have two "emergency cards," you think carefully before playing one.
The outcome: labeling teams are protected from chaotic, unpredictable schedules — and when a situation is genuinely critical, stakeholders still get what they need, on time.

◈

Distributing work to match team strengths, not just capacity

Ongoing — raw data allocation across supplier teams

Not all labeling suppliers are equal — and treating them as interchangeable leads to quality inconsistencies, missed deadlines, and supplier frustration. Raw data is distributed with intention.

My approach

Allocate by strength, not just availability. Each supplier team has distinct areas of expertise — domain knowledge, annotation speed, familiarity with specific sign categories or geographies. Data is routed accordingly.
Milestone awareness at all times. Allocation decisions are always made in the context of the overall program timeline — balancing individual supplier throughput against the broader delivery commitments.
The outcome: better quality at the team level, fewer corrections, and a supplier ecosystem that performs consistently because each team is set up to succeed.

Phase

Data Collection

Real-world data captured by specially equipped test vehicles using cameras, LiDAR, and radar sensors. Collection parameters are scoped per technical requirements — covering specific environments, weather conditions, and traffic scenarios identified as priority gaps in the ML model.

Camera · LiDAR · RadarRequirements-drivenTest vehicle fleet

Phase

Data Post-Processing

Raw sensor data is converted, filtered, and prepared for labeling. Key steps include frame-to-sequence conversion, preparation of 2D reference images for 3D LiDAR data, generation of AI-assisted pre-labels to reduce manual effort, and mandatory anonymization of all faces and license plates in compliance with privacy regulations.

AnonymizationPre-labeling (AI-assisted)Format conversionGDPR compliance

Phase

Data Labeling

Skilled annotators label all objects, participants, and environmental elements according to strict technical specifications — including traffic signs, road markings, pedestrians, and vehicles. Outputs are structured annotation files (ALF format) validated against quality thresholds before dataset sign-off.

Traffic signs2D / 3D annotationQuality validationSupplier-managed

Phase

ML Model Training

Labeled datasets are fed into two parallel ML models: a detection model trained on full-frame camera data to locate and position objects in the scene, and a classification model trained on image crops to identify object types with precision. Both models are trained iteratively until performance meets agreed threshold values.

Detection modelClassification modelThreshold-based QAIterative training

Phase

Deployment — Embedded Environment

Trained models are deployed into a controlled embedded environment simulating the onboard compute of a production vehicle. Performance is benchmarked against defined parameters: detection accuracy, false-positive rates, and response latency under various simulated conditions.

Embedded computeControlled environmentPerformance benchmarking

Phase

Review & Targeted Improvements

Engineering teams analyse model performance, identify failure cases and underperforming scenarios, and prioritise the next data labeling sprint to address gaps. Additional targeted datasets are commissioned, labeled, and re-introduced into the training cycle.

Failure analysisData gap identificationSprint prioritisation

Phase

Re-deployment & Approval Gate ✓

Updated models are re-deployed in the embedded environment and re-evaluated against all performance parameters. If thresholds are met, the model receives formal approval to proceed to real-world vehicle testing. If not, the improvement cycle repeats until approval is granted.

Approval gateGo / No-go decisionStakeholder sign-off

Phase

Test Vehicle Deployment & Route Testing

Approved models are deployed onto instrumented test vehicles and validated on pre-defined real-world driving routes — not a closed track, but a real traffic environment with agreed parameters. This phase captures edge cases and real-world variance that simulated environments cannot replicate.

Real-world scenarioPre-defined routeInstrumented vehiclesLive traffic conditions

Phase

Feedback Collection & Improvement Backlog

Driving test data is analysed, edge cases are documented, and a prioritised improvement backlog is compiled. Findings feed directly back into data collection requirements, closing the loop and initiating the next cycle with a sharper focus on real-world gaps.

Edge case analysisBacklog refinementRequirements update

↻ Cycle repeats — continuous improvement loop

ADAS Data Labeling Traffic Sign Recognition ML Pipeline Supplier Management SQL · Databricks Python Automation Stakeholder Alignment Automotive

Back to portfolio

ADAS & Autonomous Driving Data

The labeled dataset is the product.

Context & Challenge

How the ML Pipeline Works — and Where I Fit

Case Study: Traffic Sign Recognition

Iteration, Feedback Loops & Prioritization

Tooling & Operational Efficiency

Impact & Measurable Outcomes

How I Prioritise & Key Decisions