Cobli
Jul 2024 — Jan 2026
A dashcam product where the algorithm wasn't ours to change
Cobli is a Brazilian fleet telematics company — GPS trackers and dashcams installed in vehicles, with a SaaS platform that turns the raw data into operational intelligence for logistics and transport companies.
I joined as BU leader of the dashcam product and PM of the AI squad, responsible for the features built on top of the dashcam hardware. The dashcam itself ran an embedded computer vision algorithm supplied by the hardware vendor — detecting risky driving events like unsafe following distance, drowsiness, and driver distraction. We had no ability to modify the underlying firmware. The algorithm was a black box we depended on.
The alerts were generating noise, not trust
The false positive rate on the embedded algorithm was high enough that customers had stopped trusting the detections entirely. Operations teams were receiving hundreds of alerts per day. Safety managers were manually reviewing every flagged video to decide whether an event was real. Some accounts were marking the majority of alerts as irrelevant.
The product's core value proposition — AI-powered safety monitoring — had effectively become a manual review queue. The more alerts the algorithm produced, the more human time it consumed, and the less value it delivered.
And we couldn't fix the source. Every path that involved modifying the embedded algorithm required going through the hardware vendor — a timeline of months per iteration.
Starting with measurement, not a solution
The first thing I did was establish baseline precision through manual video labeling — building a dataset of real events and false positives that let us actually quantify what we were dealing with. Without that, every conversation about the problem was impressionistic.
I then built a prioritization framework combining three factors for each event type: precision baseline (how bad was the false positive rate?), engineering feasibility (what approaches were available to improve it?), and customer-reported importance (what did interview data and event review rates tell us about which detections customers cared most about?).
The unsafe distance model came out first on all three dimensions — the lowest precision, the highest complaint rate in customer interviews, and technically approachable through depth-inferring models available in the cloud. We started there.
The squad ran experiments with multiple approaches and iterated through model versions. The labeled dataset we'd built for baseline measurement became the training and evaluation set for everything that followed.
A cloud validator between the firmware and the customer
The architecture we arrived at: a cloud-side validation layer that sits between the embedded algorithm and the customer-facing interface. Events flagged by the dashcam firmware are sent to the cloud validator before they're surfaced to customers. The cloud model re-evaluates each event and filters out the ones it classifies as false positives.
The cloud validator reached 95% precision and 75% recall on the unsafe distance model — meaning it was both accurate in what it passed through and catching the majority of false positives the firmware was generating.
A side effect we planned for from the start: the labeling process generated a structured dataset of real driving events that didn't exist before. That dataset is the foundation for training future firmware-embedded models — the path to eventually owning what we were dependent on, rather than filtering around it indefinitely.
The numbers that followed
~70% reduction in false positives surfaced to customers on the validated event types
Customer review behavior shifted — accounts that had been ignoring alerts resumed active safety management
Labeled dataset established as the foundation for the firmware model roadmap — the long-term dependency reduction path
The UX changes to show customers only cloud-validated events were in rollout at the time I left. The labeled dataset is actively used by the ML team for future model development.
What I specifically did
- Initiated and ran the video labeling exercise to establish baseline precision — the measurement that made every subsequent decision concrete
- Built the event-type prioritization framework and gathered input from Key Account CSMs to inform the prioritization decision
- Defined the cloud validator architecture as the strategic response to the firmware dependency — the choice to build on top rather than waiting for vendor iteration
- Worked as PM of the squad through the model iterations that reached high precision and recall
Other significant work in this role
- Facial recognition precision improvement — Improved driver identification precision from ~80% to 98% in production. Root cause was poor image quality, identified through iterative dataset labeling and testing across different facial recognition models. Solution was to detect and filter low-quality images before inference. A churning enterprise account stabilized after rollout.
- UX-retention correlation analysis — Self-initiated cross-squad project: ran a 12-month cohort analysis correlating product feature usage combinations with net revenue retention. Found meaningful correlations, proposed a "breadth of usage" metric adopted company-wide. UX teams gained roadmap space in the following planning cycle.