Anno.Ai CEO Steven Witt Interview


I first met Steven Witt a few years ago at a DataTribe event. Steven had recently sold his previous company, Onyara and helped launch DataTribe as a co-founder. I got to interact with him on a variety of cyber investing and cyber product topics. After helping get DataTribe going with their first group of companies, he expressed interest in being a CEO again and creating a company to make it easier to train computer vision models. I was familiar with this pain point, as our investment in Pixm heavily relied on proprietary curation of logo data to detect phishing attacks. Steven recently launched Anno.Ai and closed his first round of investment of which we participated. Steven agreed to answer a variety of questions of computer vision training and Anno.Ai's approach. 
How does help a development team make a better computer vision application or product? 
Machine learning is in a transition phase as enterprises begin to operationalize the wealth of research advancements coming out of academia, government, and Silicon Valley. One of the challenges is that there are few tools that exist to handle the scale of data needed to operationalize machine learning in the enterprise. Thus, we think of as an AI Ops approach to facilitate the development, deployment, and iteration of machine learning models.

This seems useful to both speed up the actual development process and help diagnose bugs that could emerge once an application is live. Do you have an examples of how teams who did this type of work in-house ran into issues when they got to scale? 
I have an example of a team that couldn’t get to operational scale prior to leveraging Anno. They were a part of an enterprise with proprietary data that was needed to build a training dataset. Due to intellectual property concerns, they needed to keep the data on-prem for annotation by their own internal subject matter experts. As a first attempt, they used the standard open source libraries and python scripts. The issue was that this was good for a handful of people building a training dataset across a few hundred images. It wasn’t a scalable approach in the desired timeline for the hundreds of thousands of images needed to achieve the model accuracy that they were looking for. Once they switched to the Anno framework, their internal team of a few hundred produced the training data that they needed within three days.

I've used the web interface to upload and tag family photos and it was very easy to use and get results. How often are you running into computer vision teams who are distributed or have even outsourced the tagging of images? 
Outsourced tagging has been the approach across industry to achieve the trifecta of scale, cost, and timeliness for the initial wave of image-based machine learning models. This is absolutely the correct economic approach when you are working on autonomous driving. We are focused on enterprises building computer vision models where this outsourcing approach is not applicable. These use cases cluster around sensitive, proprietary data and where specialized domain knowledge is needed to create the annotations. A good example is a dentist that annotates a cavity on an X-ray.

Do customers have any concerns of letting proprietary or perhaps end user private photos being sent to the cloud? Do you have an on-prem solution on your roadmap?
Absolutely, yes. This is a critical issue. It’s one of the reasons that we designed Anno to be fully functional in a container for on-prem. supports some extremely large image sizes making it easy to work with medical, satellite and many other high-resolution data sources.  Can you tell us a bit about the applications you are trying to enable? 
I would broadly describe the applications along the lines of enabling machines to become more efficient and effective assistants for humans. Let’s take satellite imaging. This is an area where there this is an overwhelming amount of data increasingly available as more-and-more satellites are launched with ever-improving resolution. As an naval imagery analyst: would you rather spend your days scanning imagery of the ocean looking for a specific ship or would you rather use a machine learning model to track the ship, so that you can focus on generating analytic insights about changes in the ship’s appearance or operating behavior ?

How soon do you expect services like Ring to allow users to tag images as entities so we can have better logging with alerts like "Jerry's Car is now in the driveway" and "Jerry is in the kitchen" as well as "Unknown person at front door"? 
I am not sure about Ring or Nest, but hopefully this is a good nudge for some of our more entrepreneurial friends who have built solutions like this for their own homes! I think that personalized machine learning models retrained on your own data is where we will head with consumer tech.

What type of roadmap can you speak about publicly? Does have the underlying architecture to handle video annotation? 
In the next couple of months, we will start supporting basic video and audio annotation through our website. From a technical perspective, a basic video annotation approach for machine learning is to treat every few frames in a video as a standalone image. This will work for things like facial recognition in security video. However, we are also working with customers to support far more complex video-based machine learning use cases with a much longer-term development roadmap.

Where can readers go to learn more? 
If you have imagery or audio machine learning training data that could be described as sensitive, proprietary, or regulated, the best way to learn more is to send us an email: .