Microsoft’s Project Oxford Gives Developers Access To Facial, Image And Speech-Recognition APIs

Microsoft quietly launched a set of new machine-learning APIs in beta under the “Project Oxford” moniker yesterday. These new APIs allow developers to add face detection and recognition features to their apps, as well as speech recognition with the ability to understand the speaker’s intent. The project also features a vision API for automatically categorizing images and creating smart image crops that always put the subject into the center of the cropped images.

These three services are now available as a public beta. There’s also a fourth API that lets developers build custom language understanding into their applications.

Previously, Microsoft offered a set of somewhat similar APIs under the Bing brand. Bing offers a speech and translator API, for example, but for the most part, these Bing services are somewhat more basic and search-focused than the Project Oxford tools.

To showcase Project Oxford’s Face API, Microsoft built How-Old.net. This site lets you upload photos of faces and then it automatically figures out how old the person in that photo is. It’s a nice demo — and works somewhat well — but it does involve working with a number of other machine-learning services. Right out of the box, this API offers face detection in images, face verification to check whether two faces belong to the same person, and the ability to find similar-looking faces.

The Speech API, as the name implies, offers speech-recognition services for speech-to-text conversion, as well as a text-to-speech service that turns written text into audio. More interestingly, though, it also features intent recognition. The idea here is to allow application to understand the speaker’s intent (order a burrito, cancel a flight, etc.). This is driven by the project’s Language Understanding Intelligent Service.

Using the image API, developers can categorize images to filter out adult content, for example, or to simply automatically apply tags to images or group them into clusters. The API also features optical character recognition capabilities and lets developers crop images automatically by recognizing what’s important in an image and keeping that in the center of the photo as you crop it.

Even if you’re not a developer, you can give some of these features a try here.



from TechCrunch http://feedproxy.google.com/~r/Techcrunch/~3/H12gJD3wN1A/
via IFTTT

0 коммент.:

Отправить комментарий