Microsoft has developed a new image captioning algorithm that exceeds human precision in some limited tests. The AI system was used to update the company’s visually impaired assistant application, Seeing AI,. And will soon be integrated with other Microsoft products such as Word, Outlook, and PowerPoint. There it will be used for tasks such as creating alt text for images – a particularly important function for increasing accessibility.
“Ideally, everyone would include alt text for all images in documents, on the web, on social media,. As this allows the blind to access the content and participate in the conversation”. Saqib Shaikh said. , head of software engineering at Microsoft. The AI team in a press release. “But, alas, people don’t. So there are several apps that use picture captioning to fill in alt text when it is missing. “
These apps include Microsoft’s Seeing AI, which the company first launched in 2017. Seeing AI uses computer vision to describe the world seen through a smartphone camera for the visually impaired. He can identify household items, read and scan text, describe scenes, and even identify friends. It can also be used to describe images in other apps including email clients, social media apps, and messaging apps like WhatsApp.
Microsoft’s new image caption algorithm will significantly improve the performance of Seeing AI, as it can not only identify objects, but also more accurately describe the relationship between them. Therefore, the algorithm can look at an image and not only tell what elements and objects it contains (eg, “A person, a chair, an accordion”) but how they interact (eg, “A person is sitting in a chair and play an accordion ”). Microsoft says the algorithm is two times better than its previous image captioning system, in use since 2015.
The algorithm, which was described in a pre-printed article published in September, achieved the highest scores in an image caption benchmark known as “nocaps.” This is an industry leading marker for image captions, although it has its own limitations.
The nocaps benchmark consists of more than 166,000 human-generated captions describing some 15,100 images taken from the Open Images Dataset. These images span a variety of settings, from sports to vacation snapshots, food photos, and more. (You can get an idea of the combination of images and captions by exploring the knockout dataset here or by looking at the gallery below.) The algorithms are tested on their ability to create captions for these images that match those of humans.
However, it is important to note that non-capitalized cue points capture only a small part of the complexity of image captions as a general task. Although Microsoft states in a press release that its new algorithm “describes images as well as people”. This is only true to the extent that it applies to a very small subset of images contained in knockouts.
As Harsh Agrawal, one of the benchmark’s creators, told krafitis via email: “Outperforming human knockouts is not an indicator that captions in images are a solved issue”. Agrawal noted that the metrics used to evaluate knockout performance “only roughly correlate with human preferences”. And that the benchmark itself “only covers a small percentage of all possible visual concepts”.
Other News :- Next Generation of Office Communication Technology
“As is the case with most benchmarks, [the] nocaps benchmark is just a rough indicator of the performance of the models on the task,” Argawal said. “Outperforming human knockouts in no way indicates that AI systems outperform humans in understanding images.”
This issue, assuming that performance at a specific benchmark can be extrapolated as performance on the underlying task more generally,. Is common when it comes to overstating AI capabilities. In fact, Microsoft has been criticized by researchers in the past for making similar claims about the ability of its algorithms to understand the written word.
However, subtitling images is a task that has seen great improvements in recent years thanks to artificial intelligence. And Microsoft’s algorithms are undoubtedly state-of-the-art. In addition to integrating into Word, Outlook, and PowerPoint,. The AI image title will also be available as a standalone template through the Microsoft cloud and the Azure AI platform.