OCR Support in Office 365

Last year at Ignite, Microsoft announced OCR capability using “computer vision technology” would be coming to Office 365! This stirred up a lot of excitement as this was a feature many people have been wanting for years! Microsoft posted an article at techcommunity about the new advancement in Intelligent Search using OCR which can be found here. Well I am happy to say it exists and it sure is awesome!

Supported Types

Per the techcommunity article above, the supported types are “bmp”, “png”, “jpeg”, “jpg”, “gif”, “tif”, “tiff”, “raw”, and also “arw”, “cr2”, “crw”, “erf”, “mef”, “mrw”, “nef”, “nrw”, “orf”, “pef”, “rw2”, “rw1”, “sr2”.

To test this capability, I have uploaded a mock up design for a Help Desk Add-In I have built for Office 365 into a Document library.

SupportTicket

The jpg looks like this

HelpDesk

You’ll notice inside the image there are 4 tickets have been mocked up with a bit of bacon ipsum. The ticket titles are “Trouble with SharePoint”, “Computer not Working”, “No Trouble Just Saying Hi” and “Trouble with SharePoint” again.  My plan to test out OCR is to search for the contents of that Title using SharePoint search!

Test: Search for Ticket values

My first test will be trying to find one of the tickets labeled “Trouble with SharePoint” and look at the results! Not only does it match with the values in the image, but it’s also picking up the other ticket values as well.

 

SearchResult

It should be no surprise that when I search for some of the description of the tickets, that they should return in search results as well. I’ve decided to search for the first sentence of the description in the mock up. Here is the result!

SearchResult2Does it work in Modern too?

In my previous examples I was using SharePoint Classic Search. If you were wondering if it works in Modern search as well, you bet!

modernsearchHow does it work?

My guess is that OCR in SharePoint is using Azure Media Services to convert text content in digital files into digital text. The reason for assuming this is because of the following naming convention used in SharePoint.

Whenever SharePoint finds text within your images, the values get stored on the item in a field called MediaServiceOCR. Take a look at the JSON response from querying for the list item.

MediaServiceOCR

I wasn’t able to find a default managed property for this field but that isn’t a huge problem because SharePoint automatically creates a crawled property called ows_MediaServiceOCR. Using this crawled property, I can create whatever managed property mappings that I want.

Crawled.png

 

Some Comments

OCR PDFs have native support in Office 365. However, scanned documents which are PDFs currently aren’t generating values in the MediaServiceOCR column. I’ve been testing this functionality with no success — yet.

I have noticed some inconsistencies with the OCR functionality. I have tested this on multiple libraries and I have noticed it hasn’t been creating the MediaServiceOCR values on some items (doesn’t exist). I’ll keep you posted when I find more information about this.

 

 

So your hub site settings aren’t showing up in Office 365?

Hub sites have been released to Office 365 Targeted Release tenants. With all of the excitement, everyone has been jumping to powershell and adding hub sites to their tenancies!

Register-SPOHubSite https://mysite.sharepoint.com/sites/Hubsite

However, some have noticed that even though the powershell commands appear to be working, you may notice that your navigation is missing or the hub site settings are not appearing in the UI. The reason is simple, and thanks to @dmadelung for reminding me… you need to make sure that your release preferences in the Office 365 admin center are set to “Targeted release for everyone”.

ReleasePreference

If you’d like more tips and tricks on hub sites, head over to Marc Anderson’s post!

Thanks and happy hubbing!