<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>azure form recognizer &#8211; Sibeesh Passion</title>
	<atom:link href="https://sibeeshpassion.com/tag/azure-form-recognizer/feed/" rel="self" type="application/rss+xml" />
	<link>https://sibeeshpassion.com</link>
	<description>My passion towards life</description>
	<lastBuildDate>Tue, 24 Aug 2021 17:21:24 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>/wp-content/uploads/2017/04/Sibeesh_Passion_Logo_Small.png</url>
	<title>azure form recognizer &#8211; Sibeesh Passion</title>
	<link>https://sibeeshpassion.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Azure Form Recognizer and Microsoft Flow to Search Scanned PDF Content in SharePoint Online</title>
		<link>https://sibeeshpassion.com/azure-form-recognizer-and-microsoft-flow-to-search-scanned-pdf-content-in-sharepoint-online/</link>
					<comments>https://sibeeshpassion.com/azure-form-recognizer-and-microsoft-flow-to-search-scanned-pdf-content-in-sharepoint-online/#disqus_thread</comments>
		
		<dc:creator><![CDATA[SibeeshVenu]]></dc:creator>
		<pubDate>Thu, 05 Mar 2020 13:11:24 +0000</pubDate>
				<category><![CDATA[Azure]]></category>
		<category><![CDATA[Cognitive Services]]></category>
		<category><![CDATA[Office 365]]></category>
		<category><![CDATA[SharePoint]]></category>
		<category><![CDATA[Azure Form Recognize]]></category>
		<category><![CDATA[azure form recognizer]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[flow]]></category>
		<category><![CDATA[microsoft flow]]></category>
		<category><![CDATA[scanned pdf content to search]]></category>
		<category><![CDATA[scanned pdf to searchable pdf]]></category>
		<category><![CDATA[sharepoint]]></category>
		<category><![CDATA[sharepoint online]]></category>
		<guid isPermaLink="false">https://sibeeshpassion.com/?p=14005</guid>

					<description><![CDATA[Introduction SharePoint is a huge platform and sometimes we may have to do some tricks to achieve our requirements. I was in a need to make my scanned PDF content to be searchable in the SharePoint online, which I have already achieved in a way, you can see that article here. Please consider this article as the second part of the above-mentioned article. Here in this article, we will make the Scanned PDF and images contents to be searchable in SharePoint online using the new Azure Form Recognizer and Microsoft Flow. Please keep reading. Background In our previous article, we [&#8230;]]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Introduction</h2>



<p>SharePoint is a huge platform and sometimes we may have to do some tricks to achieve our requirements. I was in a need to make my scanned PDF content to be searchable in the SharePoint online, which I have already achieved in a way, you can see that article here. Please consider this article as the second part of the above-mentioned article. Here in this article, we will make the Scanned PDF and images contents to be searchable in SharePoint online using the new Azure Form Recognizer and Microsoft Flow. Please keep reading.</p>



<h2 class="wp-block-heading">Background</h2>



<p>In our <a href="https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/">previous article</a>, we learned how to make the Scanned PDFs to be searchable by its contents using the technology called OCR with a third party tool AquaForest. AquaForest is really a cool product and there are many things that you can do, but it is expensive, as I was using that just for the OCR purpose, it was not worth the money I spend. Because of that, I had to find a different option to satisfy my requirements, that is how the Azure Form Recognizer comes into this story. If you have ever used the Azure Computer Vision AI, you can see that there we use OCR to read the content of the image files, unfortunately, that doesn&#8217;t work well with PDF files. The Azure Form Recognizer removes that limitation.</p>



<p>Azure Form Recognizer is part of the Cognitive Services Family, if you are new to Cognitive Service, please feel free to <a href="https://sibeeshpassion.com/category/azure/cognitive-services/">read some of my articles on the same topics</a>. </p>



<h2 class="wp-block-heading">Update the Document Library List</h2>



<p>As you all know that the SharePoint search will work with the content of the list and the metadata. So my idea here is to create a new column <strong>Metadata</strong> in the Document library list and then Azure Form Recognizer result to this field so that we search with the content this list entry, that is, our Scanned PDF will be available in the search result. Don&#8217;t worry if it sounds too complex, in fact, it is way too easy. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="498" height="435" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library.png" alt="" class="wp-image-14022" srcset="/wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library.png 498w, /wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library-300x262.png 300w, /wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library-425x371.png 425w" sizes="(max-width: 498px) 100vw, 498px" /><figcaption>Metadata column in the Document Library</figcaption></figure></div>



<p>The reason why I am using a separate column here is to get the full control over the column and to set the Multiline support and allow unlimited length. </p>



<h2 class="wp-block-heading">Creating the Flow to make the scanned PDF/Image contents to be searchable</h2>



<h3 class="wp-block-heading">Setting up Azure Form Recognizer</h3>



<p>Now we need to create an Azure Form Recognizer, it is as simple as you create any other services in Azure. Go to the Azure Portal and search for the Form Recognizer, and create one.</p>



<h3 class="wp-block-heading">Train your Form Recognizer Model </h3>



<p>Now it is time to train our model so that the Form Recognizer can give us the appropriate output. You can do this step either by using the <a href="https://westeurope.dev.cognitive.microsoft.com/docs/services/form-recognizer-api/operations/TrainCustomModel/console">Web UI Console given by Microsoft</a> or Curl. </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>If you are running the commands in Windows 10, run it with Bash or use the Invoke-WebRequest in PowerShell.</p><cite>Where to run commands?</cite></blockquote>



<p>Using the web console is very easy so, I will use that. Before we run that we need to upload our sample document to the Azure Blob. Let&#8217;s do that now.</p>



<h4 class="wp-block-heading">Configure Azure Storage Account and Upload Blob</h4>



<p>Creating a storage account is really straightforward, search for Storage Account in the portal, and then fill the form as per your wish. Once you create the account, go to the resource and click on the Containers, under the Blob service menu. Now we need to create a container so that we can save the sample blob files inside. For now, I created this container with the name &#8220;models&#8221;. And then I uploaded 2 PDF files.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="702" height="323" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Storage-Account-with-Blobs.png" alt="" class="wp-image-14008" srcset="/wp-content/uploads/2020/03/Storage-Account-with-Blobs.png 702w, /wp-content/uploads/2020/03/Storage-Account-with-Blobs-300x138.png 300w, /wp-content/uploads/2020/03/Storage-Account-with-Blobs-425x196.png 425w" sizes="(max-width: 702px) 100vw, 702px" /><figcaption>Storage Account with Blobs</figcaption></figure></div>



<p>As mentioned here in the Microsoft Doc, now we need to get the SAS URL with our container name in it. Go to the Settings menu and click on the Share access signature and then click on the Generate SAS and connection string. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="813" height="646" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Generate-SAS-and-Connection-string.png" alt="" class="wp-image-14009" srcset="/wp-content/uploads/2020/03/Generate-SAS-and-Connection-string.png 813w, /wp-content/uploads/2020/03/Generate-SAS-and-Connection-string-300x238.png 300w, /wp-content/uploads/2020/03/Generate-SAS-and-Connection-string-768x610.png 768w, /wp-content/uploads/2020/03/Generate-SAS-and-Connection-string-425x338.png 425w" sizes="(max-width: 813px) 100vw, 813px" /><figcaption>Generate SAS and Connection string</figcaption></figure></div>



<p>Now copy the Blob Service SAS URL and add your container name to the URL right after the <em><strong>windows.net/</strong></em> so the end SAS URL will be looking like the below URL.</p>



<pre class="wp-block-code"><code>https://mlfit.blob.core.windows.net/models?sv=2019-02-02&amp;ss=bfqgt&amp;srt=shco&amp;sp=rwdlhfacup&amp;se=2020-03-05T16:51:40Z&amp;st=2020-03-05T08:51:40Z&amp;spr=https&amp;sig=MSN0%2BhGHDGSDGW7jH2tOTGwh8I%2Bld%2BvcYAYTFGDSGH6mUyzsCAQXVoo%3D</code></pre>



<p>Here the &#8220;<strong>models</strong>&#8221; is my container name. </p>



<h4 class="wp-block-heading">Train your model</h4>



<p> Now go to the console and fill all the details as below.  </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="782" height="654" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Train-Model-With-Web-Console.png" alt="" class="wp-image-14010" srcset="/wp-content/uploads/2020/03/Train-Model-With-Web-Console.png 782w, /wp-content/uploads/2020/03/Train-Model-With-Web-Console-300x251.png 300w, /wp-content/uploads/2020/03/Train-Model-With-Web-Console-768x642.png 768w, /wp-content/uploads/2020/03/Train-Model-With-Web-Console-425x355.png 425w" sizes="(max-width: 782px) 100vw, 782px" /><figcaption>Train Model With Web Console</figcaption></figure></div>



<p>Here the resource name is the name of your Form Recognizer resource, Ocp-Apim-Subscription-Key is the key of that service, you can use key1 or key2. And in the Request body, edit the source property with your SAS URL. Then hit the send button. Now if everything goes well, you should get the output as below.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="575" height="587" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output.png" alt="" class="wp-image-14011" srcset="/wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output.png 575w, /wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output-294x300.png 294w, /wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output-425x434.png 425w" sizes="(max-width: 575px) 100vw, 575px" /><figcaption>Form Recognize Train Model Output</figcaption></figure></div>



<p>Please make a note of the <strong>modelId</strong> from the result, as we will use this in our Flow. If you are getting the error &#8221; No valid blobs found in the specified Azure blob container&#8221;, then most probably it is because of the source filter we apply in the Request body, just remove that and hit the send button again. </p>



<pre class="wp-block-code"><code>{
  "error": {
    "code": "2024",
    "innerError": {
      "requestId": "78df3a9b-ae2c-47a7-900c-8fa78f5a5a15"
    },
    "message": "No valid blobs found in the specified Azure blob container."
  }
}</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="518" height="128" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied.png" alt="" class="wp-image-14013" srcset="/wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied.png 518w, /wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied-300x74.png 300w, /wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied-425x105.png 425w" sizes="(max-width: 518px) 100vw, 518px" /><figcaption>Request body without source filter applied</figcaption></figure></div>



<p>You can also try out the different API calls like &#8220;Get Models&#8221;, &#8220;Get Model&#8221;, etc. As that is not relevant to this article, I am not going to try them. The one thing to notice here is that <strong>the more you train, the more accurate the result will be</strong>.</p>



<h3 class="wp-block-heading">Set up the Flow</h3>



<p>If you are not sure about how you can create a flow, please look at the &#8220;Create a Flow&#8221; section <strong><a href="https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/#create-a-flow">here</a></strong>. Once you have the basic flow with the connector &#8220;When a file is created&#8221;, we can initialize our variables which we are going to use later. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="464" height="307" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/String-Variables-in-Flow.png" alt="" class="wp-image-14016" srcset="/wp-content/uploads/2020/03/String-Variables-in-Flow.png 464w, /wp-content/uploads/2020/03/String-Variables-in-Flow-300x198.png 300w, /wp-content/uploads/2020/03/String-Variables-in-Flow-425x281.png 425w" sizes="(max-width: 464px) 100vw, 464px" /><figcaption>String Variables in Flow</figcaption></figure></div>



<p>One variable is to save the content type of the file we get and the other is to save the result of the Form Recognize Analyze API. Let&#8217;s move on to the next step now. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="1024" height="492" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-1024x492.png" alt="" class="wp-image-14017" srcset="/wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-1024x492.png 1024w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-300x144.png 300w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-768x369.png 768w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-1536x738.png 1536w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-425x204.png 425w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales.png 1639w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Set the content type to string variables</figcaption></figure></div>



<p>As you can see that the Flow is just like the programming tasks we do, we can use if, if-else, switch and many more. Try out these functions in your Flow when you get time. </p>



<p>So now we have a dynamic value in our contenttype variable. Let&#8217;s add the Analyze Form task, just search for the  &#8220;Form Recognizer&#8221; and then select the action Analyze Form.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="472" height="315" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Analyze-Form-in-Flow.png" alt="" class="wp-image-14018" srcset="/wp-content/uploads/2020/03/Analyze-Form-in-Flow.png 472w, /wp-content/uploads/2020/03/Analyze-Form-in-Flow-300x200.png 300w, /wp-content/uploads/2020/03/Analyze-Form-in-Flow-425x284.png 425w" sizes="(max-width: 472px) 100vw, 472px" /><figcaption>Analyze Form in Flow</figcaption></figure></div>



<p>Now it will ask for you to enter the key of your Azure Form Recognizer, and a connection name. Once you give that, you can paste your <strong>modelId</strong> you got from the <em>Train Model API</em> call. In the end, this is how your Analyze Form action will look like. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="475" height="167" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Analyze-Form-Action.png" alt="" class="wp-image-14019" srcset="/wp-content/uploads/2020/03/Analyze-Form-Action.png 475w, /wp-content/uploads/2020/03/Analyze-Form-Action-300x105.png 300w, /wp-content/uploads/2020/03/Analyze-Form-Action-425x149.png 425w" sizes="(max-width: 475px) 100vw, 475px" /><figcaption>Analyze Form Action</figcaption></figure></div>



<p>You can see that we use the <strong>contenttype </strong>string variable here. Now we can append the value of the Analyze Form result to our <strong>recogoutput </strong>variable. And in the next step, we can save this information to the Metadata field that we created earlier for the Document library. Sounds good?</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="618" height="485" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata.png" alt="" class="wp-image-14020" srcset="/wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata.png 618w, /wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata-300x235.png 300w, /wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata-425x334.png 425w" sizes="(max-width: 618px) 100vw, 618px" /><figcaption>Save the Form Recognizer output to Metadata</figcaption></figure></div>



<p>Now we are done with the Flow creation and if you wish you can send mails to the people about this conversion from the Scanned PDF to the Searchable PDF. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="610" height="547" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Send-mail-from-Flow.png" alt="" class="wp-image-14021" srcset="/wp-content/uploads/2020/03/Send-mail-from-Flow.png 610w, /wp-content/uploads/2020/03/Send-mail-from-Flow-300x269.png 300w, /wp-content/uploads/2020/03/Send-mail-from-Flow-425x381.png 425w" sizes="(max-width: 610px) 100vw, 610px" /><figcaption>Send mail from Flow</figcaption></figure></div>



<p>Below are the full steps of my flow.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="492" height="738" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Full-Steps-of-the-Flow.png" alt="" class="wp-image-14023" srcset="/wp-content/uploads/2020/03/Full-Steps-of-the-Flow.png 492w, /wp-content/uploads/2020/03/Full-Steps-of-the-Flow-200x300.png 200w, /wp-content/uploads/2020/03/Full-Steps-of-the-Flow-367x550.png 367w" sizes="(max-width: 492px) 100vw, 492px" /><figcaption>Full Steps of the Flow</figcaption></figure></div>



<p>Now we can test our flow. Cool right?</p>



<h2 class="wp-block-heading">Test the Flow</h2>



<p>To test, add a PDF document and an image to your Document Library and the flow will be triggered automatically. You can see the <a href="https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/#testing-the-flow">running status from the portal</a>. Once the flow is run, you can see the result like below. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="476" height="684" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Analyze-Form-Action-Result.png" alt="" class="wp-image-14024" srcset="/wp-content/uploads/2020/03/Analyze-Form-Action-Result.png 476w, /wp-content/uploads/2020/03/Analyze-Form-Action-Result-209x300.png 209w, /wp-content/uploads/2020/03/Analyze-Form-Action-Result-383x550.png 383w" sizes="(max-width: 476px) 100vw, 476px" /><figcaption>Analyze Form Action Result</figcaption></figure></div>



<p>Now just go back to your Document library and see the data in the Metadata column, this will be the Body data that we get from the Analyze Form action. Now the only thing pending is to do some search actions. Can&#8217;t wait to do that. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="850" height="311" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result.png" alt="" class="wp-image-14025" srcset="/wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result.png 850w, /wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result-300x110.png 300w, /wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result-768x281.png 768w, /wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result-425x156.png 425w" sizes="(max-width: 850px) 100vw, 850px" /><figcaption>Sibeesh Passion Search Result</figcaption></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="854" height="304" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Microsoft-MVP-Search-Result.png" alt="" class="wp-image-14026" srcset="/wp-content/uploads/2020/03/Microsoft-MVP-Search-Result.png 854w, /wp-content/uploads/2020/03/Microsoft-MVP-Search-Result-300x107.png 300w, /wp-content/uploads/2020/03/Microsoft-MVP-Search-Result-768x273.png 768w, /wp-content/uploads/2020/03/Microsoft-MVP-Search-Result-425x151.png 425w" sizes="(max-width: 854px) 100vw, 854px" /><figcaption>Microsoft MVP Search Result</figcaption></figure></div>



<p>Now try out with as many PDFs and Images you can, I will leave this to you.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Thanks a lot for staying with me for a long time and reading this article. I hope now you have learned about</p>



<ul class="wp-block-list"><li>creating Azure Form Recognizer</li><li>using Azure Form Recognizer to read text from PDF and Images</li><li>training Azure Form Recognizer</li><li>using Azure Storage Account</li><li>creating a flow in SharePoint online</li><li>creating the steps in Flow</li><li>use the connections in Flow</li><li>send mails from Flow</li></ul>



<p>If you have learned anything else from this article, please let me know in the comment section.</p>



<h2 class="wp-block-heading">Follow me</h2>



<p>If you like this article, consider following me, haha!.</p>



<ul class="wp-block-list"><li><a href="https://github.com/SibeeshVenu">GitHub</a></li><li><a href="https://medium.com/@sibeeshvenu">medium</a></li><li><a href="https://twitter.com/sibeeshvenu">Twitter</a></li></ul>



<h2 class="wp-block-heading">Your turn. What do you think?</h2>



<p>Thanks a lot for reading. Did I miss anything that you may think which is needed in this article? Could you find this post useful? Kindly do not forget to share your feedback.</p>



<p>Kindest Regards<br>Sibeesh Venu</p>
]]></content:encoded>
					
					<wfw:commentRss>https://sibeeshpassion.com/azure-form-recognizer-and-microsoft-flow-to-search-scanned-pdf-content-in-sharepoint-online/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
