<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sharepoint online &#8211; Sibeesh Passion</title>
	<atom:link href="https://sibeeshpassion.com/tag/sharepoint-online/feed/" rel="self" type="application/rss+xml" />
	<link>https://sibeeshpassion.com</link>
	<description>My passion towards life</description>
	<lastBuildDate>Tue, 24 Aug 2021 17:21:25 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>/wp-content/uploads/2017/04/Sibeesh_Passion_Logo_Small.png</url>
	<title>sharepoint online &#8211; Sibeesh Passion</title>
	<link>https://sibeeshpassion.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Azure Form Recognizer and Microsoft Flow to Search Scanned PDF Content in SharePoint Online</title>
		<link>https://sibeeshpassion.com/azure-form-recognizer-and-microsoft-flow-to-search-scanned-pdf-content-in-sharepoint-online/</link>
					<comments>https://sibeeshpassion.com/azure-form-recognizer-and-microsoft-flow-to-search-scanned-pdf-content-in-sharepoint-online/#disqus_thread</comments>
		
		<dc:creator><![CDATA[SibeeshVenu]]></dc:creator>
		<pubDate>Thu, 05 Mar 2020 13:11:24 +0000</pubDate>
				<category><![CDATA[Azure]]></category>
		<category><![CDATA[Cognitive Services]]></category>
		<category><![CDATA[Office 365]]></category>
		<category><![CDATA[SharePoint]]></category>
		<category><![CDATA[Azure Form Recognize]]></category>
		<category><![CDATA[azure form recognizer]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[flow]]></category>
		<category><![CDATA[microsoft flow]]></category>
		<category><![CDATA[scanned pdf content to search]]></category>
		<category><![CDATA[scanned pdf to searchable pdf]]></category>
		<category><![CDATA[sharepoint]]></category>
		<category><![CDATA[sharepoint online]]></category>
		<guid isPermaLink="false">https://sibeeshpassion.com/?p=14005</guid>

					<description><![CDATA[Introduction SharePoint is a huge platform and sometimes we may have to do some tricks to achieve our requirements. I was in a need to make my scanned PDF content to be searchable in the SharePoint online, which I have already achieved in a way, you can see that article here. Please consider this article as the second part of the above-mentioned article. Here in this article, we will make the Scanned PDF and images contents to be searchable in SharePoint online using the new Azure Form Recognizer and Microsoft Flow. Please keep reading. Background In our previous article, we [&#8230;]]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Introduction</h2>



<p>SharePoint is a huge platform and sometimes we may have to do some tricks to achieve our requirements. I was in a need to make my scanned PDF content to be searchable in the SharePoint online, which I have already achieved in a way, you can see that article here. Please consider this article as the second part of the above-mentioned article. Here in this article, we will make the Scanned PDF and images contents to be searchable in SharePoint online using the new Azure Form Recognizer and Microsoft Flow. Please keep reading.</p>



<h2 class="wp-block-heading">Background</h2>



<p>In our <a href="https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/">previous article</a>, we learned how to make the Scanned PDFs to be searchable by its contents using the technology called OCR with a third party tool AquaForest. AquaForest is really a cool product and there are many things that you can do, but it is expensive, as I was using that just for the OCR purpose, it was not worth the money I spend. Because of that, I had to find a different option to satisfy my requirements, that is how the Azure Form Recognizer comes into this story. If you have ever used the Azure Computer Vision AI, you can see that there we use OCR to read the content of the image files, unfortunately, that doesn&#8217;t work well with PDF files. The Azure Form Recognizer removes that limitation.</p>



<p>Azure Form Recognizer is part of the Cognitive Services Family, if you are new to Cognitive Service, please feel free to <a href="https://sibeeshpassion.com/category/azure/cognitive-services/">read some of my articles on the same topics</a>. </p>



<h2 class="wp-block-heading">Update the Document Library List</h2>



<p>As you all know that the SharePoint search will work with the content of the list and the metadata. So my idea here is to create a new column <strong>Metadata</strong> in the Document library list and then Azure Form Recognizer result to this field so that we search with the content this list entry, that is, our Scanned PDF will be available in the search result. Don&#8217;t worry if it sounds too complex, in fact, it is way too easy. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img fetchpriority="high" decoding="async" width="498" height="435" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library.png" alt="" class="wp-image-14022" srcset="/wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library.png 498w, /wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library-300x262.png 300w, /wp-content/uploads/2020/03/Metadata-column-in-the-Document-Library-425x371.png 425w" sizes="(max-width: 498px) 100vw, 498px" /><figcaption>Metadata column in the Document Library</figcaption></figure></div>



<p>The reason why I am using a separate column here is to get the full control over the column and to set the Multiline support and allow unlimited length. </p>



<h2 class="wp-block-heading">Creating the Flow to make the scanned PDF/Image contents to be searchable</h2>



<h3 class="wp-block-heading">Setting up Azure Form Recognizer</h3>



<p>Now we need to create an Azure Form Recognizer, it is as simple as you create any other services in Azure. Go to the Azure Portal and search for the Form Recognizer, and create one.</p>



<h3 class="wp-block-heading">Train your Form Recognizer Model </h3>



<p>Now it is time to train our model so that the Form Recognizer can give us the appropriate output. You can do this step either by using the <a href="https://westeurope.dev.cognitive.microsoft.com/docs/services/form-recognizer-api/operations/TrainCustomModel/console">Web UI Console given by Microsoft</a> or Curl. </p>



<blockquote class="wp-block-quote is-layout-flow wp-block-quote-is-layout-flow"><p>If you are running the commands in Windows 10, run it with Bash or use the Invoke-WebRequest in PowerShell.</p><cite>Where to run commands?</cite></blockquote>



<p>Using the web console is very easy so, I will use that. Before we run that we need to upload our sample document to the Azure Blob. Let&#8217;s do that now.</p>



<h4 class="wp-block-heading">Configure Azure Storage Account and Upload Blob</h4>



<p>Creating a storage account is really straightforward, search for Storage Account in the portal, and then fill the form as per your wish. Once you create the account, go to the resource and click on the Containers, under the Blob service menu. Now we need to create a container so that we can save the sample blob files inside. For now, I created this container with the name &#8220;models&#8221;. And then I uploaded 2 PDF files.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="702" height="323" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Storage-Account-with-Blobs.png" alt="" class="wp-image-14008" srcset="/wp-content/uploads/2020/03/Storage-Account-with-Blobs.png 702w, /wp-content/uploads/2020/03/Storage-Account-with-Blobs-300x138.png 300w, /wp-content/uploads/2020/03/Storage-Account-with-Blobs-425x196.png 425w" sizes="(max-width: 702px) 100vw, 702px" /><figcaption>Storage Account with Blobs</figcaption></figure></div>



<p>As mentioned here in the Microsoft Doc, now we need to get the SAS URL with our container name in it. Go to the Settings menu and click on the Share access signature and then click on the Generate SAS and connection string. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="813" height="646" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Generate-SAS-and-Connection-string.png" alt="" class="wp-image-14009" srcset="/wp-content/uploads/2020/03/Generate-SAS-and-Connection-string.png 813w, /wp-content/uploads/2020/03/Generate-SAS-and-Connection-string-300x238.png 300w, /wp-content/uploads/2020/03/Generate-SAS-and-Connection-string-768x610.png 768w, /wp-content/uploads/2020/03/Generate-SAS-and-Connection-string-425x338.png 425w" sizes="(max-width: 813px) 100vw, 813px" /><figcaption>Generate SAS and Connection string</figcaption></figure></div>



<p>Now copy the Blob Service SAS URL and add your container name to the URL right after the <em><strong>windows.net/</strong></em> so the end SAS URL will be looking like the below URL.</p>



<pre class="wp-block-code"><code>https://mlfit.blob.core.windows.net/models?sv=2019-02-02&amp;ss=bfqgt&amp;srt=shco&amp;sp=rwdlhfacup&amp;se=2020-03-05T16:51:40Z&amp;st=2020-03-05T08:51:40Z&amp;spr=https&amp;sig=MSN0%2BhGHDGSDGW7jH2tOTGwh8I%2Bld%2BvcYAYTFGDSGH6mUyzsCAQXVoo%3D</code></pre>



<p>Here the &#8220;<strong>models</strong>&#8221; is my container name. </p>



<h4 class="wp-block-heading">Train your model</h4>



<p> Now go to the console and fill all the details as below.  </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="782" height="654" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Train-Model-With-Web-Console.png" alt="" class="wp-image-14010" srcset="/wp-content/uploads/2020/03/Train-Model-With-Web-Console.png 782w, /wp-content/uploads/2020/03/Train-Model-With-Web-Console-300x251.png 300w, /wp-content/uploads/2020/03/Train-Model-With-Web-Console-768x642.png 768w, /wp-content/uploads/2020/03/Train-Model-With-Web-Console-425x355.png 425w" sizes="(max-width: 782px) 100vw, 782px" /><figcaption>Train Model With Web Console</figcaption></figure></div>



<p>Here the resource name is the name of your Form Recognizer resource, Ocp-Apim-Subscription-Key is the key of that service, you can use key1 or key2. And in the Request body, edit the source property with your SAS URL. Then hit the send button. Now if everything goes well, you should get the output as below.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="575" height="587" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output.png" alt="" class="wp-image-14011" srcset="/wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output.png 575w, /wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output-294x300.png 294w, /wp-content/uploads/2020/03/Form-Recognize-Train-Model-Output-425x434.png 425w" sizes="(max-width: 575px) 100vw, 575px" /><figcaption>Form Recognize Train Model Output</figcaption></figure></div>



<p>Please make a note of the <strong>modelId</strong> from the result, as we will use this in our Flow. If you are getting the error &#8221; No valid blobs found in the specified Azure blob container&#8221;, then most probably it is because of the source filter we apply in the Request body, just remove that and hit the send button again. </p>



<pre class="wp-block-code"><code>{
  "error": {
    "code": "2024",
    "innerError": {
      "requestId": "78df3a9b-ae2c-47a7-900c-8fa78f5a5a15"
    },
    "message": "No valid blobs found in the specified Azure blob container."
  }
}</code></pre>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="518" height="128" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied.png" alt="" class="wp-image-14013" srcset="/wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied.png 518w, /wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied-300x74.png 300w, /wp-content/uploads/2020/03/Request-body-without-Soure-filter-applied-425x105.png 425w" sizes="(max-width: 518px) 100vw, 518px" /><figcaption>Request body without source filter applied</figcaption></figure></div>



<p>You can also try out the different API calls like &#8220;Get Models&#8221;, &#8220;Get Model&#8221;, etc. As that is not relevant to this article, I am not going to try them. The one thing to notice here is that <strong>the more you train, the more accurate the result will be</strong>.</p>



<h3 class="wp-block-heading">Set up the Flow</h3>



<p>If you are not sure about how you can create a flow, please look at the &#8220;Create a Flow&#8221; section <strong><a href="https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/#create-a-flow">here</a></strong>. Once you have the basic flow with the connector &#8220;When a file is created&#8221;, we can initialize our variables which we are going to use later. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="464" height="307" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/String-Variables-in-Flow.png" alt="" class="wp-image-14016" srcset="/wp-content/uploads/2020/03/String-Variables-in-Flow.png 464w, /wp-content/uploads/2020/03/String-Variables-in-Flow-300x198.png 300w, /wp-content/uploads/2020/03/String-Variables-in-Flow-425x281.png 425w" sizes="(max-width: 464px) 100vw, 464px" /><figcaption>String Variables in Flow</figcaption></figure></div>



<p>One variable is to save the content type of the file we get and the other is to save the result of the Form Recognize Analyze API. Let&#8217;s move on to the next step now. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="1024" height="492" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-1024x492.png" alt="" class="wp-image-14017" srcset="/wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-1024x492.png 1024w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-300x144.png 300w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-768x369.png 768w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-1536x738.png 1536w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales-425x204.png 425w, /wp-content/uploads/2020/03/Set-the-content-type-to-string-varibales.png 1639w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption>Set the content type to string variables</figcaption></figure></div>



<p>As you can see that the Flow is just like the programming tasks we do, we can use if, if-else, switch and many more. Try out these functions in your Flow when you get time. </p>



<p>So now we have a dynamic value in our contenttype variable. Let&#8217;s add the Analyze Form task, just search for the  &#8220;Form Recognizer&#8221; and then select the action Analyze Form.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="472" height="315" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Analyze-Form-in-Flow.png" alt="" class="wp-image-14018" srcset="/wp-content/uploads/2020/03/Analyze-Form-in-Flow.png 472w, /wp-content/uploads/2020/03/Analyze-Form-in-Flow-300x200.png 300w, /wp-content/uploads/2020/03/Analyze-Form-in-Flow-425x284.png 425w" sizes="(max-width: 472px) 100vw, 472px" /><figcaption>Analyze Form in Flow</figcaption></figure></div>



<p>Now it will ask for you to enter the key of your Azure Form Recognizer, and a connection name. Once you give that, you can paste your <strong>modelId</strong> you got from the <em>Train Model API</em> call. In the end, this is how your Analyze Form action will look like. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="475" height="167" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Analyze-Form-Action.png" alt="" class="wp-image-14019" srcset="/wp-content/uploads/2020/03/Analyze-Form-Action.png 475w, /wp-content/uploads/2020/03/Analyze-Form-Action-300x105.png 300w, /wp-content/uploads/2020/03/Analyze-Form-Action-425x149.png 425w" sizes="(max-width: 475px) 100vw, 475px" /><figcaption>Analyze Form Action</figcaption></figure></div>



<p>You can see that we use the <strong>contenttype </strong>string variable here. Now we can append the value of the Analyze Form result to our <strong>recogoutput </strong>variable. And in the next step, we can save this information to the Metadata field that we created earlier for the Document library. Sounds good?</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="618" height="485" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata.png" alt="" class="wp-image-14020" srcset="/wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata.png 618w, /wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata-300x235.png 300w, /wp-content/uploads/2020/03/Save-the-Form-Recognizer-output-to-Metadata-425x334.png 425w" sizes="(max-width: 618px) 100vw, 618px" /><figcaption>Save the Form Recognizer output to Metadata</figcaption></figure></div>



<p>Now we are done with the Flow creation and if you wish you can send mails to the people about this conversion from the Scanned PDF to the Searchable PDF. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="610" height="547" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Send-mail-from-Flow.png" alt="" class="wp-image-14021" srcset="/wp-content/uploads/2020/03/Send-mail-from-Flow.png 610w, /wp-content/uploads/2020/03/Send-mail-from-Flow-300x269.png 300w, /wp-content/uploads/2020/03/Send-mail-from-Flow-425x381.png 425w" sizes="(max-width: 610px) 100vw, 610px" /><figcaption>Send mail from Flow</figcaption></figure></div>



<p>Below are the full steps of my flow.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="492" height="738" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Full-Steps-of-the-Flow.png" alt="" class="wp-image-14023" srcset="/wp-content/uploads/2020/03/Full-Steps-of-the-Flow.png 492w, /wp-content/uploads/2020/03/Full-Steps-of-the-Flow-200x300.png 200w, /wp-content/uploads/2020/03/Full-Steps-of-the-Flow-367x550.png 367w" sizes="(max-width: 492px) 100vw, 492px" /><figcaption>Full Steps of the Flow</figcaption></figure></div>



<p>Now we can test our flow. Cool right?</p>



<h2 class="wp-block-heading">Test the Flow</h2>



<p>To test, add a PDF document and an image to your Document Library and the flow will be triggered automatically. You can see the <a href="https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/#testing-the-flow">running status from the portal</a>. Once the flow is run, you can see the result like below. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="476" height="684" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Analyze-Form-Action-Result.png" alt="" class="wp-image-14024" srcset="/wp-content/uploads/2020/03/Analyze-Form-Action-Result.png 476w, /wp-content/uploads/2020/03/Analyze-Form-Action-Result-209x300.png 209w, /wp-content/uploads/2020/03/Analyze-Form-Action-Result-383x550.png 383w" sizes="(max-width: 476px) 100vw, 476px" /><figcaption>Analyze Form Action Result</figcaption></figure></div>



<p>Now just go back to your Document library and see the data in the Metadata column, this will be the Body data that we get from the Analyze Form action. Now the only thing pending is to do some search actions. Can&#8217;t wait to do that. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="850" height="311" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result.png" alt="" class="wp-image-14025" srcset="/wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result.png 850w, /wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result-300x110.png 300w, /wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result-768x281.png 768w, /wp-content/uploads/2020/03/Sibeesh-Passion-Search-Result-425x156.png 425w" sizes="(max-width: 850px) 100vw, 850px" /><figcaption>Sibeesh Passion Search Result</figcaption></figure></div>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="854" height="304" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Microsoft-MVP-Search-Result.png" alt="" class="wp-image-14026" srcset="/wp-content/uploads/2020/03/Microsoft-MVP-Search-Result.png 854w, /wp-content/uploads/2020/03/Microsoft-MVP-Search-Result-300x107.png 300w, /wp-content/uploads/2020/03/Microsoft-MVP-Search-Result-768x273.png 768w, /wp-content/uploads/2020/03/Microsoft-MVP-Search-Result-425x151.png 425w" sizes="(max-width: 854px) 100vw, 854px" /><figcaption>Microsoft MVP Search Result</figcaption></figure></div>



<p>Now try out with as many PDFs and Images you can, I will leave this to you.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Thanks a lot for staying with me for a long time and reading this article. I hope now you have learned about</p>



<ul class="wp-block-list"><li>creating Azure Form Recognizer</li><li>using Azure Form Recognizer to read text from PDF and Images</li><li>training Azure Form Recognizer</li><li>using Azure Storage Account</li><li>creating a flow in SharePoint online</li><li>creating the steps in Flow</li><li>use the connections in Flow</li><li>send mails from Flow</li></ul>



<p>If you have learned anything else from this article, please let me know in the comment section.</p>



<h2 class="wp-block-heading">Follow me</h2>



<p>If you like this article, consider following me, haha!.</p>



<ul class="wp-block-list"><li><a href="https://github.com/SibeeshVenu">GitHub</a></li><li><a href="https://medium.com/@sibeeshvenu">medium</a></li><li><a href="https://twitter.com/sibeeshvenu">Twitter</a></li></ul>



<h2 class="wp-block-heading">Your turn. What do you think?</h2>



<p>Thanks a lot for reading. Did I miss anything that you may think which is needed in this article? Could you find this post useful? Kindly do not forget to share your feedback.</p>



<p>Kindest Regards<br>Sibeesh Venu</p>
]]></content:encoded>
					
					<wfw:commentRss>https://sibeeshpassion.com/azure-form-recognizer-and-microsoft-flow-to-search-scanned-pdf-content-in-sharepoint-online/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Search Contents of a PDF File in SharePoint Online, Make them Searchable Using Microsoft Flow</title>
		<link>https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/</link>
					<comments>https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/#disqus_thread</comments>
		
		<dc:creator><![CDATA[SibeeshVenu]]></dc:creator>
		<pubDate>Wed, 04 Mar 2020 15:20:57 +0000</pubDate>
				<category><![CDATA[Azure]]></category>
		<category><![CDATA[Cognitive Services]]></category>
		<category><![CDATA[Office 365]]></category>
		<category><![CDATA[SharePoint]]></category>
		<category><![CDATA[aquaforest]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[flow]]></category>
		<category><![CDATA[microsoft flow]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[searchable pdf in sharepoint]]></category>
		<category><![CDATA[sharepoint]]></category>
		<category><![CDATA[sharepoint flow]]></category>
		<category><![CDATA[sharepoint online]]></category>
		<category><![CDATA[SharePoint Tips]]></category>
		<guid isPermaLink="false">https://sibeeshpassion.com/?p=13986</guid>

					<description><![CDATA[[toc] Introduction We all get stuck somewhere in our so-called &#8220;Programmer Life&#8221; for a small requirement. And I was stuck with such a requirement that the content of the PDF file uploaded to my SharePoint online is not searchable, however, the PDF I created manually from the Word document works fine. Let me tell you why!. Typically there are 3 kinds of PDF files. Normal PDF: These are the files that you get from applications like Microsoft Word, Adobe tools, etc. The beauty of this file is that the content of this file can be searched, you can select the [&#8230;]]]></description>
										<content:encoded><![CDATA[
<p>[toc]</p>



<h2 class="wp-block-heading">Introduction</h2>



<p>We all get stuck somewhere in our so-called &#8220;Programmer Life&#8221; for a small requirement. And I was stuck with such a requirement that the content of the PDF file uploaded to my SharePoint online is not searchable, however, the PDF I created manually from the Word document works fine. Let me tell you why!. Typically there are 3 kinds of PDF files.</p>



<ol class="wp-block-list"><li><strong>Normal PDF</strong>: These are the files that you get from applications like Microsoft Word, Adobe tools, etc. The beauty of this file is that the content of this file can be searched, you can select the text in this file, style them and copy-paste, etc. </li><li>Scanned PDF: This one is exactly opposite to the first one, and this was Villain in my requirement. The issue with this type is that though the content looks visually the same, it can not be searchable, select, copy-paste, etc, as in the end it is an image inserted to a PDF document. Now how can we read the contents of this file, that is where the technology called OCR (Optical Character Recognition) comes into the picture. With this, we can read the content, and make them searchable, etc. And when we do that, we introduce the third type of PDF file</li><li>Searchable/OCRed PDF: It is the type that we get from the OCR process as an output. In the end, this type will have two-layer in it, one is the image that we get from a scanner, and the second is the text content. With this two-layer, this file becomes almost equal to the first kind </li></ol>



<p>Now let&#8217;s go see what was my requirement and how did I overcome this process.</p>



<h2 class="wp-block-heading">Background</h2>



<p>Technology is fast and starts running today if you want to touch it. I have a One Drive Sync folder to which I save the scanned PDF files from my scanner and once that is done the same will be synced to my SharePoint online. So far so good. But the problem is the content of these files are not searchable. Now let&#8217;s fix that.</p>



<h2 class="wp-block-heading">Fix to make Scanned PDF files searchable</h2>



<p>We use Microsoft Flow to do this process of converting the Scanned PDF to the Searchable PDF file. And in the flow, there are many ways that you can do this, I initially tried to do it with the combination of Computer Vision AI and some other services as preceding. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="621" height="367" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Connect-to-the-services-needed.png" alt="" class="wp-image-13987" srcset="/wp-content/uploads/2020/03/Connect-to-the-services-needed.png 621w, /wp-content/uploads/2020/03/Connect-to-the-services-needed-300x177.png 300w, /wp-content/uploads/2020/03/Connect-to-the-services-needed-425x251.png 425w" sizes="(max-width: 621px) 100vw, 621px" /><figcaption>Computer Vision AI in SharePoint</figcaption></figure></div>



<p>But, I was not getting the expected output when I was using them. So, I decided to go with other options. <a href="https://sibeeshpassion.com/using-azure-cognitive-service-computer-vision-ai-to-read-text-from-an-image/">If you are new with OCR technology or Computer Vision AI, you can find my article here</a>. </p>



<h3 class="wp-block-heading">Create a flow</h3>



<p>The files are being synced to my Document folder in SharePoint, thus I needed to create a flow that gets triggered whenever there is a file uploaded.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="817" height="155" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Create-a-Flow.png" alt="" class="wp-image-13988" srcset="/wp-content/uploads/2020/03/Create-a-Flow.png 817w, /wp-content/uploads/2020/03/Create-a-Flow-300x57.png 300w, /wp-content/uploads/2020/03/Create-a-Flow-768x146.png 768w, /wp-content/uploads/2020/03/Create-a-Flow-425x81.png 425w" sizes="(max-width: 817px) 100vw, 817px" /><figcaption>Create Flow</figcaption></figure></div>



<p>Click on the &#8220;Create a flow&#8221; then you will be asked to select the flow template. I selected the template &#8220;When a new file is added in SharePoint, complete a custom action&#8221;. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="585" height="594" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action.png" alt="" class="wp-image-13989" srcset="/wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action.png 585w, /wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action-295x300.png 295w, /wp-content/uploads/2020/03/When-a-new-file-is-added-in-SharePoint-complete-a-custom-action-425x432.png 425w" sizes="(max-width: 585px) 100vw, 585px" /><figcaption>When a new file is added in SharePoint, complete a custom action</figcaption></figure></div>



<p>Once you click on the Continue button, you are good to create new steps in your flow. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="499" height="265" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Add-Steps-in-Flow.png" alt="" class="wp-image-13990" srcset="/wp-content/uploads/2020/03/Add-Steps-in-Flow.png 499w, /wp-content/uploads/2020/03/Add-Steps-in-Flow-300x159.png 300w, /wp-content/uploads/2020/03/Add-Steps-in-Flow-425x226.png 425w" sizes="(max-width: 499px) 100vw, 499px" /><figcaption>Add Steps in Flow</figcaption></figure></div>



<p>Flow is a step by step solution and some steps may be having an output that we can carry to the next step and in our flow, we use this a lot. Once you connect to the SharePoint site, we need to get the uploaded file properties, to do that, click on the +(plus) icon, select &#8220;Add an action&#8221; and then search for &#8220;Get File Properties&#8221; </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="490" height="362" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Get-File-Properties-Step.png" alt="" class="wp-image-13993" srcset="/wp-content/uploads/2020/03/Get-File-Properties-Step.png 490w, /wp-content/uploads/2020/03/Get-File-Properties-Step-300x222.png 300w, /wp-content/uploads/2020/03/Get-File-Properties-Step-425x314.png 425w" sizes="(max-width: 490px) 100vw, 490px" /><figcaption> Get File Properties Step </figcaption></figure></div>



<p>Now select the Site address and the library, and then click on the ID field, you will see an option to select the output of the previous step.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="416" height="265" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/ID-of-the-file-created.png" alt="" class="wp-image-13994" srcset="/wp-content/uploads/2020/03/ID-of-the-file-created.png 416w, /wp-content/uploads/2020/03/ID-of-the-file-created-300x191.png 300w" sizes="(max-width: 416px) 100vw, 416px" /><figcaption>The ID of the file created</figcaption></figure></div>



<p>Now we get the file and need to check the file type right, to do that add a condition control and then add the conditions to it.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="608" height="415" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image.png" alt="" class="wp-image-13995" srcset="/wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image.png 608w, /wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image-300x205.png 300w, /wp-content/uploads/2020/03/Condition-to-check-whether-PDF-or-image-425x290.png 425w" sizes="(max-width: 608px) 100vw, 608px" /><figcaption>Condition to check whether PDF or image</figcaption></figure></div>



<p>Each condition will have an output as &#8220;Yes&#8221; or &#8220;No&#8221; and in the &#8220;Yes&#8221; part, we will add all of our other steps and we will not think about the &#8220;No&#8221; output now. But, you can think of adding some tasks there.</p>



<p>Now in the &#8220;Yes&#8221; tab, we can get the file and pass it to the OCR process, that is where the tool called AquaForest comes into the story. Please follow the steps mentioned in <a href="https://www.aquaforest.com/en/aquaforest-flow-doc.asp">this article</a> and get the key needed. Once that is done, add the action &#8220;OCR PDF or Images&#8221; by searching the word &#8221; AquaForest&#8221;.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="600" height="393" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images.png" alt="" class="wp-image-13996" srcset="/wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images.png 600w, /wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images-300x197.png 300w, /wp-content/uploads/2020/03/AquaForest-OCR-PDF-or-Images-425x278.png 425w" sizes="(max-width: 600px) 100vw, 600px" /><figcaption>AquaForest OCR PDF or Images</figcaption></figure></div>



<p>Give the connection a name and add the key in the next popup. There are many properties that you can set here, but the below two are important.</p>



<p> </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="596" height="131" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/File-Content-with-OCR.png" alt="" class="wp-image-13997" srcset="/wp-content/uploads/2020/03/File-Content-with-OCR.png 596w, /wp-content/uploads/2020/03/File-Content-with-OCR-300x66.png 300w, /wp-content/uploads/2020/03/File-Content-with-OCR-425x93.png 425w" sizes="(max-width: 596px) 100vw, 596px" /><figcaption>File Content with OCR</figcaption></figure></div>



<p>As an output of this step, we get the OCRed file and now all we have to do is to add the action called &#8220;Create File&#8221; and set up the same.</p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="599" height="216" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Save-the-OCRed-File.png" alt="" class="wp-image-13998" srcset="/wp-content/uploads/2020/03/Save-the-OCRed-File.png 599w, /wp-content/uploads/2020/03/Save-the-OCRed-File-300x108.png 300w, /wp-content/uploads/2020/03/Save-the-OCRed-File-425x153.png 425w" sizes="(max-width: 599px) 100vw, 599px" /><figcaption>Save the OCRed File</figcaption></figure></div>



<p>Wow, now we have a Searchable PDF in our Document folder. Go search with any content of your newly updated PDF. If you wish, you can also create an action to send an acknowledgment mail. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="619" height="556" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow.png" alt="" class="wp-image-14000" srcset="/wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow.png 619w, /wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow-300x269.png 300w, /wp-content/uploads/2020/03/Send-mail-from-SharePoint-Flow-425x382.png 425w" sizes="(max-width: 619px) 100vw, 619px" /><figcaption>Send email step in Flow</figcaption></figure></div>



<h3 class="wp-block-heading">Testing the flow</h3>



<p>As we already created the flow, now it is time to test the same. To do that, I added a scanned document to my one drive folder. We can check the Flow running status in the <a href="https://emea.flow.microsoft.com/">portal</a>. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="759" height="360" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Run-History-of-Flow.png" alt="" class="wp-image-14001" srcset="/wp-content/uploads/2020/03/Run-History-of-Flow.png 759w, /wp-content/uploads/2020/03/Run-History-of-Flow-300x142.png 300w, /wp-content/uploads/2020/03/Run-History-of-Flow-425x202.png 425w" sizes="(max-width: 759px) 100vw, 759px" /><figcaption>Run History of Flow</figcaption></figure></div>



<p>Below is the sample run history output of my flow. </p>



<div class="wp-block-image"><figure class="aligncenter size-large"><img decoding="async" width="837" height="611" src="https://sibeeshpassion.com/wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR.png" alt="" class="wp-image-14002" srcset="/wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR.png 837w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-300x219.png 300w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-768x561.png 768w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-315x230.png 315w, /wp-content/uploads/2020/03/Sample-Flow-Run-History-PDF-OCR-425x310.png 425w" sizes="(max-width: 837px) 100vw, 837px" /><figcaption>Sample Flow Run History PDF OCR</figcaption></figure></div>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Thanks a lot for staying with me for a long time and reading this article. I hope now you have learned about</p>



<ul class="wp-block-list"><li>creating a flow in SharePoint online</li><li>creating the steps in Flow</li><li>use the connections in Flow</li><li>OCR the PDF using Computer Vision</li><li>OCR the PDF using AquaForest API</li><li>creating a new File with OCRed output</li><li>send mails from Flow</li></ul>



<p>If you have learned anything else from this article, please let me know in the comment section.</p>



<h2 class="wp-block-heading">Follow me</h2>



<p>If you like this article, consider following me, haha!.</p>



<ul class="wp-block-list"><li><a href="https://github.com/SibeeshVenu">GitHub</a></li><li><a href="https://medium.com/@sibeeshvenu">medium</a></li><li><a href="https://twitter.com/sibeeshvenu">Twitter</a></li></ul>



<h2 class="wp-block-heading">Your turn. What do you think?</h2>



<p>Thanks a lot for reading. Did I miss anything that you may think which is needed in this article? Could you find this post useful? Kindly do not forget to share your feedback.</p>



<p>Kindest Regards<br>Sibeesh Venu</p>
]]></content:encoded>
					
					<wfw:commentRss>https://sibeeshpassion.com/search-contents-of-a-pdf-file-in-sharepoint-online-make-them-searchable-using-microsoft-flow/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
			</item>
	</channel>
</rss>
