Sibeesh Passion

Top Menu

  • Home
  • Search
  • About
  • Privacy Policy

Main Menu

  • Articles
    • Azure
    • .NET
    • IoT
    • JavaScript
    • Career Advice
    • Interview
    • Angular
    • Node JS
    • JQuery
    • Knockout JS
    • Jasmine Framework
    • SQL
    • MongoDB
    • MySQL
    • WordPress
  • Contributions
    • Medium
    • GitHub
    • Stack Overflow
    • Unsplash
    • ASP.NET Forum
    • C# Corner
    • Code Project
    • DZone
    • MSDN
  • Social Media
    • LinkedIn
    • Facebook
    • Instagram
    • Twitter
  • YouTube
    • Sibeesh Venu
    • Sibeesh Passion
  • Awards
  • Home
  • Search
  • About
  • Privacy Policy

logo

Sibeesh Passion

  • Articles
    • Azure
    • .NET
    • IoT
    • JavaScript
    • Career Advice
    • Interview
    • Angular
    • Node JS
    • JQuery
    • Knockout JS
    • Jasmine Framework
    • SQL
    • MongoDB
    • MySQL
    • WordPress
  • Contributions
    • Medium
    • GitHub
    • Stack Overflow
    • Unsplash
    • ASP.NET Forum
    • C# Corner
    • Code Project
    • DZone
    • MSDN
  • Social Media
    • LinkedIn
    • Facebook
    • Instagram
    • Twitter
  • YouTube
    • Sibeesh Venu
    • Sibeesh Passion
  • Awards
  • Linux Azure Function Isolated Dot Net 9 YAML Template Deployment

  • Build, Deploy, Configure CI &CD Your Static Website in 5 mins

  • Post Messages to Microsoft Teams Using Python

  • Get Azure Blob Storage Blob Metadata Using PowerShell

  • Deploy .net 6 App to Azure from Azure DevOps using Pipelines

AzureCognitive ServicesOffice 365SharePoint
Home›Azure›Search Contents of a PDF File in SharePoint Online, Make them Searchable Using Microsoft Flow

Search Contents of a PDF File in SharePoint Online, Make them Searchable Using Microsoft Flow

By SibeeshVenu
March 4, 2020
0
1
Share:

[toc]

Introduction

We all get stuck somewhere in our so-called “Programmer Life” for a small requirement. And I was stuck with such a requirement that the content of the PDF file uploaded to my SharePoint online is not searchable, however, the PDF I created manually from the Word document works fine. Let me tell you why!. Typically there are 3 kinds of PDF files.

  1. Normal PDF: These are the files that you get from applications like Microsoft Word, Adobe tools, etc. The beauty of this file is that the content of this file can be searched, you can select the text in this file, style them and copy-paste, etc.
  2. Scanned PDF: This one is exactly opposite to the first one, and this was Villain in my requirement. The issue with this type is that though the content looks visually the same, it can not be searchable, select, copy-paste, etc, as in the end it is an image inserted to a PDF document. Now how can we read the contents of this file, that is where the technology called OCR (Optical Character Recognition) comes into the picture. With this, we can read the content, and make them searchable, etc. And when we do that, we introduce the third type of PDF file
  3. Searchable/OCRed PDF: It is the type that we get from the OCR process as an output. In the end, this type will have two-layer in it, one is the image that we get from a scanner, and the second is the text content. With this two-layer, this file becomes almost equal to the first kind

Now let’s go see what was my requirement and how did I overcome this process.

Background

Technology is fast and starts running today if you want to touch it. I have a One Drive Sync folder to which I save the scanned PDF files from my scanner and once that is done the same will be synced to my SharePoint online. So far so good. But the problem is the content of these files are not searchable. Now let’s fix that.

Fix to make Scanned PDF files searchable

We use Microsoft Flow to do this process of converting the Scanned PDF to the Searchable PDF file. And in the flow, there are many ways that you can do this, I initially tried to do it with the combination of Computer Vision AI and some other services as preceding.

Computer Vision AI in SharePoint

But, I was not getting the expected output when I was using them. So, I decided to go with other options. If you are new with OCR technology or Computer Vision AI, you can find my article here.

Create a flow

The files are being synced to my Document folder in SharePoint, thus I needed to create a flow that gets triggered whenever there is a file uploaded.

Create Flow

Click on the “Create a flow” then you will be asked to select the flow template. I selected the template “When a new file is added in SharePoint, complete a custom action”.

When a new file is added in SharePoint, complete a custom action

Once you click on the Continue button, you are good to create new steps in your flow.

Add Steps in Flow

Flow is a step by step solution and some steps may be having an output that we can carry to the next step and in our flow, we use this a lot. Once you connect to the SharePoint site, we need to get the uploaded file properties, to do that, click on the +(plus) icon, select “Add an action” and then search for “Get File Properties”

Get File Properties Step

Now select the Site address and the library, and then click on the ID field, you will see an option to select the output of the previous step.

The ID of the file created

Now we get the file and need to check the file type right, to do that add a condition control and then add the conditions to it.

Condition to check whether PDF or image

Each condition will have an output as “Yes” or “No” and in the “Yes” part, we will add all of our other steps and we will not think about the “No” output now. But, you can think of adding some tasks there.

Now in the “Yes” tab, we can get the file and pass it to the OCR process, that is where the tool called AquaForest comes into the story. Please follow the steps mentioned in this article and get the key needed. Once that is done, add the action “OCR PDF or Images” by searching the word ” AquaForest”.

AquaForest OCR PDF or Images

Give the connection a name and add the key in the next popup. There are many properties that you can set here, but the below two are important.

File Content with OCR

As an output of this step, we get the OCRed file and now all we have to do is to add the action called “Create File” and set up the same.

Save the OCRed File

Wow, now we have a Searchable PDF in our Document folder. Go search with any content of your newly updated PDF. If you wish, you can also create an action to send an acknowledgment mail.

Send email step in Flow

Testing the flow

As we already created the flow, now it is time to test the same. To do that, I added a scanned document to my one drive folder. We can check the Flow running status in the portal.

Run History of Flow

Below is the sample run history output of my flow.

Sample Flow Run History PDF OCR

Conclusion

Thanks a lot for staying with me for a long time and reading this article. I hope now you have learned about

  • creating a flow in SharePoint online
  • creating the steps in Flow
  • use the connections in Flow
  • OCR the PDF using Computer Vision
  • OCR the PDF using AquaForest API
  • creating a new File with OCRed output
  • send mails from Flow

If you have learned anything else from this article, please let me know in the comment section.

Follow me

If you like this article, consider following me, haha!.

  • GitHub
  • medium
  • Twitter

Your turn. What do you think?

Thanks a lot for reading. Did I miss anything that you may think which is needed in this article? Could you find this post useful? Kindly do not forget to share your feedback.

Kindest Regards
Sibeesh Venu

TagsaquaforestCognitive ServicesComputer Visionflowmicrosoft flowocrsearchable pdf in sharepointsharepointsharepoint flowsharepoint onlineSharePoint Tips
Previous Article

Apple iPad Pro 3rd Generation Detailed Review, ...

Next Article

Azure Form Recognizer and Microsoft Flow to ...

0
Shares
  • 0
  • +
  • 0
  • 0
  • 0

SibeeshVenu

I am Sibeesh Venu, an engineer by profession and writer by passion. Microsoft MVP, Author, Speaker, Content Creator, Youtuber, Programmer.

Related articles More from author

  • Uncategorized

    What is the Difference Between Office 365 and SharePoint

    May 1, 2017
    By Ashish Ratan Singh
  • Porsche Car Result
    AzureCognitive Services

    Custom Vision AI – Building Your Own Custom Model and Train

    November 25, 2018
    By SibeeshVenu
  • Office 365SharePoint

    What is SharePoint? Explore its Top 10 Benefits & Advantages

    April 21, 2017
    By Ashish Ratan Singh
  • Office 365

    Reminder to a Teams Channel Using Power Automate Flow

    August 23, 2021
    By SibeeshVenu
  • AngularAzureCognitive Services

    Using Azure Cognitive Service Computer Vision AI to read text from an image

    April 4, 2019
    By SibeeshVenu
  • AzureCognitive ServicesOffice 365SharePoint

    Azure Form Recognizer and Microsoft Flow to Search Scanned PDF Content in SharePoint Online

    March 5, 2020
    By SibeeshVenu
0

My book

Asp Net Core and Azure with Raspberry Pi Sibeesh Venu

YouTube

MICROSOFT MVP (2016-2022)

profile for Sibeesh Venu - Microsoft MVP

Recent Posts

  • Linux Azure Function Isolated Dot Net 9 YAML Template Deployment
  • Build, Deploy, Configure CI &CD Your Static Website in 5 mins
  • Easily move data from one COSMOS DB to another
  • .NET 8 New and Efficient Way to Check IP is in Given IP Range
  • Async Client IP safelist for Dot NET
  • Post Messages to Microsoft Teams Using Python
  • Get Azure Blob Storage Blob Metadata Using PowerShell
  • Deploy .net 6 App to Azure from Azure DevOps using Pipelines
  • Integrate Azure App Insights in 1 Minute to .Net6 Application
  • Azure DevOps Service Connection with Multiple Azure Resource Group

Tags

Achievements (35) Angular (14) Angular 5 (7) Angular JS (15) article (10) Article Of The Day (13) Asp.Net (14) Azure (65) Azure DevOps (10) Azure Function (10) Azure IoT (7) C# (17) c-sharp corner (13) Career Advice (11) chart (11) CSharp (7) CSS (7) CSS3 (6) HighChart (10) How To (9) HTML5 (10) HTML5 Chart (11) Interview (6) IoT (11) Javascript (10) JQuery (82) jquery functions (9) JQWidgets (15) JQX Grid (17) Json (7) Microsoft (8) MVC (20) MVP (9) MXChip (7) News (18) Office 365 (7) Products (10) SQL (20) SQL Server (15) Visual Studio (10) Visual Studio 2017 (7) VS2017 (7) Web API (12) Windows 10 (7) Wordpress (9)
  • .NET
  • Achievements
  • ADO.NET
  • Android
  • Angular
  • Arduino
  • Article Of The Day
  • ASP.NET
  • Asp.Net Core
  • Automobile
  • Awards
  • Azure
  • Azure CDN
  • azure devops
  • Blockchain
  • Blog
  • Browser
  • C-Sharp Corner
  • C#
  • Career Advice
  • Code Snippets
  • CodeProject
  • Cognitive Services
  • Cosmos DB
  • CSS
  • CSS3
  • Data Factory
  • Database
  • Docker
  • Drawings
  • Drill Down Chart
  • English
  • Excel Programming
  • Exporting
  • Facebook
  • Fun
  • Gadgets
  • GitHub
  • GoPro
  • High Map
  • HighChart
  • How to
  • HTML
  • HTML5
  • Ignite UI
  • IIS
  • Interview
  • IoT
  • JavaScript
  • JQuery
  • jQuery UI
  • JQWidgets
  • JQX Grid
  • Json
  • Knockout JS
  • Linux
  • Machine Learning
  • Malayalam
  • Malayalam Poems
  • MDX Query
  • Microsoft
  • Microsoft ADOMD
  • Microsoft MVP
  • Microsoft Office
  • Microsoft Technologies
  • Microsoft Windows
  • Microsoft Windows Server
  • Mobile
  • MongoDB
  • Monthly Winners
  • MVC
  • MVC Grid
  • MySQL
  • News
  • Node JS
  • npm
  • Number Conversions
  • October 2015
  • Office 365
  • Office Development
  • One Plus
  • Outlook
  • Page
  • PHP
  • Poems
  • PowerShell
  • Products
  • Q&A
  • Raspberry PI
  • React
  • SEO
  • SharePoint
  • Skype
  • Social Media
  • Software
  • Spire.Doc
  • Spire.PDF
  • Spire.XLS
  • SQL
  • SQL Server
  • SSAS
  • SSMS
  • Storage In HTML5
  • Stories
  • Third Party Software Apps
  • Tips
  • Tools
  • Translator Text
  • Uncategorized
  • Unit Testing
  • UWP
  • VB.Net
  • Videos
  • Virtual Machine
  • Visual Studio
  • Visual Studio 2017
  • Wamp Server
  • Web API
  • Web Platform Installer
  • Webinars
  • WebMatrix
  • Windows 10
  • Windows 7
  • Windows 8.1
  • Wordpress
  • Writing

ABOUT ME

I am Sibeesh Venu, an engineer by profession and writer by passion. Microsoft MVP, Author, Speaker, Content Creator, Youtuber, Programmer. If you would like to know more about me, you can read my story here.

Contact Me

  • info@sibeeshpassion.com

Pages

  • About
  • Search
  • Privacy Policy
  • About
  • Search
  • Privacy Policy
© Copyright Sibeesh Passion 2014-2025. All Rights Reserved.
Go to mobile version