Collect Data. Convert Scanned Documents to Text. Extracting Data from Contracts and Receipts. 📚 10 Lessons

How to extract data from scanned documents. From PDF format to images, table and text.

Enrolment validity: 0 day

159.00 59.00

Description

In this course, we will step by step, using the example of real data, we will go through the main processes related to the topic “Big data and machine learning”.
Since the material turned out to be voluminous, I divided the course into five parts.

🎓 The Second part is devoted to the collection and extraction of data from scanned documents and Images. In this course, you will learn how to extract data from From Scanned Documents And Images, invoices, receipts, contracts and any other documents in PDF format or in Image format.

✔️We will work on real data. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.

✔️ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.


🔎 Topics covered in this course:

 📝 Lecture 1. Python. Choosing python IDE. Anaconda. Install Python.

• How to convert a scanned PDF to text?

• Python or Anaconda?

• Choosing an Python IDE for beginners.

• How to install Visual Studio Code on Windows?

• How to install Python?

• How to run Python in VS Code?

📝 Lecture 2. Scanned PDF files. Convert a pdf document to images using Python.

• How to convert scanned PDF to JPEG?

• How to Install Tesseract OCR?

• What is Tesseract?

• Google OCR in Python with Tesseract.

• Extract a page from a pdf as a jpeg

• How to convert a pdf document to images using python?

• Convert PDF to Image using Python.

• Install Poppler, Pillow (PIL) module.

📝 Lecture 3. Installing Tesseract. User-defined functions in Python

• Installing Tesseract for Windows

• Install PyTesseract OCR.

• Iterate over files in a given directory.

• How is try/except used in Python?

• Writing user-defined functions in Python

📝 Lecture 4. Regular Expression in Python. Pattern matching in Python.

• What is regular expression?

• How do you match in regex?

• Online RegEx tester and debugger.

• Use Findall in Python?

• Using Regex for Text Manipulation in Python.

📝 Lecture 5. Array und Function in Python. Add data to Array.

• Add a string to an array.

• How to declare and add items to an array in Python?

• Write a function in Python.

• Save data to Pandas Dataframe.

📝 Lecture 6. GeoPy – easy to locate the coordinates. Get the latitude and longitude of location

• How do I convert address to coordinates?

• How do you geocode data?

• Locate the coordinates.

• How do I find the geocode of an address?

• Install GeoPy module.

• Install GDL, Fiona module.

📝 Lecture 7. Kaggle. Jupyter Notebook. Plot data with matplotlib, seaborn, squarify.

•Visualize a dataset.

•Run Jupyter notebook using Kaggle.

• Python Treemaps with Squarify and Matplotlib.

• How do you create a TreeMap chart?

• How to Convert Strings to Floats in Pandas DataFrame.

• Replacing strings with numbers in Python

• Plot a DataFrame with matplotlib and seaborn.

📝 Lecture 8. Folium. Mapping in Python. Plot Geographic Data on a Map.

• Plot Geographic Data on a Map.

• How to use folium with Jupyter notebook?

• Placing coordinates on a map.

• How to plot data on maps in Jupyter.

• Efficiently display a map with CircleMarker().

• Mapping in Python with geopandas.

• Black & White map with Folium.

📝 Lecture 9. GitHub. Desktop GitHub. Store and manage code

•GitHub and how do you use it.

•Upload files to GitHub.

•Install GitHub Desktop.

•Sync with a remote Git repository.

•Adding a repository from your local computer to GitHub.

What Will I Learn?

  • How to convert a scanned PDF to text?
  • Choosing an Python IDE for beginners.
  • How to install Visual Studio Code on Windows?
  • How to install Python?
  • How to run Python in VS Code?
  • How to convert scanned PDF to JPEG?
  • How to Install Tesseract OCR?
  • What is Tesseract?
  • Google OCR in Python with Tesseract.
  • Extract a page from a pdf as a jpeg
  • How to convert a PDF document to images using python?
  • Convert PDF to Image using Python.
  • Install Poppler, Pillow (PIL) module.
  • What is regular expression?
  • How do you match in regex?
  • Online RegEx tester and debugger.
  • Use Findall in Python?
  • Using Regex for Text Manipulation in Python.
  • Add a string to an array.
  • How to declare and add items to an array in Python?
  • Write a function in Python.
  • Save data to Pandas Dataframe.
  • How do I convert address to coordinates?
  • How do you geocode data?
  • Locate the coordinates.
  • How do I find the geocode of an address?
  • Install GeoPy module.
  • Install GDL, Fiona module.
  • Visualize a dataset.
  • Run Jupyter notebook using Kaggle.
  • Python Treemaps with Squarify and Matplotlib.
  • How do you create a TreeMap chart?
  • How to Convert Strings to Floats in Pandas DataFrame.
  • Replacing strings with numbers in Python
  • Plot a DataFrame with matplotlib and seaborn.
  • Plot Geographic Data on a Map.
  • How to use folium with Jupyter notebook?
  • Placing coordinates on a map.
  • How to plot data on maps in Jupyter.
  • Efficiently display a map with CircleMarker().
  • Mapping in Python with geopandas.
  • Black & White map with Folium.
  • Upload files to GitHub.
  • GitHub and how do you use it.
  • Install GitHub Desktop.
  • Sync with a remote Git repository.
  • Adding a repository from your local computer to GitHub.

Topics for this course

10 Lessons1h 35m

Introduction

Python. PyTesseract OCR. Regular Expression. Array und Function.

Regex. Regular Expression in Python.

Kaggle. Kaggle. Jupyter Notebook.

Github

Share

Course Details

  • Level: Beginner
  • Categories: Big Data
  • Total Hour: 1h 35m
  • Total Lessons: 10
  • Last Update: August 11, 2021

Requirements

  • You need only the installed Windows System.
  • You do not need any special programming knowledge or theoretical knowledge of Python.

Target Audience

  • Beginners who are interested in Big Data and Machine Learning using Python
  • Who already have Python programming skills but want to practice with a hands-on, real-world data project
  • This course can be opted by anyone (students, developer, manager) who is interested to learn big data.
  • Practical Step-by-Step Course for Beginners.
  • BIM Managers
  • BIM Coordinators
  • BIM Engineer
  • BIM Specialist
  • Professionals in the AEC industry
  • Structural Engineers
  • Architects
  • Designers
Share