Big Data. Extract Data from PDF Drawings and Documents. PDF – Excel – Charts and Graphs. 📚 10 Lessons

Convert PDF documents to Text and Graphics. Data Visualisation. Python OCR. Practical Step-by-Step Course for Beginners.
179.00 59.00

Description

🎓 The first part is devoted to the collection and extraction of data from documents.

✔️ In this course, you will learn how to extract data from PDF documents, drawings and any other documents in PDF format.
✔️ We will work on real data. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.
✔️ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.

🔎 Topics covered in this course:

 📝 Lecture 2. Python. Choosing python IDE. Anaconda. Install Python
  • How to convert a PDF to text?
  • Python or Anaconda?
  • What is the best Python IDE for beginners?
  • How do I install VS Code?
  • How do I install Python?
  • How to run Python in VS Code?
  • How does Python interpreter choose VS code?

 📝 Lecture 3. 1st Dataset. PDF files. Tika OCR. Extracting content and metadata
  • How do I convert a PDF to TXT in Python?
  • How can I iterate over files in a given directory?
  • Install Apache Tika on Windows.
  • How to split a string into a list?
  • Remove blank strings from a list?

 📝 Lecture 4. Regular Expression in Python. Pattern matching in Python.
  • What is regular expression with example?
  • How to match regular expression in Python?
  • Debug a regular expression in Python?
  • What is the regular expression for date format?
  • How do you check if an array contains a regular expression?
  • Create loop with regular expression.
 📝 Lecture 5. Array und Function in Python. Add data to Array. Create function.
  • How do you add a string to an array?
  • How do you find the index of an element in a list?
  • How can I extract the date from a string?
  • How to declare and add items to an array in Python?
  • How do you write a function in Python?
📝 Lecture 6. Pandas DataFrame. Two-dimensional size-mutable, tabular data structure.
  • How to install pandas on Python?
  • How do I create a pandas DataFrame?
  • How to reduce number of columns in a pandas DataFrame?
  • How to combine column values into a list in a new column?
  • How to convert array into DataFrame in Python?
  • How to change column names in pandas Dataframe?
  • How do I save a Dataframe as CSV table?
📝 Lecture 7. Kaggle. Jupiter Notebook. Create an account. Plotting with matplotlib and seaborn.
  • How do I upload a file to kaggle kernel?
  • How do you use kaggle dataset?
  • How to run Jupyter notebook using Kaggle kernels?
  • How to convert a CSV to dataframe in Python Jupyter Notebook?
  • How to use the functions of Pandas Dataframe?
  • How do I change the date format of a column in pandas?
  • How do I convert a string to datetime Objects in Python?
  • How to Calculate Difference Between Two Dates in Pandas Dataframe?
  • How do I delete a column in pandas DataFrame?
  • How do I add columns in  pandas DataFrame?
  • How do you visualize a dataset?
  • How do you plot a DataFrame in pandas?
📝 Lecture 8. 2nd Dataset. Task. Data from PDF. Getting data from PDF drawings.
  • Independent Work Tasks
  • Learn to Code – on real data (16 PDF files to chart)
  • A brief overview of the data in the task
📝 Lecture 9. GitHub. Desktop GitHub. Store and manage code.
  • What is GitHub and how do you use it?
  • What can I use GitHub for?
  • How do I upload files to GitHub?
  • How to install GitHub Desktop?
  • How to sync with a remote Git repository?
  • Ho adding a repository from your local computer to GitHub?

What Will I Learn?

  • How to convert a PDF to text?
  • How do I install Python?
  • How do you visualize a dataset?
  • What is GitHub and how do you use it?
  • What is regular expression?
  • How do I install VS Code?
  • How to run Python in VS Code?
  • How do you use kaggle dataset?
  • How to install pandas on Python?
  • How do I convert a PDF to TXT in Python?
  • What is the best Python IDE for beginners?
  • How can I iterate over files in a given directory?
  • How to Install Apache Tika on Windows?
  • How to split a string into a list?
  • How do I remove blank strings from a list?
  • How does Python interpreter choose VS code?
  • How to match regular expression in Python?
  • How can I debug a regular expression in Python?
  • What is the regular expression for date format?
  • How do you check if an array contains a regular expression?
  • How to create loop with regular expression?
  • How do you add a string to an array?
  • How do you find the index of an element in a list?
  • How can I extract the date from a string?
  • How do you write a function in Python?
  • How do I create a pandas DataFrame?
  • How to reduce number of columns in a pandas DataFrame?
  • How to convert array into DataFrame in Python?
  • How to change column names in pandas Dataframe?
  • How do I save a Dataframe as CSV table?
  • How do I upload a file to kaggle kernel?
  • How to run Jupyter notebook using Kaggle kernels?
  • How to convert a CSV to dataframe in Python Jupyter Notebook?
  • How do I change the date format of a column in pandas?
  • How do I convert a string to datetime Objects in Python?
  • How to Calculate Difference Between Two Dates in Pandas Dataframe?
  • How do I delete a column in pandas DataFrame?
  • How do I add columns in pandas DataFrame?
  • How do you plot a DataFrame in pandas?
  • What can I use GitHub for?
  • How do I upload files to GitHub?
  • How to install GitHub Desktop?
  • How to sync with a remote Git repository?
  • How adding a repository from your local computer to GitHub?

Topics for this course

10 Lessons01h 22m

Introduction

Python. 1st Dataset. PDF files. Tika OCR. Regular Expression. Array und Function

Pandas DataFrame. Kaggle. Jupiter Notebook.

Independent Work Tasks. 2nd Dataset.

GitHub. Desktop GitHub

Share

Course Details

  • Level: Beginner
  • Categories: Big Data
  • Total Hour: 01h 22m
  • Total Lessons: 10
  • Last Update: August 11, 2021

Requirements

  • You need only the installed Windows System.
  • You do not need any special programming knowledge or theoretical knowledge of Python.

Target Audience

  • Beginners who are interested in Big Data and Machine Learning using Python
  • This course is for beginners so you do not need any special programming knowledge.
  • This course can be opted by anyone (students, developer, manager) who is interested to learn big data.
  • Designer
  • Architect
  • BIM Manager
  • BIM Engineer
  • BIM Specialist
  • Professionals in the AEC industry
  • Professionals in the construction industry
Share