In this course, we will step by step, using the example of real data, we will go through the main processes related to the topic “Big data and machine learning”.
Since the material turned out to be voluminous, I divided the course into five parts.
🎓 The Second part is devoted to the collection and extraction of data from scanned documents and Images. In this course, you will learn how to extract data from From Scanned Documents And Images, invoices, receipts, contracts and any other documents in PDF format or in Image format.
✔️We will work on real data. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.
✔️ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.
🔎 Topics covered in this course:
• How to convert a scanned PDF to text?
• Python or Anaconda?
• Choosing an Python IDE for beginners.
• How to install Visual Studio Code on Windows?
• How to install Python?
• How to run Python in VS Code?
• How to convert scanned PDF to JPEG?
• How to Install Tesseract OCR?
• What is Tesseract?
• Google OCR in Python with Tesseract.
• Extract a page from a pdf as a jpeg
• How to convert a pdf document to images using python?
• Convert PDF to Image using Python.
• Install Poppler, Pillow (PIL) module.
• Installing Tesseract for Windows
• Install PyTesseract OCR.
• Iterate over files in a given directory.
• How is try/except used in Python?
• Writing user-defined functions in Python
• What is regular expression?
• How do you match in regex?
• Online RegEx tester and debugger.
• Use Findall in Python?
• Using Regex for Text Manipulation in Python.
• Add a string to an array.
• How to declare and add items to an array in Python?
• Write a function in Python.
• Save data to Pandas Dataframe.
• How do I convert address to coordinates?
• How do you geocode data?
• Locate the coordinates.
• How do I find the geocode of an address?
• Install GeoPy module.
• Install GDL, Fiona module.
•Visualize a dataset.
•Run Jupyter notebook using Kaggle.
• Python Treemaps with Squarify and Matplotlib.
• How do you create a TreeMap chart?
• How to Convert Strings to Floats in Pandas DataFrame.
• Replacing strings with numbers in Python
• Plot a DataFrame with matplotlib and seaborn.
• Plot Geographic Data on a Map.
• How to use folium with Jupyter notebook?
• Placing coordinates on a map.
• How to plot data on maps in Jupyter.
• Efficiently display a map with CircleMarker().
• Mapping in Python with geopandas.
• Black & White map with Folium.
•GitHub and how do you use it.
•Upload files to GitHub.
•Install GitHub Desktop.
•Sync with a remote Git repository.
•Adding a repository from your local computer to GitHub.
If you don’t already have an account click the button below to create your account.Create New Account