Step-by-Step Full Course. All Parts. 📚 46 Lessons

Practical Step-by-Step Course for Beginners. In this course, we will step by step, using the example of real data, we will go through the main processes related to the topic “Big data and machine learning”.

Description

🎓 This course is intended to be an initiation to learn #BigData and #MachineLearning with #Python programming for absolute beginners that have no background in programming.

In this course, we will step by step, using the example of real data, we will go through the main processes related to the topic “Big data and machine learning”. Since the material turned out to be voluminous, I divided the course into five parts.

📑 The first part is devoted to the collection and extraction of data from documents.
✔️ In this course, you will learn how to extract data from PDF documents, drawings and any other documents in PDF format. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.
✔️ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.

📑 The Second part is devoted to the collection and extraction of data from scanned documents and Images. In this course, you will learn how to extract data from From Scanned Documents And Images, invoices, receipts, contracts and any other documents in PDF format or in Image format.
✔️We will work on real data. We will have two sets of data consisting of PDF files that we will transform to the text and to tabular form. We will visualize the received data on the Kaggle platform using python libraries, which will help us to depict our received data in a graphical format.
✔️ During the training process, we will install Python and such libraries as Pandas, seaborn, matplotlib and others. We will upload the received data to the Kaggle platform and here using the “Jupiter Notebook” we will visualize our data and at the end, we will upload our data to the GitHub platform.

📑 In third part we will consider the main options for storing big data.
✔️ In practical lesson we will install the MySQL server on computer and learn how to work and edit MySQL databases.
In the fifth lesson we will take one regular exel table and transfer the information from this table to the MySql server.
✔️ Then we will install the spark in order to work with datasets in a distributed manner.Then, to process the distributed data, we export the data from MySQL into spark. And with the help of Jupiter Notebook, we prepare the data for visualization of this data.

📑 In fourth part we will look at the main platforms for visualizing Big Data and consider the main Data Visualization Online-Tools for Big Data.
✔️ We will briefly look at these platforms and generate several reports in each of the platforms. This will give you the opportunity to choose the right platform that suits you and your data.
✔️ In practical lesson we exported an excel file with our data to the Kaggle platform and using a Jupyter Notebook we cleared the data and visualized the data using different python libraries.

📑 In fifth part we will examine in detail the basic types, terms and algorithms of machine learning. We go through the basic concepts of machine learning that beginners need. We will consider in more detail such algorithms as K-means supervised Machine Learning, Linear Regression and other algorithms for Machine Learning.
✔️ In practical lessons we will predict the time and cost of construction for the new project X, based on the data that we collected on previous projects. And in another lesson we will predict the cost of building project X and construction time by the parameters that we will set for the new project x
✔️ Then we take open source data for the San Francisco city. We will clear this raw data and display the data in the form of a charts and maps. We will collect various interesting insights from this public information. Then we will prepare the data to create a machine learning model and try to predict some parameters from this data.

📦 Learning Resources

🔎 Topics covered in this course:

📝 Lecture 2. Python. Choosing python IDE. Anaconda. Install Python

How to convert a PDF to text?
Python or Anaconda?
What is the best Python IDE for beginners?
How do I install VS Code?
How do I install Python?
How to run Python in VS Code?
How does Python interpreter choose VS code?

📝 Lecture 3. 1st Dataset. PDF files. Tika OCR. Extracting content and metadata

How do I convert a PDF to TXT in Python?
How can I iterate over files in a given directory?
Install Apache Tika on Windows.
How to split a string into a list?
Remove blank strings from a list?

📝 Lecture 4. Regular Expression in Python. Pattern matching in Python.

What is regular expression with example?
How to match regular expression in Python?
Debug a regular expression in Python?
What is the regular expression for date format?
How do you check if an array contains a regular expression?
Create loop with regular expression.

📝 Lecture 5. Array und Function in Python. Add data to Array. Create function.

How do you add a string to an array?
How do you find the index of an element in a list?
How can I extract the date from a string?
How to declare and add items to an array in Python?
How do you write a function in Python?

📝 Lecture 6. Pandas DataFrame. Two-dimensional size-mutable, tabular data structure.

How to install pandas on Python?
How do I create a pandas DataFrame?
How to reduce number of columns in a pandas DataFrame?
How to combine column values into a list in a new column?
How to convert array into DataFrame in Python?
How to change column names in pandas Dataframe?
How do I save a Dataframe as CSV table?

📝 Lecture 7. Kaggle. Jupiter Notebook. Create an account. Plotting with matplotlib and seaborn.

How do I upload a file to kaggle kernel?
How do you use kaggle dataset?
How to run Jupyter notebook using Kaggle kernels?
How to convert a CSV to dataframe in Python Jupyter Notebook?
How to use the functions of Pandas Dataframe?
How do I change the date format of a column in pandas?
How do I convert a string to datetime Objects in Python?
How to Calculate Difference Between Two Dates in Pandas Dataframe?
How do I delete a column in pandas DataFrame?
How do I add columns in pandas DataFrame?
How do you visualize a dataset?
How do you plot a DataFrame in pandas?

📝 Lecture 8. 2nd Dataset. Task. Data from PDF. Getting data from PDF drawings.

Independent Work Tasks
Learn to Code – on real data (16 PDF files to chart)
A brief overview of the data in the task

📝 Lecture 9. GitHub. Desktop GitHub. Store and manage code.

What is GitHub and how do you use it?
What can I use GitHub for?
How do I upload files to GitHub?
How to install GitHub Desktop?
How to sync with a remote Git repository?
Ho adding a repository from your local computer to GitHub?

📝 Lecture 10. Python. Choosing python IDE. Anaconda. Install Python.

• How to convert a scanned PDF to text?

• Python or Anaconda?

• Choosing an Python IDE for beginners.

• How to install Visual Studio Code on Windows?

• How to install Python?

• How to run Python in VS Code?

📝 Lecture 11. Scanned PDF files. Convert a pdf document to images using Python.

• How to convert scanned PDF to JPEG?

• How to Install Tesseract OCR?

• What is Tesseract?

• Google OCR in Python with Tesseract.

• Extract a page from a pdf as a jpeg

• How to convert a pdf document to images using python?

• Convert PDF to Image using Python.

• Install Poppler, Pillow (PIL) module.

📝 Lecture 12. Installing Tesseract. User-defined functions in Python

• Installing Tesseract for Windows

• Install PyTesseract OCR.

• Iterate over files in a given directory.

• How is try/except used in Python?

• Writing user-defined functions in Python

📝 Lecture 13. Regular Expression in Python. Pattern matching in Python.

• What is regular expression?

• How do you match in regex?

• Online RegEx tester and debugger.

• Use Findall in Python?

• Using Regex for Text Manipulation in Python.

📝 Lecture 14. Array und Function in Python. Add data to Array.

• Add a string to an array.

• How to declare and add items to an array in Python?

• Write a function in Python.

• Save data to Pandas Dataframe.

📝 Lecture 15. GeoPy – easy to locate the coordinates. Get the latitude and longitude of location

• How do I convert address to coordinates?

• How do you geocode data?

• Locate the coordinates.

• How do I find the geocode of an address?

• Install GeoPy module.

• Install GDL, Fiona module.

📝 Lecture 16. Kaggle. Jupyter Notebook. Plot data with matplotlib, seaborn, squarify.

•Visualize a dataset.

•Run Jupyter notebook using Kaggle.

• Python Treemaps with Squarify and Matplotlib.

• How do you create a TreeMap chart?

• How to Convert Strings to Floats in Pandas DataFrame.

• Replacing strings with numbers in Python

• Plot a DataFrame with matplotlib and seaborn.

📝 Lecture 17. Folium. Mapping in Python. Plot Geographic Data on a Map.

• Plot Geographic Data on a Map.

• How to use folium with Jupyter notebook?

• Placing coordinates on a map.

• How to plot data on maps in Jupyter.

• Efficiently display a map with CircleMarker().

• Mapping in Python with geopandas.

• Black & White map with Folium.

📝 Lecture 18. GitHub. Desktop GitHub. Store and manage code

•GitHub and how do you use it.

•Upload files to GitHub.

•Install GitHub Desktop.

•Sync with a remote Git repository.

•Adding a repository from your local computer to GitHub.

📝 Lecture 19. Big Data Storage. Three ways to store digital data.

What is big data?
What storage options we have today?
Public Cloud and Private Cloud.
Distribute you data

📝 Lecture 20. MySql. SQL. Introduction. How it works?

What is MySQL?
How Does MySQL Work?
Why is MySQL so Popular?

📝 Lecture 21. Installing and Launching MySQL Workbench. How to Get Started with MySQL Workbench

MySQL server setup
Initial settings
Getting Started with MySQL

📝 Lecture 22. Practice. Excel table into MySql. Import Excel data into a MySQL database.

Import Excel data into a MySQL
Create a new MySQL table.
Most Common Queries.
SELECT, DROP, UPDATE query mysql

📝 Lecture 23. Spark. Hadoop. Data’s Distribution. A Storage System for Big Data.

What is Hadoop?
Spark vs MySql
Spark. Analytics engine for big data processing

📝 Lecture 24. Installing and Launching Apache Spark. Download and Get Started.

Installing Apache Spark
updating PATH environment
Getting Started with Spark
Launching Apache Spark

📝 Lecture 25. Practice. Connecting Python To The Spark. Get Started with PySpark and Jupyter Notebook

Installing Anaconda On Windows
Running the Jupyter Notebook
Connecting Jupyter notebook to Spark

📝 Lecture 26. Practice. Connecting MySQL with Spark. Export Data from Mysql to Spark.

Connecting Jupyter notebook to Spark
How to set up PySpark for your Jupyter Notebook
Export Data from Mysql to Spark
Importing Spark Dataframes from MySQL on Jupyter notebooks

📝 Lecture 27. Data Visualization Tools. Power BI, Tableau, Google Data Studio, Jupyter.

What is Business Intelligence?
Data Visualization Tools
Was ist Business Intelligence? Was ist BI?
Jupyter Notebooks as a Custom Calculation Engine
Machine Learning Visualizations made in Python

📝 Lecture 28. Python Data Visualizations. Prepare Data for Visualizations. (Part 1/3)

Export data from Excel to Python
Uploading data to Visualizations on Kaggle
Introduction to Jupyter Notebooks
Prepare data for Visualisations

📝 Lecture 29. Python Data Visualizations. Clean data for Visualizations (Part 2/3).

Clean data for Visualizations
Use Pandas in Jupyter Notebook
Data Cleaning With Pandas

📝 Lecture 30. Python Data Visualizations. Data Visualizations in Jupyter Notebook (Part 3/3).

Visualization with Seaborn and Matplotlib
Data visualization by Heatmaps and Scatter plots
Python Treemaps with Squarify
Three-Dimensional Plotting in Matplotlib

📝 Lecture 31. Practice. Power BI. Introduction and getting started.

Pros and Cons of Power BI
Import an Excel file into Power BI
How to Get Started
Treemaps in Power BI
Creating Reports in Power BI

📝 Lecture 32. Practice. Tableau. Introduction and getting started.

Pros and Cons of Tableau
Import an Excel file into Tableu
How to Get Started in Tableu
Treemaps in Tableu
Creating Reports in Tableu
Creating Dashboards in Tableu

📝 Lecture 33. Practice. Google Data Studio. Introduction and getting started.

Pros and Cons of Google Data Studio
Import an Excel file into Google Data Studio
How to Get Started in Google Data Studio
Treemaps in Google Data Studio
Creating Reports in Google Data Studio
Creating Dashboards in Google Data Studio

📝 Lecture 34. What is machine learning? Key ML Terminology.

What is machine learning?
Key ML Terminology
Supervised Machine Learning
Unsupervised Machine Learning
Reinforcement Learning

📝 Lecture 35. Practice. Predict the price of houses. Dataset 1. Beginner’s Guide to Jupyter

Jupyter Notebooks for Data Science
Introduction to Kaggle for Beginners in Machine Learning
Supervised learning: predicting an output
Predict the price of a house

📝 Lecture 36. How does machine learning work? Prediction of construction time and cost.

Prediction of time and cost for small training dataset
K-means supervised Machine Learning algorithm
Understanding K-means Clustering in Machine Learning
Overview of Machine Learning Algorithms

📝 Lecture 37. Practice. Prediction of price and time. Data upload and preparation (Part 1/2)

Getting started with Machine Learning in MS Excel
A Kaggle Walkthrough – Cleaning Data
Beginner’s Guide to Jupyter Notebooks
Train, Validation Sets in Machine Learning
Splitting data into Training & Validation

📝 Lecture 38. Practice. Prediction of price and time. Evaluation Metrics (Part 2/2)

Determined the cost and time of construction work for project X
Evaluation Metrics for Machine Learning Model
Linear Regression for Machine Learning
How our algorithm works visually
Creating and Visualizing Decision Trees

📝 Lecture 39. Workflow of a Machine Learning project. Stages of the Machine Learning Modeling

Stages of the Machine Learning Modeling Cycle
Learning Phase of Machine Learning
Inference from Model
Machine Learning Deployment Pipeline

📝 Lecture 40. Practice. Data loading and preparation to Analyzing (Part 1/2).

Build a Predictive Model
Training and Validation Sets: Splitting Data
Determining the “estimated cost” by parameters
Predict the “estimated cost” for arbitrary parameters
Evaluation Metrics for Machine Learning Model
Linear regression Predictive Models

📝 Lecture 41. Practice. Cost Prediction. Way to build a Predictive Model (Part 2/2).

Find Open Datasets
Loading large Datasets into Kaggle
Data visualization and analysis in Kaggle
Average postcode price on a San Francisco map
Total cost of all building permits for the postal code
Average “estimated cost” by type of housing

Topics for this course

40 Lessons07h 15m

Step-by-Step Full Course. All Parts. 📚 46 Lessons

Login

Description

📦 Learning Resources

🔎 Topics covered in this course:

Topics for this course

Introduction

Introduction00:03:31

Part 1. Python. 1st Dataset. PDF files. Tika OCR. Regular Expression. Array und Function

Python. Choosing python IDE. Anaconda. Install Python.00:05:38

1st Dataset. PDF files. Tika OCR. Extracting content and metadata.00:08:40

Regular Expression in Python. Pattern matching in Python with RegEx.00:10:33

Array und Function in Python. Add data to Array. Create function.00:09:48

Python & Regular Expressions

Part 1. Pandas DataFrame. Kaggle. Jupiter Notebook.

Pandas DataFrame. Two-dimensional size-mutable, tabular data structure.00:09:15

Kaggle. Jupiter Notebook. Create an account. Plotting with matplotlib and seabor00:11:35

Pandas & Kaggle

Part 1. Independent Work Tasks. 2nd Dataset.

2nd Dataset. Task. Data from PDF. Getting data from PDF drawings.00:01:52

2nd Dataset. My solution.00:05:39

Part 1. GitHub. Desktop GitHub

GitHub. Desktop GitHub. Store and manage code00:07:03

Part 2. Python. PyTesseract OCR. Regular Expression. Array und Function.

Scanned PDF files. Convert a pdf document to images using Python.00:10:49

Installing Tesseract. User-defined functions in Python00:09:04

Python & Regular Expressions

Part 2. RegEx. Regular Expression in Python.

Regular Expression in Python. Pattern matching in Python.00:09:02

Array und Function in Python. Add data to Array.00:13:21

GeoPy – easy to locate the coordinates. Get the latitude and longitude of location00:08:52

Part 2. Kaggle. Jupyter Notebook.

Kaggle. Jupiter Notebook. Plot data with matplotlib, seaborn, squarify.00:10:58

Folium. Mapping in Python. Plot Geographic Data on a Map.00:08:56

Part 3. Big Data Storage and MySQL.

Big Data Storage. Three ways to store digital data.00:05:23

MySql. SQL. Introduction. How it works?00:03:45

Part 3. Practice. Export Excel worksheet data to a MySQL table

Installing and Launching MySQL Workbench. How to Get Started with MySQL Workbench00:03:29

Practice. Excel table into MySql. Import Excel data into a MySQL database.00:08:16

MySQL Quiz

Part 3. A Storage System for Big Data. Hadoop.

Spark. Hadoop. Data’s Distribution. A Storage System for Big Data.00:03:41

Part 3. Practice. How Apache Spark makes your slow MySQL queries 10x faster.

Installing and Launching Apache Spark. Download and Get Started.00:05:07

Practice. Connecting Python To The Spark. Get Started with PySpark and Jupyter Notebook00:07:11

Practice. Connecting MySQL with Spark. Export Data from Mysql to Spark.00:09:33

Spark and Hadoop Quiz

Part 4. The Data Visualisation Tools. Introduction

Data Visualization Tools. Power BI, Tableau, Google Data Studio, Jupyter.00:04:57

Part 4. Practice. Data Visualization with Python. Kaggle and Jupyter Notebook.

Python Data Visualizations. Prepare Data for Visualizations. (Part 1/3)00:11:10

Python Data Visualizations. Clean data for Visualizations (Part 2/3).00:04:57

Python Data Visualizations. Data Visualizations in Jupyter Notebook (Part 3/3).00:11:19

Plotting in Python Quiz

Part 4. Online Data Visualization Tools. Introduction and getting started.

Practice. Power BI. Introduction and getting started.00:07:00

Practice. Tableau. Introduction and getting started.00:04:35

Practice. Google Data Studio. Introduction and getting started.00:06:20

Part 5. Machine Learning. An Introduction.

What is machine learning? Key ML Terminology.00:06:45

Practice. Predict the price of houses. Dataset 1. Beginner’s Guide to Jupyter00:10:24

Part 5. Practice. How does machine learning work?

How does machine learning work? Prediction of construction time and cost.00:07:13

Practice. Prediction of price and time. Data upload and preparation (Part 1/2)00:08:57

Practice. Prediction of price and time. Evaluation Metrics (Part 2/2)00:10:40

Machine Learning Quiz

Part 5. Workflow of a Machine Learning project.

Workflow of a Machine Learning project. Stages of the Machine Learning Modeling00:05:11

Part 5. Practice. San Francisco – explore Building Permits Data. Build Predictive Model.

Practice. Data loading and preparation to Analyzing (Part 1/2).00:12:32

Practice. Cost Prediction. Way to build a Predictive Model (Part 2/2).00:14:08

Share

Course Details

Requirements

Tags

Target Audience

Similar Courses

REVIT to ASANA. BIM Kanban & Gantt Chart Construction Planning 📚 15 Lessons

Revit – BIM Project Management with 4D Time and 5D Cost 📚 16 Lessons