Skip to playerSkip to main content
Explore the powerful world of Random Forest (RF) modelling and evaluation in Python—now in Tamil!

Start learning machine learning in your own language, Tamil, today!

Our Website:
Visit 🔗 http://www.skillfloor.com

Our Blogs:
Visit 🔗 https://skillfloor.com/blog/

DEVELOPMENT TRAINING IN CHENNAI
https://skillfloor.com/development-training-in-chennai

DEVELOPMENT TRAINING IN COIMBATORE
https://skillfloor.com/development-training-in-coimbatore

Our Development Courses:
Certified Python Developer
Visit 🔗https://skillfloor.com/certified-python-developer
Certified Data BASE Developer
Visit 🔗https://skillfloor.com/certified-data-base-developer
Certified Android App Developer
Visit 🔗https://skillfloor.com/certified-android-app-developer
Certified IOS App Developer
Visit 🔗https://skillfloor.com/certified-ios-app-developer
Certified Flutter Developer
Visit 🔗https://skillfloor.com/certified-flutter-developer
Certified Full Stack Developer
Visit 🔗https://skillfloor.com/certified-full-stack-developer
Certified Front End Developer
Visit 🔗https://skillfloor.com/certified-front-end-developer

Our Classroom Locations:
Bangalore - https://maps.app.goo.gl/ZKTSJNCKTihQqfgx6
Chennai - https://maps.app.goo.gl/36gvPAnwqVWWoWD47
Coimbatore - https://maps.app.goo.gl/BvEpAWtdbDUuTf1G6
Hyderabad - https://maps.app.goo.gl/NyPwrN35b3EoUDHCA
Ahmedabad - https://maps.app.goo.gl/uSizg8qngBMyLhC76
Pune - https://maps.app.goo.gl/JbGVtDgNQA7hpJYj9

Our Additional Course:
Analytics Course
https://skillfloor.com/analytics-courses
https://skillfloor.com/analytics-training-in-bangalore
Artificial Intelligence Course
https://skillfloor.com/artificial-intelligence-courses
https://skillfloor.com/artificial-intelligence-training-in-bangalore
Data Science Course
https://skillfloor.com/data-science-courses
https://skillfloor.com/data-science-course-in-bangalore
Digital Marketing
https://skillfloor.com/digital-marketing-courses
https://skillfloor.com/digital-marketing-courses-in-bangalore
Ethical Hacking
https://skillfloor.com/ethical-hacking-courses
https://skillfloor.com/cyber-security-training-in-bangalore

#randomforest #machinelearning #pythontutorial #tamilcoding #skillfloor #datascience #pythonintamil #ensemblelearning #mlalgorithm #decisiontree #datasciencetamil #machinelearningtamil #pythonprogramming #mlmodels #techeducation #codingintamil #dataanalysis #modelassessment #rfmodelling #pythoncourse
Transcript
00:00Hello everyone. In this video, we will talk about modeling and evaluation of random forest in python.
00:10So, if we use a random forest, we will see how to implement a random forest.
00:19So, we will consider a hard data set.
00:22So, we will decide a particular person who has a heart disease.
00:26So, this is a classification problem.
00:28So, in random forest, we will create a decision tree classifier in the back end.
00:34So, when we implement it, we will load the data set.
00:37So, first, we will import the NumPy, Pandas, Matplot, C1.
00:42So, we will import the libraries.
00:44So, we will import the data set.
00:47So, we will import the data set.
00:48Like, if there are any columns, age.
00:50Then, gender, female and male.
00:52We will import the 0, female, 1, male.
00:55Then, chest pain type.
00:56There are 4 different types.
00:580, 1, 2, 3.
00:590 is typical angina.
01:001 is atypical angina.
01:022 is non-anginal pain.
01:043 is asymptomatic.
01:06So, in this case, we have chest pain type.
01:09Then, chest BP.
01:10Blood pressure.
01:11Then, cholesterol levels.
01:12Then, fasting blood sugar rate.
01:14So, fasting blood sugar rate.
01:161 is 120.
01:171 is low.
01:18Then, resting blood sugar.
01:19Fasting blood sugar rate.
01:201 is low.
01:21Then, resting ECG.
01:22So, ECG.
01:23There is a PQRS wave.
01:25So, there is a PQRS wave.
01:26So, the PQRS wave.
01:28We are normal.
01:291 is abnormal.
01:302 is abnormal.
01:312 is left ventricular hypotrophy according to S criteria.
01:35So, there is maximum heart rate.
01:42So, when we say that, we are stressed.
01:46So, when we say that, we provide value 2.
01:49So, we are separated from one column.
01:52So, based on that particular person, we decide the target column.
01:58So, that means that there is no heart disease.
02:02So, when we say CA, we say number of major vessels.
02:09Then, TAL is a blood disorder.
02:12So, 1 is normal.
02:152 is the blood flow in particular heart.
02:183 is the reversible defect.
02:20So, when we say the issue of blood flow, we say reverse.
02:26So, in that part, we do this.
02:29So, all of these values are complete data set.
02:33So, there are actually 14 columns.
02:37So, target is our final output column.
02:40So, this is what we predict.
02:42Then, in this hard dot shape, basic information checks.
02:46So, 303 rows, 14 columns.
02:49Then, you have null data set.
02:51So, actually, we have 303 rows.
02:53So, all of these values are non-null values.
02:55Okay?
02:56Then, describe complete and visualize.
02:59Then, normal and basic EDH checks.
03:02So, EDL is univariate analysis.
03:05So, univariate analysis is heart with respect to heart.
03:08We look at heart with respect to count values.
03:11His plot.
03:12Then, bivariate analysis is CP.
03:15That is chest pain with respect to target provide.
03:18So, target is 0 and 1.
03:19That is based on chest pain types.
03:21Then, each and every columns.
03:24We analyze correlation and analyze.
03:26Then, here we look at missing data.
03:28This is missing data.
03:29So, if you have missing data,
03:30one of the missing data is also have missing data.
03:31Then, heart.duplicator.sum
03:32Then, heart.duplicator.sum
03:33Now, here is another duplicate data.
03:35That is important.
03:36It is important for the port in columns.
03:37We are exactly replicate.
03:38So, that is what we do.
03:39We remove this.
03:40So, heart.drop.duplicates.
03:43In place equal to true.
03:45So, here we check.
03:48We remove the duplicate.
03:50Next, we remove random forest classifier.
03:55Next, let's go to random forest classifier implementation.
03:59So from SQL under Ensemble, we import random forest classifier.
04:03This is the regression problem.
04:05We use random forest regressor.
04:07So first, we create a model for random forest classifier.
04:10Then, we fit our training data,
04:13we predict our testing data.
04:15Then, accuracy final,
04:16we calculate y test and y prediction.
04:19We have 86% accuracy.
04:22Then, we check classification report.
04:25So in classification report,
04:27Precision, Recall, F1 score, Support.
04:30We check.
04:31So here, we have 87% accuracy.
04:33Now, 0 is the person not having heart disease.
04:37So not having heart disease.
04:40Then, one person is having heart disease.
04:43So if we compare,
04:45if we compare the person having heart disease,
04:47we learn better.
04:49We compare not having heart disease.
04:51Next.
04:52Next, we have 86% accuracy.
04:54If we compare,
04:56we still have improvised.
04:57What we use is hyperparameter tuning.
05:01We optimize hyperparameter tuning.
05:03So hyperparameter tuning is basic,
05:06best parameters we provide.
05:08So what we use is hyperparameter tuning.
05:11Grid search CV, randomized search CV.
05:13So cross-validation.
05:14How do we do cross-validation?
05:16Like,
05:17we provide the entire data
05:19for 4-folds.
05:22Now, we split it.
05:24First time we consider,
05:26the block is training,
05:28testing data,
05:29testing data,
05:30and training data.
05:32After that,
05:33then,
05:34we run with 75% to 25%
05:35then,
05:36second time we run with
05:37first block,
05:38testing data
05:39then,
05:40training data
05:41So, now we have separate training data. Then, we provide the features, the combinations
05:50provide, set, where we have higher security, the particular part we provide. So, this is
05:57random. Grid search CV, we set different combinations and best parameters. So, in random
06:07for us, what parameters we consider? N estimators. That is the number of decision trees. So,
06:14decision tree is basically default 100, 200, 300 provide. Then, maximum of features. So,
06:21features are auto, square root and log2. Then, depth of trees is 10, 20, 30, none. In the list
06:29we provide up and of none. So, sometimes we consider maximum depth. Then, minimum sample
06:36split. So, when the data is split, we say minimum sample split. So, that is 5 to 10. Then,
06:43final node, we say leaf node, we say leaf node. That leaf node split
06:48is confined to minimum sample leaf. So, if we go to all these, we say
06:54a random grid, we say dictionary, n estimators. That is 100, 200, 300. So,
07:00that is 100, 200, 300. So, this is one dictionary we create. Then, next we
07:07create a random forest classifier model. So, we create a grid search
07:11here, first name we provide model. So, the model name is rf1. So,
07:18the first name we provide. So, the first name we provide. So, the scoring
07:21is equal to f1 provide. So, f1 score based on the accuracy and accuracy
07:26and accuracy and accuracy and accuracy
07:28we provide the precision, recall f1
07:30we provide. So, in the model, we can complete f1
07:33base and create and provide. Then,
07:37param grid is random grid. That is, we consider all the data
07:40that we have to do. Then, cv equal to 3
07:43we divide. Then, obos. Obos equal to 2
07:48we provide. Fitting 3 folds for each of 144 candidates.
07:52In the line provide. Then, n jobs. n jobs
07:56n jobs, basic grid search cv. Now, what is
08:00n estimate as 100, so, 100. Then,
08:06auto feature. Then, all the depth. 10. Then, 100
08:11square root. 10. Then, 100 log to 10. So,
08:20in the model, we complete. Then, 3p. 100 auto. 20. 100
08:26square root. 20. So, all different possible combinations
08:30we complete match. Based on f1 score
08:34we decide. We decide. We
08:36in the data, we divide the entire
08:38three different folds. In the particular parameters
08:41based and f1 consider
08:43we provide best parameters. That is, we
08:46provide. Okay. So, here
08:48we train the grid search cv. We train
08:50we provide. That is, grid search cv.
08:52We provide model. Then, training data
08:56provide best parameters extract. So, best parameters
08:59how do we extract the model? We
09:01create the model. So, in that
09:03we have inbuilt best parameters. We
09:05say a keyword. The base. That
09:07we extract it. It will be showing. Best
09:10parameters. How do we
09:11take maximum depth? How do we take
09:12minimum swap? How do we take
09:13sample leaf? How do we take
09:14sample leaf? How do we take
09:15separate this? Okay. So, this
09:20based, we create random forest
09:23classifier. So, random forest
09:25classifier. Here, best parameters
09:27provide maximum depth
09:2910. Maximum feature auto. Minimum
09:31sample leaf 4. So, this
09:33we use double pointer. So, random forest
09:36classifier of double star. Here, what we
09:39provide? We save the best parameters. So, that
09:42we point. So, that's what we
09:44point. That's what we
09:45point. That's what we
09:46point. That's what we
09:47point. Double pointer method. We
09:48use random forest classifier
09:49create. Here, we provide
09:51training data. Then, testing data.
09:53provide. Predic.
09:55Predic.
09:57Okay.
09:58Predic.
09:59Accuracy.
10:00Find.
10:0188%.
10:02Normal 86.
10:03How do we
10:04find?
10:0588?
10:0680?
10:0788?
10:0888?
10:0988.
10:1088?
10:1288?
10:1488?
10:1588?
10:1688?
10:1788?
10:1888?
10:1988?
10:2088?
10:2188?
10:2288?
10:2388?
10:24the highest predictor.
10:25So, we can do the random forest overfitting and then we can do the hyperparameter tuning.
10:34This is the modeling and evaluation of random forest in python.
Be the first to comment
Add your comment

Recommended