Modelling and Evaluation of RF in Python | Python Courses in Tamil | Skillfloor - video Dailymotion

Skillfloor

Explore the powerful world of Random Forest (RF) modelling and evaluation in Python—now in Tamil!   Start learning machine learning in your own language, Tamil, today!  Our Website: Visit 🔗 http://www.skillfloor.com  Our Blogs: Visit 🔗 https://skillfloor.com/blog/  DEVELOPMENT TRAINING IN CHENNAI https://skillfloor.com/development-training-in-chennai  DEVELOPMENT TRAINING IN COIMBATORE https://skillfloor.com/development-training-in-coimbatore  Our Development Courses: Certified Python Developer Visit 🔗https://skillfloor.com/certified-python-developer Certified Data BASE Developer Visit 🔗https://skillfloor.com/certified-data-base-developer Certified Android App Developer Visit 🔗https://skillfloor.com/certified-android-app-developer Certified IOS App Developer Visit 🔗https://skillfloor.com/certified-ios-app-developer Certified Flutter Developer Visit 🔗https://skillfloor.com/certified-flutter-developer Certified Full Stack Developer Visit 🔗https://skillfloor.com/certified-full-stack-developer Certified Front End Developer Visit 🔗https://skillfloor.com/certified-front-end-developer  Our Classroom Locations: Bangalore - https://maps.app.goo.gl/ZKTSJNCKTihQqfgx6 Chennai - https://maps.app.goo.gl/36gvPAnwqVWWoWD47 Coimbatore - https://maps.app.goo.gl/BvEpAWtdbDUuTf1G6 Hyderabad - https://maps.app.goo.gl/NyPwrN35b3EoUDHCA Ahmedabad - https://maps.app.goo.gl/uSizg8qngBMyLhC76 Pune - https://maps.app.goo.gl/JbGVtDgNQA7hpJYj9  Our Additional Course: Analytics Course https://skillfloor.com/analytics-courses https://skillfloor.com/analytics-training-in-bangalore Artificial Intelligence Course https://skillfloor.com/artificial-intelligence-courses https://skillfloor.com/artificial-intelligence-training-in-bangalore Data Science Course https://skillfloor.com/data-science-courses https://skillfloor.com/data-science-course-in-bangalore Digital Marketing https://skillfloor.com/digital-marketing-courses https://skillfloor.com/digital-marketing-courses-in-bangalore Ethical Hacking https://skillfloor.com/ethical-hacking-courses https://skillfloor.com/cyber-security-training-in-bangalore  #randomforest #machinelearning #pythontutorial #tamilcoding #skillfloor #datascience #pythonintamil #ensemblelearning #mlalgorithm #decisiontree #datasciencetamil #machinelearningtamil #pythonprogramming #mlmodels #techeducation #codingintamil #dataanalysis #modelassessment #rfmodelling #pythoncourse

Transcript

00:00Hello everyone. In this video, we will talk about modeling and evaluation of random forest in python.

00:10So, if we use a random forest, we will see how to implement a random forest.

00:19So, we will consider a hard data set.

00:22So, we will decide a particular person who has a heart disease.

00:26So, this is a classification problem.

00:28So, in random forest, we will create a decision tree classifier in the back end.

00:34So, when we implement it, we will load the data set.

00:37So, first, we will import the NumPy, Pandas, Matplot, C1.

00:42So, we will import the libraries.

00:44So, we will import the data set.

00:47So, we will import the data set.

00:48Like, if there are any columns, age.

00:50Then, gender, female and male.

00:52We will import the 0, female, 1, male.

00:55Then, chest pain type.

00:56There are 4 different types.

00:580, 1, 2, 3.

00:590 is typical angina.

01:001 is atypical angina.

01:022 is non-anginal pain.

01:043 is asymptomatic.

01:06So, in this case, we have chest pain type.

01:09Then, chest BP.

01:10Blood pressure.

01:11Then, cholesterol levels.

01:12Then, fasting blood sugar rate.

01:14So, fasting blood sugar rate.

01:161 is 120.

01:171 is low.

01:18Then, resting blood sugar.

01:19Fasting blood sugar rate.

01:201 is low.

01:21Then, resting ECG.

01:22So, ECG.

01:23There is a PQRS wave.

01:25So, there is a PQRS wave.

01:26So, the PQRS wave.

01:28We are normal.

01:291 is abnormal.

01:302 is abnormal.

01:312 is left ventricular hypotrophy according to S criteria.

01:35So, there is maximum heart rate.

01:42So, when we say that, we are stressed.

01:46So, when we say that, we provide value 2.

01:49So, we are separated from one column.

01:52So, based on that particular person, we decide the target column.

01:58So, that means that there is no heart disease.

02:02So, when we say CA, we say number of major vessels.

02:09Then, TAL is a blood disorder.

02:12So, 1 is normal.

02:152 is the blood flow in particular heart.

02:183 is the reversible defect.

02:20So, when we say the issue of blood flow, we say reverse.

02:26So, in that part, we do this.

02:29So, all of these values are complete data set.

02:33So, there are actually 14 columns.

02:37So, target is our final output column.

02:40So, this is what we predict.

02:42Then, in this hard dot shape, basic information checks.

02:46So, 303 rows, 14 columns.

02:49Then, you have null data set.

02:51So, actually, we have 303 rows.

02:53So, all of these values are non-null values.

02:55Okay?

02:56Then, describe complete and visualize.

02:59Then, normal and basic EDH checks.

03:02So, EDL is univariate analysis.

03:05So, univariate analysis is heart with respect to heart.

03:08We look at heart with respect to count values.

03:11His plot.

03:12Then, bivariate analysis is CP.

03:15That is chest pain with respect to target provide.

03:18So, target is 0 and 1.

03:19That is based on chest pain types.

03:21Then, each and every columns.

03:24We analyze correlation and analyze.

03:26Then, here we look at missing data.

03:28This is missing data.

03:29So, if you have missing data,

03:30one of the missing data is also have missing data.

03:31Then, heart.duplicator.sum

03:32Then, heart.duplicator.sum

03:33Now, here is another duplicate data.

03:35That is important.

03:36It is important for the port in columns.

03:37We are exactly replicate.

03:38So, that is what we do.

03:39We remove this.

03:40So, heart.drop.duplicates.

03:43In place equal to true.

03:45So, here we check.

03:48We remove the duplicate.

03:50Next, we remove random forest classifier.

03:55Next, let's go to random forest classifier implementation.

03:59So from SQL under Ensemble, we import random forest classifier.

04:03This is the regression problem.

04:05We use random forest regressor.

04:07So first, we create a model for random forest classifier.

04:10Then, we fit our training data,

04:13we predict our testing data.

04:15Then, accuracy final,

04:16we calculate y test and y prediction.

04:19We have 86% accuracy.

04:22Then, we check classification report.

04:25So in classification report,

04:27Precision, Recall, F1 score, Support.

04:30We check.

04:31So here, we have 87% accuracy.

04:33Now, 0 is the person not having heart disease.

04:37So not having heart disease.

04:40Then, one person is having heart disease.

04:43So if we compare,

04:45if we compare the person having heart disease,

04:47we learn better.

04:49We compare not having heart disease.

04:51Next.

04:52Next, we have 86% accuracy.

04:54If we compare,

04:56we still have improvised.

04:57What we use is hyperparameter tuning.

05:01We optimize hyperparameter tuning.

05:03So hyperparameter tuning is basic,

05:06best parameters we provide.

05:08So what we use is hyperparameter tuning.

05:11Grid search CV, randomized search CV.

05:13So cross-validation.

05:14How do we do cross-validation?

05:16Like,

05:17we provide the entire data

05:19for 4-folds.

05:22Now, we split it.

05:24First time we consider,

05:26the block is training,

05:28testing data,

05:29testing data,

05:30and training data.

05:32After that,

05:33then,

05:34we run with 75% to 25%

05:35then,

05:36second time we run with

05:37first block,

05:38testing data

05:39then,

05:40training data

05:41So, now we have separate training data. Then, we provide the features, the combinations

05:50provide, set, where we have higher security, the particular part we provide. So, this is

05:57random. Grid search CV, we set different combinations and best parameters. So, in random

06:07for us, what parameters we consider? N estimators. That is the number of decision trees. So,

06:14decision tree is basically default 100, 200, 300 provide. Then, maximum of features. So,

06:21features are auto, square root and log2. Then, depth of trees is 10, 20, 30, none. In the list

06:29we provide up and of none. So, sometimes we consider maximum depth. Then, minimum sample

06:36split. So, when the data is split, we say minimum sample split. So, that is 5 to 10. Then,

06:43final node, we say leaf node, we say leaf node. That leaf node split

06:48is confined to minimum sample leaf. So, if we go to all these, we say

06:54a random grid, we say dictionary, n estimators. That is 100, 200, 300. So,

07:00that is 100, 200, 300. So, this is one dictionary we create. Then, next we

07:07create a random forest classifier model. So, we create a grid search

07:11here, first name we provide model. So, the model name is rf1. So,

07:18the first name we provide. So, the first name we provide. So, the scoring

07:21is equal to f1 provide. So, f1 score based on the accuracy and accuracy

07:26and accuracy and accuracy and accuracy

07:28we provide the precision, recall f1

07:30we provide. So, in the model, we can complete f1

07:33base and create and provide. Then,

07:37param grid is random grid. That is, we consider all the data

07:40that we have to do. Then, cv equal to 3

07:43we divide. Then, obos. Obos equal to 2

07:48we provide. Fitting 3 folds for each of 144 candidates.

07:52In the line provide. Then, n jobs. n jobs

07:56n jobs, basic grid search cv. Now, what is

08:00n estimate as 100, so, 100. Then,

08:06auto feature. Then, all the depth. 10. Then, 100

08:11square root. 10. Then, 100 log to 10. So,

08:20in the model, we complete. Then, 3p. 100 auto. 20. 100

08:26square root. 20. So, all different possible combinations

08:30we complete match. Based on f1 score

08:34we decide. We decide. We

08:36in the data, we divide the entire

08:38three different folds. In the particular parameters

08:41based and f1 consider

08:43we provide best parameters. That is, we

08:46provide. Okay. So, here

08:48we train the grid search cv. We train

08:50we provide. That is, grid search cv.

08:52We provide model. Then, training data

08:56provide best parameters extract. So, best parameters

08:59how do we extract the model? We

09:01create the model. So, in that

09:03we have inbuilt best parameters. We

09:05say a keyword. The base. That

09:07we extract it. It will be showing. Best

09:10parameters. How do we

09:11take maximum depth? How do we take

09:12minimum swap? How do we take

09:13sample leaf? How do we take

09:14sample leaf? How do we take

09:15separate this? Okay. So, this

09:20based, we create random forest

09:23classifier. So, random forest

09:25classifier. Here, best parameters

09:27provide maximum depth

09:2910. Maximum feature auto. Minimum

09:31sample leaf 4. So, this

09:33we use double pointer. So, random forest

09:36classifier of double star. Here, what we

09:39provide? We save the best parameters. So, that

09:42we point. So, that's what we

09:44point. That's what we

09:45point. That's what we

09:46point. That's what we

09:47point. Double pointer method. We

09:48use random forest classifier

09:49create. Here, we provide

09:51training data. Then, testing data.

09:53provide. Predic.

09:55Predic.

09:57Okay.

09:58Predic.

09:59Accuracy.

10:00Find.

10:0188%.

10:02Normal 86.

10:03How do we

10:04find?

10:0588?

10:0680?

10:0788?

10:0888?

10:0988.

10:1088?

10:1288?

10:1488?

10:1588?

10:1688?

10:1788?

10:1888?

10:1988?

10:2088?

10:2188?

10:2288?

10:2388?

10:24the highest predictor.

10:25So, we can do the random forest overfitting and then we can do the hyperparameter tuning.

10:34This is the modeling and evaluation of random forest in python.

Modelling and Evaluation of RF in Python | Python Courses in Tamil | Skillfloor

Category

Transcript

Be the first to comment

Recommended