DS-5 Visual Programming with Orange Tool

In this blog, we will discuss more features of Orange Tool to Split our data into training data and testing data and how to use cross-validation.

Install and Open Orange tool and add by default file of iris data set into workspace.

Next, add the widget Data Sampler. Data Sampler selects a subset of data instances from an input data set. and outputs a sampled and a complementary data set. Here I sampled the data 70% output sampled data and 30% will be complementary data set.

Now Test and Score widget is added. The widget tests learning algorithms. Different sampling schemes are available, including using separate test data.

Sampling using Cross-Validation in Orange

Cross-validation splits the data into a given number of folds (usually 5 or 10). Cross-validation is primarily used in applied machine learning to estimate the skill of a machine learning model on unseen data. That is, to use a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model.

Split data in training data and testing data in Orange

To split the data into train and test datasets, we will send 70% of the sampled data from Data Sampler as the train data and remaining 30% data as the test data by clicking on the link between Data Sampler and Test and Score. In there set the link from Data Sample box to Data box and Remaining Data box to Test Data as shown in below figure.

Now get the comparison scores of the three different algorithms by testing on the train data. To do so double click on the Test and Score widget and choose the option of Test on train data there and get the scores for all the three algorithm.

In this blog how we can sample our data and compare different learning algorithms to find out which is the best algorithm for our data set using the Orange tool.

Thank You!!