Skip to content

Commit

Permalink
Add splitting methods to AIRandomPartitioner
Browse files Browse the repository at this point in the history
	splitTrainTestFrom:usingTargetColumn:withProportions: aCollectionOfProportions to split aCollection into multiple sets. The sets to split are determined by the size of aCollectionOfProportions, which looks like #(0.7 0.3), and commonly it represents splitting between a training and test set.

Add AIPartitionedDataSet class to hold the splitted data from train and test set.
  • Loading branch information
Hernán Morales Durand committed Dec 14, 2023
1 parent 318f5a3 commit 9340ee1
Showing 1 changed file with 20 additions and 28 deletions.
48 changes: 20 additions & 28 deletions src/AI-DataPartitioners/AIRandomPartitioner.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -141,46 +141,38 @@ AIRandomPartitioner >> split: aCollection withSizes: aCollectionOfSizes [
]

{ #category : 'api' }
AIRandomPartitioner >> splitTrainTestFrom: aDataFrame usingTargetColumn: targetCollection withProportions: aTwoElementCollectionOfProportions seed: aNumber [
"Answer a <AIPartitionedDataSet>. Split the receiver's data into two sets: train and test.
xTrain and yTrain sets are used for training and fitting the model.
xTest and yTest sets are used for testing the model.
"

| partition |
AIRandomPartitioner >> splitTrainTestFrom: aDataFrame usingTargetColumn: targetCollection withProportions: aTwoElementCollectionOfProportions [
" See comment in splitTrainTestFrom:usingTargetColumn:withProportions:seed: "

partition := self
split: aDataFrame
^ self
splitTrainTestFrom: aDataFrame
usingTargetColumn: targetCollection
withProportions: aTwoElementCollectionOfProportions
seed: aNumber.
^ AIPartitionedDataSet new
xTrain: (partition first columnsAllBut: targetCollection);
yTrain: (partition first columns: targetCollection);

xTest: (partition second columnsAllBut: targetCollection);
yTest: (partition second columns: targetCollection);
yourself
seed: nil
]

{ #category : 'api' }
AIRandomPartitioner >> splitTrainTestFrom: aDataFrame usingTargetColumn: targetCollection withProportions: aTwoElementCollectionOfProportions shuffle: aBoolean [
"Answer a <AIPartitionedDataSet>. Split the receiver's data into two sets: train and test.
xTrain and yTrain sets are used for training and fitting the model.
xTest and yTest sets are used for testing the model.
AIRandomPartitioner >> splitTrainTestFrom: aDataFrame usingTargetColumn: columnName withProportions: aTwoElementCollectionOfProportions seed: aNumber [
"Answer a <AIPartitionedDataSet>. Split aDataFrame into four sets:
First we split aDataFrame into train and test, and then each one between 'x' and 'y'.'
columnName specifies the 'y', the name of a column in aDataFrame called the 'target variable' or 'labels'.
'x' is a DataFrame of your features (generally representing the independent variables of a dataset.
x-train and y-train sets are used for training and fitting the model.
x-test and y-test sets are used for testing the model.
"

| partition |

partition := self
split: aDataFrame
withProportions: aTwoElementCollectionOfProportions
shuffle: aBoolean.
seed: aNumber.
^ AIPartitionedDataSet new
xTrain: (partition first columnsAllBut: targetCollection);
yTrain: (partition first columns: targetCollection);
xTrain: (partition first columnsAllBut: columnName);
yTrain: (partition first columns: columnName);

xTest: (partition second columnsAllBut: targetCollection);
yTest: (partition second columns: targetCollection);
xTest: (partition second columnsAllBut: columnName);
yTest: (partition second columns: columnName);
yourself
]

0 comments on commit 9340ee1

Please sign in to comment.