Seed problem #22

raufer · 2016-08-16T14:16:25Z

Hello

Im trying to go though the 3rd week lab, however it seems to be a problem with the proportions by which the data is partitioned regarding train, validation and test. I'm using the supplied seed, along with the defined weights and i get a different number of examples within each set. Obviously, the following tests are sentenced to fail.

snippet:

weights = [.8, .1, .1]
seed = 42
raw_train_df, raw_validation_df, raw_test_df = raw_df.randomSplit(weights, seed)

n_train = raw_train_df.cache().count()
n_val = raw_validation_df.cache().count()
n_test = raw_test_df.cache().count()
print n_train, n_val, n_test, n_train + n_val + n_test
raw_df.show(1)

output:

80115 9955 9930 100000
+--------------------+
|                text|
+--------------------+
|0,1,1,5,0,1382,4,...|
+--------------------+
only showing top 1 row

the same thing happens in lab 2 linear regression

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seed problem #22

Seed problem #22

raufer commented Aug 16, 2016 •

edited

Loading

Seed problem #22

Seed problem #22

Comments

raufer commented Aug 16, 2016 • edited Loading

raufer commented Aug 16, 2016 •

edited

Loading