-
-
Notifications
You must be signed in to change notification settings - Fork 310
Description
In the examples the mnist dataset from keras is used, but it is already loaded as numpy.ndarray. I would like to load my RGB image dataset into a Spark dataframe. In Pyspark there is the method:
spark.read.format("image").option("dropInvalid", True).load(path)
which allows you to load all the images contained in the path into a dataframe. In the Dataframe there is a row for each image, and each row contains the binary format of the corresponding image. You can convert the binary format to RGB matrices with numpy's methods, but how do you save a Tensor in each row, and then give the Dataframe as input to a convolutional network in Keras?
Is there any other way to not provide 3 matrices (RGB) for each image, and just provide a large vector of pixels?