1.1) How many samples do we have for training and test? Do we have the same information for training and test data? How many samples do we have for each species?
Hint: You might want to use .shape, .columns, pd.unique() and .symmetric_difference().
2.1) In general it is a good approach to visualize an image for each category to get a better feeling of the task. You should now write some code to show 1 image from each category. You might want to use plt.subplot().
3.1) Find an appropriate image size. Test various resizings of the image until you have found the smallest resizing of the image where you "can still differentiate between the images". How small is too small should ultimately be determined by an actual test, but what makes sense visually is probably a good place to start. Change the image_size = (?, ?) parameter below, and note your choice.
So far we have learned about feed forward neural networks (FFNN), convolutional neural networks (CNN), recurrent neural networks (RNN), and transformers.
4.1) How could the image, Margin, Shape and Texture be used for classification, i.e. what kind of network type would you use for each of them, and why?
5.1) As a part of the data loader, we should specify the shape of the images, number of classes, and the number of features for the three feature types: margin, shape and texture. Define the three variables in the code below.
6.1) Explain shortly why the size of batches is important. You should comment on how the size of batches affect the memory, training speed, and the estimates of the gradients. Choose a reasonable batch size the batch_size = ? parameter below, and note your choice.