Task 1 Bag of Words and simple Features
1.1 Create a baseline model for predicting wine quality using only non-text features.
1.2 Create a simple text-based model using a bag-of-words approach and a linear model.
1.2 Try using n-grams, characters, tf-idf rescaling and possibly other ways to tune the BoW model. Be aware that you might need to adjust the (regularization of the) linear model for different feature sets.
1.3 Combine the non-text features and the text features. How does adding those features improve upon just using bag-of-words?
Task 2 Word Vectors
Use a pretrained word-embedding (word2vec, glove or fasttext) for featurization instead of the bag-of-words model. Does this improve classification? How about combining the embedded words with the BoW model?
Task 3 Transformers (bonus / optional)
Fine-tune a BERT model on the text data alone using the transformers library.
How does this model compare to a BoW model, and how does it compare to a model using all features?