Music Recommender System — Part 1
Creating Recommender System using Machine Learning
Introduction
We have all seen many automated recommender systems everywhere, a few well-known ones are Netflix, Amazon, Youtube, LinkedIn, etc. In this series, let’s see how to build a recommender system using machine learning from scratch. As part of this series, I would like to show how we can create a framework for applying different machine learning algorithms on a real world music dataset to predict the playlist/songs recommendations. We will use four main approaches such as content based filtering, collaborative filtering, model based methods, and deep neural networks.
The steps that we follow to build a recommender system are here:
- Create development environment with necessary libraries.
- Get the real world music dataset and explore.
- Build and train machine learning models.
- Evaluate the effect of dataset size on machine learning models
- Evaluate the models using various metrics.
- Deploy the trained model for public use.
In this article, let’s get started by creating the development environment and installing all necessary libraries.
Part 1: Create Development Environment
We would like to avoid “works on our machine” situations and try to find a system that is available to everyone so that they can try running this project code. I used Google Colab for other projects, it is basically a free Jupyter notebook environment that runs entirely in the cloud. I also used GitHub Codespaces, but I would like to try Gitpod for this project because it provides a workspace which includes: source code, a Linux shell with root/sudo, a file system, the full VS Code editing experience including extensions, language support and all other tools and binaries that run on Linux. Click the link, if you want to setup Codespaces Development Environment. Let’s see how to create and configure your first Gitpod workspace below.
I assume you already have a GitHub account, if not register for free account here. Once you get the GitHub account, login to your GitHub account in a browser and create a new repository, let’s call it “RecSys”. Once it’s created, it will take you to the code page of that repository.
Next, let’s add the Open in Gitpod button to make it easy to start Gitpod workspace for this project as shown below by replacing the “project-url” with GitHub repository url. Alternatively, we can also install Gitpod browser extension.
[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#<project-url>)Ex: [![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/nsanka/RecSys)
Project Specific Customization
Click the Add file -> Create new file button in your repository main folder. In the file name box enter “.gitpod.yml”. This file will be used to apply the customization of the workspace environment for any user who uses this particular project. More details of configuring the Gitpod workspaces can be found here.
# Custom Docker Image
#image:
#file: .gitpod.Dockerfile# List the start up tasks.
# Learn more https://www.gitpod.io/docs/config-start-tasks/tasks:
- name: Check Setup
init: |
python -m pip install --upgrade pip
# Add commands to Setup Python Environment
command: |
clear
echo "=============="
echo " Welcome "
echo "=============="
pyenv versions
echo ""# List the ports to expose.
# Learn more https://www.gitpod.io/docs/config-ports/
ports:
# jupyter
- port: 8888
onOpen: ignore# Install VSCode Extensions
vscode:
extensions:
- ms-azuretools.vscode-docker
- ms-python.python
Setup Python Environment
Gitpod workspaces already have Python installed in it and it also has pyenv which can be used to manage Python versions. We will create a new file in the repository main folder and name it “requirements.txt” to define all the Python packages we need for this project.
# requirements.txt file
altair==4.1.0
matplotlib==3.5.0
numpy==1.19.5
openTSNE==0.6.1
pandas==1.2.5
pip==21.3.1
plotly==5.4.0
requests==2.25.1
scikit-learn==0.24.2
scipy==1.7.3
spotipy==2.19.0
streamlit==1.2.0
seaborn==0.11.2
tqdm==4.62.3
urllib3==1.26.7
wordcloud==1.8.1
We add the following commands to the “.gitpod.yml” file in the “task->init” section to install the required Python version and necessary modules mentioned in “requirements.txt” using pip.
# Install Python 3.7.2
pyenv install -v 3.7.2
# Set Python 3.7.2 as default
pyenv global 3.7.2
# Install all libraries
python -m pip install -r requirements.txt
Container Specific Customization
This step is optional. By default, Gitpod creates a workspace with a standard Docker Image called “Workspace-Full” if we don’t specify any Dockerfile in “.gitpod.yml” file. This standard docker image includes tools such as Docker, Go, Java, Node.js, C/C++, Python, Ruby, Rust, PHP as well as Homebrew, Tailscale, Nginx and several more.
If you want to build custom container, click the Add file -> Create new file button in your repository main folder. In the file name box enter “.gitpod.Dockerfile”. Your custom container can contain anything you want. This is very useful if you are using a framework or SDK that is not present in the standard image, or if you have to install a specific package. What I’d recommend is to use the standard Workspace-Full image as base image and then build on top of that as shown below. If you would like to start with different framework other than standard image, check the pre-built container configuration for Gitpod that is available in the workspace-images repository.
FROM gitpod/workspace-base:latest# [Optional] Uncomment this section to install additional OS packages.
# RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
# && apt-get -y install --no-install-recommends <your-package-list-here>
Open Workspace
We have everything setup and ready to start Gitpod Workspace. Click the Open in Gitpod button to start Gitpod workspace, it will ask you to Login to Gitpod account. We use GitHub as provider and your workspace will open in a new tab.
It will take few seconds and you will see your Workspace in the browser as shown in below image. It also shows how to open a Terminal.
This completes our Development Environment setup for this project using Gitpod Workspaces, we can quickly open the project code in a predefined environment, build and test it.
Next Step
In next artticle, we can see how to get the music dataset and perform exploratory data analysis.
If you enjoy reading my articles and want to support me, please consider signing up to become a Medium member. It’s $5 a month and gives you unlimited access to stories on Medium. Please signup using my link to support me: https://nsanka.medium.com/membership.