shree thaanu
shreeThaanu's blog

shreeThaanu's blog

Making a Movie recommendation IOS App — Part 1

Making a Movie recommendation IOS App — Part 1

shree thaanu's photo
shree thaanu
·Aug 17, 2020·

3 min read

Subscribe to my newsletter and never miss my upcoming articles

Have you ever wondered how Netflix is suggesting movies for you or how amazon is recommending products to buy. What exactly is the tech behind and how your own data help them gain more profit repeatedly with your activities.

Diving directly into tech, the system that they are using is recommender. This recommender or filtering system is roughly of types.

  • Collaborative filtering : Collaborative filtering methods predict the target user’s rating of a given item using similar users’ ratings of that item .If you are trying to guess whether or not you will like a certain movie, you might ask people with similar taste what they thought of that movie. You also might ask these people what other movies they liked and gather a list of recommendations.
  • Content based filtering : Unlike collaborative filtering methods, content-based recommender systems do not use other user’s ratings at all. Instead, they utilize descriptive keywords associated with each item to make recommendations. This is quite useful, because the only rating history we need to make predictions is that of the target user.

Fine of the definition, let’s build a realtime product with available dataset. Now I’ll make a recommender system with Collaborative filtering concept.

Language used : Python

Tool : Anaconda Navigator — Spyder

Packages : Pandas, numpy, seaborn, matplotlib, json.

# Import headers

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json
import flask

app = flask.Flask(__name__)
app.config[“DEBUG”] = True

[@a](twitter.com/app "Twitter profile for @app")pp.route(‘/’, methods=[‘GET’])
def home():

# Gather Data
column_names = [‘user_id’, ‘item_id’, ‘rating’, ‘timestamp’]
df = pd.read_csv(‘u.data’, sep=’\t’, names=column_names)
movie_titles = pd.read_csv(“Movie_Id_Titles”)

print(df.head())
print(movie_titles.head())

# Merging movie data with movie titles based on the item_id
df = pd.merge(df,movie_titles,on=’item_id’)
print(df.head())

# Data Visualisation based on rating and count
sns.set_style(‘white’)

# high rated movies not based on ppl
print(df.groupby(‘title’)[‘rating’].mean().sort_values(ascending=False).head())

#max participation in rating movie
print(df.groupby(‘title’)[‘rating’].count().sort_values(ascending=False).head())

ratings = pd.DataFrame(df.groupby(‘title’)[‘rating’].mean())
print(ratings.head())

ratings[‘num of ratings’] = pd.DataFrame(df.groupby(‘title’)[‘rating’].count())
print(ratings.head())

# distribution of ratings
plt.figure(figsize=(10,4))
ratings[‘num of ratings’].hist(bins=70)

# outlier and participation histogram to understand the curve : Contains outliers and the gaussian curve.
# Understanding the curve : 3 is the max chosen and outliers are 1 and 5 . the rough graph simply represents the gaussian

plt.figure(figsize=(10,4))
ratings[‘rating’].hist(bins=70)

# jointplot for rating and number of ratings with scatter check via the cluster
# on seeing the graph : the participation is higher from in 0 -100 ppl and ratings are 2–4.

sns.jointplot(x=’rating’,y=’num of ratings’,data=ratings,alpha=0.5)

# Its high time to recommend movies
# This matrix contains user id and how much they rated the movies, most movies value will be empty, cuz the user may have not watched all movies

moviemat = df.pivot_table(index=’user_id’,columns=’title’,values=’rating’)
print(moviemat.head())

# higher participation for rating the movie in higher to lower order
print(ratings.sort_values(‘num of ratings’,ascending=False).head(10))

# Will be working with 2 movies initially Star Wars (1977)
starwars_user_ratings = moviemat[‘Star Wars (1977)’]
print(starwars_user_ratings.head())

# using correlation between two pandas series to find the similarity
similar_to_starwars = moviemat.corrwith(starwars_user_ratings)

# Filtering data : Removing NAN value
corr_starwars = pd.DataFrame(similar_to_starwars, columns=[‘Correlation’])
corr_starwars.dropna(inplace=True)

# This list may contain movie with less number of 5* rating,

print(corr_starwars.sort_values(‘Correlation’,ascending=False).head(10))

# Filtering the data by taking count 100+ people participation in the rating event

#Combining the ratings data
corr_starwars = corr_starwars.join(ratings[‘num of ratings’])
print(corr_starwars.head())

# filtering based on amt of ratings
print(corr_starwars[corr_starwars[‘num of ratings’]>100].sort_values(‘Correlation’,ascending=False).head())

relatedMovies = corr_starwars[corr_starwars[‘num of ratings’]>100].sort_values(‘Correlation’,ascending=False).head()
relatedMovieList = relatedMovies.index.tolist()

# convert the list into json and return
relevantMovieJson = json.dumps({‘results’: relatedMovieList})
print(relevantMovieJson)
return relevantMovieJson

app.run()

This returns a response Json in localhost:5000 .

This returns a response Json in localhost:5000 .

The next steps involved is to make an app and connect with the given response.

will be continued in part 2.

Did you find this article valuable?

Support shree thaanu by becoming a sponsor. Any amount is appreciated!

See recent sponsors Learn more about Hashnode Sponsors
 
Share this