TV shows on Netflix, Prime Video, Hulu and Disney+ : A Data Analysis

!pip install opendatasets — upgrade — quietdataset_url = ‘https://www.kaggle.com/ruchi798/tv-shows-on-netflix-prime-video-hulu-animport opendatasets as odod.download(dataset_url)data_dir = ‘./tv-shows-on-netflix-prime-video-hulu-and-disney’import pandas as pdimport numpy as npdf = pd.read_csv(‘./tv-shows-on-netflix-prime-video-hulu-and-disney/tv_shows.csv’)df.head()
df['Age'].fillna(value = '16+', inplace = True)df['IMDb'].fillna(value = 7.3,inplace = True)
df.drop('Rotten Tomatoes', axis = 1,inplace=True)df.drop('type',axis = 1, inplace = True)
df.info()df.describe()
import seaborn as snsimport matplotlibimport matplotlib.pyplot as plt%matplotlib inlinesns.set_style('darkgrid')matplotlib.rcParams['font.size'] = 14matplotlib.rcParams['figure.figsize'] = (9, 5)matplotlib.rcParams['figure.facecolor'] = '#00000000'
sns.distplot(df['Year'])
top_imdb_rated = df[['Title','IMDb']]top_imdb_rated = top_imdb_rated.sort_values(by = 'IMDb',ascending=False)[:10]top_imdb_ratedsns.barplot(x='IMDb', y = 'Title',data = top_imdb_rated)xtop_imdb_rated = df[['Title','IMDb']]xtop_imdb_rated = xtop_imdb_rated.sort_values(by = 'IMDb', ascending = True)[:10]xtop_imdb_rated
age = df['Age'].value_counts()ageage.plot.pie(x='Age',autopct='%1.2f%%')
platforms = df[[‘Netflix’, ‘Hulu’, ‘Prime Video’, ‘Disney+’]].apply(pd.Series.value_counts).reset_index()platforms = platforms.Tplatforms.drop(‘index’,inplace=True)platformsplatforms.plot.bar()
ratingdic = {}for i in ['Netflix', 'Hulu', 'Prime Video', 'Disney+']:      ratingdic['r_'+i] = (df[df[i]==1].IMDb.sum())/(df[df[i]==1][i].sum())rating = pd.DataFrame.from_dict(ratingdic,orient='index',columns=['Rating'])ratingsns.barplot(x='Rating', y =rating.index, data = rating)rating.plot.line()
platform_agedict = {}for i in [‘Netflix’, ‘Hulu’, ‘Prime Video’, ‘Disney+’]:      platform_agedict[‘a_’+i] = df[df[i]==1].Age.value_counts()platform_age = pd.DataFrame.from_dict(platform_agedict,orient='index')
platform_age.fillna(0,inplace = True)
platform_ageplatform_age.plot.bar()
platform_aget = platform_age.Tfig = plt.figure()ax1 = fig.add_axes([0, 0, 0.5, 0.5], aspect = 1)ax1.set_title('Age of Disney+')ax1.pie(platform_aget['a_Disney+'], labels = platform_aget.index,autopct='%1.1f%%')ax2 = fig.add_axes([0.3, 0, 0.5, 0.5], aspect = 1)ax2.set_title('Age of Hulu')ax2.pie(platform_aget['a_Hulu'], labels = platform_aget.index,autopct='%1.1f%%')ax3 = fig.add_axes([0.6, 0, 0.5, 0.5], aspect = 1)ax3.set_title('Age of Netflix')ax3.pie(platform_aget['a_Netflix'], labels = platform_aget.index,autopct='%1.1f%%')ax4 = fig.add_axes([0.9, 0, 0.5, 0.5], aspect = 1)ax4.set_title('Age of Prime Video')
yearwise = {}for i in [‘Netflix’, ‘Hulu’, ‘Prime Video’, ‘Disney+’]:     yearwise[i+’_year’] = df[df[i] == 1][‘Year’].value_counts()dfyearwise = pd.DataFrame.from_dict(yearwise,orient='index')dfyearwise.fillna(0,inplace = True)dfyearwise['Netflix_year'].max()dfyearwise['Hulu_year'].max()dfyearwise['Prime Video_year'].max()dfyearwise['Disney+_year'].max()

Inferences and Conclusion

1) In the given dataset there is data about TV Shows from 1901 to 2020, where we can see a tremendous increase in the number of TV shows by beginning of 21st century

References and Future Work

1) Much more rating platforms like IMDb can be included to make analysis more precise for users.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store