TV shows on Netflix, Prime Video, Hulu and Disney+ : A Data Analysis

Aanand S
4 min readJan 9, 2021

On seeing the dataset ( i found interesting over this because in the covid situation most often we spend time on home and we get a lot of time free after work from home. Now people use such time to see TV Shows , so we must use the best possible streaming platform. I took a viable data as mentioned earlier and the report are attached below

Download and import the dataset

!pip install opendatasets — upgrade — quietdataset_url = ‘ opendatasets as = ‘./tv-shows-on-netflix-prime-video-hulu-and-disney’import pandas as pdimport numpy as npdf = pd.read_csv(‘./tv-shows-on-netflix-prime-video-hulu-and-disney/tv_shows.csv’)df.head()

Filling the missing value

df['Age'].fillna(value = '16+', inplace = True)df['IMDb'].fillna(value = 7.3,inplace = True)

Dropping unwanted columns

df.drop('Rotten Tomatoes', axis = 1,inplace=True)df.drop('type',axis = 1, inplace = True)

Overview of the data


1) This dataset contain TV shows from 1901 to 2020

2) Highest IMDb rated TV show is 9.6 and 1.0 as the lowest

3) 38.21% TV Shows are in Prime Videos

Exploratory Analysis and Visualization

import seaborn as snsimport matplotlibimport matplotlib.pyplot as plt%matplotlib inlinesns.set_style('darkgrid')matplotlib.rcParams['font.size'] = 14matplotlib.rcParams['figure.figsize'] = (9, 5)matplotlib.rcParams['figure.facecolor'] = '#00000000'

Analysis the number of TV Shows in a year



By the beginning of the 21th century there is a trendomous

Finding the TOP and LOWEST IMDb rated TV Show

top_imdb_rated = df[['Title','IMDb']]top_imdb_rated = top_imdb_rated.sort_values(by = 'IMDb',ascending=False)[:10]top_imdb_ratedsns.barplot(x='IMDb', y = 'Title',data = top_imdb_rated)xtop_imdb_rated = df[['Title','IMDb']]xtop_imdb_rated = xtop_imdb_rated.sort_values(by = 'IMDb', ascending = True)[:10]xtop_imdb_rated


1) Destiny is the top IMDb rated TV Show with 9.6 rating

2) Be with you is the lowest IMDb rated TV show with 1.0 rating

Agewise Analysis

age = df['Age'].value_counts()ageage.plot.pie(x='Age',autopct='%1.2f%%')

No of TV Shows Present in Each Platform

platforms = df[[‘Netflix’, ‘Hulu’, ‘Prime Video’, ‘Disney+’]].apply(pd.Series.value_counts).reset_index()platforms = platforms.Tplatforms.drop(‘index’,inplace=True)


Large of number TV Show are present in Prime Video (as mentioned earlier). To be precise, 2144 TV Shows

Rating of each streaming Platform

ratingdic = {}for i in ['Netflix', 'Hulu', 'Prime Video', 'Disney+']:      ratingdic['r_'+i] = (df[df[i]==1].IMDb.sum())/(df[df[i]==1][i].sum())rating = pd.DataFrame.from_dict(ratingdic,orient='index',columns=['Rating'])ratingsns.barplot(x='Rating', y =rating.index, data = rating)rating.plot.line()


Most of the TV Shows in Prime Video is highly rated by IMDb. Following the Prime Video we have the Netflix, Hulu and Disney+

Agewise analysis of platforms

platform_agedict = {}for i in [‘Netflix’, ‘Hulu’, ‘Prime Video’, ‘Disney+’]:      platform_agedict[‘a_’+i] = df[df[i]==1].Age.value_counts()platform_age = pd.DataFrame.from_dict(platform_agedict,orient='index')
platform_age.fillna(0,inplace = True)


platform_aget = platform_age.Tfig = plt.figure()ax1 = fig.add_axes([0, 0, 0.5, 0.5], aspect = 1)ax1.set_title('Age of Disney+')ax1.pie(platform_aget['a_Disney+'], labels = platform_aget.index,autopct='%1.1f%%')ax2 = fig.add_axes([0.3, 0, 0.5, 0.5], aspect = 1)ax2.set_title('Age of Hulu')ax2.pie(platform_aget['a_Hulu'], labels = platform_aget.index,autopct='%1.1f%%')ax3 = fig.add_axes([0.6, 0, 0.5, 0.5], aspect = 1)ax3.set_title('Age of Netflix')ax3.pie(platform_aget['a_Netflix'], labels = platform_aget.index,autopct='%1.1f%%')ax4 = fig.add_axes([0.9, 0, 0.5, 0.5], aspect = 1)ax4.set_title('Age of Prime Video')


1) Most of the TV Shows of Disney+ is accessible to all age, hence good platform for kids

2) Hulu is a good streaming platform for children above the age of 7

3) While as Netflix and Prime Video are adult contents

Yearwise analysis

yearwise = {}for i in [‘Netflix’, ‘Hulu’, ‘Prime Video’, ‘Disney+’]:     yearwise[i+’_year’] = df[df[i] == 1][‘Year’].value_counts()dfyearwise = pd.DataFrame.from_dict(yearwise,orient='index')dfyearwise.fillna(0,inplace = True)dfyearwise['Netflix_year'].max()dfyearwise['Hulu_year'].max()dfyearwise['Prime Video_year'].max()dfyearwise['Disney+_year'].max()


1) In single year large of TV Shows relase in Prime Video. Following the Prime Video there are Netflix.

2) In term of new relase Hulu and Disney+ shows a poor performance.

Inferences and Conclusion

1) In the given dataset there is data about TV Shows from 1901 to 2020, where we can see a tremendous increase in the number of TV shows by beginning of 21st century

2) Prime Video is the most rated streaming platform, which streams adult contents(16+ to be correct) the more, and some kids contents too.

3) Following Prime Video, Netflix is the 2nd most rated , most of the TV shows in Netflix are also adult contents but few are accessible to everyone

4) Hulu is an average rated streaming platform, which streams kids TV shows

5) Disney+ is a poor rated streaming platform, which streams more contents accessible to everyone.

In short, all the streaming platforms have pros and concs, but most of the public supported TV shows are streamed by Prime Video

References and Future Work

1) Much more rating platforms like IMDb can be included to make analysis more precise for users.

2) This is only based age, rating and updating , but there are also lot of factors like multiple account using, afforable price etc..

As a last word, lot of factors depend in finding the best streaming platform and we cannot analysis all of them simultaneously. But we can limit the analysis to some factors like we did in this