May 16, 2024

[DSC 4.0] Building an recommendation system for IPTV - Tomislav Hlupic, Iva Soric



Published June 18, 2023, 8:20 p.m. by Courtney


The talk is planned to be a mixture between a description of fast streaming architecture on which the system was build on and the IPTV recommender system that Poslovna Inteligencija has developed. The overview of the topic will be given in the introduction, following by description of content delivery services and the data produced by them and how it is used in the customer experience. Next the overview of recommender system will be given together with the architecture of the content analytics system and the implementation (including the algorithm) of the recommendation engine, which is a part of the content analytics system.

We will cover the architecture of the system and the implementation equally, not giving the priority to any of the parts. We plan to present the reasons for the system creation, the progress in the architecture based on tests and the final implementation of the algorithm, together with the presented difference between various algorithms used for same or similar purposes. The content of the topic is backed with the article on the mipro 2018 conference and is the current highlight of the Innovation & Development department in Poslovna inteligencija.

This talk was presented by Mr. Tomislav Hlupic, Consultantat at Poslovna Inteligencija and Iva Soric, Consultantat at Poslovna Inteligencija, during data Science conference 4.0, as a part of Big data track.

You can find this talk presentation on the following link:

https://www.slideshare.net/Insitute_of_Contemporary_Sciences/building-an-recommendation-system-for-iptv-on-a-fast-streaming-architecture-tomislav-hlupic-iva-soric

More info about data Science conference:

Website: http://datasciconference.com

Instagram: https://www.instagram.com/datasciconf/

Facebook: https://www.facebook.com/DataSciConference/

Twitter: https://twitter.com/datasciconf

Flickr: https://www.flickr.com/photos/data-science-conference

To watch more new videos regarding data Science - click subscribe to our YouTube Channel.

You may also like to read about:



we start off with Tomislav loop it's a

consultant at post online tickled

against ya regional leader in bi

solutions and implementations where he's

gaining expertise in banking and retail

business intelligence systems working in

Microsoft bi stack Tom Isla will talk a

bit about fast streaming architecture

and the IPTV recommender system that

post online intelligentsia has developed

Tom Isla will be joined by his colleague

give a shortage who is also consultant

at post online intelligentsia and I

would remind the audience to post their

questions through our platform Tommy's

father Eva the floor is yours thank you

[Music]

first of all we would like to thank the

Institute for contemporary census for

having us second year in a row we are

really proud to be speakers that is

great conference also we hope to be

attendees speakers in the future and

today we will present the recommendation

system built for IPTV on the first

streaming architecture which was done as

a proof of concept and already

implemented on several vendors platforms

I will be talking mostly about the

technical and architectural stuff well

Eva who has developed recommender system

will talk about the recommender system

itself

maybe the button isn't working

so before we start with the presentation

itself I would like to present our

company we are the leader in the

business intelligence for the

southeastern Europe region but our focus

is not only on this region but also on

the projects in Europe and in the world

we have over 200 realized projects

several ongoing projects modern more

than 90 I'd say maybe over 100 users in

20 different countries over 110

employees most of them are consultants

or implementers of the systems we have

20 business analysts 5 project managers

our offices are situated in Zagreb in

polgár itza Belgrade in Sarajevo and in

London and in Vienna lately our

expertise is over 600 many years so

multiply mandates towards year so you

can see how many hours we have invested

in implementing the solutions and the

earth in and in ourselves as the

consultants gaining knowledge on the

different projects our fields consist of

analysis of the of the client's needs

the design of the solution development

of the solution implementation after the

implementation we provide the support

constantly for the users and finally our

last field of expertise is it education

they have opened an educational center

two weeks ago we have developed our own

educational track also based on our

expertise so the agenda for the

presentation is the introduction where

we will present the content delivery

services the different types of the

delivery services also the difference

between IPT several IPTV not to diabetes

the digital solutions used for the

content streaming then we will present

how we did the analysis analytics system

what would the business requirements

what was the technical solution after

that IVA will present the implementation

and the development of the

recommendation system which was a part

of the continent relief system developed

by post-metal again SIA and they will

will give the conclusion of the whole

system and what we have presented so the

main idea behind building the

recommendation system is that the

content provided by the communication

operators is given to the users all the

time but we need to tailor it to the

users needs so the operators can gather

as much data as they can even if it's

based on one click that you have on your

remote and you need the systems built

specifically for the analysis and for

collecting the data which is growing

rapidly

imagine collecting the data from all the

set of boxes provided by some operator

you will get every click every every

channel change every volume change

everything is collected everything is

written in the system and afterwards it

needs to be analyzed in order to provide

the recommendation system the valuable

data which is used to build up

recommendations for every and each user

the analysis is done in the real time so

we had to develop a solution that is

able to consume the data in the real

time meaning all those source systems

can give the data without without having

some delays

in the in the processing so there had we

had to develop the integration system

which can provide which can provide the

real-time streaming support and the

batch support by that we gave the

operators the possibility to maximize

the revenue and to minimize the costs

imagine giving the operator a

possibility to minimize the cost of

developing a new system or buying some

TV rights by knowing what their users

will actually watch besides that

operators can serve their customers

better so I know how many of you are

using the IPTV but sometimes the

channels change and users especially the

ones related to sports are quite

affected by it because different sport

channels provide different content so if

you lose a channel and you didn't know

your users were focused on it and we're

using it you can lose those users so it

besides minimizing all that you can be

you can minimize the churn of your users

and finally by having tailored customer

at a low customer content you can reach

the highest possible level of the their

customer experience so there will be all

the way drawn to your to your system

and to the content you provide for them

therefore the operators and the content

providers need the system which will

help them to give a comedic

recommendation for every user the

difference between systems when you're

talking about the TV and the

broadcasting is that some of the systems

can give you only the real-time content

those are the history so let's say

historical systems nowadays though

they're digital

but it's that they're terrestrial TV the

one that you'll plug in into set-top box

if you don't have the connector on the

TV either a plug directly to your TV you

have the settle digital satellite TV

which was used more in the 90s but

nowadays IPTV has again much more usage

and finally the cable TV which is more

or less stable

all of them are digitalized but have the

downside of having only the real time

streaming meaning once the broke work is

done you can go back and watch some

content you wanted to watch like 30

minutes ago

maybe you got stuck in traffic and got

home on time and afterwards you can

watch anything either you have to watch

the rerun somewhere later also the

downside for the recommendation system

is that the data can be bind it to a

particular customer or it can be done by

using some sophisticated engines which

are placed before the X before the TV

which gathered the data about the usage

streaming services on the other side are

using the internet IP T protocols to

distribute the content by the seeming

services we mean the IP TV so you know

the set-top boxes that you received from

your IP provider also the all the top

solutions which are gaining more and

more popularity lately and mobile TV

consumers using the streaming services

can watch both the content in real time

and go back in the history without any

hassle and without mostly without having

to pay additional cost for it there is a

difference between IPTV and the OT team

first of all IPTV is provided

to you by the local telecom in Croatia

it's there are to tell comes providing

it more or less in Serbia probably every

telecom has their own set of content

it's binded to the local network so the

router you have your good from your

provider has a dedicated network you can

plug the IPTV into any port it has

usually two ports for the internet and

two ports for the IPTV the receiver is

that set-top box you got and the display

device is the TV OTT services are

provided by this by some studios by

channels or by independent services so

they can also be provided from the

telecoms but not necessary the content

you receive via public internet and via

the local telecom meaning the content

generator is generated through the

internet distribute over Internet but is

distributed to local telecom operator

and the content is purged by purchased

by the consumer and the receiver finally

is the one that you actually use whether

it's your computer whether it's your

mobile phone whether it's your laptop or

it can be the TV with embedded systems

the display device is the screen

provided by you so it can be the mobile

screen the TV screen or whichever screen

you're using to display the business

requirements for the recommender system

came from the providers so even the

analytical system needs to provide the

data and to understand the consumers

behavior to understand which content

they can consume to which channels and

at what time and at what device since

that differentiates

the usage on the mobile devices or at

home the system needs to analyze the

performance of the survey of the

packages it needs to use the rating

informations it needs to segment the

customers into different clusters based

on their behavior and finally the system

itself should approach the customers

with the appropriate offers based on

their behavior the solution that built

consists of the kafka engine which I

will describe later the storage system

is vertical for those who don't know

Vertica is the corner storage which is

working really really fast in the prowl

in the past Cassandra was taken into

consideration also as a corner storage

but just the speed and volume of the

data that was coming

Cassandra couldn't handle Vertica also

offers several different connectors one

of which we have used for connecting to

the spark to the spark engine and for

for the visualization we have used the

tableau also there is a custom tailored

application developed by our innovation

center research department and all the

recommendations are written back all

into their specific tables in vertical

the source systems are the IPTV so the

set-top box is the video on demand the

OTT the OTT providers the system the

source system could also be the mobile

device meaning the the content on the go

the kafka is used for ingestion and in

the bed it can be processed in the real

time or in the bench

so the Kafka cluster is put before

before this storage solution the data is

loaded by running a series of copy

statements each loads the small amounts

of data so the so the volume is not on

the highest peak at all for the

real-time streaming it can also

automatically load the data into the

vertical as it streams to the Kafka

channel the Kafka channel gets the

message against the data as a message

usually the data is put in a JSON or

every format JSON is mostly used also on

the web platforms is a semi structured

data but the custom parser can be built

for any source system so this base based

on the system that you are using the

same integration type can be done the

feed of message is coming to Kafka is

group to form topics and those topics

are later divided into partitions which

are fed to vertical target tables and

store the data on the business side the

recommender system provides the

real-time usage

dashboards and the reports they can be

done based on the geolocation so the on

the map view you can see the trend

analysis you can drill down using the

tableau dashboards and the reports to

see the the different granularities of

the data and you can calculate the

different metrics the system provides

the behavior of the user targeted to

each user specifically you can see the

behavior through channels you can see

the behaviors to bind it to the content

finally the behaviors and the usage

based on the device based on the

delivery types and based on the actions

that were taken

so each action every change of the for

example every changes the channel gives

you the different behavior type and can

give you a different value finally

putting the customer into different

segments and the business value is the

predictive model for segmentation of the

customers the recommend the

recommendation for the customers and

finally the cross in the upsell and I

will be give the floor to Eva thank you

okay I'll be talking a little more about

recommendations module inside of our

content analytics platform first of all

why recommender systems this topic was

already covered on the conference so

it'll be a really short introduction the

main reason is the information overload

problem which for example for an IPTV

means that the user sits in front of a

TV in the evening and he wants to watch

a movie and he doesn't want to spend an

additional hour browsing through

thousands of channels searching for

something to watch so he needs a little

help in the information discovery so

companies use recommendation systems to

solve the information overload problem

to improve customer experience and of

course if possible to increase revenue

with a potential cross-sell and upsell

possibilities so the idea is to approach

cost consumers with special offers and

personalized recommendations and this

field is not very new there have been a

lot of advantages through the years in

the recommendation systems and basically

there are two types of approaches that

are mostly appearing in these kinds of

implementations the first is content

based recommender systems which

basically built profile to characterize

both users and items and for example for

movies that characteristics can be the

movies are director his popularity

starring actors and so on so these are

really concrete measurable features

while in collaborative filtering there

are more these algorithms are more

focused on the interaction with the

platform so basically they only take

into consideration the user's activities

this can be explicit ratings data like

if a user is asked to rate every movie

he watch Twitter rating from 1 to 10 or

if that information is not available

then we have their activities and their

past behavioral data like clicks

purchases views and so on so the idea

for collaborative filtering is to make

predictions based on what other similar

users liked and these algorithms are in

general more accurate in most cases but

they have a big disadvantage that is one

of the disadvantages is the cold start

problem which means that they cannot

make predictions for new users and new

items since they do not have historical

data about them and there are of course

Hebrew recommender systems that combine

these two approaches and when building a

recommender system for a content

delivery platform there are some

additional complications that need to be

taken into consideration like for

example the absence of explicit ratings

which is the case in our implementation

since we do not have explicit ratings

from users since that would be probably

annoying for them so we only have their

information about what they watched when

on what device for how long and so on

and there is also an important

information about their preferences the

other very important thing is that

typically for an IP TV one account is

used in one household which means that

the user data is actually the data about

a couple of different people and then

there is the thing that prices can vary

over time maybe some items are not

available at all times

some additional business rules like

filtering of adult content and of course

the performance issues since typically

recommender systems deal with large

amounts of they

I need to serve a large number of

customers okay

we built a recommendation system is part

of our content analytics platform which

as tomislav explained uses the vertical

platform in the architectures so the

source data for recommender system is

also stored in the vertical database and

basically we used detailed activities on

the users which is records on basically

every click on the remote machines so

even if he paused or changed the audio

or something like that so a lot of

pre-processing of the data needed to be

done to clean it and prepare it for the

algorithms and to take only the records

that actually apply to the watching of

the movies and the shows and basically

the two tables that we derived from this

detailed data look something like this

but for the on the right corner the user

item interaction table doesn't have

explicit ratings but derived implicit

ratings that I will explain a little

later and the other is items metadata

which it contains information that we

have about the movies and these two

tables are refreshed and maintained on

daily daily basis our chosen approach

uses collaborative filtering model based

collaborative filtering where users and

items are represented by a set of Latin

factors or features that are derived

from the patterns in watching in the

users watching the movies so these are

not necessarily humanly interpretable

features like the movie channel there

are just dimensions used to make to

estimate the users unknown preferences

and to learn these factors matrix

factorization techniques are used and

the tools and techniques we are we used

include spark apache spark which is

a big data tool parallel processing

engine that fits in our big data

architecture we used this part machine

learning library in Python so PI SPARC

and since the source data is in vertical

used SPARC vertical connector to

communicate between those two the

connector allows us to in a simple

manner read the data from a vertical

table in SPARC rdd's or data frames and

the other way around to save the

calculated results back from spark in a

vertical table and sparks implementation

of collaborative filtering uses

alternating least squares to learn the

Latin factors and the thing about Sparks

implementation is that since CLS

calculates these features independently

of others it can leverage

parallelization for better performance

on large data sets as I mentioned we

don't have explicit ratings so we

calculated some kind of implicit

estimated preferences we basically use

the percentage of the show that the user

actually watched as some kind of

estimation of his preference for that

show so we derived implicit ratings by

this simple formula the highest rating

is 5 if the user watched the entire show

and the smallest rating if he watched it

is 2 which means that he at least

started to watch it so it means he at

least showed some kind of interest for

it and the more he watches the greater

the level of confidence in his estimated

preference okay and the results are

stored calculated in SPARC the

calculation is scheduled on a daily

basis during low activity periods like

during the early morning or night and

the results are stored back in Vertica

top ten recommendations for every user

and the idea is to integrate the IPTV

platform with results to serve the

customers with top ten recommendations

while they are watching something well

while they are searching for something

to watch and because of the cold start

problem we also save and maintain a

table with the most popular items to

show to new users so in conclusion we

realized the importance of

recommendation system and their role in

customer experience for content delivery

platforms so we developed content

analytics systems for for content

delivery services that has two main

features one is the part that tomislav

explained in the first part of the

presentation the processing of

structured and unstructured data from

the continual aerial delivery platform

and the other is the content analytics

module which includes the recommendation

system so that would be all if you have

any questions thank you

we received seven questions so far

question number one how do you recommend

something when multiple users watch

using the same top box you mentioned

that earlier so how do you measure it we

are currently not dealing with that

problem but we are planning on take

taking that into consideration in the

future basically the idea is to look at

the times we are assuming that maybe if

it's a family maybe the parents are

watching the TV at night and maybe the

child is watching in the morning so we

would look had time to take that into

consideration I will go straight with

the connected question how do you handle

GDP are constants having in mind that

one receiver is used by multiple users

how are we handling gdpr of course we

are I honestly do not know I have

anonymize data for my algorithm so which

language you used for Kafka is Park Y

vertical / Cassandra or some other NoSQL

technology for SPARC how I used Python

so for vertical item for first Park

you mentioned the Cassandra couldn't

handle rights could you elaborate is

concerned ray is the fastest no SQL

database for writes every node can

handle every operation it depends on the

scalability at the time of development

because Cassandra in theory was the

fastest one but in reality when the data

was stored - Cassandra they were

bottlenecks even though the data can be

placed into the Cassandra cluster having

a small number of clusters came came to

be the biggest problem so it can be fast

but you need to for the Cassandra need

to scale it really to the high level in

order to compete for with vertical

Christo's asks what types of programs

are the most difficult to predict

accurately and why on the other hand

what programs are the easy ones to

predict what programs well I don't know

about programs but I guess the easiest

to predict people there are the easiest

to predict for are the ones that have a

specific tastes like they only watch

comedies or thrillers or SF or something

like that

what are performances between skylines

spark supports kaput Scala and pythons

so it doesn't really matter I think

there are some marginal cases where

Scala is faster but I I think in this

case it it really doesn't matter you can

use what do a language you know better

what is the size of the data set and why

do you need spark I mean we don't

necessarily need it we wanted to build a

big data we wanted to use big data tools

and have an architecture that is

scalable so in case we don't we have

requirement and the data sets gets even

bigger we can scale it which spark can

you can just add new nodes so the data

set includes I think a couple of hundred

thousand users and their daily data

which is every click every action they

take so it is big big data it will be

all thank you very much on behalf of the

organizers I would like to give you the

certificates thank you for contributing

to the data science conference

[Applause]

Resources:

Similar videos

2CUTURL

Created in 2013, 2CUTURL has been on the forefront of entertainment and breaking news. Our editorial staff delivers high quality articles, video, documentary and live along with multi-platform content.

© 2CUTURL. All Rights Reserved.