Published June 18, 2023, 8:20 p.m. by Courtney
The talk is planned to be a mixture between a description of fast streaming architecture on which the system was build on and the IPTV recommender system that Poslovna Inteligencija has developed. The overview of the topic will be given in the introduction, following by description of content delivery services and the data produced by them and how it is used in the customer experience. Next the overview of recommender system will be given together with the architecture of the content analytics system and the implementation (including the algorithm) of the recommendation engine, which is a part of the content analytics system.
We will cover the architecture of the system and the implementation equally, not giving the priority to any of the parts. We plan to present the reasons for the system creation, the progress in the architecture based on tests and the final implementation of the algorithm, together with the presented difference between various algorithms used for same or similar purposes. The content of the topic is backed with the article on the mipro 2018 conference and is the current highlight of the Innovation & Development department in Poslovna inteligencija.
This talk was presented by Mr. Tomislav Hlupic, Consultantat at Poslovna Inteligencija and Iva Soric, Consultantat at Poslovna Inteligencija, during data Science conference 4.0, as a part of Big data track.
You can find this talk presentation on the following link:
https://www.slideshare.net/Insitute_of_Contemporary_Sciences/building-an-recommendation-system-for-iptv-on-a-fast-streaming-architecture-tomislav-hlupic-iva-soric
Facebook: https://www.facebook.com/DataSciConference/
Flickr: https://www.flickr.com/photos/data-science-conference
To watch more new videos regarding data Science - click subscribe to our YouTube Channel.
You may also like to read about:
we start off with Tomislav loop it's a
consultant at post online tickled
against ya regional leader in bi
solutions and implementations where he's
gaining expertise in banking and retail
business intelligence systems working in
Microsoft bi stack Tom Isla will talk a
bit about fast streaming architecture
and the IPTV recommender system that
post online intelligentsia has developed
Tom Isla will be joined by his colleague
give a shortage who is also consultant
at post online intelligentsia and I
would remind the audience to post their
questions through our platform Tommy's
father Eva the floor is yours thank you
[Music]
first of all we would like to thank the
Institute for contemporary census for
having us second year in a row we are
really proud to be speakers that is
great conference also we hope to be
attendees speakers in the future and
today we will present the recommendation
system built for IPTV on the first
streaming architecture which was done as
a proof of concept and already
implemented on several vendors platforms
I will be talking mostly about the
technical and architectural stuff well
Eva who has developed recommender system
will talk about the recommender system
itself
maybe the button isn't working
so before we start with the presentation
itself I would like to present our
company we are the leader in the
business intelligence for the
southeastern Europe region but our focus
is not only on this region but also on
the projects in Europe and in the world
we have over 200 realized projects
several ongoing projects modern more
than 90 I'd say maybe over 100 users in
20 different countries over 110
employees most of them are consultants
or implementers of the systems we have
20 business analysts 5 project managers
our offices are situated in Zagreb in
polgár itza Belgrade in Sarajevo and in
London and in Vienna lately our
expertise is over 600 many years so
multiply mandates towards year so you
can see how many hours we have invested
in implementing the solutions and the
earth in and in ourselves as the
consultants gaining knowledge on the
different projects our fields consist of
analysis of the of the client's needs
the design of the solution development
of the solution implementation after the
implementation we provide the support
constantly for the users and finally our
last field of expertise is it education
they have opened an educational center
two weeks ago we have developed our own
educational track also based on our
expertise so the agenda for the
presentation is the introduction where
we will present the content delivery
services the different types of the
delivery services also the difference
between IPT several IPTV not to diabetes
the digital solutions used for the
content streaming then we will present
how we did the analysis analytics system
what would the business requirements
what was the technical solution after
that IVA will present the implementation
and the development of the
recommendation system which was a part
of the continent relief system developed
by post-metal again SIA and they will
will give the conclusion of the whole
system and what we have presented so the
main idea behind building the
recommendation system is that the
content provided by the communication
operators is given to the users all the
time but we need to tailor it to the
users needs so the operators can gather
as much data as they can even if it's
based on one click that you have on your
remote and you need the systems built
specifically for the analysis and for
collecting the data which is growing
rapidly
imagine collecting the data from all the
set of boxes provided by some operator
you will get every click every every
channel change every volume change
everything is collected everything is
written in the system and afterwards it
needs to be analyzed in order to provide
the recommendation system the valuable
data which is used to build up
recommendations for every and each user
the analysis is done in the real time so
we had to develop a solution that is
able to consume the data in the real
time meaning all those source systems
can give the data without without having
some delays
in the in the processing so there had we
had to develop the integration system
which can provide which can provide the
real-time streaming support and the
batch support by that we gave the
operators the possibility to maximize
the revenue and to minimize the costs
imagine giving the operator a
possibility to minimize the cost of
developing a new system or buying some
TV rights by knowing what their users
will actually watch besides that
operators can serve their customers
better so I know how many of you are
using the IPTV but sometimes the
channels change and users especially the
ones related to sports are quite
affected by it because different sport
channels provide different content so if
you lose a channel and you didn't know
your users were focused on it and we're
using it you can lose those users so it
besides minimizing all that you can be
you can minimize the churn of your users
and finally by having tailored customer
at a low customer content you can reach
the highest possible level of the their
customer experience so there will be all
the way drawn to your to your system
and to the content you provide for them
therefore the operators and the content
providers need the system which will
help them to give a comedic
recommendation for every user the
difference between systems when you're
talking about the TV and the
broadcasting is that some of the systems
can give you only the real-time content
those are the history so let's say
historical systems nowadays though
they're digital
but it's that they're terrestrial TV the
one that you'll plug in into set-top box
if you don't have the connector on the
TV either a plug directly to your TV you
have the settle digital satellite TV
which was used more in the 90s but
nowadays IPTV has again much more usage
and finally the cable TV which is more
or less stable
all of them are digitalized but have the
downside of having only the real time
streaming meaning once the broke work is
done you can go back and watch some
content you wanted to watch like 30
minutes ago
maybe you got stuck in traffic and got
home on time and afterwards you can
watch anything either you have to watch
the rerun somewhere later also the
downside for the recommendation system
is that the data can be bind it to a
particular customer or it can be done by
using some sophisticated engines which
are placed before the X before the TV
which gathered the data about the usage
streaming services on the other side are
using the internet IP T protocols to
distribute the content by the seeming
services we mean the IP TV so you know
the set-top boxes that you received from
your IP provider also the all the top
solutions which are gaining more and
more popularity lately and mobile TV
consumers using the streaming services
can watch both the content in real time
and go back in the history without any
hassle and without mostly without having
to pay additional cost for it there is a
difference between IPTV and the OT team
first of all IPTV is provided
to you by the local telecom in Croatia
it's there are to tell comes providing
it more or less in Serbia probably every
telecom has their own set of content
it's binded to the local network so the
router you have your good from your
provider has a dedicated network you can
plug the IPTV into any port it has
usually two ports for the internet and
two ports for the IPTV the receiver is
that set-top box you got and the display
device is the TV OTT services are
provided by this by some studios by
channels or by independent services so
they can also be provided from the
telecoms but not necessary the content
you receive via public internet and via
the local telecom meaning the content
generator is generated through the
internet distribute over Internet but is
distributed to local telecom operator
and the content is purged by purchased
by the consumer and the receiver finally
is the one that you actually use whether
it's your computer whether it's your
mobile phone whether it's your laptop or
it can be the TV with embedded systems
the display device is the screen
provided by you so it can be the mobile
screen the TV screen or whichever screen
you're using to display the business
requirements for the recommender system
came from the providers so even the
analytical system needs to provide the
data and to understand the consumers
behavior to understand which content
they can consume to which channels and
at what time and at what device since
that differentiates
the usage on the mobile devices or at
home the system needs to analyze the
performance of the survey of the
packages it needs to use the rating
informations it needs to segment the
customers into different clusters based
on their behavior and finally the system
itself should approach the customers
with the appropriate offers based on
their behavior the solution that built
consists of the kafka engine which I
will describe later the storage system
is vertical for those who don't know
Vertica is the corner storage which is
working really really fast in the prowl
in the past Cassandra was taken into
consideration also as a corner storage
but just the speed and volume of the
data that was coming
Cassandra couldn't handle Vertica also
offers several different connectors one
of which we have used for connecting to
the spark to the spark engine and for
for the visualization we have used the
tableau also there is a custom tailored
application developed by our innovation
center research department and all the
recommendations are written back all
into their specific tables in vertical
the source systems are the IPTV so the
set-top box is the video on demand the
OTT the OTT providers the system the
source system could also be the mobile
device meaning the the content on the go
the kafka is used for ingestion and in
the bed it can be processed in the real
time or in the bench
so the Kafka cluster is put before
before this storage solution the data is
loaded by running a series of copy
statements each loads the small amounts
of data so the so the volume is not on
the highest peak at all for the
real-time streaming it can also
automatically load the data into the
vertical as it streams to the Kafka
channel the Kafka channel gets the
message against the data as a message
usually the data is put in a JSON or
every format JSON is mostly used also on
the web platforms is a semi structured
data but the custom parser can be built
for any source system so this base based
on the system that you are using the
same integration type can be done the
feed of message is coming to Kafka is
group to form topics and those topics
are later divided into partitions which
are fed to vertical target tables and
store the data on the business side the
recommender system provides the
real-time usage
dashboards and the reports they can be
done based on the geolocation so the on
the map view you can see the trend
analysis you can drill down using the
tableau dashboards and the reports to
see the the different granularities of
the data and you can calculate the
different metrics the system provides
the behavior of the user targeted to
each user specifically you can see the
behavior through channels you can see
the behaviors to bind it to the content
finally the behaviors and the usage
based on the device based on the
delivery types and based on the actions
that were taken
so each action every change of the for
example every changes the channel gives
you the different behavior type and can
give you a different value finally
putting the customer into different
segments and the business value is the
predictive model for segmentation of the
customers the recommend the
recommendation for the customers and
finally the cross in the upsell and I
will be give the floor to Eva thank you
okay I'll be talking a little more about
recommendations module inside of our
content analytics platform first of all
why recommender systems this topic was
already covered on the conference so
it'll be a really short introduction the
main reason is the information overload
problem which for example for an IPTV
means that the user sits in front of a
TV in the evening and he wants to watch
a movie and he doesn't want to spend an
additional hour browsing through
thousands of channels searching for
something to watch so he needs a little
help in the information discovery so
companies use recommendation systems to
solve the information overload problem
to improve customer experience and of
course if possible to increase revenue
with a potential cross-sell and upsell
possibilities so the idea is to approach
cost consumers with special offers and
personalized recommendations and this
field is not very new there have been a
lot of advantages through the years in
the recommendation systems and basically
there are two types of approaches that
are mostly appearing in these kinds of
implementations the first is content
based recommender systems which
basically built profile to characterize
both users and items and for example for
movies that characteristics can be the
movies are director his popularity
starring actors and so on so these are
really concrete measurable features
while in collaborative filtering there
are more these algorithms are more
focused on the interaction with the
platform so basically they only take
into consideration the user's activities
this can be explicit ratings data like
if a user is asked to rate every movie
he watch Twitter rating from 1 to 10 or
if that information is not available
then we have their activities and their
past behavioral data like clicks
purchases views and so on so the idea
for collaborative filtering is to make
predictions based on what other similar
users liked and these algorithms are in
general more accurate in most cases but
they have a big disadvantage that is one
of the disadvantages is the cold start
problem which means that they cannot
make predictions for new users and new
items since they do not have historical
data about them and there are of course
Hebrew recommender systems that combine
these two approaches and when building a
recommender system for a content
delivery platform there are some
additional complications that need to be
taken into consideration like for
example the absence of explicit ratings
which is the case in our implementation
since we do not have explicit ratings
from users since that would be probably
annoying for them so we only have their
information about what they watched when
on what device for how long and so on
and there is also an important
information about their preferences the
other very important thing is that
typically for an IP TV one account is
used in one household which means that
the user data is actually the data about
a couple of different people and then
there is the thing that prices can vary
over time maybe some items are not
available at all times
some additional business rules like
filtering of adult content and of course
the performance issues since typically
recommender systems deal with large
amounts of they
I need to serve a large number of
customers okay
we built a recommendation system is part
of our content analytics platform which
as tomislav explained uses the vertical
platform in the architectures so the
source data for recommender system is
also stored in the vertical database and
basically we used detailed activities on
the users which is records on basically
every click on the remote machines so
even if he paused or changed the audio
or something like that so a lot of
pre-processing of the data needed to be
done to clean it and prepare it for the
algorithms and to take only the records
that actually apply to the watching of
the movies and the shows and basically
the two tables that we derived from this
detailed data look something like this
but for the on the right corner the user
item interaction table doesn't have
explicit ratings but derived implicit
ratings that I will explain a little
later and the other is items metadata
which it contains information that we
have about the movies and these two
tables are refreshed and maintained on
daily daily basis our chosen approach
uses collaborative filtering model based
collaborative filtering where users and
items are represented by a set of Latin
factors or features that are derived
from the patterns in watching in the
users watching the movies so these are
not necessarily humanly interpretable
features like the movie channel there
are just dimensions used to make to
estimate the users unknown preferences
and to learn these factors matrix
factorization techniques are used and
the tools and techniques we are we used
include spark apache spark which is
a big data tool parallel processing
engine that fits in our big data
architecture we used this part machine
learning library in Python so PI SPARC
and since the source data is in vertical
used SPARC vertical connector to
communicate between those two the
connector allows us to in a simple
manner read the data from a vertical
table in SPARC rdd's or data frames and
the other way around to save the
calculated results back from spark in a
vertical table and sparks implementation
of collaborative filtering uses
alternating least squares to learn the
Latin factors and the thing about Sparks
implementation is that since CLS
calculates these features independently
of others it can leverage
parallelization for better performance
on large data sets as I mentioned we
don't have explicit ratings so we
calculated some kind of implicit
estimated preferences we basically use
the percentage of the show that the user
actually watched as some kind of
estimation of his preference for that
show so we derived implicit ratings by
this simple formula the highest rating
is 5 if the user watched the entire show
and the smallest rating if he watched it
is 2 which means that he at least
started to watch it so it means he at
least showed some kind of interest for
it and the more he watches the greater
the level of confidence in his estimated
preference okay and the results are
stored calculated in SPARC the
calculation is scheduled on a daily
basis during low activity periods like
during the early morning or night and
the results are stored back in Vertica
top ten recommendations for every user
and the idea is to integrate the IPTV
platform with results to serve the
customers with top ten recommendations
while they are watching something well
while they are searching for something
to watch and because of the cold start
problem we also save and maintain a
table with the most popular items to
show to new users so in conclusion we
realized the importance of
recommendation system and their role in
customer experience for content delivery
platforms so we developed content
analytics systems for for content
delivery services that has two main
features one is the part that tomislav
explained in the first part of the
presentation the processing of
structured and unstructured data from
the continual aerial delivery platform
and the other is the content analytics
module which includes the recommendation
system so that would be all if you have
any questions thank you
we received seven questions so far
question number one how do you recommend
something when multiple users watch
using the same top box you mentioned
that earlier so how do you measure it we
are currently not dealing with that
problem but we are planning on take
taking that into consideration in the
future basically the idea is to look at
the times we are assuming that maybe if
it's a family maybe the parents are
watching the TV at night and maybe the
child is watching in the morning so we
would look had time to take that into
consideration I will go straight with
the connected question how do you handle
GDP are constants having in mind that
one receiver is used by multiple users
how are we handling gdpr of course we
are I honestly do not know I have
anonymize data for my algorithm so which
language you used for Kafka is Park Y
vertical / Cassandra or some other NoSQL
technology for SPARC how I used Python
so for vertical item for first Park
you mentioned the Cassandra couldn't
handle rights could you elaborate is
concerned ray is the fastest no SQL
database for writes every node can
handle every operation it depends on the
scalability at the time of development
because Cassandra in theory was the
fastest one but in reality when the data
was stored - Cassandra they were
bottlenecks even though the data can be
placed into the Cassandra cluster having
a small number of clusters came came to
be the biggest problem so it can be fast
but you need to for the Cassandra need
to scale it really to the high level in
order to compete for with vertical
Christo's asks what types of programs
are the most difficult to predict
accurately and why on the other hand
what programs are the easy ones to
predict what programs well I don't know
about programs but I guess the easiest
to predict people there are the easiest
to predict for are the ones that have a
specific tastes like they only watch
comedies or thrillers or SF or something
like that
what are performances between skylines
spark supports kaput Scala and pythons
so it doesn't really matter I think
there are some marginal cases where
Scala is faster but I I think in this
case it it really doesn't matter you can
use what do a language you know better
what is the size of the data set and why
do you need spark I mean we don't
necessarily need it we wanted to build a
big data we wanted to use big data tools
and have an architecture that is
scalable so in case we don't we have
requirement and the data sets gets even
bigger we can scale it which spark can
you can just add new nodes so the data
set includes I think a couple of hundred
thousand users and their daily data
which is every click every action they
take so it is big big data it will be
all thank you very much on behalf of the
organizers I would like to give you the
certificates thank you for contributing
to the data science conference
[Applause]
2CUTURL
Created in 2013, 2CUTURL has been on the forefront of entertainment and breaking news. Our editorial staff delivers high quality articles, video, documentary and live along with multi-platform content.
© 2CUTURL. All Rights Reserved.