The Future is Agentic in Recommender Systems

Welcome to Data Skeptic, a podcast exploring the methods, use cases and consequences of

recommender systems.

Welcome to Data Skeptic Recommender Systems.

We've touched on the classic methodologies like collaborative filtering on a lot of episodes

previously.

We also how could we not talk about the impact of large language models and other generative

models in this space?

You know, any of us could go to a generic LLM and ask it to recommend a book or movie

or band or something.

Is it not a recommender system?

So of course, there's big questions about what the recommender system pipelines of the

future look like.

And are they hybrids?

How do they merge various methodologies?

We get into a little bit of the categories today, like where they can be agents as recommenders,

like replacing classic models, other cases where there's just an agentic augmentation.

Today we discuss some agentic approaches and also get a quick overview of the forthcoming

book recommendations with generative models that our guest today is written.

We'll get into that as the focus of our interview today.

My name is Yashar Delju.

I'm an associate professor at Polytechnic University of Bali in Italy and I'm a senior research

scientist working in the field of recommender system for about 10 years now.

A lot's happened in 10 years in the recommender system field.

How did you first get interested in it?

My interest originally to the fields actually started from the broader area of artificial intelligence.

At that time, when I was a master student, I was working in the field of electrical engineering

and especially dealing with image processing and those kind of works.

It's quite fascinating to know, even those fields, like image processing, where you used

to work with pixels and extract descriptors from images and so on.

Even those kind of fields has evolved a lot.

Now people are doing this with computer region with machine learning models that can extract

holistic descriptors, let's say there and use that in the downstream task.

I did my PhD at Polytechnic or DiMidano in the field of computer science and with the

focus on the recommender system.

At that time, due to my previous studies on multimedia processing, I did my PhD within

the field of radio recommendation.

Essentially, my work involved reaching the field of multimedia processing in recommender

system in particular exploring the use of multimodal signals to be used in downstream

and the recommendation task.

Things have evolved a lot since then and I would say my work gradually went into the

direction of trust for zero recommender system where you would like to have resilience,

recommender systems or models against different types of adversaries and the directions around

biases and fairness of them and this is what the community mostly knows me on, works

on trust for zero recommender system.

Since 2023, due to the emergence of large language models, especially recognized by

CHCPD or models like that, I would say there was a hit to every subtopic within the field

of artificial intelligence and even broader than artificial intelligence, including recommender

systems.

So these models recognize mostly by large language models.

One notable thing was that they are very good in conversing with humans and they had very

natural working conversation with humans that can answer questions to tasks and question

answering, which were very good at or they looked very good at what is essentially one

thing that made this system quite not for and obviously question answering search and

recommender system are very close area so definitely one can imagine that how much these

models could bring advantages to the users, not only by being able to provide new types

of personalization, but also being able to engage in user friendly conversations.

So that's a short background.

I wonder if we could touch on the topic of trustworthiness first in recommender systems.

I'm not sure every listener is going to have a sense of maybe what that means like you

mentioned that there could be an adversary.

What is adversarial activity when it comes to recommender systems?

Right.

So there are different ways we can look at the topic of trustworthie AI or responsible AI

from different perspectives.

One perspective is to look at it from this kind of how objective or subjective the goals

of trustworth's responsible AI are.

To give you an example, if you want to categorize five classical main dimensions of trust or

trustworthie AI, we can say they are about generalizability on the very top layer.

Then below you have robustness.

Below robustness you have a topic of privacy.

Then you have explainability and then you have fairness and biases.

So these are like the very five key dimensions.

Well, from top to bottom, when you go, the definitions of what trustworthie means become

more and more black.

For example, the first layer which is generalizability, as we know, is the goal of any machine learning

model that is trained on some historical data.

Right.

We would like these models to learn from the past and predict future in some sense.

Right.

And this is recommender system or no exceptions.

Then you have the topic of adversarial resilience or adversarial robustness.

But the goal here is that these systems are robust against attacks, machine learned attacks

or human crafted also could be included in that category.

And the goal of these attacks generally are twofolded.

Either there is a direct goal of what we call targeted attack.

For example, for commercial penetration, you are a company you would like to, in the

case of recommender system, you would like to penetrate in the business market.

So from a producer perspective here in a time would be, how can I like change my product

appearance and descriptions, even in a dishonest way to be able to penetrate the business markets.

So you would like to push your items into recommendation list of users.

The more you do, the more aggressive economical gain you get.

No.

And the other thing is that on target, to generally speaking, you want to target certain

products or categories to demote them in their recommendation list of users.

So essentially, these layers from top to bottom bring you go.

It becomes harder and harder to define the definitions of the, so for example,

of fairness and biases, it's in the recommender system community.

There are definitions of what fairness and error and biases mean.

But there is no consensus exactly which perspective of fairness and bias we should look into,

because recommender system working the multi-stakeholder environment consumer and producers.

Whereas it's much easier to define and quantify adversarial robustness.

Typically, this is done by looking at how much you can degrade the quality of recommendations.

So on the risk side, these are, let's say, five or six components.

We can say there are key components of adversarial resilience, trust for C.A.I.

or responsible AI.

Now after 2023, in particular with the emergence of AI,

the risk category have in a way exploded.

What kind of new risk or imagined risks we could expect from systems like

cloud language models and so on, there are inevitably different from what we know before.

So a very clear example is that of hallucination or context drift.

When you're talking with cloud language models,

these systems are susceptible to different types of hallucination.

And this hallucination could be on what they, for example,

in the consults of question answering what they show to the user.

So you're talking about movie, movie recommendation.

Recommendation are about movies that don't even exist in the catalog or they exist,

but the metadata produced is simply wrong.

So different types of hallucination, we can say at the movie,

at the item level, at the metadata level, or factual,

or even context drift where you start talking with these models.

And you start with a subject you have a goal and over the turn of conversation,

you see that the model is not able to memorize like,

well, what is your real goal of starting this conversation?

And I think this is something that you notice,

like in the very early version of large language models activity in 2023,

they were not very good in memorizing things.

So they used to change the course of conversation very easily with the user,

forgetting what the core core is.

So essentially the risks with this, we can say we had a word class here,

CRIR, trying to draw holistic evaluation of large language models,

which essentially tries to categorize the emerging risks.

And there are two different broad categories of risks we can look into.

Those are the emerging and those that are the old one,

but we can say exaverated.

So exaverated when you talk about fairness and biases,

you can say that those type of risks exist,

but with large language models, we can claim that they have been exaverated.

A very simple example is that of large language models,

because they have been trained on vast amount of internet data,

which are unregulated,

you can expect that they would project a lot of stereo types, right?

The second type is new emerging risks like hallucination

and much more, which we can say categorize new categories that are still under research.

So when I first started looking at recommender systems,

which was maybe around the time you did as well,

the big ideas were things like collaborative filtering

and maybe the a priori algorithm,

but you had these core ideas that were all about item item comparisons

or item user kinds of things.

And even when LLM hit the scene,

at least to me, it wasn't immediately obvious

that these would be useful in recommender systems.

When did you see that they could be useful?

I think they're certainly useful.

And we can look at it from maybe different perspectives.

I can try to elaborate on two.

One of them is the ability to engage in natural conversation with the users.

So conversation on recommender system in particular,

which is now, I think you can see it as kind of the future of recommender system

because everything now we are,

that a lot of tasks, especially since 2003,

we are doing has become much more conversational.

Like a lot of things we want to watch,

a lot of things we want to do,

a lot of things we are constantly elaborating,

discussing with large language models

and there are different types of suggestions we are receiving, right?

And I think this is something in general nice

because you can see there's a smart machine.

As smart machines are trying to act by friends to the user, right?

They can potentially increase the user trust in some way

that they are able at least to produce natural looking things

that can help you to debug your code,

to debug problems that you have with some machines

or ask for basic services and so on.

So this is something nice.

So I think certainly they are doing

and they have been doing a lot of the things to conversation AI

and the field of conversation on the conversation system.

I think this is what people mostly remember

from large language models.

The second is also their ability, for example,

to look at the personalization in maybe different ways.

We know, as you mentioned,

gravity filtering are the mainstream models

that are still using a lot of tasks in industries.

And one way what you can look at these models

is that they have the ability to extract relationships

between different entities that you have in your system,

I weather item items, users, users and so on.

So they're very good and we can in some way say

they have reached a very good level of maturity

or saturation on what, how much knowledge they can get fits.

But essentially that's it.

If you see from last couple of years,

there are not so many new models,

that's a new collaborative filtering models,

which is mind blowing, I've been proposed.

Now, because essentially they are,

we can say that they are pretty good

on extracting this knowledge from the in domain data.

Thanks to deep learning models

that pushed this frontier.

But large language models potentially have the ability

at least augment this thing from new perspectives.

You have, now you are talking about models

that are connecting you from the in domain data

that you use to train your models to an external board.

And you can extract knowledge and information

from this external board.

So you can look at this like a very, very big graph

where if you previously wanted to find item item relationships,

you had to look at the historical behavior of the user

and how much there is similarities.

Now, the similarities obtained from the bigger world,

which contains a lot of information

about the entities that you're looking for.

So the minimum we can say is that the ability

to provide or to augment is the best term we can use here.

What we did before or how much we could provide personalized

services now, we can say potentially,

they have the potential to improve this.

And you can also look at this from the perspective

of content processing multimedia signals

that are very useful signals.

And they were not predominantly used

in the built-up recommender system.

Although we have different models that use them,

but now with large language models and BLMs

and other categories, they're essentially

we can look at them like being able to better digest

this type of signals and use them in the downstream test.

Of course, let's not forget this doesn't come with risks.

Can any new emerging technology

has its own risk and amount of risks?

So one need to use this system with care

depending on the task magnification.

Well, the traditional recommender system

where collaborative filtering really unlocked it,

the experience I'm used to is more or less a ranked result.

And then maybe I give some feedback and it can re-rank,

but it's essentially trying to put the things best for me

at the top and worst for me at the bottom.

Are you envisioning that LLMs just help do a better job

of that or are you picturing something more revolutionary?

Actually, both, especially with the emergence

of language agents, we can say what you very well mentioned

as the core goal of recommender system.

Please rank me these list of items that best serve to me.

This was the say the core goal of recommender system, no?

Now instead of do these ranking for me,

we can say with large language models and language agents

where we are heading to is being transformed to,

please do these tasks for me.

So from rank, they're going to tasks.

And these tasks are more complex than ranks.

They're commonly used to, let's say, multiple constraints

of the user.

To give you an example, let's say for the task

of travel domain, you are, we are getting close

to the eastern holidays.

We would like to have some findings where to go.

And there are typically a number of constraints

we with our family member looking to.

What I should go that looks good for me and my family.

How does the better that looks like?

Let's say, what is my budget?

And how much that matches that?

Maybe other constraint, how eco-friendly it is

for someone who cares about this.

How much it's good for its child friendly?

What is the distance?

By many different constraints, right?

How could we do this with classical recommender system models?

The collaborative filtering models,

you didn't even have the chance

to expose this constraint to this model.

Now they were used to take your historical data

and say, go, go to this.

And if there were some package and intrusive recommendation,

they're not really able to do this very basic task

that any human could do.

So what we used to do, I can explain better,

was to do each of these tasks individually,

open up several browsers, not do these tasks

and that task and that task,

and essentially collect all this information

and use us as the brain to come up with the decision

where is the best place for me to go.

Now this brain is being replaced by large language models

and I'm talking about the case of language agents.

So they'll define as systems that,

in which you have an LLM as the brain

and different things that you receive

are looked upon as tools.

So you have tools which collects better information,

which collects budget information,

like the costs, which collect,

which cause this service and that service

and that service brings them back to a central LLM,

for example, and that decision making is being done.

So you can see in a way,

the model information now,

these models can digest has increased

and the complexity of tasks they can achieve

is also being improved.

And I think this is quite fascinating.

One thing that we can say here,

of course, this would make us as a user

perhaps even more greedy to come up

with harder and harder questions.

Now how do we, we are human?

We would like to explore things that we couldn't do.

And this could be applied to anything,

like from a stock market prediction to tasks

in which users have a couple of questions,

a multiple constraints, no?

These are the changes that revolutionary,

as you said, changes that we that years ahead,

we are facing these models.

And these go much beyond what collaborative

13 models can do to do.

So systems that are dynamic,

that are engaging,

that can do multiple tasks for you,

with such up to,

that can to task for you,

such up to multiple constraints.

And at the same time,

looking natural and you know, for instance.

So yeah.

Well, in my mind,

collaborative filtering was a tremendous success,

but maybe it hit a ceiling.

It can only recommend as good as its methodology allows.

And there's a big appeal here,

especially if I'm picky,

that the LLM agent could help me.

Maybe I want a fancy place for an anniversary dinner,

and hopefully it has a Sumaliye,

and hopefully it has a separate dessert chef

from the main chef,

and you know, very picky restaurant,

snob qualities like this.

Not every collaborative filtering system

will have those features,

but an LLM could go out and get them.

But then how do you envision like a synthesis here?

How does the insight of the agents merge

with collaborative filtering?

Or do we leave that behind

and we just go with agents?

I don't think we should leave that behind.

Of course, a steel,

and even in the years ahead,

steel collaborative filtering are,

let's say, trustable source.

Or let's say any classical machine learning model

is a trustable model

that you can always use in your downstream task.

Why trustable?

Essentially they have learned

they're good to know from not using your own data,

how to use them to provide sets.

So they provide a minimum of quality, right?

Which is acceptable.

Like let's say, for example,

talking about things like fashion recommendation, no?

You know, these applications now online,

especially after Khalid have become much popularized,

people look into applications

like the London, so on,

to purchase fashion clothes, you know?

So these systems, these models,

the fashion and community system behind engine,

typically use a lot of collaborative style models

or classical machine learning models,

because they know their tasks,

they know what they're trained for on and so on.

However, you can imagine that having that level of maturity

and subset getting a base ceiling of equality,

one would like to push the boundaries a bit further.

So, for example, to be able to, in the case of fashion,

you can imagine a lot of our decision making

is based on the visual appearance

of how nice the fashion item looks like.

And for example, when you look at outfits,

you can see how two different pieces of clothing

go well together, no?

So, one can imagine with models in this case,

multi-modal models powered by genitive AI.

What an advantage they could provide

is to visualize after shelf different possibilities

that user could have depending on his interests.

So, for example, to preview based on a clothing item

that you have, what could match well with that?

Even if that product doesn't exist, right?

It can help the user to engage with conversation

to provide suggestions and so on.

And then based on what the user finally decides,

go and build that item, go and find item

that is closest to what the user wants, no?

So, they can keep on one hand,

user engaged in conversation, like to increase the time

spent on the platform and businesses

to help them with the gold,

but also in some way to increase

what is called in the fashion industry, this awareness.

An awareness essentially means a lot of time

you go on, you wanna buy an item,

you have an idea what you wanna do,

and you see, oh, there is this thing nice there,

it is shown on the website or maybe on the shop,

you don't wanna buy that item,

but you remember there is this thing

and you remember that in your future purchases, right?

So, it's always nice from a business perspective

to increase this awareness of different types of products

that you have in the capital, so on.

One thing that I would say models,

these models, not language models,

and as I said, I include is their creativity, no?

Their creativity, looking at the problem,

maybe from a new perspective is something that,

if it's used with care,

if it's used with care could be nice.

Of course, there are difficulties and risks

and challenges in doing so,

because these models may not be fully under control.

They may not fully follow user,

our system designer instructions,

what to do and what not to do.

They're exposed to saying hallucination relies,

and unfortunately, one of the bad things about this

hallucination is that they look natural.

So, if they're used in task-sensitive applications

like medical domain,

you may see a system producing drug or medicine suggestions,

which is not wrong, but they don't seem wrong,

and this is something that's scary,

because they look very persuasive.

This is something that one needs to look at with care.

I definitely had that on my list to bring up with you.

We both know that LLMs can hallucinate.

That's not unique to recommender systems and agents

and things like that, it happens everywhere,

but how do you envision building reliable systems

with this perhaps crack in the foundation?

Well, this is something that I guess is an open field right now.

It's not binded to recommender system,

but broader field of machine learning and AI,

and that depends on the type of risks

that you're more focused on in general.

There are things we can do at the prompt level,

like very basic things.

For example, you can go from within in context learning,

you can try to provide examples of the system,

like future examples where the system can see and can learn.

There are things that you can, so you can go further,

you can try to deal with the model parameter itself,

and to have fine tuning that have trust objectives there.

Of course, retraining these models from scratch is costly,

so one would like to ideally have some kind of alignments

with human goals.

And this actually was at some level was down

when these models were introduced.

Probably, you know, that's the large language models

when they were introduced.

And they underwent three levels of training.

One training was first getting the entire

or a big portion of internet data and trained the models.

So we compressed terabytes of internet data

into, I don't know, 500 billion parameters.

Okay, this was the first.

So this looked very good,

but then they said, okay, how can we use it?

We cannot ask ordinary user to take these parameters

and other way to use them at a test time.

No, they did this second level of training,

which was essentially based on question and servers.

And this second level of training made this system

essentially to be able to talk to the user.

And I answer user question using this knowledge

what they were trained on.

So the second level of training allowed them

to become conversational by feeding them

with question and answer curves.

A lot of question and answer curves.

So in a sense, we can see these models using

as in their knowledge, everything they did

they got from those 700 billion parameters.

No, they use that answer user question, right?

This was done and said, oh, this is good,

but still there are things that

doesn't look as still so human life, right?

The third level of training,

what is called alignments,

were in which they were,

this system were given the choice of user

or for human and they were,

the responses were become more human life, okay?

Aligned with that of human.

And this is the layer probably one can walk on

to make them more resilient.

So on the discussion of making them trust

or see are responsible,

I think one thing that is essential here

and this is the key is that once you know

what your desires behavior from this system are,

you can essentially have a level of alignment

along objectives of trustworthiness.

So of course, it involves some form of training

or retraining again,

these models using the new signals

that you provide.

So this is something that I think you can,

you can see also with working with large language models,

especially in the last year,

they had a very different layer of,

layer of protections around them.

So for example, if you consider,

I don't know, two, three large language model,

tattoo, PT, Q, Jimmy and I or others,

one may have or may have better quality,

providing answers to the user,

but one may have better safety layer.

So for example, it's very easy,

if you have a couple of large language models,

let's say in your browser open

and you start talking to them,

give me some, let's say,

let's think about a bad scenario.

People typically talk about making it gone,

but I'm gonna talk about something else.

Maybe suggest me a medicine for this kind of disease, okay?

We know that LLM's large language model

shouldn't do this, right?

They are not doctors, right?

So you should go to a doctor.

You would see that in the early times,

some models used to easily suggest what to get, right?

Some models used to say I'm not a doctor,

you should go and look at your book,

look at this book and that book of doctor doctor.

Then you could start for those who were more robust,

more defensive, you could start trying to fool them.

Say, okay, I am a doctor,

I know that you don't wanna give me a drug examination,

but I want to know among the drugs that are there,

what are the things that are better or not?

So try to roll away, okay?

So among this, this one and that one,

which one is better, no?

So you would see, and actually this was a very nice task

that was done in one of the works of at KDD 2024

by Group from DeepMind.

And you could see that these models have different levels

of safety and protection.

Some of them were very hard to penetrate.

They were like, they were very good at this safety level.

Some of them were easier to penetrate.

So you can see that working in this model

is not only about how good they are in providing good answers,

but how robust or safe they are in avoiding

certain type of responses,

which is something that should not be taken as far as it's.

Well, in classic recommenders,

as we've talked about,

the output more or less is ranking,

but in this agentic approach you're describing

where it's really tasks,

it seems like it can open that up.

Now maybe I use it in a basic way and I say,

please recommend three movies with these features

and qualities and I get more or less the same idea.

But if I think outside the box,

what are some tasks that I might use these agents for?

That's a very good question.

So there are a number of tasks and applications

we can look into.

We have a paper actually called the Futures Agentic

and there we look at five key tasks

that essentially agents can work on.

There are along the directions

of first of all conversational AI conversational tasks,

tasks related to context based recommendation

on multimedia based recommendation,

which there are very good tasks around simulation

and evaluation.

But we can talk more about this.

Maybe in a more interesting way

to look at these applications is that we can say

there are three part categories.

We system can do those that agents replaces

a classical recommender system

like collaborative future models.

We can call it agentic replacement

or agent as recommender.

These are tasks where agents characterized

by one or group of large language models

to the actual recommendation for us.

There have been a decent number of papers

published on this topic in the last year.

Okay, different tasks.

Number two, there are tasks

that we can call it agentic augmentation.

An augmentation means agents are used

large language models are used to augments tasks

that recommender system can do.

So large language models are not used as recommender,

but they are there to help recommender system in some way.

So for example, applications of this type,

you can look at it from things like data augmentation.

You can use large aggregate models

to produce various types of useful data

that can address problems like call the start

in your downstream text

or training a model they can provide tools

that are used to essentially do the task.

So agentic augmentation,

or we can say tasks where an agent assist a recommender system

is not a recommender system,

but helps to recommend agent assist a recommender system.

And the same category are things around simulation

and evaluation.

And essentially here for simulation,

there is a lot of interest in the community right now.

There are different reasons to do simulation,

different goals, but essentially,

you can see simulation goals on level based on

if they're at the user level, at the data level

or at the user and system level.

One goal of the simulation you can see is that

you do not need to deploy real online user study,

which is costly.

And you can have an idea beforehand

if you run your system your model.

How good it would look like,

but would be the user reaction to your system

and how your system would behave on.

So simulation could be done at the level of the data,

at the level of the user or user and system.

In the case of user and system,

we are talking about close loop agents

or typically recognize also by reinforcement learning techniques.

What essentially you can simulate

was the user behavior and system behavior,

a system response, right?

So yeah, three broad categories of tasks we can say

and currently the research community is looking at that.

Maybe I can just add also one thing

that here is quite interesting to see Kyle

and that is one very interesting way to look at

the evolution of a computer system.

And in general machine learning model

is based on the level of autonomy this model is half.

And level of autonomy essentially,

if you have an arrow from left to right,

on the left side you have the least autonomous systems.

And this, for example, collaborative filtering models,

which were used to somehow passive models

that are placed there.

And then when you go to the right,

you have more autonomous systems.

So you have, for example, collaborative filtering models

and you have models that say characteristics

by much language models.

Then from large language models,

you can go to single agents,

multiple agents and what is called AGI.

AGI is currently a conceptual discussion.

But essentially the idea is that from left to right,

you go, you go from passive system

to interactive system, dynamic system

that they can proactively start a discussion with the user.

And there are more autonomous in what they want to do,

what they can do in how they look at the problem

and do the task for the user

and also from the perspective of tool and memory.

So the two things that makes agents really different

from even large language models

is their ability to use tools

and they have memory and different types

of memory and talk about.

In your paper that we've been talking about,

the future is agentic.

You present a nice formalism to describe the agents.

Could you share a little bit of details on it

and the motivation for that framework?

So essentially the idea of this work

is to conceptualize both at a formal level

or at a perspective level different entities

or the alphabets of agents.

And typically this alphabet is that

what are the core components that you have.

So you have first what we call

an underlying large language model.

And you have your input space.

So you have what can these agents observe?

You have the output space, what can this produce?

This could be from a simple rank list of items

to text, explanation, multimodal signals

and even reasoning why they come to this definition to this thing.

And then after that you have in your alphabet

you have a set of tools or functions

that can enhance the system capability.

So these systems can invoke these tools

depending on the use and lastly the memory.

And this memory we can say categorize them

according to working memory or short-term memory

long-term or episodic memory, semantic and procedural.

So the idea of this memory is also quite interesting.

And actually this memory part is something

that makes this system really useful

for conversational tasks.

So for example, working memory, your short-term memory,

you can see them as how these agents are good to recall

like recent things, recent discussion

that you have with the system, no?

So for example, if you say recommend me,

I don't know, a sci-fi or a mystery novel.

And then in the follow-up question,

you say something like the last one,

something like the previous one,

the system simply has a memory to remember

what does last and what does memory, what does this mean?

No, and this is what actually make this system really

that engaging.

Then you can go beyond this level and talk about

the episodic or long-term memory.

And this memory essentially refers

to how the system are good to store

and retrieve specific past events.

So for example, let's say you are interested in,

I don't know, Persian or Italian restaurants.

And you say, please recommend me the restaurant

you mentioned last time, no?

And this didn't happen at that episode.

At that scene, it was like long time ago or one week ago.

And this episodic memory now helps the system

to retrieve the exact information you are looking.

Then about this episodic, you can go even further talk

about semantic memory, which is still a long-term memory.

And this is about distilling and accumulating facts

or user preferences for many interactions.

For example, let's say, after several conversation

with the system, the system itself

understands that this user likes generally speaking

Italian cuisine.

You never said that, maybe explicitly

in the current conversation, but it's something

that the user distills from the conversation.

Okay, typically this is a type of things that you like.

And there are other types of memories, like let's say,

you come to the system and say, please do summarize

what you usually do.

Please summarize what you did before.

Like their tasks, or send me the usual summary

that you do, or fine tune disting.

And the system knows what you're referring to.

Because this is a task that you always ask them to do.

This is called procedure of memory.

Even without saying explicit instruction,

the system knows what to do.

To be honest, this memory part, the memory aspect,

is what has made this system, let's say,

favored by users.

Because when you're discussing just like human, human

conversation, just like we're discussing with human,

we would like some that can remember your goals and priorities.

The system that can recognize you well.

And currently, the research industry

is so much focused on this direction of using.

It works on the memory side.

So I'm happy to introduce this for the future is

agentically to your community.

And there are lots of definitions and information

brought information about the alphabet's agents.

And what kind of tools and memory you can think about

using this system.

What are different tasks that are there,

like different core tasks that agents can do?

And finally, what are their potential risks?

That can be associated.

There's a lot to be discussed about that.

Well, I also wanted to make sure we touched on the book,

you wrote, recommendation with generative models.

Yes, I found it on archive.

I'm hoping I can get a printed copy one day.

I don't know, you could tell me about that?

Yes, absolutely.

Actually, this is coming now.

In their final phase of publication,

I think we've won two weeks, it should be online.

And I would definitely post it on LinkedIn for your audience.

Very good.

We'll put a link in the show notes too,

so because it'll come out after that.

So it'll be ready.

People can get a copy.

Can you give listeners an overview of what they'll find

in the book?

Yes, so this is a very nice book.

I would say it's a minus on that we did with a couple of folks

at leading academic institutions around the globe and industries.

So one of the things we can see on this book

is that it answers the question of, first of all,

what are the generative models?

Second, what are different categories of generative models?

Third, how could we evaluate these models?

And third, what are the social and ethical risks associated

with this system?

What are these three, four questions?

At the system level evaluation and the social and ethical risk level?

These are the three main things that this book is doing.

We noticed that one, apparently one very interesting way

to present them is according to underlying data modality,

they work on.

There are three types of modality, we can say.

Those using collaborative type of signal,

ID-based model, they're also called, for example,

variational autoencoder, diffusion models,

even GANS and so on, auto-regressive models and so on.

The second category are those based on NLP or text,

large language models, and finally, multimodal foundation models.

So these are the three broad categories of these systems

according to the underlying data modality that they use.

And these are currently how chapter three, four, and five

decide.

Chapter six discusses the topic of evaluation of these systems.

Large language models typically, for example,

considering systems like the T-volumization generation

and the Rack, there are a lot of times systems

that are composed of more than one single system.

It's not now one single system that you're looking at.

So, for example, a Rack, a T-volumization generation system

uses combines a classical, a quite difficult model

with a large language model, and it conducts tools.

But the question is, when you want to evaluate such a system

that composes different components, how can I evaluate it?

Should I evaluate it end-to-end from the final quality

of ranked item it gives me, or is there a way

I can look at the evaluation like module wise, no?

And also, what you're evaluating,

the golden evaluation has expanded much more.

It's not anymore about providing the best ranking list

that I've added, but there are other dimensions

of evaluation.

So typically, when you see these papers published

on the topic of large language model,

typically you see these radar charts appearing

where there are multiple dimensions people are putting.

And one of the reason you see these radar charts

in large language model papers that

there are actually more than single evaluation

dimension people are looking at.

So, starting from an accuracy of the model,

then you look at other aspects which are new,

I don't know, hallucination of this model,

latency of the models, and lots of things

that you can talk about.

And finally, the last chapter talked about

social and ethical risks, which essentially

has been written by people with more, let's say,

social and philosophical background on the topic.

So I think it is still a milestone

in the commuter recommended system

from the perspective of how broad it is.

And I'm really happy and proud about this work.

This has been a minor so now,

last year, we did.

Well, what's next for you?

So, as a currently, we are working on extension

of this book recommendation with January models.

And we are working on, I'm currently involved

a lot in the direction of trust for the recommender system.

I think it's quite important to know

that unknown risks associated with this systems.

And even understanding the new risks that are emerging

by itself is a valuable contribution.

Now, the second thing is that after you understand

what are the ways you can mitigate these risks.

So I think on the risk side,

there is much of my research currently focused on.

And especially with new evolution of logical models

with language and a general understanding,

complexities that arise when using this systems.

And what are the ways we can improve them?

So hopefully we can use this system

in our recommendation task with benefits.

This is what most of my research I would say is focused on.

And that's primarily on the field of recommender systems.

Makes sense, yeah.

And where can listeners follow you online?

So I'm quite active on LinkedIn

and where most of my knowledge,

most of my, let's say, follows goes there.

And I've got a personal, of course, the page

where posts, posts information.

And I would say currently they are mostly in.

I can say also one note that I'm currently

in the process of writing a book

that is used for educational purposes

on the topic of alternative AI language agents.

And essentially, this could be a very useful resource

for people who are interested to use these models

for university students who are interested to learn

about the concepts.

And there are also practical notes

and Python exercises they can do to get a group that

with what is needed to learn to learn about this.

Currently I'm working on this, my new script,

and I hope it will be published at some point in 2036.

We'll give us a heads up.

Well, that listeners know when that's out as well.

Absolutely, okay.

Well, yes, I thank you so much for taking the time

to come on and share your work.

So thank you very much.

It was a pleasure talking to you.

And thanks for your interest

and interesting questions as well.

Thank you.

The Future is Agentic in Recommender Systems

About this Episode

Hosts & Guests

More from Data Skeptic

Book Ratings and Recommendations

Disentanglement and Interpretability in Recommender Systems

Collective Altruism in Recommender Systems

Niche vs Mainstream