Loading...
Loading...

Kyle Polich sits down with Yashar Deldjoo, research scientist and Associate Professor at the Polytechnic University of Bari, to explore how recommender systems have evolved and why trustworthiness matters. They unpack key dimensions of responsible AI, including robustness to adversarial attacks, privacy, explainability, and fairness, and discuss how LLMs introduce new risks like hallucinations.
The episode closes with a look at "agentic" recommender systems, where tools and memory shift recommendations from ranked lists to end-to-end task completion.
Welcome to Data Skeptic, a podcast exploring the methods, use cases and consequences of
recommender systems.
Welcome to Data Skeptic Recommender Systems.
We've touched on the classic methodologies like collaborative filtering on a lot of episodes
previously.
We also how could we not talk about the impact of large language models and other generative
models in this space?
You know, any of us could go to a generic LLM and ask it to recommend a book or movie
or band or something.
Is it not a recommender system?
So of course, there's big questions about what the recommender system pipelines of the
future look like.
And are they hybrids?
How do they merge various methodologies?
We get into a little bit of the categories today, like where they can be agents as recommenders,
like replacing classic models, other cases where there's just an agentic augmentation.
Today we discuss some agentic approaches and also get a quick overview of the forthcoming
book recommendations with generative models that our guest today is written.
We'll get into that as the focus of our interview today.
My name is Yashar Delju.
I'm an associate professor at Polytechnic University of Bali in Italy and I'm a senior research
scientist working in the field of recommender system for about 10 years now.
A lot's happened in 10 years in the recommender system field.
How did you first get interested in it?
My interest originally to the fields actually started from the broader area of artificial intelligence.
At that time, when I was a master student, I was working in the field of electrical engineering
and especially dealing with image processing and those kind of works.
It's quite fascinating to know, even those fields, like image processing, where you used
to work with pixels and extract descriptors from images and so on.
Even those kind of fields has evolved a lot.
Now people are doing this with computer region with machine learning models that can extract
holistic descriptors, let's say there and use that in the downstream task.
I did my PhD at Polytechnic or DiMidano in the field of computer science and with the
focus on the recommender system.
At that time, due to my previous studies on multimedia processing, I did my PhD within
the field of radio recommendation.
Essentially, my work involved reaching the field of multimedia processing in recommender
system in particular exploring the use of multimodal signals to be used in downstream
and the recommendation task.
Things have evolved a lot since then and I would say my work gradually went into the
direction of trust for zero recommender system where you would like to have resilience,
recommender systems or models against different types of adversaries and the directions around
biases and fairness of them and this is what the community mostly knows me on, works
on trust for zero recommender system.
Since 2023, due to the emergence of large language models, especially recognized by
CHCPD or models like that, I would say there was a hit to every subtopic within the field
of artificial intelligence and even broader than artificial intelligence, including recommender
systems.
So these models recognize mostly by large language models.
One notable thing was that they are very good in conversing with humans and they had very
natural working conversation with humans that can answer questions to tasks and question
answering, which were very good at or they looked very good at what is essentially one
thing that made this system quite not for and obviously question answering search and
recommender system are very close area so definitely one can imagine that how much these
models could bring advantages to the users, not only by being able to provide new types
of personalization, but also being able to engage in user friendly conversations.
So that's a short background.
I wonder if we could touch on the topic of trustworthiness first in recommender systems.
I'm not sure every listener is going to have a sense of maybe what that means like you
mentioned that there could be an adversary.
What is adversarial activity when it comes to recommender systems?
Right.
So there are different ways we can look at the topic of trustworthie AI or responsible AI
from different perspectives.
One perspective is to look at it from this kind of how objective or subjective the goals
of trustworth's responsible AI are.
To give you an example, if you want to categorize five classical main dimensions of trust or
trustworthie AI, we can say they are about generalizability on the very top layer.
Then below you have robustness.
Below robustness you have a topic of privacy.
Then you have explainability and then you have fairness and biases.
So these are like the very five key dimensions.
Well, from top to bottom, when you go, the definitions of what trustworthie means become
more and more black.
For example, the first layer which is generalizability, as we know, is the goal of any machine learning
model that is trained on some historical data.
Right.
We would like these models to learn from the past and predict future in some sense.
Right.
And this is recommender system or no exceptions.
Then you have the topic of adversarial resilience or adversarial robustness.
But the goal here is that these systems are robust against attacks, machine learned attacks
or human crafted also could be included in that category.
And the goal of these attacks generally are twofolded.
Either there is a direct goal of what we call targeted attack.
For example, for commercial penetration, you are a company you would like to, in the
case of recommender system, you would like to penetrate in the business market.
So from a producer perspective here in a time would be, how can I like change my product
appearance and descriptions, even in a dishonest way to be able to penetrate the business markets.
So you would like to push your items into recommendation list of users.
The more you do, the more aggressive economical gain you get.
No.
And the other thing is that on target, to generally speaking, you want to target certain
products or categories to demote them in their recommendation list of users.
So essentially, these layers from top to bottom bring you go.
It becomes harder and harder to define the definitions of the, so for example,
of fairness and biases, it's in the recommender system community.
There are definitions of what fairness and error and biases mean.
But there is no consensus exactly which perspective of fairness and bias we should look into,
because recommender system working the multi-stakeholder environment consumer and producers.
Whereas it's much easier to define and quantify adversarial robustness.
Typically, this is done by looking at how much you can degrade the quality of recommendations.
So on the risk side, these are, let's say, five or six components.
We can say there are key components of adversarial resilience, trust for C.A.I.
or responsible AI.
Now after 2023, in particular with the emergence of AI,
the risk category have in a way exploded.
What kind of new risk or imagined risks we could expect from systems like
cloud language models and so on, there are inevitably different from what we know before.
So a very clear example is that of hallucination or context drift.
When you're talking with cloud language models,
these systems are susceptible to different types of hallucination.
And this hallucination could be on what they, for example,
in the consults of question answering what they show to the user.
So you're talking about movie, movie recommendation.
Recommendation are about movies that don't even exist in the catalog or they exist,
but the metadata produced is simply wrong.
So different types of hallucination, we can say at the movie,
at the item level, at the metadata level, or factual,
or even context drift where you start talking with these models.
And you start with a subject you have a goal and over the turn of conversation,
you see that the model is not able to memorize like,
well, what is your real goal of starting this conversation?
And I think this is something that you notice,
like in the very early version of large language models activity in 2023,
they were not very good in memorizing things.
So they used to change the course of conversation very easily with the user,
forgetting what the core core is.
So essentially the risks with this, we can say we had a word class here,
CRIR, trying to draw holistic evaluation of large language models,
which essentially tries to categorize the emerging risks.
And there are two different broad categories of risks we can look into.
Those are the emerging and those that are the old one,
but we can say exaverated.
So exaverated when you talk about fairness and biases,
you can say that those type of risks exist,
but with large language models, we can claim that they have been exaverated.
A very simple example is that of large language models,
because they have been trained on vast amount of internet data,
which are unregulated,
you can expect that they would project a lot of stereo types, right?
The second type is new emerging risks like hallucination
and much more, which we can say categorize new categories that are still under research.
So when I first started looking at recommender systems,
which was maybe around the time you did as well,
the big ideas were things like collaborative filtering
and maybe the a priori algorithm,
but you had these core ideas that were all about item item comparisons
or item user kinds of things.
And even when LLM hit the scene,
at least to me, it wasn't immediately obvious
that these would be useful in recommender systems.
When did you see that they could be useful?
I think they're certainly useful.
And we can look at it from maybe different perspectives.
I can try to elaborate on two.
One of them is the ability to engage in natural conversation with the users.
So conversation on recommender system in particular,
which is now, I think you can see it as kind of the future of recommender system
because everything now we are,
that a lot of tasks, especially since 2003,
we are doing has become much more conversational.
Like a lot of things we want to watch,
a lot of things we want to do,
a lot of things we are constantly elaborating,
discussing with large language models
and there are different types of suggestions we are receiving, right?
And I think this is something in general nice
because you can see there's a smart machine.
As smart machines are trying to act by friends to the user, right?
They can potentially increase the user trust in some way
that they are able at least to produce natural looking things
that can help you to debug your code,
to debug problems that you have with some machines
or ask for basic services and so on.
So this is something nice.
So I think certainly they are doing
and they have been doing a lot of the things to conversation AI
and the field of conversation on the conversation system.
I think this is what people mostly remember
from large language models.
The second is also their ability, for example,
to look at the personalization in maybe different ways.
We know, as you mentioned,
gravity filtering are the mainstream models
that are still using a lot of tasks in industries.
And one way what you can look at these models
is that they have the ability to extract relationships
between different entities that you have in your system,
I weather item items, users, users and so on.
So they're very good and we can in some way say
they have reached a very good level of maturity
or saturation on what, how much knowledge they can get fits.
But essentially that's it.
If you see from last couple of years,
there are not so many new models,
that's a new collaborative filtering models,
which is mind blowing, I've been proposed.
Now, because essentially they are,
we can say that they are pretty good
on extracting this knowledge from the in domain data.
Thanks to deep learning models
that pushed this frontier.
But large language models potentially have the ability
at least augment this thing from new perspectives.
You have, now you are talking about models
that are connecting you from the in domain data
that you use to train your models to an external board.
And you can extract knowledge and information
from this external board.
So you can look at this like a very, very big graph
where if you previously wanted to find item item relationships,
you had to look at the historical behavior of the user
and how much there is similarities.
Now, the similarities obtained from the bigger world,
which contains a lot of information
about the entities that you're looking for.
So the minimum we can say is that the ability
to provide or to augment is the best term we can use here.
What we did before or how much we could provide personalized
services now, we can say potentially,
they have the potential to improve this.
And you can also look at this from the perspective
of content processing multimedia signals
that are very useful signals.
And they were not predominantly used
in the built-up recommender system.
Although we have different models that use them,
but now with large language models and BLMs
and other categories, they're essentially
we can look at them like being able to better digest
this type of signals and use them in the downstream test.
Of course, let's not forget this doesn't come with risks.
Can any new emerging technology
has its own risk and amount of risks?
So one need to use this system with care
depending on the task magnification.
Well, the traditional recommender system
where collaborative filtering really unlocked it,
the experience I'm used to is more or less a ranked result.
And then maybe I give some feedback and it can re-rank,
but it's essentially trying to put the things best for me
at the top and worst for me at the bottom.
Are you envisioning that LLMs just help do a better job
of that or are you picturing something more revolutionary?
Actually, both, especially with the emergence
of language agents, we can say what you very well mentioned
as the core goal of recommender system.
Please rank me these list of items that best serve to me.
This was the say the core goal of recommender system, no?
Now instead of do these ranking for me,
we can say with large language models and language agents
where we are heading to is being transformed to,
please do these tasks for me.
So from rank, they're going to tasks.
And these tasks are more complex than ranks.
They're commonly used to, let's say, multiple constraints
of the user.
To give you an example, let's say for the task
of travel domain, you are, we are getting close
to the eastern holidays.
We would like to have some findings where to go.
And there are typically a number of constraints
we with our family member looking to.
What I should go that looks good for me and my family.
How does the better that looks like?
Let's say, what is my budget?
And how much that matches that?
Maybe other constraint, how eco-friendly it is
for someone who cares about this.
How much it's good for its child friendly?
What is the distance?
By many different constraints, right?
How could we do this with classical recommender system models?
The collaborative filtering models,
you didn't even have the chance
to expose this constraint to this model.
Now they were used to take your historical data
and say, go, go to this.
And if there were some package and intrusive recommendation,
they're not really able to do this very basic task
that any human could do.
So what we used to do, I can explain better,
was to do each of these tasks individually,
open up several browsers, not do these tasks
and that task and that task,
and essentially collect all this information
and use us as the brain to come up with the decision
where is the best place for me to go.
Now this brain is being replaced by large language models
and I'm talking about the case of language agents.
So they'll define as systems that,
in which you have an LLM as the brain
and different things that you receive
are looked upon as tools.
So you have tools which collects better information,
which collects budget information,
like the costs, which collect,
which cause this service and that service
and that service brings them back to a central LLM,
for example, and that decision making is being done.
So you can see in a way,
the model information now,
these models can digest has increased
and the complexity of tasks they can achieve
is also being improved.
And I think this is quite fascinating.
One thing that we can say here,
of course, this would make us as a user
perhaps even more greedy to come up
with harder and harder questions.
Now how do we, we are human?
We would like to explore things that we couldn't do.
And this could be applied to anything,
like from a stock market prediction to tasks
in which users have a couple of questions,
a multiple constraints, no?
These are the changes that revolutionary,
as you said, changes that we that years ahead,
we are facing these models.
And these go much beyond what collaborative
13 models can do to do.
So systems that are dynamic,
that are engaging,
that can do multiple tasks for you,
with such up to,
that can to task for you,
such up to multiple constraints.
And at the same time,
looking natural and you know, for instance.
So yeah.
Well, in my mind,
collaborative filtering was a tremendous success,
but maybe it hit a ceiling.
It can only recommend as good as its methodology allows.
And there's a big appeal here,
especially if I'm picky,
that the LLM agent could help me.
Maybe I want a fancy place for an anniversary dinner,
and hopefully it has a Sumaliye,
and hopefully it has a separate dessert chef
from the main chef,
and you know, very picky restaurant,
snob qualities like this.
Not every collaborative filtering system
will have those features,
but an LLM could go out and get them.
But then how do you envision like a synthesis here?
How does the insight of the agents merge
with collaborative filtering?
Or do we leave that behind
and we just go with agents?
I don't think we should leave that behind.
Of course, a steel,
and even in the years ahead,
steel collaborative filtering are,
let's say, trustable source.
Or let's say any classical machine learning model
is a trustable model
that you can always use in your downstream task.
Why trustable?
Essentially they have learned
they're good to know from not using your own data,
how to use them to provide sets.
So they provide a minimum of quality, right?
Which is acceptable.
Like let's say, for example,
talking about things like fashion recommendation, no?
You know, these applications now online,
especially after Khalid have become much popularized,
people look into applications
like the London, so on,
to purchase fashion clothes, you know?
So these systems, these models,
the fashion and community system behind engine,
typically use a lot of collaborative style models
or classical machine learning models,
because they know their tasks,
they know what they're trained for on and so on.
However, you can imagine that having that level of maturity
and subset getting a base ceiling of equality,
one would like to push the boundaries a bit further.
So, for example, to be able to, in the case of fashion,
you can imagine a lot of our decision making
is based on the visual appearance
of how nice the fashion item looks like.
And for example, when you look at outfits,
you can see how two different pieces of clothing
go well together, no?
So, one can imagine with models in this case,
multi-modal models powered by genitive AI.
What an advantage they could provide
is to visualize after shelf different possibilities
that user could have depending on his interests.
So, for example, to preview based on a clothing item
that you have, what could match well with that?
Even if that product doesn't exist, right?
It can help the user to engage with conversation
to provide suggestions and so on.
And then based on what the user finally decides,
go and build that item, go and find item
that is closest to what the user wants, no?
So, they can keep on one hand,
user engaged in conversation, like to increase the time
spent on the platform and businesses
to help them with the gold,
but also in some way to increase
what is called in the fashion industry, this awareness.
An awareness essentially means a lot of time
you go on, you wanna buy an item,
you have an idea what you wanna do,
and you see, oh, there is this thing nice there,
it is shown on the website or maybe on the shop,
you don't wanna buy that item,
but you remember there is this thing
and you remember that in your future purchases, right?
So, it's always nice from a business perspective
to increase this awareness of different types of products
that you have in the capital, so on.
One thing that I would say models,
these models, not language models,
and as I said, I include is their creativity, no?
Their creativity, looking at the problem,
maybe from a new perspective is something that,
if it's used with care,
if it's used with care could be nice.
Of course, there are difficulties and risks
and challenges in doing so,
because these models may not be fully under control.
They may not fully follow user,
our system designer instructions,
what to do and what not to do.
They're exposed to saying hallucination relies,
and unfortunately, one of the bad things about this
hallucination is that they look natural.
So, if they're used in task-sensitive applications
like medical domain,
you may see a system producing drug or medicine suggestions,
which is not wrong, but they don't seem wrong,
and this is something that's scary,
because they look very persuasive.
This is something that one needs to look at with care.
I definitely had that on my list to bring up with you.
We both know that LLMs can hallucinate.
That's not unique to recommender systems and agents
and things like that, it happens everywhere,
but how do you envision building reliable systems
with this perhaps crack in the foundation?
Well, this is something that I guess is an open field right now.
It's not binded to recommender system,
but broader field of machine learning and AI,
and that depends on the type of risks
that you're more focused on in general.
There are things we can do at the prompt level,
like very basic things.
For example, you can go from within in context learning,
you can try to provide examples of the system,
like future examples where the system can see and can learn.
There are things that you can, so you can go further,
you can try to deal with the model parameter itself,
and to have fine tuning that have trust objectives there.
Of course, retraining these models from scratch is costly,
so one would like to ideally have some kind of alignments
with human goals.
And this actually was at some level was down
when these models were introduced.
Probably, you know, that's the large language models
when they were introduced.
And they underwent three levels of training.
One training was first getting the entire
or a big portion of internet data and trained the models.
So we compressed terabytes of internet data
into, I don't know, 500 billion parameters.
Okay, this was the first.
So this looked very good,
but then they said, okay, how can we use it?
We cannot ask ordinary user to take these parameters
and other way to use them at a test time.
No, they did this second level of training,
which was essentially based on question and servers.
And this second level of training made this system
essentially to be able to talk to the user.
And I answer user question using this knowledge
what they were trained on.
So the second level of training allowed them
to become conversational by feeding them
with question and answer curves.
A lot of question and answer curves.
So in a sense, we can see these models using
as in their knowledge, everything they did
they got from those 700 billion parameters.
No, they use that answer user question, right?
This was done and said, oh, this is good,
but still there are things that
doesn't look as still so human life, right?
The third level of training,
what is called alignments,
were in which they were,
this system were given the choice of user
or for human and they were,
the responses were become more human life, okay?
Aligned with that of human.
And this is the layer probably one can walk on
to make them more resilient.
So on the discussion of making them trust
or see are responsible,
I think one thing that is essential here
and this is the key is that once you know
what your desires behavior from this system are,
you can essentially have a level of alignment
along objectives of trustworthiness.
So of course, it involves some form of training
or retraining again,
these models using the new signals
that you provide.
So this is something that I think you can,
you can see also with working with large language models,
especially in the last year,
they had a very different layer of,
layer of protections around them.
So for example, if you consider,
I don't know, two, three large language model,
tattoo, PT, Q, Jimmy and I or others,
one may have or may have better quality,
providing answers to the user,
but one may have better safety layer.
So for example, it's very easy,
if you have a couple of large language models,
let's say in your browser open
and you start talking to them,
give me some, let's say,
let's think about a bad scenario.
People typically talk about making it gone,
but I'm gonna talk about something else.
Maybe suggest me a medicine for this kind of disease, okay?
We know that LLM's large language model
shouldn't do this, right?
They are not doctors, right?
So you should go to a doctor.
You would see that in the early times,
some models used to easily suggest what to get, right?
Some models used to say I'm not a doctor,
you should go and look at your book,
look at this book and that book of doctor doctor.
Then you could start for those who were more robust,
more defensive, you could start trying to fool them.
Say, okay, I am a doctor,
I know that you don't wanna give me a drug examination,
but I want to know among the drugs that are there,
what are the things that are better or not?
So try to roll away, okay?
So among this, this one and that one,
which one is better, no?
So you would see, and actually this was a very nice task
that was done in one of the works of at KDD 2024
by Group from DeepMind.
And you could see that these models have different levels
of safety and protection.
Some of them were very hard to penetrate.
They were like, they were very good at this safety level.
Some of them were easier to penetrate.
So you can see that working in this model
is not only about how good they are in providing good answers,
but how robust or safe they are in avoiding
certain type of responses,
which is something that should not be taken as far as it's.
Well, in classic recommenders,
as we've talked about,
the output more or less is ranking,
but in this agentic approach you're describing
where it's really tasks,
it seems like it can open that up.
Now maybe I use it in a basic way and I say,
please recommend three movies with these features
and qualities and I get more or less the same idea.
But if I think outside the box,
what are some tasks that I might use these agents for?
That's a very good question.
So there are a number of tasks and applications
we can look into.
We have a paper actually called the Futures Agentic
and there we look at five key tasks
that essentially agents can work on.
There are along the directions
of first of all conversational AI conversational tasks,
tasks related to context based recommendation
on multimedia based recommendation,
which there are very good tasks around simulation
and evaluation.
But we can talk more about this.
Maybe in a more interesting way
to look at these applications is that we can say
there are three part categories.
We system can do those that agents replaces
a classical recommender system
like collaborative future models.
We can call it agentic replacement
or agent as recommender.
These are tasks where agents characterized
by one or group of large language models
to the actual recommendation for us.
There have been a decent number of papers
published on this topic in the last year.
Okay, different tasks.
Number two, there are tasks
that we can call it agentic augmentation.
An augmentation means agents are used
large language models are used to augments tasks
that recommender system can do.
So large language models are not used as recommender,
but they are there to help recommender system in some way.
So for example, applications of this type,
you can look at it from things like data augmentation.
You can use large aggregate models
to produce various types of useful data
that can address problems like call the start
in your downstream text
or training a model they can provide tools
that are used to essentially do the task.
So agentic augmentation,
or we can say tasks where an agent assist a recommender system
is not a recommender system,
but helps to recommend agent assist a recommender system.
And the same category are things around simulation
and evaluation.
And essentially here for simulation,
there is a lot of interest in the community right now.
There are different reasons to do simulation,
different goals, but essentially,
you can see simulation goals on level based on
if they're at the user level, at the data level
or at the user and system level.
One goal of the simulation you can see is that
you do not need to deploy real online user study,
which is costly.
And you can have an idea beforehand
if you run your system your model.
How good it would look like,
but would be the user reaction to your system
and how your system would behave on.
So simulation could be done at the level of the data,
at the level of the user or user and system.
In the case of user and system,
we are talking about close loop agents
or typically recognize also by reinforcement learning techniques.
What essentially you can simulate
was the user behavior and system behavior,
a system response, right?
So yeah, three broad categories of tasks we can say
and currently the research community is looking at that.
Maybe I can just add also one thing
that here is quite interesting to see Kyle
and that is one very interesting way to look at
the evolution of a computer system.
And in general machine learning model
is based on the level of autonomy this model is half.
And level of autonomy essentially,
if you have an arrow from left to right,
on the left side you have the least autonomous systems.
And this, for example, collaborative filtering models,
which were used to somehow passive models
that are placed there.
And then when you go to the right,
you have more autonomous systems.
So you have, for example, collaborative filtering models
and you have models that say characteristics
by much language models.
Then from large language models,
you can go to single agents,
multiple agents and what is called AGI.
AGI is currently a conceptual discussion.
But essentially the idea is that from left to right,
you go, you go from passive system
to interactive system, dynamic system
that they can proactively start a discussion with the user.
And there are more autonomous in what they want to do,
what they can do in how they look at the problem
and do the task for the user
and also from the perspective of tool and memory.
So the two things that makes agents really different
from even large language models
is their ability to use tools
and they have memory and different types
of memory and talk about.
In your paper that we've been talking about,
the future is agentic.
You present a nice formalism to describe the agents.
Could you share a little bit of details on it
and the motivation for that framework?
So essentially the idea of this work
is to conceptualize both at a formal level
or at a perspective level different entities
or the alphabets of agents.
And typically this alphabet is that
what are the core components that you have.
So you have first what we call
an underlying large language model.
And you have your input space.
So you have what can these agents observe?
You have the output space, what can this produce?
This could be from a simple rank list of items
to text, explanation, multimodal signals
and even reasoning why they come to this definition to this thing.
And then after that you have in your alphabet
you have a set of tools or functions
that can enhance the system capability.
So these systems can invoke these tools
depending on the use and lastly the memory.
And this memory we can say categorize them
according to working memory or short-term memory
long-term or episodic memory, semantic and procedural.
So the idea of this memory is also quite interesting.
And actually this memory part is something
that makes this system really useful
for conversational tasks.
So for example, working memory, your short-term memory,
you can see them as how these agents are good to recall
like recent things, recent discussion
that you have with the system, no?
So for example, if you say recommend me,
I don't know, a sci-fi or a mystery novel.
And then in the follow-up question,
you say something like the last one,
something like the previous one,
the system simply has a memory to remember
what does last and what does memory, what does this mean?
No, and this is what actually make this system really
that engaging.
Then you can go beyond this level and talk about
the episodic or long-term memory.
And this memory essentially refers
to how the system are good to store
and retrieve specific past events.
So for example, let's say you are interested in,
I don't know, Persian or Italian restaurants.
And you say, please recommend me the restaurant
you mentioned last time, no?
And this didn't happen at that episode.
At that scene, it was like long time ago or one week ago.
And this episodic memory now helps the system
to retrieve the exact information you are looking.
Then about this episodic, you can go even further talk
about semantic memory, which is still a long-term memory.
And this is about distilling and accumulating facts
or user preferences for many interactions.
For example, let's say, after several conversation
with the system, the system itself
understands that this user likes generally speaking
Italian cuisine.
You never said that, maybe explicitly
in the current conversation, but it's something
that the user distills from the conversation.
Okay, typically this is a type of things that you like.
And there are other types of memories, like let's say,
you come to the system and say, please do summarize
what you usually do.
Please summarize what you did before.
Like their tasks, or send me the usual summary
that you do, or fine tune disting.
And the system knows what you're referring to.
Because this is a task that you always ask them to do.
This is called procedure of memory.
Even without saying explicit instruction,
the system knows what to do.
To be honest, this memory part, the memory aspect,
is what has made this system, let's say,
favored by users.
Because when you're discussing just like human, human
conversation, just like we're discussing with human,
we would like some that can remember your goals and priorities.
The system that can recognize you well.
And currently, the research industry
is so much focused on this direction of using.
It works on the memory side.
So I'm happy to introduce this for the future is
agentically to your community.
And there are lots of definitions and information
brought information about the alphabet's agents.
And what kind of tools and memory you can think about
using this system.
What are different tasks that are there,
like different core tasks that agents can do?
And finally, what are their potential risks?
That can be associated.
There's a lot to be discussed about that.
Well, I also wanted to make sure we touched on the book,
you wrote, recommendation with generative models.
Yes, I found it on archive.
I'm hoping I can get a printed copy one day.
I don't know, you could tell me about that?
Yes, absolutely.
Actually, this is coming now.
In their final phase of publication,
I think we've won two weeks, it should be online.
And I would definitely post it on LinkedIn for your audience.
Very good.
We'll put a link in the show notes too,
so because it'll come out after that.
So it'll be ready.
People can get a copy.
Can you give listeners an overview of what they'll find
in the book?
Yes, so this is a very nice book.
I would say it's a minus on that we did with a couple of folks
at leading academic institutions around the globe and industries.
So one of the things we can see on this book
is that it answers the question of, first of all,
what are the generative models?
Second, what are different categories of generative models?
Third, how could we evaluate these models?
And third, what are the social and ethical risks associated
with this system?
What are these three, four questions?
At the system level evaluation and the social and ethical risk level?
These are the three main things that this book is doing.
We noticed that one, apparently one very interesting way
to present them is according to underlying data modality,
they work on.
There are three types of modality, we can say.
Those using collaborative type of signal,
ID-based model, they're also called, for example,
variational autoencoder, diffusion models,
even GANS and so on, auto-regressive models and so on.
The second category are those based on NLP or text,
large language models, and finally, multimodal foundation models.
So these are the three broad categories of these systems
according to the underlying data modality that they use.
And these are currently how chapter three, four, and five
decide.
Chapter six discusses the topic of evaluation of these systems.
Large language models typically, for example,
considering systems like the T-volumization generation
and the Rack, there are a lot of times systems
that are composed of more than one single system.
It's not now one single system that you're looking at.
So, for example, a Rack, a T-volumization generation system
uses combines a classical, a quite difficult model
with a large language model, and it conducts tools.
But the question is, when you want to evaluate such a system
that composes different components, how can I evaluate it?
Should I evaluate it end-to-end from the final quality
of ranked item it gives me, or is there a way
I can look at the evaluation like module wise, no?
And also, what you're evaluating,
the golden evaluation has expanded much more.
It's not anymore about providing the best ranking list
that I've added, but there are other dimensions
of evaluation.
So typically, when you see these papers published
on the topic of large language model,
typically you see these radar charts appearing
where there are multiple dimensions people are putting.
And one of the reason you see these radar charts
in large language model papers that
there are actually more than single evaluation
dimension people are looking at.
So, starting from an accuracy of the model,
then you look at other aspects which are new,
I don't know, hallucination of this model,
latency of the models, and lots of things
that you can talk about.
And finally, the last chapter talked about
social and ethical risks, which essentially
has been written by people with more, let's say,
social and philosophical background on the topic.
So I think it is still a milestone
in the commuter recommended system
from the perspective of how broad it is.
And I'm really happy and proud about this work.
This has been a minor so now,
last year, we did.
Well, what's next for you?
So, as a currently, we are working on extension
of this book recommendation with January models.
And we are working on, I'm currently involved
a lot in the direction of trust for the recommender system.
I think it's quite important to know
that unknown risks associated with this systems.
And even understanding the new risks that are emerging
by itself is a valuable contribution.
Now, the second thing is that after you understand
what are the ways you can mitigate these risks.
So I think on the risk side,
there is much of my research currently focused on.
And especially with new evolution of logical models
with language and a general understanding,
complexities that arise when using this systems.
And what are the ways we can improve them?
So hopefully we can use this system
in our recommendation task with benefits.
This is what most of my research I would say is focused on.
And that's primarily on the field of recommender systems.
Makes sense, yeah.
And where can listeners follow you online?
So I'm quite active on LinkedIn
and where most of my knowledge,
most of my, let's say, follows goes there.
And I've got a personal, of course, the page
where posts, posts information.
And I would say currently they are mostly in.
I can say also one note that I'm currently
in the process of writing a book
that is used for educational purposes
on the topic of alternative AI language agents.
And essentially, this could be a very useful resource
for people who are interested to use these models
for university students who are interested to learn
about the concepts.
And there are also practical notes
and Python exercises they can do to get a group that
with what is needed to learn to learn about this.
Currently I'm working on this, my new script,
and I hope it will be published at some point in 2036.
We'll give us a heads up.
Well, that listeners know when that's out as well.
Absolutely, okay.
Well, yes, I thank you so much for taking the time
to come on and share your work.
So thank you very much.
It was a pleasure talking to you.
And thanks for your interest
and interesting questions as well.
Thank you.



