Loading...
Loading...

Hosted on Acast. See acast.com/privacy for more information.
This episode is brought to you by Capital One. Capital One's tech team isn't just talking about multi-agentic AI
They already deployed one. It's called chat concierge and it's simplifying car shopping using self-reflection and layered reasoning with live API checks
It doesn't just help buyers find a car they love. It helps schedule a test drive, get pre-approved for financing and
Estimate trade and value, advanced, intuitive and deployed. That's how they stack. That's technology at Capital One
Howdy, howdy ho and welcome to fantasy fanfellas. I'm Hayden producer of the fantasy fangirls podcast and your resident lover of all things
Sanderson and I'm Steven your bookish internet goofball, but you can call me the smash daddy and we are currently deep diving
Brandon Sanderson's fantasy epic Missborn, but here's the catch
Steven here has not read Missborn before that's right. Hey, hey, so each week you'll get my unfiltered raw
Reactions to every single chip and along the way we'll do character deep dives magic explainers and Steven will even try to guess
What's next spoiler alert? He'll be wrong news flash. I'm never wrong episodes come out every Wednesday and you can find fantasy fanfellas wherever you get your podcasts
Well, hello, this is Android faithful here with a their holiday special for you
Did you not have enough green this week on St. Patrick's Day? Well, guess what we got some extra green on the inside folks for you
I'm wind wet down and I'm Ron Richards. We're back when we didn't get enough this week, didn't we?
We did not we did more
No, but we're suiting out exactly you're making up for lost time
No, but we're super super excited because we're here for a special bonus episode
extra episode this week
Because Google came to us and said hey
You guys want to you guys want to have a conversation with a Googler and we're like of course we do, right?
And especially when it's one of our favorites, right? One of our favorites. Yes
Very very good friend of the show Matthew McCullough VP of product Android developer experience three words, which of course
I appreciate very much
Yeah, no, so recently we covered it on the show when they announced that I think about a week and a half ago or so
When the Android developer team rolled out the the new Android bench along with those
LLM rankings
And we joked when you were on the show, but but Jason poked fun at it because those LLM rankings had Gemini at the top
And we're like oh of course Gemini is at the top
And so but but actually as we dug into the information a lot of like I just thought this is like another like oh developer docs
And how to use AI and you're kind of stuff
But as I dug into it, I'm like oh, they really did a lot of work here in terms of like
Guiding development in the proper use of AI and selecting the right tools
And I was really curious when your perspective as a developer when that announcement roll out what you're take away from it was
Yeah, I mean if you've been following me on DTS and here you know that I'm not I'm not the most let's say
I am not the biggest
In general sense AI stand in the room. I've been pretty critical about seven and I I've definitely railed about stuff every day
So of course when I saw I I have to admit like I tried to pay a very hard to pay attention especially since you know we I do
I do we do know eat personally and I'm personal other people that that that working it
But it's there's a lot of sermon wrong along with a lot of innovation happening right now
So I have to admit when I kind of saw it. I was like okay, what's this um and I have to admit like especially after reading about it
And especially you know talking to Matthew this feels
You know very androidy. It feels like a lot of things that you know
It feels like android is listening. It feels like the team is listening. It feels like it's taking our feet
It feels like they're trying to do something to address real problems and real feelings and real concerns about you know
Like how do we integrate this new technology in our lives in a in a successful?
But also in a in like the best way possible for both products and human beings and so
Yeah, I was a little skeptical at first and certainly yeah, it was kind of funny that Gemini at the top
I did I did kind of like oh, that's kind of funny
But but I will say that I
I
I'm still I'm still kind of like very I wouldn't say bear
I can be pretty bearish about this
But I do say I have a lot of respect for this project
So yeah, that's so that's glad to hear that and your perspective and so when they when Google reached out and said hey
Do you want to talk to Matthew? Of course the first answer is yes
Because we just like
But then the second answer was like yeah, this is an opportunity to really understand like why are they doing what they're doing
What is the motivation about it? What is the approach by it? And then you know, honestly, I was like you know
How did Gemini become first? I want to understand that and so we we definitely you know like as we often do in Android faithful
We don't pull any punches. We're gonna ask the tough questions
So if you listen to the interview we do ask Matthew how did Gemini become why is that on top and and he gave a great answer
As well as giving a ton of context and behind the project and the approach
And the dedication that they're doing to it and again, I think that they're you know the the Android development team is doing a great job in terms of you know
really
Building resources for developers and trying to build a best-in-class kind of approach for the platform
And I can listen to him talk about it for hours, you know, it's so much fun. So yeah, so enough of that
Let's get right into it
We hope you enjoy this special bonus episode of Android faithful
In this conversation about Android bench and the LML LM ranking system and just enjoy hanging out Matthew McCullough for half an hour
So yeah, and so we'll be back on Tuesday. Well, when we'll miss you
You'll be gone, but but flow and Jason. I will be back on Tuesday
and yeah, enjoy this interview with Matthew
Well, welcome back to Android faithful Matthew McCullough. It's great to see you again
Good to see you when Ron. It's it's always wonderful to talk to android faithful
Whether it's the official current or the ones that I meet when I'm traveling around. Oh, I'm and I'm both of those right so
Yes, you're two people
With Matthew you are returning champions. That's always good to have you back on the show and and this time around
We're talking about all the exciting news that recently came out earlier this month around
Android bench and the work that you guys are doing with LLMs and AI and all this exciting stuff
We covered it on the show when it was announced, but we had the opportunity
You wanted to come on and talk more about it and we of course jumped at the opportunity of course just to hang out with you
But we had a whole bunch of questions for you
So let's get right into it
So the first thing I thought of that I wanted to hear from you was that you know before Android bench kind of existed
How were developed Android developers actually making decisions about which AI model to use and for at least from your perspective
And what problem did that create that you guys thought that Google really needed to solve like why do this?
Well, I think we're all devs here and then part of this is no hand raising today for sure
But you know, we like to tell ourselves that we're rigorous and we use metrics and we we've got you know a stack rank of our choices
And then sometimes we get busy and the answer becomes whatever's at hand
Or whatever we have closest at hand and we really started to see that you know
Happen or people just reaching for the thing that was already corporately subscribed or whatever they'd used most recently
But that doesn't necessarily mean that that was the very best thing
For their Android development work and so what we decided to set on a quest sponsored by our our leads in the organization
Is to give people a way to rigorously and repeatedly quantify which one gives them the most benefit
That makes a lot of sense can can can concur engineering tends to be very data driven
So it makes a lot of sense to have a tool that allows you to analyze and quantify what might something make better than the other
um
The so the next thing I want to ask you which is honestly a quest especially as an Android developer myself a question very dear and dear to my heart that perhaps
um
I might have asked um, but you know that there are already plenty of general coding benchmarks are at there
You know, we're like several years deep now into kind of like this new advent of like AI as part of our workflow
And as like the broader culture and you know
There are already general coding benchmarks out there
But what specifically about Android development made those the existing coding benchmarks fall short
And can you give us a concrete example of some gap that they just didn't quite cover
Can indeed I think they've really helped so to anyone who's contributed or built one of those existing benchmarks
I have nothing but appreciation it is driven in industry that I love like forward and and more helpful
But much as we know that there's the difference between coding for back end and front end and full stack
You know the phrases from the last decade or so there's definitely a difference in coding for
JavaScript type script
React type stacks versus that of native Android development and there are so many benefits
You know all the innovation that we're bringing to the Android platform even like with Android 17
We want to make sure that all of that is making it into the hands of developers who are using AI tools for their coding
Not just some sort of last year's tech or three years old tech but specifically the newest stuff
And so you know, there's everything from I could numerate a very long list when but it could be you know
Compose it could be Kotlin making sure we're using the latest language constructs
It's making sure that we're using the latest libraries or even performance techniques
You're seeing posts all the time from the DREs in our organization many of your friends
I see when we're at conferences. They're always posting tips and tricks
And we don't want the lowest comment denominator or mid
I think that's what the cool kids say like
But we literally want best practices best architecture best libraries best frameworks latest versions
And if you want all of that
It's quite the shopping list. I think most of them are going to go tell you you got to go do this yourself
So we built on the best we got inspired from the ones that are already there
But this is unabashedly
Android native development boosting
Full stop
I just want to comment real quick that that sounds like very encouraging to me because I think you know
And I think this is a bit ago, but I know one tip I heard once was like well, you know
One of the best ways to optimize it is to use architectures that you know are older architectures that the that you know
LLS might be familiar with which
Which understandable but as you say that that's not I mean
Not to hate not to judge, but that's not how we roll in Android and and that ability to stay like adaptable and flexible and kind of take advantage of
All of the new ideas coming out from your team as well as like people in the community. That's so important
So thank you for saying that
And in some ways it's kind of funny because you what we don't want to do
But it can happen is essentially to be beholden to the automation like well the automation prefers and so we're gonna do
Sorry, we want to be best latest greatest most innovative. It needs to serve us
So to some degree maybe that's the
subtitled mission here is making sure that the automation is serving what we aspire to do
Well, and that's really interesting because and it's great because you know you guys are in the weeds on the development side
And I have so much respect for you Matthew and your team and win and the work you do and I you know
I always joke that I'm not a developer, but I know enough to get me in trouble and I've found myself in a career path where I'm
Product doing product management and working with the developers and needing to walk the walk and you know
And it's really interesting to hear you say that
But my first thing I think of is like okay, well if you're doing it in that sense
How walk us through how you actually do that you know like if you're sourcing real issues from GitHub repositories to do this
How do you ensure the tasks are generally testing android expertise
rather than just general coding ability like like how
How do you make sure that it is serving your guys need versus just general development?
Well, I like to kind of draw the illustration of a sales funnel to some degree if anybody's working in a place where they've
Had to deal with that where you you've got a lot of option at the top right?
You know what are we going after what are we reaching and then we kind of start making
Hopefully very intelligent decisions to come down to the set that that makes a lot of sense for us
Let's just say with the amount of Android code and what's on GitHub the the top is very wide and we finished with a hundred curated tasks
So the steepness of that I don't know what this logo of that line is, but it's very very steep and a lot of energy and a lot of decision
Went into choosing those and and they're coupled that you know rubrics that we used for this work
One it needed to represent best practices. We covered that a few minutes ago. So that one's already in there
Second, it needed to be modern approaches. It's sweet. We're we're not looking to drive
Classic approaches as much. I mean those are great, but we're looking to drive modern approaches
So compose for example compose only for the approaches that we've got here
And then third they need to be looked at by an expert to make sure that these are
Quality changes because when you think about it ultimately what we're doing is is helping train these systems on what great looks like
And I want to emphasis that word not what mediocre what compiles
That can already be probably be done. What great looks like I want to set the bar extremely high
Let's just say there was a lot of stressed people when you set the bar at at great
There are more stressed faces at times because it means reviewing every single pull request every single one of the tests
Line by line
trajectory by trajectory change by change. So you know what somebody said and I get it
They're like a hundred I can crank that out in no time
But I feel like these are like a hundred FaberJ eggs and so maybe if he is that kind of mental model
Hundreds a lot in that particular case
And this was done by GDE's by sui's on the team by product managers by folks in the DRE team
Couple of outside consultants that we also use to just get all these different lenses
And all of these tests were examined by that expert set
Fascinating
Yeah
I that I mean especially given and it is really encouraging that you know you source, you know
Issues from GitHub repository
So if you're not familiar watching this in your somehow not familiar
You know, I think in the Andrew community one of the things that we have built an identity on it
And our most proud of it is our wide network of open source solutions that
This entire community not just as contributed to but has become foundational to a lot of the work
We do so it makes a lot of sense because we get our best practices
Our best ideas of what is great from the project is it I think it made a lot of sense
As someone who for better for worse you probably for better
I should say that has worked that on a lot of large large large scale projects and an enterprise environments where
You know
Especially given its proprietary so we you know unfortunately can't always
Talk about or show the practices that we do because of more enterprise hugely scaled projects or you know
Just even like the unique challenges of kind of being on a large scale
Massive project with a lot of different kind of gears working and a lot of different kind of more enterprise
The no concerns can you can you kind of give us like
Kind of a any any thoughts on like how well those android bench still reflect that you know given that you know
Sometimes enterprises a bit different than open source
It's it's a difficult task. I think we did a really good job on it
But it's extremely hard, you know
We already talked about the steepness of the funnel, but one of the other elements since you ask
When about this is is also looking for at least 30 or so tasks. We got the 29
That that had large quantities of code change and I mean a hundred lines and there and the reason is it comes back again to this example of
Well, there was already sweet bench and other things like that
You know was able to emit Kotlin code
It sometimes you could coax it to emit compose for some of the leading models
But not necessarily in the volume that we're looking for for that enterprise or professional work
So exactly on your question we need it to do more for us
And I think for professional developers to really feel the benefit where they're saying like oh my goodness
This is transforming the work day
It's helping me be more productive. It's got to be larger quantities of code
So that was a specific lens that we used in some of it and so just to re emphasize there's
29 if I remember correctly that have at least a hundred lines of code change
And so you're talking about big sets in bigger code bases
Now the difficult part about this is given that this first pass of the benchmark
And I think we might have time to get to it to talk about version two
We'll see if I get in trouble for sneak peeks
But for version one, it's all open source and so there's also this tension that these are not
Necessarily commercial projects like you might have been contracted or employed to work on in the past
And we have some
Has coming up to even add that to future versions
So I'm already proud of where we got for this for the answer
But maybe it's usual Matthew style. I'm never fully satisfied. So we're gonna try to do more
Okay, oh, okay, sorry. Go ahead Ross. I mean, I'm gonna say in the world of the technology development
The work's never really done, right? I mean, there's always the first release is the first release
So then you build from there, right? So one point. No is a Friday and probably nothing more like oh
I will say after covering this stuff weekend a week out we have been commenting
I don't know if you listen to show we have a commenting at the relentless pace you guys have put us on in terms of
Covering the the output. So I'm not surprised to hear that
On the core core, but our job is to make your job difficult by the relentless pace of what we put out good
It has been nonstop
Well real quickly one one quick question for you, you know, similarly to that question room and health are you know kind of a pity
This episode is brought to you by capital one capital one's tech team isn't just talking about multi-agentic AI
They already deployed one. It's called chat concierge and it's simplifying car shopping using self-reflection and layered reasoning with live API checks
It doesn't just help buyers find a car they love it helps schedule a test drive get pre-approved for financing
An estimate trade-in value advanced intuitive and deployed. That's how they stack
That's technology at capital one
Howdy howdy ho and welcome to fantasy fanfellas
I'm Hayden producer of the fantasy fangirls podcast and your resident lover of all things sanderson and I'm steven your bookish internet
Goofball, but you can call me the smash daddy and we are currently deep diving Brandon sanderson's fantasy epic
Missborn, but here's the catch
Steven here has not read
Missborn before that's right. Hey, hey, so each week you'll get my unfiltered raw reactions to every single chip and along the way
We'll do character deep dives magic explainers and Steven will even try to guess what's next spoiler alert
He'll be wrong news flash. I'm never wrong episodes come out every Wednesday and you can find fantasy fanfellas wherever you get your podcasts
You
Gnated standard proposed by google coming out of this, but many projects actually use alternates
You know and and many developers use alternate storage and di frameworks
How much a team's factor that in when using android benches a reference?
You know, this is a place where we're a little bit more thoughtful and the thinking may evolve here
So I'm gonna give you the current mode of thinking but it may evolve over time
I think something for the last four years that you know my team is supported me on and I've had as kind of
For vision for android is there's a balance between being opinionated and not having options
And let me just tease that part just a little bit
Android is the land of tons of options. You can usually find two three four five maybe ten different options for a library that suits a particular need
Language parsing image parsing animations and the like I think that is part of what makes it vibrant and attractive to developers at times
But when they're on their learning journey or they just get it done because they're under pressure
There's also a nice desire to be helpful with a we've tried we've used this one looks good
And that is a very difficult balance
So openness so that you can make choice
But a little bit of opinionation so you can actually like what if y'all used what seems to be working for for the for the industry
So we've done that balance in this
We have not taken any strong stance in the selection of tasks for library or preference for maker or for author foundation
That one is when we've largely kind of put gently to the side and looked more at architecture and top-level choices
With the exception if you want to call it a library of compose. I believe in that so
Religiously that you can't unsell me on all the you know miraculous benefits of compose
But aside from that we did not use library choice as a filtering concern
Cool all right, so one of the big things coming out of this announcement was the the the the the kind of the grading of LLM's and these the
The benchmarks and all that sort of stuff helping people guide are they AI
One of one criticism of public benchmarks is basically that models can cheat by training on test data
How are you guys guarding against that with Android venture and how are you going to stay ahead of that as those models
evolve?
You can you can take a couple of approaches so we'll peel apart in three parts. I think number one
Developers in general and Ron
We're gonna for this point for just like you're in the developer community now that you're AI enhanced in a product manager
You're in team developers and no more no more team out
And so I think for for all of those folks, you know
You're thinking about the fact that software development is
Definitely you're training on all the the open materials that are properly licensed to be able to do so
But you know in general people will with open source. I think when I'll come back to yours
You know well when we're learning we'll copy and paste something and we may not have always like followed to the T
The license there and so there's just risk in industry that open source code gets
Transmagnified from one code based to another and essentially we've designed for the fact and expected
That's this this benchmark will eventually find its way into some of the models that are out there
And so I just want to make it labeled completely clear
We expect that to happen over time. It's just the nature of industry
But the second piece that I wanted to bring to this is that we're already planning it's a little bit that joke of
You know version 1.0 is any given Friday
Well, we already started planning and we're working on 2.0 before 1.0 came out of of this benchmark
And what you'll see is we'll evolve using some of the the wisdom and best practices of sweet bench and sweet bench pro
If you're familiar with those two
They they have a second set effectively a second wave of
More closed evals that supplement the first so a yin and yang if you weld this on degree and we're going to go that direction with some of the additional
Elements that we add to 2.0 that also plays to your earlier question when I didn't I didn't know we're going to go this route
But it's the end of this is it's the enterprise approach. We may be able to have some licensed code bases that we're able to use in evals
But we may not be able to make open source so they're going to contribute to the wisdom the quality of what the models are able to do
But they won't necessarily be code bases that are out in the open and so what you've got is for the first round
You can test it yourself evaluate yourself look at every single line of what we've selected
So from a trustworthyness we gave it all to you. It's all in the open
And then after we booked the ability for v2 so that we can kind of stretch our arms a little bit
We're going to incorporate some code that comes from other sources that we made license to make this even stronger
Okay, well picking off of back up back off of that idea, you know like any good party
That lets you bring your own beverages android studio out of three launch bring your own model back in January
You know, which if you're not aware let's developers plug in claw gpt and like other models
And and so how does android bench complete that picture is it essentially like you know like a buying guy
That comes with the open marketplace
Definitely not a buying guide and and on top of it, you know, these are changing every single week
I mean if you look at just the relentless pace of model releases
I think that you know who's on top who's in second place who's fifth
Could change week to week to week and and probably is there's new model releases and there's improvements to existing ones
But I think even more importantly is the philosophy behind this and not that it's just a buying guide
We have we have two goals for this one to be able to as we begin the show
I think I'm giving developers a way to more empirically make choices about what applies to which scenario
I mean models cost different depending on whether you're using pros or ultras or some of the nano light, you know kind of super
Efficient kind of bits and there are certain scenarios like maybe just code completion where the light models make sense
But if you're refactoring you're most important app in your company and you are re-architecting everything about it and wanted to run
I feel like that's the time to choose the thing that is currently on the top of the list
That's that's where to pay those dollars for the tokens
So this is not just a which one one and done
But you could almost say this helps you make discrete choices for the scenario the use case the automation the volume that you're working on
So you make choose you may choose farther down the list for cost efficiency because it gets you what you need to
And then very lastly is I had the most delightful times
This is a rare privilege of a hat to be able to wear in Google
I've been able to meet with lots of the large
Model makers and like build really good relationships. They're scientists and their researchers and
The great part is that everyone that I spoke to is on team android
Even if they don't wear a Google branded, you know, propeller hat to some degree
And so I think what's super exciting to me is this means that every major model maker is pulling for android to do better
And it just feels so so like the android vibe and culture in that way
We love that vibe. We love that culture
So I got to address the elephant in the room at you, you know, we love the approach. We love the whole thing
But looking at the leaderboard
Gemini's on top and some people might raise an eyebrow at the fact that Google's both running the benchmark and
Saying that Google's AI model coming in first
How are you guys maintaining credibility and trust in those rankings over time and some skeptics might wonder are you guys tipping the scale and Gemini's favor
To push forward Google's, you know, LLM in this case
The team that puts this together is on team android to just be super clear
The second piece though is, you know, that's what hat we wear, but then you know, what are we doing
All open all in the open source if anyone is currently a skeptic
We put it out there so that you can run it yourself like, you know, be that person that goes and runs the harness
Runs it n equals 10 runs it across the models that you care about including even once that are not on our our leaderboard
I mean, there's only so many we can do there are a lot of model choices out there today
And then go look at what your result is so I would simply say like we did that for the sake of efficiencies to somebody can pop up in a webpage
And see, you know, what we got we put hundreds of hours of labor into doing that
But if you need that confidence of getting your own answer
We gave you all the source materials all the harness all the tests and then go produce your own result
Pretend that graph doesn't exist and go create the graph you trust based on exactly the same test Evalon harness that we've got
But that would require someone actually doing work and it's much easier to be snarky unread it than to do work
I said there was no red lines for today's conversation, but maybe you found one
Will we have off that proclivities for work kind of
Okay, well, um next up let's kind of drift a little bit away from android bench and maybe look more kind of a like maybe more
Kind of top level because you know these mentioned like
You're for yourself a lot of goals and a lot of people talked about kind of a good future and what the future of app smell
My look like so for me as someone with boots in the ground
We're having a lot of discussions about quality and human review at scale and you know
It really does come up these days, especially with AI and you know on one hand for maybe like the broader public
There's the growing trend of micro apps where apps are written buying individual for their own use
And it feels like a lot of what's been talked about especially kind of the more kind of commercial
Oh regular person spaces
Um, but the working concerns of like production projects from like you know tiny to large can be quite different again
As an enterprisey working person these are the kind of things that I often think about and you know
There's there's different you know requirements in terms of security quality and scalability and it can be hard
Especially for someone like me to kind of bridge that mental gap between the two kind of projects
Like the two kind of threads maybe in the conversation. I want to ask you Matthew. What are your thoughts on what
Us as an industry need to do to fill in these gaps to make AI trustworthy and productive at scale while maintaining you know high quality and safety
Yeah, it's for using this. I think you know there's a lot of emergent techniques
I don't claim to have the full list
But I have a couple things that I hear from industry leaders are working extremely well
One when we talked about multi model in this particular case
One of the approaches is actually
LLM as it as a judge and I use that term very loosely not the traditional sense for it where
One will write the test. I mean I even have a harness for some of my work at home where I use one model to write the code
And the other one to write the test and then they flip and critique each other and you know
Yes, I'm subject to even writing the silly prompts
Your future use depends on how good of a job you you you do in critiquing the other
And so I think one of the interesting things that that you know comes from a venture like this is you could even choose a couple of the models to be able to
Cross-check each other because they just have such different training techniques and sometimes different behaviors that are really good to
To check one another I especially did that around area any area where I'm looking for a performance or best practices
Architectural or security concerns and you know, I am definitely not on team expert of our Android team
I mean we've got luminaries who've been in for ten years
So I'm you know far far more in need of this double check than maybe some of our experts in this case
But that's been one approach and then the second is we're actually open to recommendations
For folks to send in we've got a kind of the usual paths through GDEs and the like but also some air contacts on the DRE team
For where you'd like us to see putting energy on the V2 point out if you're like hey is I'm using model X model Y model Z
I'm feeling like I'm not getting secure approaches performing approaches etc
Please send us those signals because we're going to use that we're going to work for about another couple of months on making sure that we have the right plan for 2.0
Even as we're coding and I'd love to take that feedback into account so that we're essentially making the areas you need stronger
Stronger through a result of that at the benchmark
Well as someone who has a lot of opinions. I really much appreciate that
But no and I mean seriously that's it's very encouraging because I think that's the one thing that feels sometimes
Drowned out and everything is the lack of voice and ability to steer the course of you know
What is a huge movement right now and a huge way of so and when one thing we already did is you know
We have effectively a steer cove of wide range of companies
But who come together with us virtually and in person a couple times a year too
Essentially be our our steering voice for Android developer you know representatives
You've you've participated in that many a time and I think those also we quantify those results
We make sure we double click into them and that was already used for some of the decision making in v1.0
So that's another channel as well that we're getting from you know from those on the ground developers who are
Refectively leading Android in industry
Excellent well
So in some of the and you know, we're coming close to the end of our time together
But in some of the materials that came out at the announcement
You guys mentioned you know the long-term goal of a developer being able to build any app they imagine an Android
Right, which is you know a long-term dream of mine
I've got windows. I have an app idea that I've been wanting to make for years and
Someday I'm gonna find the time when my kids will leave me alone
I'm gonna vibe coded. We're gonna use Gemini. I'm gonna do studio. I'm gonna do the whole thing
Something to do where does AI need to in your opinion?
Where does AI need to get to in terms of benchmark scores and capability and all the stuff you guys are working on before that
Generally becomes realistic and how far away are we from that moment from your point of view
Not very far. I feel like users can already touch some version of this
I'm not asking people to trust me very much at all if you look at Panda 2 and download the new project
If you use the new project agent and download Panda 2
I feel like you already
Get some confidence that we're right on the the cusp of this being possible
It's super exciting. I mean there's quite a bit of Twitter traffic
I don't know if there's a specific hashtag for it, but for people
Jenning up ideas with with Panda 2 and new project agent that are
Composed best practices good architecture come with the test suite have really good look and feel use material 3 expressive
Like these are a lot of really good practices. This is not you're not having to compromise like well
It's sorry. It's not native. You know doesn't look really good. It doesn't use the best like no no compromise by default
So we're on the cusp
But I'll tell you that even with just a little bit of extra prompting still a little bit of that developer insight
But maybe not hands-on developer
People are able to make some really impressive apps and next time that we're together ask me about the progress of my friend
Grant who has a teenagers along with mine who are in driving school and the like and the app
That's out there could use some help and he's like what let's do this
This is the era like this is the era to like write a better version of this app and
It's really good. It's cool
Really good and so I think this is like my own personal validation loop not industry not Twitter not some blog post or something like that
Like my friend I was like I like this
And so then my daughter was writing in the car with him and had an idea and as soon as he got back home from like you know
Being transported to their high school activities and stuff. He was like hammering out the extra feature
And to me it's not just this this app to focus on but it's the excitement that we have new people
Raising the bar in apps that are going to be like, you know every day kind of use not just you know single individual use apps
And so to me like investing in Android bench is effectively a
Sirtoidist way of helping people like Grant create better software for people like my own daughter
Oh, that's awesome. That's what you that's what you want to see right it's it's it's it's
Adding to the society effect and change and moving forward the innovation
You know, I think about as a kid in the 80s learning basic for the first time and feeling like you've unlocked a whole new world
And like I you know my kids are seven. They're not quite you know
They're they're just starting out on and I think about the tools. They'll be able to do in there your kid you could say
She like that's gonna be it's gonna be a lot of fun and you guys are laying the groundwork for it. That's awesome
So well Matthew. Thank you so much for your time
Are there any other things you want to leave our audience with any teases anything we can look forward to or I know
Google Ios around the corner. We're looking forward to seeing you there
I'll I'll be careful about the Io teaser bits, but there is exciting stuff coming
We will we will not leave you bored for that piece. I saw we skip you that promise
And leading up to it and the second is you know on something like this
It's always important like oh one and done Android bench 1.0. Well, it's great. You know, it's been great shipping
We have the same team plus expanded plus more funding to keep going
And so I think that part that I want to leave you with is like this
This is the start. This is not the finish that we're talking about today
So please give that feedback. You know when you were asking like please send it with through any of the channels
Whether it's social or otherwise some of the chance and the feedback on what should be in it and
Effectively stay tuned for models to really if you already think they're reasonably good
I think over the next six months you will see step changes in Android capabilities
Due to this and so if you're part of the Android community
Enjoy and you know get derived as much benefit as you can from this in in building some amazing apps for our family
Our partners our kids
Awesome. Well, thank you so much for your time. We really appreciate. We'll see you. We'll see you in May at at Google Ios
Thanks Matthew then by when by run
You
This episode is brought to you by capital one capital one's tech team isn't just talking about multi-agentic AI
They already deployed one
It's called chat concierge and it's simplifying car shopping using self-reflection and layered reasoning with live API checks
It doesn't just help buyers find a car they love it helps schedule a test drive get pre-approved for financing and estimate trade and value
Advanced intuitive and deployed. That's how they stack
That's technology at capital one howdy howdy ho and welcome to fantasy fanfellas
I'm Hayden producer of the fantasy fangirls podcast and your resident lover of all things sanderson and I'm steven your bookish internet
Goofball, but you can call me the smash daddy and we are currently deep diving Brandon Sanderson's fantasy epic
Missborn, but here's the catch
Steven here has not read
Missborn before that's right. Hey, hey, so each week you'll get my unfiltered raw reactions to every single chip and along the way
We'll do character deep dives magic explainers and steven will even try to guess what's next spoiler alert
He'll be wrong news flash. I'm never wrong episodes come out every Wednesday and you can find fantasy fanfellas wherever you get your podcasts



