Discussion:
"LLR with time"
(too old to reply)
Johannes Schulte
2017-11-10 23:13:18 UTC
Permalink
Hi "all",

I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.

There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data

saying "Time aware variant of G-Test is possible"

I remember i experimented with exponentially decayed counts some years ago
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure like
T-Digest when working with a lot of counts / cells for every combination of
items. Keeping a t-digest for every combination seems unfeasible.

How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...

Cheers,

Johannes
Pat Ferrel
2017-11-11 00:12:30 UTC
Permalink
So your idea is to find anomalies in event frequencies to detect “hot” items?

Interesting, maybe Ted will chime in.

What I do is take the frequency, first, and second, derivatives as measures of popularity, increasing popularity, and increasingly increasing popularity. Put another way popular, trending, and hot. This is simple to do by taking 1, 2, or 3 time buckets and looking at the number of events, derivative (difference), and second derivative. Ranking all items by these value gives various measures of popularity or its increase.

If your use is in a recommender you can add a ranking field to all items and query for “hot” by using the ranking you calculated.

If you want to bias recommendations by hotness, query with user history and boost by your hot field. I suspect the hot field will tend to overwhelm your user history in this case as it would if you used anomalies so you’d also have to normalize the hotness to some range closer to the one created by the user history matching score. I haven’t found a vey good way to mix these in a model so use hot as a method of backfill if you cannot return enough recommendations or in places where you may want to show just hot items. There are several benefits to this method of using hot to rank all items including the fact that you can apply business rules to them just as normal recommendations—so you can ask for hot in “electronics” if you know categories, or hot "in-stock" items, or ...

Still anomaly detection does sound like an interesting approach.


On Nov 10, 2017, at 3:13 PM, Johannes Schulte <***@gmail.com> wrote:

Hi "all",

I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.

There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data

saying "Time aware variant of G-Test is possible"

I remember i experimented with exponentially decayed counts some years ago
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure like
T-Digest when working with a lot of counts / cells for every combination of
items. Keeping a t-digest for every combination seems unfeasible.

How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...

Cheers,

Johannes
Pat Ferrel
2017-11-11 01:50:51 UTC
Permalink
BTW you should take time buckets that are relatively free of daily cycles like 3 day, week, or month buckets for “hot”. This is to remove cyclical affects from the frequencies as much as possible since you need 3 buckets to see the change in change, 2 for the change, and 1 for the event volume.


On Nov 10, 2017, at 4:12 PM, Pat Ferrel <***@occamsmachete.com> wrote:

So your idea is to find anomalies in event frequencies to detect “hot” items?

Interesting, maybe Ted will chime in.

What I do is take the frequency, first, and second, derivatives as measures of popularity, increasing popularity, and increasingly increasing popularity. Put another way popular, trending, and hot. This is simple to do by taking 1, 2, or 3 time buckets and looking at the number of events, derivative (difference), and second derivative. Ranking all items by these value gives various measures of popularity or its increase.

If your use is in a recommender you can add a ranking field to all items and query for “hot” by using the ranking you calculated.

If you want to bias recommendations by hotness, query with user history and boost by your hot field. I suspect the hot field will tend to overwhelm your user history in this case as it would if you used anomalies so you’d also have to normalize the hotness to some range closer to the one created by the user history matching score. I haven’t found a vey good way to mix these in a model so use hot as a method of backfill if you cannot return enough recommendations or in places where you may want to show just hot items. There are several benefits to this method of using hot to rank all items including the fact that you can apply business rules to them just as normal recommendations—so you can ask for hot in “electronics” if you know categories, or hot "in-stock" items, or ...

Still anomaly detection does sound like an interesting approach.


On Nov 10, 2017, at 3:13 PM, Johannes Schulte <***@gmail.com> wrote:

Hi "all",

I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.

There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data

saying "Time aware variant of G-Test is possible"

I remember i experimented with exponentially decayed counts some years ago
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure like
T-Digest when working with a lot of counts / cells for every combination of
items. Keeping a t-digest for every combination seems unfeasible.

How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...

Cheers,

Johannes
Johannes Schulte
2017-11-11 08:43:19 UTC
Permalink
Pat, thanks for your help. especially the insights on how you handle the
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you say,
it's always hard to find the right boosts without setting up some ltr
system. If there would be a way to use the hotness when calculating the
indicators for subpopulations it would be great., especially for a cross
recommender.

e.g. people in greece _now_ are viewing this show/product whatever

And here the popularity of the recommended item in this subpopulation could
be overrseen when just looking at the overall derivatives of activity.

Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..

and derive multiple indicators per item to be indexed.

But this all relies on discretizing time into buckets and not looking at
the distribution of time between events like in presentation above - maybe
there is something way smarter

Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily cycles
like 3 day, week, or month buckets for “hot”. This is to remove cyclical
affects from the frequencies as much as possible since you need 3 buckets
to see the change in change, 2 for the change, and 1 for the event volume.
So your idea is to find anomalies in event frequencies to detect “hot”
items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly increasing
popularity. Put another way popular, trending, and hot. This is simple to
do by taking 1, 2, or 3 time buckets and looking at the number of events,
derivative (difference), and second derivative. Ranking all items by these
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all items
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user history
and boost by your hot field. I suspect the hot field will tend to overwhelm
your user history in this case as it would if you used anomalies so you’d
also have to normalize the hotness to some range closer to the one created
by the user history matching score. I haven’t found a vey good way to mix
these in a model so use hot as a method of backfill if you cannot return
enough recommendations or in places where you may want to show just hot
items. There are several benefits to this method of using hot to rank all
items including the fact that you can apply business rules to them just as
normal recommendations—so you can ask for hot in “electronics” if you know
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
Hi "all",
I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some years ago
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure like
T-Digest when working with a lot of counts / cells for every combination of
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
Ted Dunning
2017-11-11 12:01:33 UTC
Permalink
So ... there are a few different threads here.

1) LLR but with time. Quite possible, but not really what Johannes is
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.

2) time varying recommendation. As Johannes notes, this can make use of
windowed counts. The problem is that rarely accessed items should probably
have longer windows so that we use longer term trends when we have less
data.

The good news here is that this some part of this is nearly already in the
code. The trick is that the down-sampling used in the system can be adapted
to favor recent events over older ones. That means that if the meaning of
something changes over time, the system will catch on. Likewise, if
something appears out of nowhere, it will quickly train up. This handles
the popular in Greece right now problem.

But this isn't the whole story of changing recommendations. Another problem
that we commonly face is what I call the christmas music issue. The idea is
that there are lots of recommendations for music that are highly seasonal.
Thus, Bing Crosby fans want to hear White Christmas
until the day after christmas
at which point this becomes a really bad recommendation. To some degree,
this can be partially dealt with by using temporal tags as indicators, but
that doesn't really allow a recommendation to be completely shut down.

The only way that I have seen to deal with this in the past is with a
manually designed kill switch. As much as possible, we would tag the
obviously seasonal content and then add a filter to kill or downgrade that
content the moment it went out of fashion.



On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
Post by Johannes Schulte
Pat, thanks for your help. especially the insights on how you handle the
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you say,
it's always hard to find the right boosts without setting up some ltr
system. If there would be a way to use the hotness when calculating the
indicators for subpopulations it would be great., especially for a cross
recommender.
e.g. people in greece _now_ are viewing this show/product whatever
And here the popularity of the recommended item in this subpopulation could
be overrseen when just looking at the overall derivatives of activity.
Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..
and derive multiple indicators per item to be indexed.
But this all relies on discretizing time into buckets and not looking at
the distribution of time between events like in presentation above - maybe
there is something way smarter
Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily cycles
like 3 day, week, or month buckets for “hot”. This is to remove cyclical
affects from the frequencies as much as possible since you need 3 buckets
to see the change in change, 2 for the change, and 1 for the event
volume.
Post by Pat Ferrel
So your idea is to find anomalies in event frequencies to detect “hot”
items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly
increasing
Post by Pat Ferrel
popularity. Put another way popular, trending, and hot. This is simple to
do by taking 1, 2, or 3 time buckets and looking at the number of events,
derivative (difference), and second derivative. Ranking all items by
these
Post by Pat Ferrel
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all items
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user history
and boost by your hot field. I suspect the hot field will tend to
overwhelm
Post by Pat Ferrel
your user history in this case as it would if you used anomalies so you’d
also have to normalize the hotness to some range closer to the one
created
Post by Pat Ferrel
by the user history matching score. I haven’t found a vey good way to mix
these in a model so use hot as a method of backfill if you cannot return
enough recommendations or in places where you may want to show just hot
items. There are several benefits to this method of using hot to rank all
items including the fact that you can apply business rules to them just
as
Post by Pat Ferrel
normal recommendations—so you can ask for hot in “electronics” if you
know
Post by Pat Ferrel
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
Hi "all",
I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some years
ago
Post by Pat Ferrel
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure
like
Post by Pat Ferrel
T-Digest when working with a lot of counts / cells for every combination
of
Post by Pat Ferrel
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
Pat Ferrel
2017-11-11 17:31:50 UTC
Permalink
If Mahout were to use http://bit.ly/poisson-llr it would tend to favor new events in calculating the LLR score for later use in the threshold for whether a co or cross-occurrence iss incorporated in the model. This is very interesting and would be useful in cases where you can keep a lot of data or where recent data is far more important, like news. This is the time-aware G-test your are referencing as I understand it.

But it doesn’t relate to popularity as I think Ted is saying.

Are you looking for 1) personal recommendations biased by hotness in Greece or 2) things hot in Greece?

1) create a secondary indicator for “watched in some locale” the local-id uses a country-code+postal-code maybe but not lat-lon. Something that includes a good number of people/events. The the query would be user-id, and user-locale. This would yield personal recs preferred in the user’s locale. Athens-west-side in this case.
2) split the data into locales and do the hot calc I mention. The query would have no user-id since it is not personalized but would yield “hot in Greece”

Ted’s “Christmas video” tag is what I was calling a business rule and can be added to either of the above techniques.

On Nov 11, 2017, at 4:01 AM, Ted Dunning <***@gmail.com> wrote:

So ... there are a few different threads here.

1) LLR but with time. Quite possible, but not really what Johannes is
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.

2) time varying recommendation. As Johannes notes, this can make use of
windowed counts. The problem is that rarely accessed items should probably
have longer windows so that we use longer term trends when we have less
data.

The good news here is that this some part of this is nearly already in the
code. The trick is that the down-sampling used in the system can be adapted
to favor recent events over older ones. That means that if the meaning of
something changes over time, the system will catch on. Likewise, if
something appears out of nowhere, it will quickly train up. This handles
the popular in Greece right now problem.

But this isn't the whole story of changing recommendations. Another problem
that we commonly face is what I call the christmas music issue. The idea is
that there are lots of recommendations for music that are highly seasonal.
Thus, Bing Crosby fans want to hear White Christmas
http://youtu.be/P8Ozdqzjigg until the day after christmas
at which point this becomes a really bad recommendation. To some degree,
this can be partially dealt with by using temporal tags as indicators, but
that doesn't really allow a recommendation to be completely shut down.

The only way that I have seen to deal with this in the past is with a
manually designed kill switch. As much as possible, we would tag the
obviously seasonal content and then add a filter to kill or downgrade that
content the moment it went out of fashion.



On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
Post by Johannes Schulte
Pat, thanks for your help. especially the insights on how you handle the
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you say,
it's always hard to find the right boosts without setting up some ltr
system. If there would be a way to use the hotness when calculating the
indicators for subpopulations it would be great., especially for a cross
recommender.
e.g. people in greece _now_ are viewing this show/product whatever
And here the popularity of the recommended item in this subpopulation could
be overrseen when just looking at the overall derivatives of activity.
Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..
and derive multiple indicators per item to be indexed.
But this all relies on discretizing time into buckets and not looking at
the distribution of time between events like in presentation above - maybe
there is something way smarter
Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily cycles
like 3 day, week, or month buckets for “hot”. This is to remove cyclical
affects from the frequencies as much as possible since you need 3 buckets
to see the change in change, 2 for the change, and 1 for the event
volume.
Post by Pat Ferrel
So your idea is to find anomalies in event frequencies to detect “hot” items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly
increasing
Post by Pat Ferrel
popularity. Put another way popular, trending, and hot. This is simple to
do by taking 1, 2, or 3 time buckets and looking at the number of events,
derivative (difference), and second derivative. Ranking all items by
these
Post by Pat Ferrel
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all items
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user history
and boost by your hot field. I suspect the hot field will tend to
overwhelm
Post by Pat Ferrel
your user history in this case as it would if you used anomalies so you’d
also have to normalize the hotness to some range closer to the one
created
Post by Pat Ferrel
by the user history matching score. I haven’t found a vey good way to mix
these in a model so use hot as a method of backfill if you cannot return
enough recommendations or in places where you may want to show just hot
items. There are several benefits to this method of using hot to rank all
items including the fact that you can apply business rules to them just
as
Post by Pat Ferrel
normal recommendations—so you can ask for hot in “electronics” if you
know
Post by Pat Ferrel
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
Hi "all",
I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some years
ago
Post by Pat Ferrel
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure
like
Post by Pat Ferrel
T-Digest when working with a lot of counts / cells for every combination
of
Post by Pat Ferrel
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
Ted Dunning
2017-11-11 18:00:05 UTC
Permalink
Inline.
Post by Pat Ferrel
If Mahout were to use http://bit.ly/poisson-llr it would tend to favor
new events in calculating the LLR score for later use in the threshold for
whether a co or cross-occurrence iss incorporated in the model.
I don't think that this would actually help for most recommendation
purposes.

It might help to determine that some item or other has broken out of
historical rates. Thus, we might have "hotness" as a detected feature that
could be used as a boost at recommendation time. We might also have "not
hotness" as a negative boost feature.

Since we have a pretty good handle on the "other" counts, I don't think
that the Poisson test would help much with the cooccurrence stuff itself.

Changing the sampling rule could make a difference to temporality and would
be more like what Johannes is asking about.
Post by Pat Ferrel
But it doesn’t relate to popularity as I think Ted is saying.
Are you looking for 1) personal recommendations biased by hotness in
Greece or 2) things hot in Greece?
1) create a secondary indicator for “watched in some locale” the local-id
uses a country-code+postal-code maybe but not lat-lon. Something that
includes a good number of people/events. The the query would be user-id,
and user-locale. This would yield personal recs preferred in the user’s
locale. Athens-west-side in this case.
And this works in the current regime. Simply add location tags to the user
histories and do cooccurrence against content. Locations will pop out as
indicators for some content and not for others. Then when somebody appears
in some location, their tags will retrieve localized content.

For localization based on strict geography, say for restaurant search, we
can just add business rules based on geo-search. A very large bank customer
of ours does that, for instance.
Post by Pat Ferrel
2) split the data into locales and do the hot calc I mention. The query
would have no user-id since it is not personalized but would yield “hot in
Greece”
I think that this is a good approach.
Post by Pat Ferrel
Ted’s “Christmas video” tag is what I was calling a business rule and can
be added to either of the above techniques.
But the (not) hotness feature might help with automated this.
Post by Pat Ferrel
So ... there are a few different threads here.
1) LLR but with time. Quite possible, but not really what Johannes is
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.
2) time varying recommendation. As Johannes notes, this can make use of
windowed counts. The problem is that rarely accessed items should probably
have longer windows so that we use longer term trends when we have less
data.
The good news here is that this some part of this is nearly already in the
code. The trick is that the down-sampling used in the system can be adapted
to favor recent events over older ones. That means that if the meaning of
something changes over time, the system will catch on. Likewise, if
something appears out of nowhere, it will quickly train up. This handles
the popular in Greece right now problem.
But this isn't the whole story of changing recommendations. Another problem
that we commonly face is what I call the christmas music issue. The idea is
that there are lots of recommendations for music that are highly seasonal.
Thus, Bing Crosby fans want to hear White Christmas
http://youtu.be/P8Ozdqzjigg until the day after christmas
at which point this becomes a really bad recommendation. To some degree,
this can be partially dealt with by using temporal tags as indicators, but
that doesn't really allow a recommendation to be completely shut down.
The only way that I have seen to deal with this in the past is with a
manually designed kill switch. As much as possible, we would tag the
obviously seasonal content and then add a filter to kill or downgrade that
content the moment it went out of fashion.
On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
Post by Johannes Schulte
Pat, thanks for your help. especially the insights on how you handle the
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you say,
it's always hard to find the right boosts without setting up some ltr
system. If there would be a way to use the hotness when calculating the
indicators for subpopulations it would be great., especially for a cross
recommender.
e.g. people in greece _now_ are viewing this show/product whatever
And here the popularity of the recommended item in this subpopulation
could
Post by Johannes Schulte
be overrseen when just looking at the overall derivatives of activity.
Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..
and derive multiple indicators per item to be indexed.
But this all relies on discretizing time into buckets and not looking at
the distribution of time between events like in presentation above -
maybe
Post by Johannes Schulte
there is something way smarter
Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily
cycles
Post by Johannes Schulte
Post by Pat Ferrel
like 3 day, week, or month buckets for “hot”. This is to remove cyclical
affects from the frequencies as much as possible since you need 3
buckets
Post by Johannes Schulte
Post by Pat Ferrel
to see the change in change, 2 for the change, and 1 for the event
volume.
Post by Pat Ferrel
So your idea is to find anomalies in event frequencies to detect “hot”
items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly
increasing
Post by Pat Ferrel
popularity. Put another way popular, trending, and hot. This is simple
to
Post by Johannes Schulte
Post by Pat Ferrel
do by taking 1, 2, or 3 time buckets and looking at the number of
events,
Post by Johannes Schulte
Post by Pat Ferrel
derivative (difference), and second derivative. Ranking all items by
these
Post by Pat Ferrel
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all items
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user history
and boost by your hot field. I suspect the hot field will tend to
overwhelm
Post by Pat Ferrel
your user history in this case as it would if you used anomalies so
you’d
Post by Johannes Schulte
Post by Pat Ferrel
also have to normalize the hotness to some range closer to the one
created
Post by Pat Ferrel
by the user history matching score. I haven’t found a vey good way to
mix
Post by Johannes Schulte
Post by Pat Ferrel
these in a model so use hot as a method of backfill if you cannot return
enough recommendations or in places where you may want to show just hot
items. There are several benefits to this method of using hot to rank
all
Post by Johannes Schulte
Post by Pat Ferrel
items including the fact that you can apply business rules to them just
as
Post by Pat Ferrel
normal recommendations—so you can ask for hot in “electronics” if you
know
Post by Pat Ferrel
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
Hi "all",
I am wondering what would be the best way to incorporate event time
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some years
ago
Post by Pat Ferrel
and this involved changing the counts to doubles, but I suspect there is
some smarter way. What I don't get is the relation to a data structure
like
Post by Pat Ferrel
T-Digest when working with a lot of counts / cells for every combination
of
Post by Pat Ferrel
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to detect
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
Pat Ferrel
2017-11-12 21:34:38 UTC
Permalink
Part of what Ted is talking about can be seen in the carousels on Netflix or Amazon. Some are not recommendations like “trending” videos, or “new” videos, or “prime” videos (substitute your own promotions here). Nothing to do with recommender created items but presented along with recommender-based carousels. They are based on analytics or business rules and ideally have some randomness built in. The reason for this is 1) it works by exposing users to items that they would not see in recommendations and 2) it provides data to build the recommender model from.

A recommender cannot work in an app that has no non-recommended items displayed or there will be no un-biased data to create recommendations from. This would lead to crippling overfitting. Most apps have placements like the ones mentioned above and also have search and browse. However you do it, it must be prominent and aways available. The moral of this paragraph is; don’t try to make everything a recommendation, it will be self-defeating. In fact make sure not every video watch comes from a recommendation.

Likewise think of placements (reflecting a particular recommender use) as experimentation grounds. Try things like finding a recommended category and then recommending items in that category all based on user behavior. Or try a placement based on a single thing a user watched like “because you watched xyz you might like these”. Don’t just show the most popular categories for the user and recommend items in them. This would be a type of overfitting too.

I’m sure we have strayed far from your original question but maybe it’s covered somewhere in here.


On Nov 12, 2017, at 12:11 PM, Johannes Schulte <***@gmail.com> wrote:

I did "second order" recommendations before but more to fight sparsity and
find more significant associations in situations with less traffic, so
recommending categories instead of products. There needs to be some third
order sorting / boosting like you mentioned with "new music", or maybe
popularity or hotness to avoid quasi-random order. For events with limited
lifetime it's probably some mixture of spatial distance and freshness.

We will definetely keep an eye on the generation process of data for new
items. It depends on the domain but in the time of multi channel promotion
of videos, shows and products, it's also helps that there is traffic driven
from external sources.

Thanks for the detailed hints - now it's time to see what comes out of
this.

Johannes
Events have the natural good quality that having a cold start means that
you will naturally favor recent interactions simply because there won't be
any old interactions to deal with.
Unfortunately, that also means that you will likely be facing serious cold
start issues all the time. I have used two strategies to deal with cold
starts, both fairly successfully.
*Method 1: Second order recommendation*
For novel items with no history, you typically do have some kind of
information about the content. For an event, you may know the performer,
the organizer, the venue, possibly something about the content of the event
as well (especially for a tour event). As such, you can build a recommender
that recommends this secondary information and then do a search with
recommended secondary information to find events. This actually works
pretty well, at least for the domains where I have used (music and videos).
For instance, in music, you can easily recommend a new album based on the
artist (s) and track list.
The trick here is to determine when and how to blend in normal
recommendations. One way is query blending where you combine the second
order query with a normal recommendation query, but I think that a fair bit
of experimentation is warranted here.
*Method 2: What's new and what's trending*
It is always important to provide alternative avenues of information
gathering for recommendation. Especially for the user generated video case,
there was pretty high interest in the "What's new" and "What's hot" pages.
If you do a decent job of dithering here, you keep reasonably good content
on the what's new page longer than content that doesn't pull. That
maintains interest in the page. Similarly, you can have a bit of a lower
bar for new content to be classified as hot than established content. That
way you keep the page fresh (because new stuff appears transiently), but
you also have a fair bit of really good stuff as well. If done well, these
pages will provide enough interactions with new items so that they don't
start entirely cold. You may need to have genre specific or location
specific versions of these pages to avoid interesting content being
overwhelmed. You might also be able to spot content that has intense
interest from a sub-population as opposed to diffuse interest from a mass
population.
You can also use novelty and trending boosts for content in the normal
recommendation engine. I have avoided this in the past because I felt it
was better to have specialized pages for what's new and hot rather than
because I had data saying it was bad to do. I have put a very weak
recommendation effect on the what's hot pages so that people tend to see
trending material that they like. That doesn't help on what's new pages for
obvious reasons unless you use a touch of second order recommendation.
On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte <
Well the greece thing was just an example for a thing you don't know
upfront - it could be any of the modeled feature on the cross recommender
input side (user segment, country, city, previous buys), some
subpopulation
getting active, so the current approach, probably with sampling that
favours newer events, will be the best here. Luckily a sampling strategy
is
a big topic anyway since we're trying to go for the near real time way -
pat, you talked about it some while ago on this list and i still have to
look at the flink talk from trevor grant but I'm really eager to attack
this after years of batch :)
Thanks for your thoughts, I am happy I can rule something out given the
domain (poisson llr). Luckily the domain I'm working on is event
recommendations, so there is a natural deterministic item expiry (as
compared to christmas like stuff).
Again,
thanks!
Post by Ted Dunning
Inline.
Post by Pat Ferrel
If Mahout were to use http://bit.ly/poisson-llr it would tend to
favor
Post by Ted Dunning
Post by Pat Ferrel
new events in calculating the LLR score for later use in the
threshold
Post by Ted Dunning
for
Post by Pat Ferrel
whether a co or cross-occurrence iss incorporated in the model.
I don't think that this would actually help for most recommendation
purposes.
It might help to determine that some item or other has broken out of
historical rates. Thus, we might have "hotness" as a detected feature
that
Post by Ted Dunning
could be used as a boost at recommendation time. We might also have
"not
Post by Ted Dunning
hotness" as a negative boost feature.
Since we have a pretty good handle on the "other" counts, I don't think
that the Poisson test would help much with the cooccurrence stuff
itself.
Post by Ted Dunning
Changing the sampling rule could make a difference to temporality and
would
Post by Ted Dunning
be more like what Johannes is asking about.
Post by Pat Ferrel
But it doesn’t relate to popularity as I think Ted is saying.
Are you looking for 1) personal recommendations biased by hotness in
Greece or 2) things hot in Greece?
1) create a secondary indicator for “watched in some locale” the
local-id
Post by Ted Dunning
Post by Pat Ferrel
uses a country-code+postal-code maybe but not lat-lon. Something that
includes a good number of people/events. The the query would be
user-id,
Post by Ted Dunning
Post by Pat Ferrel
and user-locale. This would yield personal recs preferred in the
user’s
Post by Ted Dunning
Post by Pat Ferrel
locale. Athens-west-side in this case.
And this works in the current regime. Simply add location tags to the
user
Post by Ted Dunning
histories and do cooccurrence against content. Locations will pop out
as
Post by Ted Dunning
indicators for some content and not for others. Then when somebody
appears
Post by Ted Dunning
in some location, their tags will retrieve localized content.
For localization based on strict geography, say for restaurant search,
we
Post by Ted Dunning
can just add business rules based on geo-search. A very large bank
customer
Post by Ted Dunning
of ours does that, for instance.
Post by Pat Ferrel
2) split the data into locales and do the hot calc I mention. The
query
Post by Ted Dunning
Post by Pat Ferrel
would have no user-id since it is not personalized but would yield
“hot
Post by Ted Dunning
in
Post by Pat Ferrel
Greece”
I think that this is a good approach.
Post by Pat Ferrel
Ted’s “Christmas video” tag is what I was calling a business rule and
can
Post by Ted Dunning
Post by Pat Ferrel
be added to either of the above techniques.
But the (not) hotness feature might help with automated this.
Post by Pat Ferrel
So ... there are a few different threads here.
1) LLR but with time. Quite possible, but not really what Johannes is
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.
2) time varying recommendation. As Johannes notes, this can make use
of
Post by Ted Dunning
Post by Pat Ferrel
windowed counts. The problem is that rarely accessed items should
probably
Post by Pat Ferrel
have longer windows so that we use longer term trends when we have
less
Post by Ted Dunning
Post by Pat Ferrel
data.
The good news here is that this some part of this is nearly already
in
Post by Ted Dunning
the
Post by Pat Ferrel
code. The trick is that the down-sampling used in the system can be
adapted
Post by Pat Ferrel
to favor recent events over older ones. That means that if the
meaning
of
Post by Ted Dunning
Post by Pat Ferrel
something changes over time, the system will catch on. Likewise, if
something appears out of nowhere, it will quickly train up. This
handles
Post by Ted Dunning
Post by Pat Ferrel
the popular in Greece right now problem.
But this isn't the whole story of changing recommendations. Another
problem
Post by Pat Ferrel
that we commonly face is what I call the christmas music issue. The
idea
Post by Ted Dunning
is
Post by Pat Ferrel
that there are lots of recommendations for music that are highly
seasonal.
Post by Pat Ferrel
Thus, Bing Crosby fans want to hear White Christmas
http://youtu.be/P8Ozdqzjigg until the day after christmas
at which point this becomes a really bad recommendation. To some
degree,
Post by Ted Dunning
Post by Pat Ferrel
this can be partially dealt with by using temporal tags as
indicators,
Post by Ted Dunning
but
Post by Pat Ferrel
that doesn't really allow a recommendation to be completely shut
down.
Post by Ted Dunning
Post by Pat Ferrel
The only way that I have seen to deal with this in the past is with a
manually designed kill switch. As much as possible, we would tag the
obviously seasonal content and then add a filter to kill or downgrade
that
Post by Pat Ferrel
content the moment it went out of fashion.
On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
Post by Johannes Schulte
Pat, thanks for your help. especially the insights on how you
handle
Post by Ted Dunning
the
Post by Pat Ferrel
Post by Johannes Schulte
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you
say,
Post by Pat Ferrel
Post by Johannes Schulte
it's always hard to find the right boosts without setting up some
ltr
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
system. If there would be a way to use the hotness when calculating
the
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
indicators for subpopulations it would be great., especially for a
cross
Post by Pat Ferrel
Post by Johannes Schulte
recommender.
e.g. people in greece _now_ are viewing this show/product whatever
And here the popularity of the recommended item in this
subpopulation
Post by Ted Dunning
Post by Pat Ferrel
could
Post by Johannes Schulte
be overrseen when just looking at the overall derivatives of
activity.
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..
and derive multiple indicators per item to be indexed.
But this all relies on discretizing time into buckets and not
looking
Post by Ted Dunning
at
Post by Pat Ferrel
Post by Johannes Schulte
the distribution of time between events like in presentation above
-
Post by Ted Dunning
Post by Pat Ferrel
maybe
Post by Johannes Schulte
there is something way smarter
Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily
cycles
Post by Johannes Schulte
Post by Pat Ferrel
like 3 day, week, or month buckets for “hot”. This is to remove
cyclical
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
affects from the frequencies as much as possible since you need 3
buckets
Post by Johannes Schulte
Post by Pat Ferrel
to see the change in change, 2 for the change, and 1 for the event
volume.
Post by Pat Ferrel
So your idea is to find anomalies in event frequencies to detect
“hot”
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly
increasing
Post by Pat Ferrel
popularity. Put another way popular, trending, and hot. This is
simple
Post by Ted Dunning
Post by Pat Ferrel
to
Post by Johannes Schulte
Post by Pat Ferrel
do by taking 1, 2, or 3 time buckets and looking at the number of
events,
Post by Johannes Schulte
Post by Pat Ferrel
derivative (difference), and second derivative. Ranking all items
by
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
these
Post by Pat Ferrel
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all
items
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user
history
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and boost by your hot field. I suspect the hot field will tend to
overwhelm
Post by Pat Ferrel
your user history in this case as it would if you used anomalies
so
Post by Ted Dunning
Post by Pat Ferrel
you’d
Post by Johannes Schulte
Post by Pat Ferrel
also have to normalize the hotness to some range closer to the one
created
Post by Pat Ferrel
by the user history matching score. I haven’t found a vey good way
to
Post by Ted Dunning
Post by Pat Ferrel
mix
Post by Johannes Schulte
Post by Pat Ferrel
these in a model so use hot as a method of backfill if you cannot
return
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
enough recommendations or in places where you may want to show
just
Post by Ted Dunning
hot
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items. There are several benefits to this method of using hot to
rank
Post by Ted Dunning
Post by Pat Ferrel
all
Post by Johannes Schulte
Post by Pat Ferrel
items including the fact that you can apply business rules to them
just
Post by Pat Ferrel
Post by Johannes Schulte
as
Post by Pat Ferrel
normal recommendations—so you can ask for hot in “electronics” if
you
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
know
Post by Pat Ferrel
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
Hi "all",
I am wondering what would be the best way to incorporate event
time
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some
years
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
ago
Post by Pat Ferrel
and this involved changing the counts to doubles, but I suspect
there
Post by Ted Dunning
is
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
some smarter way. What I don't get is the relation to a data
structure
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
like
Post by Pat Ferrel
T-Digest when working with a lot of counts / cells for every
combination
Post by Pat Ferrel
Post by Johannes Schulte
of
Post by Pat Ferrel
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to
detect
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
Ted Dunning
2017-11-13 02:32:38 UTC
Permalink
Regarding overfitting, don't forget dithering. That can be the most
important single step you take in building a good recommender.

Dithering can be inversely proportional to amount of exposures so far if
you like to give novel items more exposure.

This doesn't have to be very fancy. I have had very good results by
generating a long list of recommendations, computing a pseudo score based
on rank, adding a bit of noise and resorting. I also scanned down the list
and penalized items that showed insufficient diversity. Then I resorted
again. Typically, the pseudo score was something like exp(-r) where r is
rank.

The noise scale is adjusted to leave a good proportion of originally
recommended items in the first page. It could have easily been scaled by
1/sqrt(exposures) to let the newbies move around more.

The parameters here should be adjusted a bit based on experiments, but a
heuristic first hack works pretty well as a start.
Post by Pat Ferrel
Part of what Ted is talking about can be seen in the carousels on Netflix
or Amazon. Some are not recommendations like “trending” videos, or “new”
videos, or “prime” videos (substitute your own promotions here). Nothing to
do with recommender created items but presented along with
recommender-based carousels. They are based on analytics or business rules
and ideally have some randomness built in. The reason for this is 1) it
works by exposing users to items that they would not see in recommendations
and 2) it provides data to build the recommender model from.
A recommender cannot work in an app that has no non-recommended items
displayed or there will be no un-biased data to create recommendations
from. This would lead to crippling overfitting. Most apps have placements
like the ones mentioned above and also have search and browse. However you
do it, it must be prominent and aways available. The moral of this
paragraph is; don’t try to make everything a recommendation, it will be
self-defeating. In fact make sure not every video watch comes from a
recommendation.
Likewise think of placements (reflecting a particular recommender use) as
experimentation grounds. Try things like finding a recommended category and
then recommending items in that category all based on user behavior. Or try
a placement based on a single thing a user watched like “because you
watched xyz you might like these”. Don’t just show the most popular
categories for the user and recommend items in them. This would be a type
of overfitting too.
I’m sure we have strayed far from your original question but maybe it’s
covered somewhere in here.
I did "second order" recommendations before but more to fight sparsity and
find more significant associations in situations with less traffic, so
recommending categories instead of products. There needs to be some third
order sorting / boosting like you mentioned with "new music", or maybe
popularity or hotness to avoid quasi-random order. For events with limited
lifetime it's probably some mixture of spatial distance and freshness.
We will definetely keep an eye on the generation process of data for new
items. It depends on the domain but in the time of multi channel promotion
of videos, shows and products, it's also helps that there is traffic driven
from external sources.
Thanks for the detailed hints - now it's time to see what comes out of
this.
Johannes
Events have the natural good quality that having a cold start means that
you will naturally favor recent interactions simply because there won't
be
any old interactions to deal with.
Unfortunately, that also means that you will likely be facing serious
cold
start issues all the time. I have used two strategies to deal with cold
starts, both fairly successfully.
*Method 1: Second order recommendation*
For novel items with no history, you typically do have some kind of
information about the content. For an event, you may know the performer,
the organizer, the venue, possibly something about the content of the
event
as well (especially for a tour event). As such, you can build a
recommender
that recommends this secondary information and then do a search with
recommended secondary information to find events. This actually works
pretty well, at least for the domains where I have used (music and
videos).
For instance, in music, you can easily recommend a new album based on the
artist (s) and track list.
The trick here is to determine when and how to blend in normal
recommendations. One way is query blending where you combine the second
order query with a normal recommendation query, but I think that a fair
bit
of experimentation is warranted here.
*Method 2: What's new and what's trending*
It is always important to provide alternative avenues of information
gathering for recommendation. Especially for the user generated video
case,
there was pretty high interest in the "What's new" and "What's hot"
pages.
If you do a decent job of dithering here, you keep reasonably good
content
on the what's new page longer than content that doesn't pull. That
maintains interest in the page. Similarly, you can have a bit of a lower
bar for new content to be classified as hot than established content.
That
way you keep the page fresh (because new stuff appears transiently), but
you also have a fair bit of really good stuff as well. If done well,
these
pages will provide enough interactions with new items so that they don't
start entirely cold. You may need to have genre specific or location
specific versions of these pages to avoid interesting content being
overwhelmed. You might also be able to spot content that has intense
interest from a sub-population as opposed to diffuse interest from a mass
population.
You can also use novelty and trending boosts for content in the normal
recommendation engine. I have avoided this in the past because I felt it
was better to have specialized pages for what's new and hot rather than
because I had data saying it was bad to do. I have put a very weak
recommendation effect on the what's hot pages so that people tend to see
trending material that they like. That doesn't help on what's new pages
for
obvious reasons unless you use a touch of second order recommendation.
On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte <
Well the greece thing was just an example for a thing you don't know
upfront - it could be any of the modeled feature on the cross
recommender
input side (user segment, country, city, previous buys), some
subpopulation
getting active, so the current approach, probably with sampling that
favours newer events, will be the best here. Luckily a sampling strategy
is
a big topic anyway since we're trying to go for the near real time way -
pat, you talked about it some while ago on this list and i still have to
look at the flink talk from trevor grant but I'm really eager to attack
this after years of batch :)
Thanks for your thoughts, I am happy I can rule something out given the
domain (poisson llr). Luckily the domain I'm working on is event
recommendations, so there is a natural deterministic item expiry (as
compared to christmas like stuff).
Again,
thanks!
Post by Ted Dunning
Inline.
Post by Pat Ferrel
If Mahout were to use http://bit.ly/poisson-llr it would tend to
favor
Post by Ted Dunning
Post by Pat Ferrel
new events in calculating the LLR score for later use in the
threshold
Post by Ted Dunning
for
Post by Pat Ferrel
whether a co or cross-occurrence iss incorporated in the model.
I don't think that this would actually help for most recommendation
purposes.
It might help to determine that some item or other has broken out of
historical rates. Thus, we might have "hotness" as a detected feature
that
Post by Ted Dunning
could be used as a boost at recommendation time. We might also have
"not
Post by Ted Dunning
hotness" as a negative boost feature.
Since we have a pretty good handle on the "other" counts, I don't think
that the Poisson test would help much with the cooccurrence stuff
itself.
Post by Ted Dunning
Changing the sampling rule could make a difference to temporality and
would
Post by Ted Dunning
be more like what Johannes is asking about.
Post by Pat Ferrel
But it doesn’t relate to popularity as I think Ted is saying.
Are you looking for 1) personal recommendations biased by hotness in
Greece or 2) things hot in Greece?
1) create a secondary indicator for “watched in some locale” the
local-id
Post by Ted Dunning
Post by Pat Ferrel
uses a country-code+postal-code maybe but not lat-lon. Something that
includes a good number of people/events. The the query would be
user-id,
Post by Ted Dunning
Post by Pat Ferrel
and user-locale. This would yield personal recs preferred in the
user’s
Post by Ted Dunning
Post by Pat Ferrel
locale. Athens-west-side in this case.
And this works in the current regime. Simply add location tags to the
user
Post by Ted Dunning
histories and do cooccurrence against content. Locations will pop out
as
Post by Ted Dunning
indicators for some content and not for others. Then when somebody
appears
Post by Ted Dunning
in some location, their tags will retrieve localized content.
For localization based on strict geography, say for restaurant search,
we
Post by Ted Dunning
can just add business rules based on geo-search. A very large bank
customer
Post by Ted Dunning
of ours does that, for instance.
Post by Pat Ferrel
2) split the data into locales and do the hot calc I mention. The
query
Post by Ted Dunning
Post by Pat Ferrel
would have no user-id since it is not personalized but would yield
“hot
Post by Ted Dunning
in
Post by Pat Ferrel
Greece”
I think that this is a good approach.
Post by Pat Ferrel
Ted’s “Christmas video” tag is what I was calling a business rule and
can
Post by Ted Dunning
Post by Pat Ferrel
be added to either of the above techniques.
But the (not) hotness feature might help with automated this.
Post by Pat Ferrel
So ... there are a few different threads here.
1) LLR but with time. Quite possible, but not really what Johannes is
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.
2) time varying recommendation. As Johannes notes, this can make use
of
Post by Ted Dunning
Post by Pat Ferrel
windowed counts. The problem is that rarely accessed items should
probably
Post by Pat Ferrel
have longer windows so that we use longer term trends when we have
less
Post by Ted Dunning
Post by Pat Ferrel
data.
The good news here is that this some part of this is nearly already
in
Post by Ted Dunning
the
Post by Pat Ferrel
code. The trick is that the down-sampling used in the system can be
adapted
Post by Pat Ferrel
to favor recent events over older ones. That means that if the
meaning
of
Post by Ted Dunning
Post by Pat Ferrel
something changes over time, the system will catch on. Likewise, if
something appears out of nowhere, it will quickly train up. This
handles
Post by Ted Dunning
Post by Pat Ferrel
the popular in Greece right now problem.
But this isn't the whole story of changing recommendations. Another
problem
Post by Pat Ferrel
that we commonly face is what I call the christmas music issue. The
idea
Post by Ted Dunning
is
Post by Pat Ferrel
that there are lots of recommendations for music that are highly
seasonal.
Post by Pat Ferrel
Thus, Bing Crosby fans want to hear White Christmas
http://youtu.be/P8Ozdqzjigg until the day after christmas
at which point this becomes a really bad recommendation. To some
degree,
Post by Ted Dunning
Post by Pat Ferrel
this can be partially dealt with by using temporal tags as
indicators,
Post by Ted Dunning
but
Post by Pat Ferrel
that doesn't really allow a recommendation to be completely shut
down.
Post by Ted Dunning
Post by Pat Ferrel
The only way that I have seen to deal with this in the past is with a
manually designed kill switch. As much as possible, we would tag the
obviously seasonal content and then add a filter to kill or downgrade
that
Post by Pat Ferrel
content the moment it went out of fashion.
On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
Post by Johannes Schulte
Pat, thanks for your help. especially the insights on how you
handle
Post by Ted Dunning
the
Post by Pat Ferrel
Post by Johannes Schulte
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you
say,
Post by Pat Ferrel
Post by Johannes Schulte
it's always hard to find the right boosts without setting up some
ltr
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
system. If there would be a way to use the hotness when calculating
the
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
indicators for subpopulations it would be great., especially for a
cross
Post by Pat Ferrel
Post by Johannes Schulte
recommender.
e.g. people in greece _now_ are viewing this show/product whatever
And here the popularity of the recommended item in this
subpopulation
Post by Ted Dunning
Post by Pat Ferrel
could
Post by Johannes Schulte
be overrseen when just looking at the overall derivatives of
activity.
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..
and derive multiple indicators per item to be indexed.
But this all relies on discretizing time into buckets and not
looking
Post by Ted Dunning
at
Post by Pat Ferrel
Post by Johannes Schulte
the distribution of time between events like in presentation above
-
Post by Ted Dunning
Post by Pat Ferrel
maybe
Post by Johannes Schulte
there is something way smarter
Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily
cycles
Post by Johannes Schulte
Post by Pat Ferrel
like 3 day, week, or month buckets for “hot”. This is to remove
cyclical
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
affects from the frequencies as much as possible since you need 3
buckets
Post by Johannes Schulte
Post by Pat Ferrel
to see the change in change, 2 for the change, and 1 for the event
volume.
Post by Pat Ferrel
So your idea is to find anomalies in event frequencies to detect
“hot”
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly
increasing
Post by Pat Ferrel
popularity. Put another way popular, trending, and hot. This is
simple
Post by Ted Dunning
Post by Pat Ferrel
to
Post by Johannes Schulte
Post by Pat Ferrel
do by taking 1, 2, or 3 time buckets and looking at the number of
events,
Post by Johannes Schulte
Post by Pat Ferrel
derivative (difference), and second derivative. Ranking all items
by
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
these
Post by Pat Ferrel
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all
items
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user
history
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and boost by your hot field. I suspect the hot field will tend to
overwhelm
Post by Pat Ferrel
your user history in this case as it would if you used anomalies
so
Post by Ted Dunning
Post by Pat Ferrel
you’d
Post by Johannes Schulte
Post by Pat Ferrel
also have to normalize the hotness to some range closer to the one
created
Post by Pat Ferrel
by the user history matching score. I haven’t found a vey good way
to
Post by Ted Dunning
Post by Pat Ferrel
mix
Post by Johannes Schulte
Post by Pat Ferrel
these in a model so use hot as a method of backfill if you cannot
return
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
enough recommendations or in places where you may want to show
just
Post by Ted Dunning
hot
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items. There are several benefits to this method of using hot to
rank
Post by Ted Dunning
Post by Pat Ferrel
all
Post by Johannes Schulte
Post by Pat Ferrel
items including the fact that you can apply business rules to them
just
Post by Pat Ferrel
Post by Johannes Schulte
as
Post by Pat Ferrel
normal recommendations—so you can ask for hot in “electronics” if
you
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
know
Post by Pat Ferrel
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
Hi "all",
I am wondering what would be the best way to incorporate event
time
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some
years
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
ago
Post by Pat Ferrel
and this involved changing the counts to doubles, but I suspect
there
Post by Ted Dunning
is
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
some smarter way. What I don't get is the relation to a data
structure
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
like
Post by Pat Ferrel
T-Digest when working with a lot of counts / cells for every
combination
Post by Pat Ferrel
Post by Johannes Schulte
of
Post by Pat Ferrel
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to
detect
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
Johannes Schulte
2017-11-14 20:02:11 UTC
Permalink
✓
Post by Ted Dunning
Regarding overfitting, don't forget dithering. That can be the most
important single step you take in building a good recommender.
Dithering can be inversely proportional to amount of exposures so far if
you like to give novel items more exposure.
This doesn't have to be very fancy. I have had very good results by
generating a long list of recommendations, computing a pseudo score based
on rank, adding a bit of noise and resorting. I also scanned down the list
and penalized items that showed insufficient diversity. Then I resorted
again. Typically, the pseudo score was something like exp(-r) where r is
rank.
The noise scale is adjusted to leave a good proportion of originally
recommended items in the first page. It could have easily been scaled by
1/sqrt(exposures) to let the newbies move around more.
The parameters here should be adjusted a bit based on experiments, but a
heuristic first hack works pretty well as a start.
Post by Pat Ferrel
Part of what Ted is talking about can be seen in the carousels on Netflix
or Amazon. Some are not recommendations like “trending” videos, or “new”
videos, or “prime” videos (substitute your own promotions here). Nothing
to
Post by Pat Ferrel
do with recommender created items but presented along with
recommender-based carousels. They are based on analytics or business
rules
Post by Pat Ferrel
and ideally have some randomness built in. The reason for this is 1) it
works by exposing users to items that they would not see in
recommendations
Post by Pat Ferrel
and 2) it provides data to build the recommender model from.
A recommender cannot work in an app that has no non-recommended items
displayed or there will be no un-biased data to create recommendations
from. This would lead to crippling overfitting. Most apps have placements
like the ones mentioned above and also have search and browse. However
you
Post by Pat Ferrel
do it, it must be prominent and aways available. The moral of this
paragraph is; don’t try to make everything a recommendation, it will be
self-defeating. In fact make sure not every video watch comes from a
recommendation.
Likewise think of placements (reflecting a particular recommender use) as
experimentation grounds. Try things like finding a recommended category
and
Post by Pat Ferrel
then recommending items in that category all based on user behavior. Or
try
Post by Pat Ferrel
a placement based on a single thing a user watched like “because you
watched xyz you might like these”. Don’t just show the most popular
categories for the user and recommend items in them. This would be a type
of overfitting too.
I’m sure we have strayed far from your original question but maybe it’s
covered somewhere in here.
On Nov 12, 2017, at 12:11 PM, Johannes Schulte <
I did "second order" recommendations before but more to fight sparsity
and
Post by Pat Ferrel
find more significant associations in situations with less traffic, so
recommending categories instead of products. There needs to be some third
order sorting / boosting like you mentioned with "new music", or maybe
popularity or hotness to avoid quasi-random order. For events with
limited
Post by Pat Ferrel
lifetime it's probably some mixture of spatial distance and freshness.
We will definetely keep an eye on the generation process of data for new
items. It depends on the domain but in the time of multi channel
promotion
Post by Pat Ferrel
of videos, shows and products, it's also helps that there is traffic
driven
Post by Pat Ferrel
from external sources.
Thanks for the detailed hints - now it's time to see what comes out of
this.
Johannes
Events have the natural good quality that having a cold start means
that
Post by Pat Ferrel
you will naturally favor recent interactions simply because there won't
be
any old interactions to deal with.
Unfortunately, that also means that you will likely be facing serious
cold
start issues all the time. I have used two strategies to deal with cold
starts, both fairly successfully.
*Method 1: Second order recommendation*
For novel items with no history, you typically do have some kind of
information about the content. For an event, you may know the
performer,
Post by Pat Ferrel
the organizer, the venue, possibly something about the content of the
event
as well (especially for a tour event). As such, you can build a
recommender
that recommends this secondary information and then do a search with
recommended secondary information to find events. This actually works
pretty well, at least for the domains where I have used (music and
videos).
For instance, in music, you can easily recommend a new album based on
the
Post by Pat Ferrel
artist (s) and track list.
The trick here is to determine when and how to blend in normal
recommendations. One way is query blending where you combine the second
order query with a normal recommendation query, but I think that a fair
bit
of experimentation is warranted here.
*Method 2: What's new and what's trending*
It is always important to provide alternative avenues of information
gathering for recommendation. Especially for the user generated video
case,
there was pretty high interest in the "What's new" and "What's hot"
pages.
If you do a decent job of dithering here, you keep reasonably good
content
on the what's new page longer than content that doesn't pull. That
maintains interest in the page. Similarly, you can have a bit of a
lower
Post by Pat Ferrel
bar for new content to be classified as hot than established content.
That
way you keep the page fresh (because new stuff appears transiently),
but
Post by Pat Ferrel
you also have a fair bit of really good stuff as well. If done well,
these
pages will provide enough interactions with new items so that they
don't
Post by Pat Ferrel
start entirely cold. You may need to have genre specific or location
specific versions of these pages to avoid interesting content being
overwhelmed. You might also be able to spot content that has intense
interest from a sub-population as opposed to diffuse interest from a
mass
Post by Pat Ferrel
population.
You can also use novelty and trending boosts for content in the normal
recommendation engine. I have avoided this in the past because I felt
it
Post by Pat Ferrel
was better to have specialized pages for what's new and hot rather than
because I had data saying it was bad to do. I have put a very weak
recommendation effect on the what's hot pages so that people tend to
see
Post by Pat Ferrel
trending material that they like. That doesn't help on what's new pages
for
obvious reasons unless you use a touch of second order recommendation.
On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte <
Well the greece thing was just an example for a thing you don't know
upfront - it could be any of the modeled feature on the cross
recommender
input side (user segment, country, city, previous buys), some
subpopulation
getting active, so the current approach, probably with sampling that
favours newer events, will be the best here. Luckily a sampling
strategy
Post by Pat Ferrel
is
a big topic anyway since we're trying to go for the near real time
way -
Post by Pat Ferrel
pat, you talked about it some while ago on this list and i still have
to
Post by Pat Ferrel
look at the flink talk from trevor grant but I'm really eager to
attack
Post by Pat Ferrel
this after years of batch :)
Thanks for your thoughts, I am happy I can rule something out given
the
Post by Pat Ferrel
domain (poisson llr). Luckily the domain I'm working on is event
recommendations, so there is a natural deterministic item expiry (as
compared to christmas like stuff).
Again,
thanks!
Post by Ted Dunning
Inline.
Post by Pat Ferrel
If Mahout were to use http://bit.ly/poisson-llr it would tend to
favor
Post by Ted Dunning
Post by Pat Ferrel
new events in calculating the LLR score for later use in the
threshold
Post by Ted Dunning
for
Post by Pat Ferrel
whether a co or cross-occurrence iss incorporated in the model.
I don't think that this would actually help for most recommendation
purposes.
It might help to determine that some item or other has broken out of
historical rates. Thus, we might have "hotness" as a detected feature
that
Post by Ted Dunning
could be used as a boost at recommendation time. We might also have
"not
Post by Ted Dunning
hotness" as a negative boost feature.
Since we have a pretty good handle on the "other" counts, I don't
think
Post by Pat Ferrel
Post by Ted Dunning
that the Poisson test would help much with the cooccurrence stuff
itself.
Post by Ted Dunning
Changing the sampling rule could make a difference to temporality and
would
Post by Ted Dunning
be more like what Johannes is asking about.
Post by Pat Ferrel
But it doesn’t relate to popularity as I think Ted is saying.
Are you looking for 1) personal recommendations biased by hotness in
Greece or 2) things hot in Greece?
1) create a secondary indicator for “watched in some locale” the
local-id
Post by Ted Dunning
Post by Pat Ferrel
uses a country-code+postal-code maybe but not lat-lon. Something
that
Post by Pat Ferrel
Post by Ted Dunning
Post by Pat Ferrel
includes a good number of people/events. The the query would be
user-id,
Post by Ted Dunning
Post by Pat Ferrel
and user-locale. This would yield personal recs preferred in the
user’s
Post by Ted Dunning
Post by Pat Ferrel
locale. Athens-west-side in this case.
And this works in the current regime. Simply add location tags to the
user
Post by Ted Dunning
histories and do cooccurrence against content. Locations will pop out
as
Post by Ted Dunning
indicators for some content and not for others. Then when somebody
appears
Post by Ted Dunning
in some location, their tags will retrieve localized content.
For localization based on strict geography, say for restaurant
search,
Post by Pat Ferrel
we
Post by Ted Dunning
can just add business rules based on geo-search. A very large bank
customer
Post by Ted Dunning
of ours does that, for instance.
Post by Pat Ferrel
2) split the data into locales and do the hot calc I mention. The
query
Post by Ted Dunning
Post by Pat Ferrel
would have no user-id since it is not personalized but would yield
“hot
Post by Ted Dunning
in
Post by Pat Ferrel
Greece”
I think that this is a good approach.
Post by Pat Ferrel
Ted’s “Christmas video” tag is what I was calling a business rule
and
Post by Pat Ferrel
can
Post by Ted Dunning
Post by Pat Ferrel
be added to either of the above techniques.
But the (not) hotness feature might help with automated this.
Post by Pat Ferrel
So ... there are a few different threads here.
1) LLR but with time. Quite possible, but not really what Johannes
is
Post by Pat Ferrel
Post by Ted Dunning
Post by Pat Ferrel
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.
2) time varying recommendation. As Johannes notes, this can make use
of
Post by Ted Dunning
Post by Pat Ferrel
windowed counts. The problem is that rarely accessed items should
probably
Post by Pat Ferrel
have longer windows so that we use longer term trends when we have
less
Post by Ted Dunning
Post by Pat Ferrel
data.
The good news here is that this some part of this is nearly already
in
Post by Ted Dunning
the
Post by Pat Ferrel
code. The trick is that the down-sampling used in the system can be
adapted
Post by Pat Ferrel
to favor recent events over older ones. That means that if the
meaning
of
Post by Ted Dunning
Post by Pat Ferrel
something changes over time, the system will catch on. Likewise, if
something appears out of nowhere, it will quickly train up. This
handles
Post by Ted Dunning
Post by Pat Ferrel
the popular in Greece right now problem.
But this isn't the whole story of changing recommendations. Another
problem
Post by Pat Ferrel
that we commonly face is what I call the christmas music issue. The
idea
Post by Ted Dunning
is
Post by Pat Ferrel
that there are lots of recommendations for music that are highly
seasonal.
Post by Pat Ferrel
Thus, Bing Crosby fans want to hear White Christmas
http://youtu.be/P8Ozdqzjigg until the day after christmas
at which point this becomes a really bad recommendation. To some
degree,
Post by Ted Dunning
Post by Pat Ferrel
this can be partially dealt with by using temporal tags as
indicators,
Post by Ted Dunning
but
Post by Pat Ferrel
that doesn't really allow a recommendation to be completely shut
down.
Post by Ted Dunning
Post by Pat Ferrel
The only way that I have seen to deal with this in the past is with
a
Post by Pat Ferrel
Post by Ted Dunning
Post by Pat Ferrel
manually designed kill switch. As much as possible, we would tag the
obviously seasonal content and then add a filter to kill or
downgrade
Post by Pat Ferrel
Post by Ted Dunning
that
Post by Pat Ferrel
content the moment it went out of fashion.
On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
Post by Johannes Schulte
Pat, thanks for your help. especially the insights on how you
handle
Post by Ted Dunning
the
Post by Pat Ferrel
Post by Johannes Schulte
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you
say,
Post by Pat Ferrel
Post by Johannes Schulte
it's always hard to find the right boosts without setting up some
ltr
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
system. If there would be a way to use the hotness when calculating
the
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
indicators for subpopulations it would be great., especially for a
cross
Post by Pat Ferrel
Post by Johannes Schulte
recommender.
e.g. people in greece _now_ are viewing this show/product whatever
And here the popularity of the recommended item in this
subpopulation
Post by Ted Dunning
Post by Pat Ferrel
could
Post by Johannes Schulte
be overrseen when just looking at the overall derivatives of
activity.
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..
and derive multiple indicators per item to be indexed.
But this all relies on discretizing time into buckets and not
looking
Post by Ted Dunning
at
Post by Pat Ferrel
Post by Johannes Schulte
the distribution of time between events like in presentation above
-
Post by Ted Dunning
Post by Pat Ferrel
maybe
Post by Johannes Schulte
there is something way smarter
Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily
cycles
Post by Johannes Schulte
Post by Pat Ferrel
like 3 day, week, or month buckets for “hot”. This is to remove
cyclical
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
affects from the frequencies as much as possible since you need 3
buckets
Post by Johannes Schulte
Post by Pat Ferrel
to see the change in change, 2 for the change, and 1 for the event
volume.
Post by Pat Ferrel
So your idea is to find anomalies in event frequencies to detect
“hot”
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly
increasing
Post by Pat Ferrel
popularity. Put another way popular, trending, and hot. This is
simple
Post by Ted Dunning
Post by Pat Ferrel
to
Post by Johannes Schulte
Post by Pat Ferrel
do by taking 1, 2, or 3 time buckets and looking at the number of
events,
Post by Johannes Schulte
Post by Pat Ferrel
derivative (difference), and second derivative. Ranking all items
by
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
these
Post by Pat Ferrel
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all
items
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user
history
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and boost by your hot field. I suspect the hot field will tend to
overwhelm
Post by Pat Ferrel
your user history in this case as it would if you used anomalies
so
Post by Ted Dunning
Post by Pat Ferrel
you’d
Post by Johannes Schulte
Post by Pat Ferrel
also have to normalize the hotness to some range closer to the one
created
Post by Pat Ferrel
by the user history matching score. I haven’t found a vey good way
to
Post by Ted Dunning
Post by Pat Ferrel
mix
Post by Johannes Schulte
Post by Pat Ferrel
these in a model so use hot as a method of backfill if you cannot
return
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
enough recommendations or in places where you may want to show
just
Post by Ted Dunning
hot
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items. There are several benefits to this method of using hot to
rank
Post by Ted Dunning
Post by Pat Ferrel
all
Post by Johannes Schulte
Post by Pat Ferrel
items including the fact that you can apply business rules to them
just
Post by Pat Ferrel
Post by Johannes Schulte
as
Post by Pat Ferrel
normal recommendations—so you can ask for hot in “electronics” if
you
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
know
Post by Pat Ferrel
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
Hi "all",
I am wondering what would be the best way to incorporate event
time
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some
years
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
ago
Post by Pat Ferrel
and this involved changing the counts to doubles, but I suspect
there
Post by Ted Dunning
is
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
some smarter way. What I don't get is the relation to a data
structure
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
like
Post by Pat Ferrel
T-Digest when working with a lot of counts / cells for every
combination
Post by Pat Ferrel
Post by Johannes Schulte
of
Post by Pat Ferrel
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to
detect
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
Eric Link
2017-11-19 20:08:28 UTC
Permalink
unsubscribe
Post by Pat Ferrel
Part of what Ted is talking about can be seen in the carousels on Netflix
or Amazon. Some are not recommendations like “trending” videos, or “new”
videos, or “prime” videos (substitute your own promotions here). Nothing to
do with recommender created items but presented along with
recommender-based carousels. They are based on analytics or business rules
and ideally have some randomness built in. The reason for this is 1) it
works by exposing users to items that they would not see in recommendations
and 2) it provides data to build the recommender model from.
A recommender cannot work in an app that has no non-recommended items
displayed or there will be no un-biased data to create recommendations
from. This would lead to crippling overfitting. Most apps have placements
like the ones mentioned above and also have search and browse. However you
do it, it must be prominent and aways available. The moral of this
paragraph is; don’t try to make everything a recommendation, it will be
self-defeating. In fact make sure not every video watch comes from a
recommendation.
Likewise think of placements (reflecting a particular recommender use) as
experimentation grounds. Try things like finding a recommended category and
then recommending items in that category all based on user behavior. Or try
a placement based on a single thing a user watched like “because you
watched xyz you might like these”. Don’t just show the most popular
categories for the user and recommend items in them. This would be a type
of overfitting too.
I’m sure we have strayed far from your original question but maybe it’s
covered somewhere in here.
I did "second order" recommendations before but more to fight sparsity and
find more significant associations in situations with less traffic, so
recommending categories instead of products. There needs to be some third
order sorting / boosting like you mentioned with "new music", or maybe
popularity or hotness to avoid quasi-random order. For events with limited
lifetime it's probably some mixture of spatial distance and freshness.
We will definetely keep an eye on the generation process of data for new
items. It depends on the domain but in the time of multi channel promotion
of videos, shows and products, it's also helps that there is traffic driven
from external sources.
Thanks for the detailed hints - now it's time to see what comes out of
this.
Johannes
Events have the natural good quality that having a cold start means that
you will naturally favor recent interactions simply because there won't
be
any old interactions to deal with.
Unfortunately, that also means that you will likely be facing serious
cold
start issues all the time. I have used two strategies to deal with cold
starts, both fairly successfully.
*Method 1: Second order recommendation*
For novel items with no history, you typically do have some kind of
information about the content. For an event, you may know the performer,
the organizer, the venue, possibly something about the content of the
event
as well (especially for a tour event). As such, you can build a
recommender
that recommends this secondary information and then do a search with
recommended secondary information to find events. This actually works
pretty well, at least for the domains where I have used (music and
videos).
For instance, in music, you can easily recommend a new album based on the
artist (s) and track list.
The trick here is to determine when and how to blend in normal
recommendations. One way is query blending where you combine the second
order query with a normal recommendation query, but I think that a fair
bit
of experimentation is warranted here.
*Method 2: What's new and what's trending*
It is always important to provide alternative avenues of information
gathering for recommendation. Especially for the user generated video
case,
there was pretty high interest in the "What's new" and "What's hot"
pages.
If you do a decent job of dithering here, you keep reasonably good
content
on the what's new page longer than content that doesn't pull. That
maintains interest in the page. Similarly, you can have a bit of a lower
bar for new content to be classified as hot than established content.
That
way you keep the page fresh (because new stuff appears transiently), but
you also have a fair bit of really good stuff as well. If done well,
these
pages will provide enough interactions with new items so that they don't
start entirely cold. You may need to have genre specific or location
specific versions of these pages to avoid interesting content being
overwhelmed. You might also be able to spot content that has intense
interest from a sub-population as opposed to diffuse interest from a mass
population.
You can also use novelty and trending boosts for content in the normal
recommendation engine. I have avoided this in the past because I felt it
was better to have specialized pages for what's new and hot rather than
because I had data saying it was bad to do. I have put a very weak
recommendation effect on the what's hot pages so that people tend to see
trending material that they like. That doesn't help on what's new pages
for
obvious reasons unless you use a touch of second order recommendation.
On Sat, Nov 11, 2017 at 11:00 PM, Johannes Schulte <
Well the greece thing was just an example for a thing you don't know
upfront - it could be any of the modeled feature on the cross
recommender
input side (user segment, country, city, previous buys), some
subpopulation
getting active, so the current approach, probably with sampling that
favours newer events, will be the best here. Luckily a sampling strategy
is
a big topic anyway since we're trying to go for the near real time way -
pat, you talked about it some while ago on this list and i still have to
look at the flink talk from trevor grant but I'm really eager to attack
this after years of batch :)
Thanks for your thoughts, I am happy I can rule something out given the
domain (poisson llr). Luckily the domain I'm working on is event
recommendations, so there is a natural deterministic item expiry (as
compared to christmas like stuff).
Again,
thanks!
Post by Ted Dunning
Inline.
Post by Pat Ferrel
If Mahout were to use http://bit.ly/poisson-llr it would tend to
favor
Post by Ted Dunning
Post by Pat Ferrel
new events in calculating the LLR score for later use in the
threshold
Post by Ted Dunning
for
Post by Pat Ferrel
whether a co or cross-occurrence iss incorporated in the model.
I don't think that this would actually help for most recommendation
purposes.
It might help to determine that some item or other has broken out of
historical rates. Thus, we might have "hotness" as a detected feature
that
Post by Ted Dunning
could be used as a boost at recommendation time. We might also have
"not
Post by Ted Dunning
hotness" as a negative boost feature.
Since we have a pretty good handle on the "other" counts, I don't think
that the Poisson test would help much with the cooccurrence stuff
itself.
Post by Ted Dunning
Changing the sampling rule could make a difference to temporality and
would
Post by Ted Dunning
be more like what Johannes is asking about.
Post by Pat Ferrel
But it doesn’t relate to popularity as I think Ted is saying.
Are you looking for 1) personal recommendations biased by hotness in
Greece or 2) things hot in Greece?
1) create a secondary indicator for “watched in some locale” the
local-id
Post by Ted Dunning
Post by Pat Ferrel
uses a country-code+postal-code maybe but not lat-lon. Something that
includes a good number of people/events. The the query would be
user-id,
Post by Ted Dunning
Post by Pat Ferrel
and user-locale. This would yield personal recs preferred in the
user’s
Post by Ted Dunning
Post by Pat Ferrel
locale. Athens-west-side in this case.
And this works in the current regime. Simply add location tags to the
user
Post by Ted Dunning
histories and do cooccurrence against content. Locations will pop out
as
Post by Ted Dunning
indicators for some content and not for others. Then when somebody
appears
Post by Ted Dunning
in some location, their tags will retrieve localized content.
For localization based on strict geography, say for restaurant search,
we
Post by Ted Dunning
can just add business rules based on geo-search. A very large bank
customer
Post by Ted Dunning
of ours does that, for instance.
Post by Pat Ferrel
2) split the data into locales and do the hot calc I mention. The
query
Post by Ted Dunning
Post by Pat Ferrel
would have no user-id since it is not personalized but would yield
“hot
Post by Ted Dunning
in
Post by Pat Ferrel
Greece”
I think that this is a good approach.
Post by Pat Ferrel
Ted’s “Christmas video” tag is what I was calling a business rule and
can
Post by Ted Dunning
Post by Pat Ferrel
be added to either of the above techniques.
But the (not) hotness feature might help with automated this.
Post by Pat Ferrel
So ... there are a few different threads here.
1) LLR but with time. Quite possible, but not really what Johannes is
talking about, I think. See http://bit.ly/poisson-llr for a quick
discussion.
2) time varying recommendation. As Johannes notes, this can make use
of
Post by Ted Dunning
Post by Pat Ferrel
windowed counts. The problem is that rarely accessed items should
probably
Post by Pat Ferrel
have longer windows so that we use longer term trends when we have
less
Post by Ted Dunning
Post by Pat Ferrel
data.
The good news here is that this some part of this is nearly already
in
Post by Ted Dunning
the
Post by Pat Ferrel
code. The trick is that the down-sampling used in the system can be
adapted
Post by Pat Ferrel
to favor recent events over older ones. That means that if the
meaning
of
Post by Ted Dunning
Post by Pat Ferrel
something changes over time, the system will catch on. Likewise, if
something appears out of nowhere, it will quickly train up. This
handles
Post by Ted Dunning
Post by Pat Ferrel
the popular in Greece right now problem.
But this isn't the whole story of changing recommendations. Another
problem
Post by Pat Ferrel
that we commonly face is what I call the christmas music issue. The
idea
Post by Ted Dunning
is
Post by Pat Ferrel
that there are lots of recommendations for music that are highly
seasonal.
Post by Pat Ferrel
Thus, Bing Crosby fans want to hear White Christmas
http://youtu.be/P8Ozdqzjigg until the day after christmas
at which point this becomes a really bad recommendation. To some
degree,
Post by Ted Dunning
Post by Pat Ferrel
this can be partially dealt with by using temporal tags as
indicators,
Post by Ted Dunning
but
Post by Pat Ferrel
that doesn't really allow a recommendation to be completely shut
down.
Post by Ted Dunning
Post by Pat Ferrel
The only way that I have seen to deal with this in the past is with a
manually designed kill switch. As much as possible, we would tag the
obviously seasonal content and then add a filter to kill or downgrade
that
Post by Pat Ferrel
content the moment it went out of fashion.
On Sat, Nov 11, 2017 at 9:43 AM, Johannes Schulte <
Post by Johannes Schulte
Pat, thanks for your help. especially the insights on how you
handle
Post by Ted Dunning
the
Post by Pat Ferrel
Post by Johannes Schulte
system in production and the tips for multiple acyclic buckets.
Doing the combination signalls when querying sounds okay but as you
say,
Post by Pat Ferrel
Post by Johannes Schulte
it's always hard to find the right boosts without setting up some
ltr
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
system. If there would be a way to use the hotness when calculating
the
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
indicators for subpopulations it would be great., especially for a
cross
Post by Pat Ferrel
Post by Johannes Schulte
recommender.
e.g. people in greece _now_ are viewing this show/product whatever
And here the popularity of the recommended item in this
subpopulation
Post by Ted Dunning
Post by Pat Ferrel
could
Post by Johannes Schulte
be overrseen when just looking at the overall derivatives of
activity.
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Maybe one could do multiple G-Tests using sliding windows
* itemA&itemB vs population (classic)
* itemA&itemB(t) vs itemA&itemB(t-1)
..
and derive multiple indicators per item to be indexed.
But this all relies on discretizing time into buckets and not
looking
Post by Ted Dunning
at
Post by Pat Ferrel
Post by Johannes Schulte
the distribution of time between events like in presentation above
-
Post by Ted Dunning
Post by Pat Ferrel
maybe
Post by Johannes Schulte
there is something way smarter
Johannes
Post by Pat Ferrel
BTW you should take time buckets that are relatively free of daily
cycles
Post by Johannes Schulte
Post by Pat Ferrel
like 3 day, week, or month buckets for “hot”. This is to remove
cyclical
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
affects from the frequencies as much as possible since you need 3
buckets
Post by Johannes Schulte
Post by Pat Ferrel
to see the change in change, 2 for the change, and 1 for the event
volume.
Post by Pat Ferrel
So your idea is to find anomalies in event frequencies to detect
“hot”
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items?
Interesting, maybe Ted will chime in.
What I do is take the frequency, first, and second, derivatives as
measures of popularity, increasing popularity, and increasingly
increasing
Post by Pat Ferrel
popularity. Put another way popular, trending, and hot. This is
simple
Post by Ted Dunning
Post by Pat Ferrel
to
Post by Johannes Schulte
Post by Pat Ferrel
do by taking 1, 2, or 3 time buckets and looking at the number of
events,
Post by Johannes Schulte
Post by Pat Ferrel
derivative (difference), and second derivative. Ranking all items
by
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
these
Post by Pat Ferrel
value gives various measures of popularity or its increase.
If your use is in a recommender you can add a ranking field to all
items
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and query for “hot” by using the ranking you calculated.
If you want to bias recommendations by hotness, query with user
history
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
and boost by your hot field. I suspect the hot field will tend to
overwhelm
Post by Pat Ferrel
your user history in this case as it would if you used anomalies
so
Post by Ted Dunning
Post by Pat Ferrel
you’d
Post by Johannes Schulte
Post by Pat Ferrel
also have to normalize the hotness to some range closer to the one
created
Post by Pat Ferrel
by the user history matching score. I haven’t found a vey good way
to
Post by Ted Dunning
Post by Pat Ferrel
mix
Post by Johannes Schulte
Post by Pat Ferrel
these in a model so use hot as a method of backfill if you cannot
return
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
enough recommendations or in places where you may want to show
just
Post by Ted Dunning
hot
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
items. There are several benefits to this method of using hot to
rank
Post by Ted Dunning
Post by Pat Ferrel
all
Post by Johannes Schulte
Post by Pat Ferrel
items including the fact that you can apply business rules to them
just
Post by Pat Ferrel
Post by Johannes Schulte
as
Post by Pat Ferrel
normal recommendations—so you can ask for hot in “electronics” if
you
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
know
Post by Pat Ferrel
categories, or hot "in-stock" items, or ...
Still anomaly detection does sound like an interesting approach.
On Nov 10, 2017, at 3:13 PM, Johannes Schulte <
Hi "all",
I am wondering what would be the best way to incorporate event
time
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
information into the calculation of the G-Test.
There is a claim here
https://de.slideshare.net/tdunning/finding-changes-in-real-data
saying "Time aware variant of G-Test is possible"
I remember i experimented with exponentially decayed counts some
years
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
ago
Post by Pat Ferrel
and this involved changing the counts to doubles, but I suspect
there
Post by Ted Dunning
is
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
some smarter way. What I don't get is the relation to a data
structure
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
like
Post by Pat Ferrel
T-Digest when working with a lot of counts / cells for every
combination
Post by Pat Ferrel
Post by Johannes Schulte
of
Post by Pat Ferrel
items. Keeping a t-digest for every combination seems unfeasible.
How would one incorporate event time into recommendations to
detect
Post by Ted Dunning
Post by Pat Ferrel
Post by Johannes Schulte
Post by Pat Ferrel
"hotness" of certain relations? Glad if someone has an idea...
Cheers,
Johannes
--
Eric Link
214.641.5465
Continue reading on narkive:
Loading...