April 27, 2024

Data Science in Marketing



Published May 15, 2023, 1:20 a.m. by Arrik Motley


What is marketing science?

Marketing science is the study of how businesses can use data to better understand and serve their customers. It combines aspects of marketing, statistics, and computer science to help companies make more informed decisions about their marketing campaigns and strategies.

Why is marketing science important?

With the advent of big data, marketing science has become increasingly important in recent years. Companies are now able to collect vast amounts of data about their customers, including everything from their demographics to their online behavior. This data can be used to create more targeted and effective marketing campaigns.

What are some of the challenges of marketing science?

One of the biggest challenges of marketing science is dealing with the sheer volume of data that companies now have access to. Another challenge is that data can be difficult to interpret, and it can be hard to know which insights are most important.

What are some of the benefits of marketing science?

Marketing science can help companies save money and increase sales by making more informed decisions about their marketing campaigns. It can also help companies better understand their customers and create more personalized experiences.

What are some of the leading marketing science programs?

The Columbia Business School offers a marketing science program that teaches students how to use data to make better marketing decisions. The program is designed for both marketing professionals and those with a background in statistics or computer science.

You may also like to read about:



- Hi, good morning, everyone.

My name is Olivier Toubia.

I'm a faculty member in the marketing division

at the Business School.

I also serve now as the Chair of the Division.

It's my pleasure today to welcome everyone

and to have a chat with my co-author,

now colleague, friend,

Shawndra Hill, who is a Principal Scientist at Facebook.

She just recently joined Facebook,

and she also recently became a part-time senior lecturer

at Columbia Business School.

And so today we're going to talk about data science

in marketing, and Shawndra is a wonderful person

to tell us about this.

She has a very unique and interesting background.

She actually was an electrical engineer by training

and then she received her PhD from NYU

in information systems.

And then she spent a few years at Walton

on the faculty there.

And after that, she moved

to Microsoft Research here in New York city.

She spent about five years there.

She was a principal researcher

in the computational social science group.

And then just this, in the last few months,

she moved again to Facebook

where she's a principal scientist.

So welcome Shawndra,

thank you for taking the time to speak with everyone

this morning.

So today you're going to tell us about, you know,

your research, your work and the tech industry

in general, and data science and marketing.

But just before I have a quick question for you.

So you were in the computational social science group

at Microsoft Research.

That's a term that is somewhat novel,

so maybe some of the people here

have never heard these terms before.

So would you mind defining for us

what computational social science means?

- Sure. So I'm actually in the computational social science

part of the org at Facebook too.

So I've been in computational social science

for a few years now,

and as you mentioned,

it's a new and burgeoning field,

but at the core, we're doing research

and really asking social science questions,

but using computation and statistics

to help distill information from data,

to be able to answer those questions.

And I think the first time, I could be wrong,

but I think the first time

that a computational social science group was named,

was at Yahoo,

probably about 15 years ago now.

So I mean, that's kind of how new this area is,

and there's now a computational social conference.

And the point being that the area and the field

is still to some extent being defined.

But you can think of folks that are doing research

in this area as being,

a lot of them, not all,

are computer scientists or statisticians,

but still care about answering the why things are happening.

So like not just predicting outcomes for, you know,

sort of marketing for instance,

but trying to understand sort of why people or groups,

or societies behave the way that they do.

- Wonderful. Thank you.

So can you then maybe,

describe for everyone a little bit your position,

what type of work do you do, you know,

at this intersection of theory and practice

and research and industry.

So you know, just what's a typical day for you?

What type of projects do you work on?

- Oh, sure.

So I am what's now, so it's interesting.

So I'm now called also a data scientist,

and I've been a data scientist probably my whole career,

but it wasn't called data science when I started.

So in today's terms, I'm a data scientist.

And for the most part, I don't have a typical day,

except for, I am either working on analysis

towards some end, like to solve some business problem,

or scoping out and mapping out new problems.

So I've been pretty lucky over the past few years

in that, I've been able to take on roles

where I can still do basic research,

and what that means for, you know,

people who might not be academics is,

I can do academic style research in companies,

while still being close to data and real problems.

So like the barriers to getting data at least,

are broken down a bit.

And also, you know

getting access to practitioners who have real problems.

The nice thing is I get to still work with academics

like you, and as you mentioned,

I now have an affiliation at Columbia.

So it's very much an academic style position

but within a company, and, you know,

we do have to deliver results

in this role,

but we take a longer term view

of problems just like you would expect an academic to do.

- So what's really exciting for us is that also

you're going to be able to bring this background

into the classroom, starting in the spring.

So I know that you're developing a new course.

Would you like to maybe say a few words

about this course that you're putting together

for the spring?

- Sure. So first of all, I'm really excited about

teaching this course and the spirit of it

is going to be learning

the necessities for managing data scientists,

and data science projects for marketers in particular.

And I taught data mining,

and sort of applied machine learning for a really long time.

So at NYU, when I was a doctoral student,

and also at Wharton for many, many years,

and frankly because I had so many engagements

with industry even then,

like, so with different companies,

I really thought I was pretty well-versed

on sort of how to manage projects and industry.

But once I joined industry and I was on the inside,

my eyes were wide open that I didn't really know

as much as as I thought about like getting things done.

So let me just kind of take a step back

of what I mean by like I thought I knew.

So in class, what we would teach is, you know

kind of know what your objective is,

know how to measure your success

based on, like, if you're thinking

about a prediction problem and a classification problem,

like the cost of misclassification,

like really understand your objective.

And then what was missing,

I think are a number of things like the fact that,

one, it's not just like that you think

these projects are cool,

like you really have to do a sales job,

and so your job becomes not really that of a scientist

but you have to wear multiple hats,

because usually you don't have a huge team.

So you have to know how to program manage a little bit.

You have to know how to sell your results

and communicate them in a way that a lay person

can understand them.

You need to know how to build prototypes along the way.

So most of the time in my experience,

you know, an idea is not really worth much.

Like you have to show people

what you're telling them is going to work.

And so you have to know how to prototype things

in a way that doesn't suck up your time

on projects that, you know,

sort may not go to the end.

So really thinking about relationship building

and selling is one aspect.

And then another, maybe more important aspect,

is that, even if you have an amazing idea

and leadership in your organization,

whether it be big or small, wants to buy in,

there might be legal policy, privacy issues

that you can't overcome with your project,

and knowing how to sort of navigate the review process,

not only for the things that are, let's just say illegal,

but also for the things that might be in bad taste

at the moment.

So there are a lot of things we can do

but that maybe we shouldn't do,

if we're checking our moral compass.

So knowing how to navigate those things.

So this did kind of come up

in sort of working with companies before joining industry,

but there's always this tension in my experience

working in industry now, of like doing something right,

versus doing something fast.

And so as someone and, you know, my students

I would teach them that they want to do it right

not just fast,

so you have to really manage the expectations

of your major stakeholders up front,

in terms of like what you're willing to bend on,

because almost always, and this may be surprising,

but you're going to have to bend on something.

So it's like, and then how you explain that.

So it's like, is this heuristic

for what you're trying to do,

or is this something that is just an association,

and we can't make any causal claims or however it is

that you need to explain it,

you almost need a contract,

because people that are trying to deliver to customers,

or you know, internally,

care about meeting a deadline oftentimes.

And so you have to figure out how to navigate that.

And then I think, you know,

thinking about impact more broadly is,

I think the number one thing that maybe I was just lucky

in the projects that I picked in the past,

but, you know if you want to have impact on a company,

it needs to be something that, you know,

if you're talking about users, will impact a large number

of users, or at least impact the bottom line,

or it's something that is so extremely important

that the answer,

that leadership believes like that they should invest,

like just having an idea,

and that it would be cool to do or connect data.

Like people don't really care about those ideas.

So you have to really kind of be able to scope these things

out for yourself in advance,

so you don't waste time on things that people won't support.

So those are the types of things that we'll talk

about in class towards making sure,

if you want to take on a role,

either managing data scientist

or being a data scientist yourself in an industry,

that you kind of almost have a checklist of things

that you need to go through

when you're thinking through working on projects

if they already exist,

or scoping out new projects that you have to sell

and position within the company for success.

- Thank you, Shawndra.

So this was, I think, a wonderful,

like, you know high-level introduction to your work

and your world.

Before we really got into more specifics,

a couple of housekeeping items,

I got message that some people

have some issues with the audio.

So I tried to mute myself when you were talking this time,

but I don't know if the audio is getting better,

so if participants are getting audio issues,

please feel free to...

Okay, sounds great.

Okay. So apparently things are getting better.

So I will just mute myself when you speak

to make sure there's no echo.

- Can you hear me okay, when I'm talking?

- I think it's fine.

Yes. I think it's fine.

Maybe I know somehow there was some issue earlier.

So another issue is that, of course,

all of the participants who feel free to enter questions

into the Q and A, and you know,

we'll try to address as many of them as possible.

So with that now, let's, you know,

Shawndra and we try to drill that a little bit deeper,

like more, specifically more concretely

into your work.

So I know that you prepared a few slides about your research

and your work.

So I'm going to give you a chance

maybe to give some examples,

to maybe give people a bit more tangible example

of your work.

So I'm gonna let you share that if you would like.

- Great. Thanks.

So just for the audience, I prepared a few slides

just to kind of walk you through a few examples.

They're not meant to be comprehensive,

but I think they'll help you follow the discussion

that we're going to have.

So--

- And then we'll go back to a more higher level after that,

the more, like in general,

how data science is impacting marketing.

But I think it's good now to have some specifics

to set the ideas a little bit.

So go ahead, Shawndra, thank you.

- Can you see my screen?

- I can see your screen.

I think everyone can. Thank you.

- Great. So for most of my career,

whether it be in marketing or in other areas

where I've applied data science techniques,

I've really tried to connect data oftentimes

for the first time, to answer marketing problems

in this context,

or sometimes use data in a way

that people haven't used it before.

So if you could go back in your head to 2004,

back in 2004, it was the first time

I actually got excited about marketing problems.

Like as Olivier mentioned, I got my PhD

in Information Systems.

I actually applied thinking I was going to predict

the stock market. I don't know what I was thinking,

but anyway,

I quickly got interested in marketing problems,

and here's the reason why.

I was doing an internship where I got access

to social network data.

So connections between people,

where those connections were phone calls.

So on this slide,

it's a network where the nodes on these networks

are supposed to represent people,

and the connections between them are phone calls,

but, you know, the idea is that birds of a feather

flock together on these networks, right?

And so these connections in absence of having information

on things like race, religion, age, gender, geography,

can be used as a proxy for knowing that people

are in some way similar.

And so we used this idea in marketing,

and the way that it worked was,

we took existing customers of a service,

and we then looked at all of their friends,

and asked whether these connections

could tell us something about their likelihood

of adopting that service.

And it turned out that they were five times

more likely to purchase this particular product

that we were advertising,

than people both selected at random,

and even after doing a propensity matching,

they were more likely to purchase the product.

So believe it or not, this was maybe the first work

that connected social network data

to real business outcomes for marketing.

So again, remember, this is like 2004,

pre everybody being on Facebook,

Facebook existed, but there wasn't a lot of data like this.

So it turned out that companies got wind of this idea.

And there were startups that were even started up

on this basis of, you know,

connections between people having some value

for predicting attributes of users.

And so a lot of people ran with this idea after this work,

maybe they were thinking about it in parallel,

but certainly these companies weren't growing

kind of before this work came out.

And then since then, you know, sort of the rest is history,

like social networks are used to predict a lot of things,

about people for various reasons.

And so one thing that always bothered me about companies

just kind of running with this idea is,

we showed that the connections in the telecom setting

mattered for this one product, right?

So it, at the time it was voice over IP.

Like people paid for it back then, right?

Now we all get voiceover IP for free,

but back then, they paid for it and it was this one product,

and it worked very well,

and there could have been reasons why that was the case.

You know, like maybe when people were talking

to their friends, they were talking about it,

like, hey, do you hear my new, you know, phone quality?

Or it could have just been this homophily idea

or something else, but it really bothered me

that like people ran without testing.

So what I did was I collected a lot of social network data.

So this data weren't perfect in the sense

that I collected it externally using the Twitter API,

and Facebook APIs, and getting as much data as I could,

but what I did was I took TV show handles,

and brand and product handles on Twitter.

I got all of their connections.

So all of the people that followed them,

and then all of the people that followed them.

So I got the followers of the brands and TV shows,

then the followers of the followers,

and for all of those followers,

I got their tweets over time.

So it was a lot of, you know,

pinging this this Twitter API.

And what I was able to do with that data

is pretty much build a pseudo recommendation engine,

where I say, okay,

like a person or a Twitter user

followed Coca-Cola say, let me take that as information

for my recommendation model,

and then predict what other brands they would follow.

And I did that based on the social network,

I did that based on text features,

and I did that based on more of a traditional sense

of people being similar, based on the products

that they either purchase or have in common.

And so this image,

let me just explain it to you.

It's a little complicated

because I don't have the whole top.

But on the horizontal axis,

so we built these recommendation engines.

It's the number of recommendations

that I was making to these users.

And then on the vertical access,

think of it as the difference between the performance

of the social network based recommendations,

and the product base,

like the traditional basing this on products.

And so it turns out that for some verticals,

the product network consistently did better,

and for other verticals, the social network did better.

So for things like children's products,

home products, media entertainment, sports,

where you would expect people to have,

expect these advertisers or products

to have a niche audience,

like where homophily would actually matter,

then the social network actually did better.

And then for cases where you would expect everybody

to follow these products, things like household products,

or health related things, the product network did better.

So this was a first step at trying to understand

like when social network data would actually outperform

at least in this setting, product network.

And so we also collected text data

and built recommendation engines based on texts,

where people were similar based on the words

that they used in their tweets.

And we were able to characterize the audiences

of these brands,

based on the tweets that their followers say.

And so we could connect those copra of tweets

to the attributes of the characteristics of people

who followed these brands.

We got this from a different data source.

And we could figure out like which words

were actually predictive of certain characteristics.

So these are word clouds that show,

on the left side,

words that products that have stronger female audience say,

versus the types of things that are said in the tweets

for audiences of products that skew male.

And so we did that for a bunch of categories,

and we cross tab these categories,

and we can get these nice sort of word associations

with certain audience types.

And so from a marketing perspective,

like this could help brands and products,

and advertisers, understand who their audiences are

a little bit better.

I will say, with all of this work,

the difference I think in how academics approach the work

and maybe how it's traditionally approached

in product group is this, right?

So a lot of times,

there are these overall high level metrics or KPIs,

that a company needs to report on,

or a team needs to report on for a particular solution.

And very rarely in my experience,

do people try to figure out

if there's any heterogeneity by user or by brand type,

when you're talking about big systems

like recommendation engines,

that need to make predictions broadly.

And so how I think about my role even now,

is sort of digging into the details,

to try to figure out where heterogeneity lies.

So that was kind of...

Using social media data is something I did

for quite a while, and it was exciting

because it was new data at the time.

And then I joined Microsoft,

and I focused a lot on sponsored search data,

and sponsored search data were exciting,

because it was new to me anyway,

but also because you can do a lot with it,

learning about the customer journey,

like, so for any given customer,

how they search over time

for a particular category of product,

or for a specific product.

Once you're sort of embedded,

you have access to a lot of information about users,

anonymized, obviously, but you can see for instance,

whether somebody already owns a product or not,

and look at how they respond to advertising,

whether they are an existing customer or not.

So this slide is just showing sort of

kind of an experiment,

how it worked, and I'll explain a couple of ways we used it.

So basically, people search for brands on search engines.

So here there's a search for Edmunds.

And usually when it's a product or brand

you'll see first a set of advertisements,

followed then by organic search data.

And so the experiment that was running,

is that the advertisements were getting shuffled randomly

above a certain threshold.

And they were either showing one, well, zero, one, two

three or four ads.

And so what that enabled us to do,

is to see what happens when, for instance,

in this case in step two,

so this is just showing that sometimes in the experiment,

no advertisements are at the top, right?

So this would let us answer,

so what happens if a customer doesn't advertise

on sponsored search?

Like how much traffic do they still get

from their organic link? For instance.

We could also ask questions around what happens,

like, because these things are now being shuffled,

the advertisements, what happens when a competitor

is shown above the focal brand,

or a complimentary product, for instance, right?

Because normally, what would happen is the brand

that is being searched for, will show up at the top.

So if you're studying without an experiment,

you'd only know sort of what happens

when it's at the top or nothing.

But because these things were being shuffled,

we can ask interesting questions

about competitors,

and complimentary products,

and also, as I mentioned, one of the things that we did

was see whether these ordering effects

that we wanted to study, like how much traffic stealing

there is, as the focal brand moves down the page.

Whether that is stronger or weaker

when a person already owns the product.

So these data, again, were maybe in this case,

it wasn't new,

like these data were sitting around,

but asking questions in new ways.

And this is the work that I'm referencing actually,

is with another faculty member,

in the marketing department, Andre Seminar.

So again, like partnership with academia.

So the search data was exciting to me for another reason.

I was really interested in TV research,

specifically like TV advertising research.

And when I joined Microsoft, it wasn't a TV ad company.

However, one thing that we could look at is,

how TV ads interact with sponsored search ads.

And then it became interesting to both Bing

and advertisers who advertised across TV channels

and sponsored search channels.

So the general idea is that

people are sitting in front of their TV,

and they are responding to what they see online,

or on some device that's not their TV.

And so it did a lot of work looking at how people respond,

who responds and why, trying to get at why.

So the general ideas, people see something on TV,

they go to a search engine like Bing, they search for it,

they get the sponsored search results,

and then they click on something, right?

And maybe they're looking for something related specifically

to what was shown in the commercial,

or maybe they're looking for the brand more broadly,

but those are things that we could try to understand

with this data.

Historically, and in most of the work that I've done,

we treated TV ads as events.

So they had a specific time, in a specific place.

Usually, we looked at ads that were shown nationally

in the US, and we can measure the search spikes immediately

after a TV ad.

So this is a plot from real data,

where the surface laptop in 2017 was aired.

And so we'd see this orange line,

where the ad was aired, and then the search spikes after.

And we weren't the first to show

that there are search spikes,

but what we tried to understand, because we had access

to more data is who these people are,

with respect to their demographics

and which types of devices they were searching from.

And even if there are differences in how the attention

shifted, with respect to the user characteristics.

So another study that we did,

so this is like a lot to digest, I know, in one slide,

but it's a very exciting project,

and one that I was really excited about,

where we moved from this, right?

So before and most prior work, not all,

really treated TV ads as events,

where you have an ad showed at 8:00 PM,

and you look what happens right in the minutes after.

And the reason why you couldn't really look long-term

for any one advertisement campaign,

is because once you look term,

they're just so many confounders

that come into play,

including that the same ad

might be shown a few minutes later.

And so that's when you're looking at aggregate level data.

So what happens in aggregate,

all the people in the US,

and what happens after 8:00 PM on search,

and what happens for, you know,

just knowing that a particular ad was shown at 8:00 PM

in the US. What we tried to do more recently,

was connect users to their TV viewership

in a privacy friendly way,

at the individual household level.

So then we could say, we know that at least,

this box was tuned in to a TV commercial,

what happened for them?

So that enables us to do two things at least.

One, is look more longer-term at what happens to people

who see TV ads, with respect to like their search behavior.

And it also enables us to look at advertisers

who advertise all the time.

So for those of you who are marketers,

or, you know, work on advertising campaigns,

you might know this,

but there's some advertisers

who like any minute of the day, are always on.

So for telecom, for things like food and beverage,

at certain times of the day,

you couldn't even do this event type analysis.

So it enables us to be able to look at more advertisers

and measurements.

A project that I worked on with Olivier is related,

but different, where instead of looking at TV ads,

we looked at TV show events,

and we looked at how people searched for information

about TV shows over the course of, for the most part a day,

but like before and after shows.

And what we wanted to understand is those dynamics,

but also could we jointly model the search behavior

and the click action,

so what people click on,

in such a way that would help us to do better

at predicting,

one, what people would likely want to see,

and also as a result, want to click on.

So this is just showing over time,

interest in the Super Bowl in 2016,

so you see this huge spike, when the Super Bowl starts.

It stays high, the interest,

while the Super Bowl is on.

But what's interesting is,

if we looked at what people were searching for

over the course of 24 hours before during the game,

and 24 hours after,

we saw a lot of the same searches,

so these are some of the top searches,

but what you see as indicated by these colors,

is that some were searched more before the Super Bowl was on

and some were searched more after.

So we wanted to look at both,

these dynamics of what people were interested in,

with respect to their searches,

but what might not be obvious to you,

is not only were people searching differently

for these topics,

but even for something like the Super Bowl,

they were clicking on different things, right?

So they were searching for Super Bowl,

but maybe before the ad, sorry,

before the TV show aired or the Super Bowl aired,

they were looking at, you know,

what the time was or when it was on,

versus after they're looking who won the MVP.

So they could have searched for that specifically,

but it was also those types of things

were also reflected in the clicks,

even for a generic term like Super Bowl.

So the way that we modeled it was using these dynamics

in the clicks, as well as topic modeling,

topic modeling the snippets,

and the text of the queries,

in such a way that enabled us

to do a better job at prediction

than if we were to not factor in these dynamics

and the end search.

So that was a really fun and exciting project.

And so finally, I'm going to end really briefly

on something that I worked on.

The motivation was to understand

whether showing diverse characters in TV ads,

led to better outcomes for businesses.

So basically what we were finding

and we know this to be true,

is that, more and more advertisers were including characters

in their ads, that reflected the diversity

of the people in the United States.

And so what we wanted to understand is like, you know,

is that good for business basically, right?

It seems like it was good from a social perspective

and the right thing to do,

but we want to know if it was good for business.

So when we started, we didn't know

how to get at these answers

of which advertisers were more inclusive

in their advertising,

versus not because that data didn't exist.

So we had to create it.

So the way that we did it,

was we used off the shelf tools

for video extraction and image labeling.

So we took a video, we extracted information

using this video indexer that's available by Microsoft.

We then let the tool automatically label the age,

gender and not race,

it doesn't label race anymore of the users.

And also we did a duplication of the actors

that were in the show.

So like, you know, somebody might turn to the side,

and then the image would show up twice in our data set

for a particular advertiser.

So basically after we extracted all the images,

in addition to automatically labeling them,

we ran a study on Amazon Mechanical Turk,

where we had people label the images

based on what they thought was,

who they thought was in the image

with respect to their gender, age and race.

And we ended up with about 6,000 videos,

and a bunch of different labels.

We asked for every image for two labels.

So what was interesting is that for gender,

people were able to label the characters pretty clearly

in the sense that the raters,

the two raters agreed most of the time,

but when we asked them about race,

only 69% of them agreed on white,

61% on blacks, 10% on Hispanics, and 15% on Asians.

So that's interesting just as an aside,

because these models are trying to do a better job

of labeling race, but even humans can't do a good job.

And that's something that has to be considered

when thinking about these models,

and actually putting them out into the world.

So we built a toolkit and I want to be sensitive to time,

so I'm just going to say very quickly,

on top of this data,

we basically came up with a set of inclusivity scores

for an advertiser, for the vertical, for the industry,

and using these scores,

we can measure the diversity in these different groups,

and plot them in such a way that an advertiser could go in,

and ask how they are comparing against other advertisers

in their cohort, whether that be at the product type level

or at the industry level.

So this was the first step,

and I didn't get to connecting this to business outcomes,

but that would be the next step,

but still, it's an example of using computational tools

to understand what's going on in marketing.

So I will, and there like,

there's some obvious things just as a, you know,

a couple of points in terms of the results.

Like we saw things like women were more likely

to show up in retail stores and health and beauty,

but less likely in electronics and communication

and vehicles.

Blacks were shown more for political,

government organizations and education,

but less so in home and real estate.

And seniors were shown more in insurance and pharma,

and less for apparel and footwear.

So there are some face validity here, you know,

to at least make us believe we were going

in the right direction for these labels.

But again, the connections to business outcomes

would be the next step.

So I think that really is it.

So I'm going to stop sharing my screen,

and hopefully the slides helped.

I kind of pulled out like pictures

that I think would make the points,

but yeah,

these are examples of using computation in marketing.

- Thank you, Shawndra.

Thank you, this is a lot of information, very exciting work.

And there's a few questions in the chat.

Some of them are technical, some of them are big picture,

so maybe we can go through them if you don't mind.

So you talked about how you do propensity matching

in the social network.

Someone wants to know whether you used

the neighbors to do that.

I'm guessing it's more

for logistic regression probably, or...

- Yeah. So at the time, it wasn't widely accepted,

and even now, like it's been criticized to use this

for marketing problems,

but at the time we use logistic regression.

- Now, someone also,

some pretty technical questions,

in your job, how much of the time you spend

building coding the prototypes?

- That's a great question.

So in my prior role at Microsoft,

I did not spend as much time as I do now coding.

So I think, I'm saying that to say

that the answer is, I can't make a general answer,

because I think each team is different

and it depends on the resources.

So I was lucky in my prior role

to have engineers that worked on our team

that could support research and building prototypes,

and I don't have that now.

- Thank you.

Now we go to a much more deep,

or maybe like philosophical question,

which is how do you balance ethics and outputs,

you know, in the time when companies

in the tech world are under scrutiny for using analytics?

- Yeah. So first of all, that's a great question.

And so let me just tell you,

like what happened over the course of working on the things

that I showed you. So back in 2003, 2004,

it was kind of like the wild, wild West.

Like there were no rules really, you know,

and the rule of thumb then,

was to talk to lawyers,

to make sure you're not doing anything

that is violating any laws.

Companies and researchers, and, you know

like even academics, and the IRBs,

like understanding this data a little bit more,

like we've all evolved,

and I rely heavily on whatever processes

are in place in the organizations that I belong to,

because they think a lot deeper about the implications

for policy, for privacy and for ethics.

And so I rely on them,

and you know,

I don't ever want to do anything

that violates anyone's privacy.

And so I think about it often, and I will say that,

because of how things have evolved,

there are a lot of changes in what you can and can't do

for data in companies.

And a lot of it is policy, right?

It's like the agreements that these companies have made

with their users and their consumers,

and less so about what's allowed by law.

And so your question is spot on,

and I think times are different now,

and some of the things that I showed you,

maybe couldn't be done now.

- Thank you, Shawndra.

So we're going back to a much more technical question.

Someone wants to know the format of the data

that you work with, is it CSV, Excel, XML, Jason?

- Yeah. It depends.

So...

these days, it's rarely any of those,

it's usually in some kind of data store,

whether that be a traditional SQL database,

or something that, you know, can handle much more scale.

Historically, like with the social media data,

we would get it from the APIs in Jason,

and it was easy to process that way,

but I would say a mix, but the data are so large now,

it's like really in a flat file.

- Thank you.

Now we're going back to, you know,

a question that's more I guess higher-level,

so, you know, you've showed some needs of experiments,

observational research, you know,

more correlational causation.

So someone wants to know how you think

about using observational methods,

which is experimental methods,

in the field or in market applications

versus an academic research.

So it's the--

- I think that's a great question.

I mean, maybe this is, Olivier is the expert

on the academic research side,

but like I can tell you, like,

in terms of like what can get published.

So I think it's really hard to get work published,

if you can't make causal claims,

whatever method you're using.

Whether it be like observational methods

or running experiments.

- Especially in marketing,

in social science as you said,

there's a big emphasis on the mechanism and the why,

so there's still research that's more predictive,

trying to predict or optimize,

but it's true that there's definitely a big taste

for being able to understand what's driving the results

for sure.

- Yeah. So it's really difficult to get things published

in top journals, if you can't make causal claims.

In practice, it's better.

Like people get more excited about results

where you can show causality.

However, oftentimes prediction is a fine answer.

As long as it's repeatable and reliable over time,

like whatever you're doing.

So those are two things.

And then the final thing is,

there are just some things

where you can't run an experiment.

Like you just can't.

So it's like, even like,

so there are people who have run experiments

on social networks, for example,

like gifting certain products

and like seeing how it shares,

but you can't like make somebody make new friends.

I mean it's really hard.

So like their context where you have a wealth of data

on user behavior,

and you want to answer a question that's really important,

but an experiment just isn't possible.

And I'll add something, actually,

if you, so we didn't talk about surveys

maybe cause like that doesn't come up

in the causal context.

But the other thing is sometimes

if you take search, for instance, can like reveal a lot

about what people are thinking or doing,

that they might not reveal.

And not because it's like private or sensitive data,

just because like, if you ask me for instance,

what I think about the political candidate that I voted for,

maybe because of political cheerleading,

I will say something positive,

even if I don't believe it.

But so my behavior might tell you more

about like what I think about the candidate,

if that makes sense.

So I think there's still a lot of value

in using observational techniques and data

that is generated without experiments.

- Thank you.

That's very interesting. There's a couple of questions

that I think in some holding back to your course

a little bit.

So someone asks whether in these examples

that you shared with us,

was the person leading the projects knowledgeable

about the technicalities of getting such research done?

And then, you know, maybe rarely someone else asks,

you know, how do you relate the core of that

to your experience at Facebook and Microsoft?

And so, yeah, so it'd be the,

yeah, I think getting these types of products done

and the management of data science projects.

- Yeah. So I think it's going to be directly related

to answer the second question first.

So it's based on like learning how to do this well now,

after also, you know doing things

in a more of an ad hoc way,

because like I wasn't required to be thoughtful

about timelines.

I mean, maybe I should have been as an academic,

I don't know.

I mean I'm saying this like, but it's different.

So I was forced to get better at this.

And also I learned from teams

that deliver over and over again really well.

So yes, it will be based on my experience,

but also based on, you know, sort of known ways

to lead a successful projects.

So it's not just my opinion,

it's gonna also be based on project management skills

and...

And solutions for that.

As far as like project, I don't know exactly

which project the first question is asking about--

- I guess, I think maybe in general,

I think the type of product that you described,

does the person leading the project need to be knowledgeable

about the methodology and the technicalities

behind the research?

Are these products being managed,

maybe by people on the management side?

- Is the question more like,

can you be a data science manager

if you're not like technical?

- I don't know.

The question is was the person

leading the projects knowledgeable

about the technicalities of getting search research done?

- Yeah. So it depends on who you're talking about.

So like, if you're talking about somebody

who is a technologist,

so of course, like they're going to be experts

in their field.

Like, that's pretty much what they're hired for,

like as data scientists and they'll be good scientists.

But that doesn't mean that you'll be able

to ask the questions right.

And again, like teaching students

how to ask the questions right,

is something that was part of the technical version

of the class.

This is altogether different,

it's not just like, are you asking the question right

so that you can get an answer?

It's also like, who will care?

Is your timeline aligned with, you know

the team that you're working with

who are your stakeholders?

And like, are they going to support you?

And support comes from any number of things.

Like it could be engineering support,

it could be PM support, it could be sales support.

And so let me just answer the question

just in case that was wrapped in it of like,

do you have to have a technical background?

I would say like, you need to understand

at a high level, like what's happening,

but there's nothing more valuable than a great PM

at getting to the bottom line

of what a project should be doing,

what their timelines are

and who what they're delivering.

So I've seen, you know,

amazing PMs that were not computer scientists,

that really can lead in technical spaces.

So you don't have to be,

but you have to be willing to learn.

Like, cause you do have to understand what's going on.

- Yeah. Thank you, Shawndra.

So I think there's a question that's specifically

about the project on the search on TV ads.

Was there any insight into who specifically tended to search

at first seeing the TV ads?

For example,

were the consumers already likely to have interest

in the product, prior to seeing the TV ad

or the consumers who mostly just became aware

of the product in that moment?

I guess maybe some attribution maybe issues there.

- Yeah. So in one of the studies that we did,

it does look like it's newer people

that are searching in those moments after the TV ad.

I will say on a related study of just to connect it

to like things that I do know for sure,

without TV ads, we looked at how people search

for products and brands over time.

And it turned out that,

at least for a couple of technology products

that we looked at,

the large majority of people searching

after the product was launched for a few months.

So in the beginning

like people are trying to get information

about the new product,

but after, it was largely people

who already owned the product.

And therefore, you know,

it's not clear that those are the people

that you want to advertise to.

They certainly, well advertise the same product.

They responded different to advertisement.

However, they were much more likely to respond positively

to complimentary products.

So knowing that information to the point

of the question asker is like really, really important.

And it's hard to get,

like it's not easy to get that data in these contexts.

- Yeah. Thank you.

I think, you know, back to the privacy issues,

there's a question, how can this function related

to kids' products surveys,

and the whole protection of privacy laws,

I guess if you're trying to study product

that are target to kids, for example,

are there restrictions,

is it possible to do, or maybe there's a...

- So there are restrictions, I've never touched that.

Like, as you know, like, so I don't exactly know.

There's somebody at Kellogg who studies marketing to kids

on the academic side, but I don't know the answer to that.

I do know that these companies spend a lot of effort

trying to identify who's a kid,

so that they do not market to them.

But as far as like doing studies on them,

my guess is that it's like a big no-no,

like that's my guess, but I don't know.

- Thank you.

So I think going back to the, getting the job done,

so someone asks what challenges you have,

and/or what tools you need to do your job better?

- So...

so two things usually, bandwidth, right?

Because...

you just go further faster when more people

are working on projects.

And I would say the tools, usually it comes down

to more like data than, you know,

sort of having more computational power

or something like that.

It's like you're missing some piece of the data

that would enable you to do things

like understand like who these searchers are,

or something to get out the why.

A lot of times you can find a connection

between two things, even if you run an experiment,

but the why part is oftentimes like really hard to get at,

because you're missing pieces of the puzzle.

So I don't know if that answers the question,

because it's not a tool question,

but it's usually like limitations in the data.

- Thank you.

So we have only a couple of minutes left,

so we touched a little bit,

you touched on social networks, search advertising, TV ads,

so are there any other areas of marketing that,

in which you see data science having an impact,

that you'd like to briefly maybe mention or,

are these the three main buckets in which you've seen

most of the action?

- Yeah. So I'll say one that didn't come up

but that maybe I'll work on, is like customer journey.

I think the data today,

enable you to learn a lot more about customer journey,

and then maybe looking to the future,

like I'm not working in this space yet,

but it's exciting, things like product placement,

and even a product placement in video games

and virtual reality games.

Like I think, you know, that's just a whole nother space

to explore where technology and computational methods

will play or play a role,

but pretty much like anything and everything,

you know, like we also didn't talk really

about like brand lift, or like really traditional things

like customer lifetime value, like all of those areas.

Like when you have more data on consumers,

like you can do more to understand what's going on.

- Yeah.

So yes. I think, you know, maybe final word.

I know one topic we haven't talked about,

in which one thing that you care about is,

the diversity in tech.

And I know just, you know,

what do you see being done

and what more be done to improve diversity

in the tech industry?

You know, we have a lot of students here,

maybe your younger lambs in the panel,

any advice for both job candidates

and also for managers and recruiters on that front?

- Oh no, I have one minute to say all that?

(both laughing)

Well, first of all, let me say it like,

anything I say is my opinion,

because I'm not like a DNI, diversity and inclusion expert.

And also like, even when we talk about diversity

like that can mean a lot of things, right?

Like there are all kinds of dimensions

where people may not be well-represented in groups.

But I will say in terms of what companies do,

I mean at least they're talking about it more,

in light of like a lot of things

that have happened in this country.

I mean, just a few things

that I think companies can do better,

and some of them are already doing this,

is like move these roles to be like more central

in the company,

so that they have a little bit more power,

and make sure, like for the companies

who don't have diversity and inclusion roles,

like have them, you know, start to build them.

Like sometimes in smaller companies,

like the burden falls on people

that are in underrepresented groups

to like make suggestions,

or even get involved to run programs.

So, you know, the first thing is like

just make sure there's somebody for whom it's their job,

and who are experts to think about, you know,

sort of how to improve things.

The other thing, like if we're talking about black women

in particular, part of the problem, I think,

maybe even the biggest problem

is like the lack of data because of the small numbers.

So like you always have a small end problem.

This shows up in academia too.

It's like, you never know how they're feeling

because like, if you survey two people,

you can't report usually for privacy reasons

like what the answers are for those two people.

And so that makes it really hard to do things.

So the obvious things like recruiting differently,

you know, sort of training.

Some things that we could do,

is make sure that people want technical roles

that they're prepared for technical interviews,

because that's oftentimes like the barrier to entry.

And then, you know, for all of these,

technology or otherwise,

like they just have to change the culture.

So that it's welcoming

but advice is, you know, do it, there's space.

Like one of the things,

and I don't want to trivialize, like, you know,

people's experiences, because they're valid,

and obviously like if you're in a minority group,

it's hard sometimes,

but I think also in technology, at least in my experience

it's like, you're rewarded for being good at what you do.

And so it's like just work really hard

to be really good,

and build networks that can help you navigate,

those times when it's, you know,

maybe not as friendly as you would like,

but there's space for everybody.

Like there's really space for everybody

who wants to participate in data science in particular.

- Thank you. That's a great note to end.

We're already past 10 o'clock,

but thank you very much Shawndra.

We're getting lots of compliments on the chat,

people want to hear more from you.

So, this is just the beginning.

So have a good day everyone and a good weekend after that,

and a happy Thanksgiving also, while we're there.

And so thank you again, Shawndra very much,

and everyone.

- Okay, bye. - Take care. Bye.

Resources:

Similar videos

2CUTURL

Created in 2013, 2CUTURL has been on the forefront of entertainment and breaking news. Our editorial staff delivers high quality articles, video, documentary and live along with multi-platform content.

© 2CUTURL. All Rights Reserved.