From Strategy to Reality: Better Decisions with Data
Articles,  Blog

From Strategy to Reality: Better Decisions with Data


Hi, I’m sure there’s people out there but
all I can see are lights so therefore if you will probably do some questions at
the end but if you have questions or comments you’ll just have to yell them
out. So my name is Darren Brisbane I am with the database analytics machine
learning and blockchain services team which we have the honour of building and
doing these services so that you can hopefully use them. And today I’m going
to walk through some of our thinking about how you can make better decisions.
So what do we actually mean by that? Well we sat down with our white board about
ten years ago and thought about what is it that we believe that matters here so
the first piece this is one of those things that seems really obvious but
people forget about it, the reason to do analytics the only reason to do
analytics is to help people make better decisions. If you’re doing analytic work
that can’t be used to make a better decision
you’re wasting time and resources and we’ve noticed that a lot of people spend
a lot of time doing stuff that when you go back and say how is this helping
anyone make a better decision it doesn’t. The second piece there-
all data has value. You don’t know what the value of that data is sometimes but
somebody does. So how do we make this stuff cheap enough and scalable enough
so that no data gets thrown away so that all the data becomes available and
usable and then the third one is this shouldn’t be a priesthood that with
limited access everyone in the organization and outside the
organization citizens stakeholders other countries should have access to that
data of course subject to access roles you
know we’re going to keep private information private and so on. So this
gets down to what we’ve traditionally seen and what what we now think we need
to move away from because traditionally analytics consisted of putting some of
the data into a data warehouse with some analytics on it but those data
warehouses are kind of expensive so you ended up throwing away about 90% of the
data and the analytics tools that you use to create all the charts and graphs
and forecasts we’re also really expensive
so only a small number of data professionals saw them, so what we ended
up doing is not only spending a lot of money but well you know if the minister
wants an answer on something then they’ll get that answer in a couple of
hours but the minister doesn’t make all the decisions right. It’s that person
three levels five levels 17 levels down who needs information to make decisions
and they don’t get this stuff if they’re lucky they get a spreadsheet that’s six
months out of date, that might show something that they can use. So what
we’re trying to do and where we’ve seen the real impact when this gets done is
goes back to those rules and think about one how am i helping people make
decisions and I don’t know what everyone’s going to do so how do I
create capabilities for people to be able to use data to make decisions not
projects capabilities. Let the projects be the ownership of the people using it
and how do I make this stuff so that it’s cheap enough and easy enough that
everyone can use it not just a small number of people so when you take away
all the marketing hype that’s a data lake. So I’m sure you’ve all heard the
term we’re in Canada there’s a lot of lakes. in this case what do we really
mean by a data Lake well one we mean it’s big it can grow to exabytes okay
you probably don’t have X advice right now but you will. Secondly that means
that I can store and analyze all sorts of data relational data non-relational
data you know emails structured charts whatever and it means I have
purpose-built analytics tools that can actually use this so that I have easy to
use ways to make queries using standard languages like SQL but be able to use
it well and I want it to be cheap I’m not supposed to say cheap but I’m not
marketing cost effective it’s what the marketing people like to say. So actually
we just dropped that price it’s 2.3 cents a month or so per gigabyte it’s
actually I think 2.1 now on the most expensive sort of S3 so if I use the
very highest quality highest data transfer version of that object
store its $23 a month for a terabyte. 23,000 dollars a month for a petabyte
but I have other options for things that are less frequently accessed that are as
little as one 50th that cost so my petabyte cost can like my terabyte cost
can again drop from from $23 yeah all the way down to about 40 cents. So we can
you know make that really cost-effective and I can query inexpensively or do full
data warehousing and expensively with managed services that can be turned on
and turned off so you’re only paying for it as you use it and give access to
everyone. Now good question in your organization and I’ll bet everyone has
pretty much the same answer for this what is the most common analytics tool
that is used by you by your people. Microsoft Excel. Okay don’t laugh right
you’re gonna have to pull that out of the cold dead hands of your analysts
right they they spent five years learning how to do pivot tables so you
are not taking away their excel but we can hook Excel into the lake so that
their excel gets fresh and current information and there’s still a role for
the high-end tools right the MicroStrategy Cognos tableau so on for
professional data analysts. What Amazon’s also introduced is something in between
called quick site so quick site gives you a little more power than Excel lets
you do things like machine learning forecasts predictions anomaly detection
it’s no harder to use than Excel and it lets people publish dashboards that can
be shared and it’s again at designed to be cost effective. it cost 30 seconds
used for 30 minutes the maximum per month is five dollars so if you’re
somebody using this all day it costs five dollars a month if you use it twice
a month that cost sixty cents we want to make this cost effective to use
throughout the organization and that’s kind of the key of putting these
together so everything else I’m going to say today is about how you actually do
these things so if you want to fall asleep now feel free. So I was joking
with our people that if we really wanted people to get a good nap we should have
sir poutine instead of put Thai and done you know the full Canadian experience
but have you seen this before few people
have this is Jeff Bezos his handwriting from a United Airlines cocktail napkin
from 1995. This was essentially the thinking that went into Amazon and this
is how we decide what to do whenever we enter a business or do a service it’s
called the flywheel. So think about what this means that what’s key is the
customer experience when you give a good customer experience you get more traffic,
well here we’re talking ecommerce if I get more traffic I have more people who
want to sell so you have more sellers they get more selection which is a
better customer experience so I have a virtuous cycle there which creates
growth and growth gives you a lower cost structure which means you have lower
prices which is also a good customer experience as shoppers I’m assuming most
of you kind of like having more selection and lower costs right I do
I’m very cheap. This by the way was a white shirt that I spilled coffee on
once couldn’t get the coffee off and my wife pointed out new shirt $50 tie-dye
$5.00 that you can’t see the coffee stain. So we think about this exactly the
same way when we do data analytics at Amazon it’s not exactly the exact same
terms what’s the same concept so I have a good customer experience for the
people who are analyzing using the data that means they’re going to share more
which means I’ll have more people providing data and putting it into my
lake which means I have a more diverse data set which is a better experience
and that makes my big data marketplace grow which of course creates better
operational efficiencies so I can do faster innovation and this is how this
is the key to a successful data Lake project. So there’s a couple of things
that are buried in here that aren’t obvious so I’ll point them out customer
experience that’s the people using the data make it easy and fun for them to
pull and use and get the information they need but the data provider
experience is important to make it easy and simple for people to put data into
the lake otherwise you just get an empty lake but if I can get all these pieces
going together I get something that grows, like at the Canadian pharmacists
Association where they’ve done a whole bunch of
interesting things here and one of the analytics apps that they’re making
available to everyone is my medications dot CA where you can you as a patient or
a loved one of a patient can easily do analytics on things that used to be
really hard to get not just drug information but drug interaction
information and how those things come together and find them but get
consistent information about medications and get that to everyone not just
physicians right everyone who used drugs or cares about someone who uses drugs
which is probably all of us in the room. I shouldn’t say drugs medications or
personalized content like our friends at Globe and Mail so you know make sure
make sure that they’re not just feeding the news but feeding the news that you
care about and the fun thing is because this is all based on these standardized
building blocks they were able to be able to do processed or analytics
building out a graph to understand customer our reader interests in about
60 days and it increased engagement by about 25% which and what is engagement
it’s the amount of time that people are spending on The Globe and Mail site
reading and looking at articles. We’re successfully figuring out what
people are interested in so that they read more. You’re all doing a lot of work
to get your information in public where citizens and others and residents and
others can see it you should also I hope care about that engagement how do we
make your stuff not just providing vital information making interesting enough
that they want to read more and learn more about your programs and how they
can help and have you heard of this company.
It’s a bookseller as you can tell I took this picture last week because it’s
talking about Mother’s Day if you forgot about that go call your mother right now
because you’re in trouble, but we had an interesting problem. We originally built
the one of those traditional data warehouses back around 15 years ago we
built it out of Oracle it grew up to about five petabytes of usable data
about 45 petabytes raw and it couldn’t grow any more and it was too hard to use
so we moved to a data Lake and moved it to a data Lake that’s using
things like redshift and EMR and the other Amazon tools and we’ve moved from
600,000 analytics jobs a day to about three million in the last year and a
half while radically reducing the costs and increasing the usability so let me
dive into that a little bit. Everyone here has used amazon.ca or
amazon.com at least once right maybe. okay a vertical head motion indicates a
yes response a horizontal head motion indicates a no response most of you that
I can see you’re doing this that usually indicates that you’re using certain
herbs that are now legal for recreational use here in Canada which is
fine but okay. So if you’ve used Amazon you’ve probably realized we have a lot
of different businesses right you know just like you know there’s a lot of
different public sector groups. So some of those groups are really big but even
the big groups have huge differences right amazon.ca retail versus say Amazon.in retail in India you know if I go to the Canada Amazon site and they look
for phones they’re all gonna be smart phones we go to the India Canada site
and look for phones about half of them will be flip phones because that’s what
that market is looking for so you got a lot of those differences and then we’ve
got some huge you know warehouses and whatever and we have some small groups
like twitch. Anybody here ever hear of twitch. It’s a group but it’s a service
of Amazon that lets you watch other people play video games. I remember when
this got started up we actually acquired the core built it out and I thought
first of all who wants to watch other people play video games well I was
wrong about that there was a I think a league of Legends Championship recently
that had something like 40 million live viewers okay I don’t get it but that’s
fine, but the bigger question I had is how do we make money showing other
people play video games and the answer turns out to be you can comment on the
players so reading the comments is free but writing a comment cost one cent.
Could you believe we’re getting hundreds of millions of dollars a year by
charging people one cent at a time to trash-talk
video game players. But there’s some analytics in there and
there’s useful things to learn all of these units are different size right
which is a fairly small unit as opposed to you know the the medium size units
XR the giant size units like commercial so we needed something that would scale
with our eco system just like you do and we had three main goals so that eye
chart on the right is all the tools we were using in 2016. Well distribute these
slides that if you don’t figure with how this works those there’s a survey you’ll
get at at the end if you fill out the survey you get the slides but only if
you rate me above four now just getting please give us honest feedback. So we had
way too many tools what we wanted was something that had sequel based solution
because that’s the language that people know to use for queries we wanted to be
able to use analytic approaches like machine learning and programmatic
analysis and I’ll talk more about this in a bit we want to both bring your own
cluster and bring your own query so we created a data Lake and we named it
Andes because the source of the Amazon is lakes and the Andes get it the jokes
are not going to get any better than this okay but the idea of Andes and by
the way if you’re going to a data analytics project I found that there’s a
lot of technical keys to success that we’ve been talking about one of the
small keys but it does seem that every successful project I’ve seen has a good
name and it’s a name that relates to your organization and it’s not a name
like enterprise database source or something like that it’s not a name that
sounds nerdy. it’s something that relates to what your organization does. Anyway
Andes is the place for data at Amazon data producers put the public data in
their data consumers use it so if I tear that up into services on the left side
is how people get information into the data like okay easy enough so how do you
make that easy well one thing that we did is we looked around we said you know
about 90 percent or so of our data is in one of three repositories, it’s either in
dynamoDB which is a scaleable no sequel database
some Postgres which we use heavily for things that need geographic information
like shipping data or it’s a stream that’s coming from some sort of smart
device or you know robots in the fact in the distribution centers or it’s
something else. So the goal here is to make it easy for producers so if I am a
producer and I have a DynamoDB database I fill out a one-page web form so it I
tell that the endpoint of the database I provide some security credentials and I
give a few pieces of information that we can’t automatically discover like what’s
the email address of the data owner please give us a short description of
the of the data set please give us a 50 character description of the data set we
call the Micro description so you know the micro description might be you know
toy sales in Canada and the longer description might be you know a
collection of sales divided by province and by date and so on and and also tell
us what divisions it applies to so in that case it would be the toy example it
would be retail would be toys so on and then that’s it. So the time to cut a data
set in if you got a Dynamo database about five minutes same thing with
Postgres it’s different underneath but it’s pretty much the same form if it’s a
stream give us the stream endpoints tell us what the stream is and if it’s
something else get your data into S3 so if you’re using Neptune graph or you’re
using Oracle or you’re using my sequel or one of the many other data
technologies know or its external data that’s actually coming from outside of
Amazon like say census information or weather information just get it into the
s3 repository and we’ll take it from there
so we’re able to make it very easy for people to put stuff in our goal is 99%
we’re currently at about 97 and a half. so about one time out of 25 somebody
tries to do this it doesn’t work they have to call us for help call IT for
help what we’re getting there that all goes into a raw s3 bucket it then gets
cooked using a tool of ours called glue so what does that mean to be a from the
raw bucket to the cooked bucket it means that the data is put into a stage
compressed columnar form we use the park’ format for that so I have the raw
bucket which is really just there for heritage reasons and then the the cooked
bucket and then now I have the data available for people to use so what does
that mean to use well the first thing you have to do if you’re a data analyst
is find your data set. so we set up a search engine so this look familiar to
those of you who may have used Amazon in the past except it’s data sets and this
is using a standard open source thing called elasticsearch and indeed deployed
using Amazon Elastic search service which is the service to make it easy to
deploy elasticsearch and so you might notice I have names of data sets and
owners and data set sizes and reviews and ratings because we asked data
analysts to tell each other how good the data set is I was at a meeting last
December and somebody walked in a little late and everyone said hey it’s mr.
two-star I feel highly confident that he left that meeting to go clean up his
data set afterwards as well as the name of the owner why is the name of the
owner in the yeah and the owner so important because let’s say an analyst
is trying to do something and there’s a column and they don’t understand what it
means I don’t want them to call i.t i.t doesn’t know what that column means IT
would just go pass that on to the data owner so get out of the loop and just go
directly to the data owner and say hey what is this column mean and if the data
owner keeps getting the same question maybe it’ll really able the column or do
something to make it easier to understand in any case I’ve got these
data sets I’ve got down the left is what’s called
faceting which is an automatic element of elasticsearch so I could drill in
easily and say I just want data that is about clickstream or I just want data
that is glance views or something like that and then the next screen which I
can’t show you because I’d have to redact all the information but it would
draw driving into one of these data sets is going to be okay here’s the shape of
the data here’s how many columns here’s you know these columns are integers
these columns or floats these columns are strings how many rows are there
here’s some sample data and then you can decide to use it are you gonna use it
well you got three choices one choice is to directly query the data as it’s
sitting there using a tool called Amazon Athena
that’s a surfer Liske we retool it’s actually based on a patchy presto under
the hood but all I have to do it’s already preset as a table it’s sitting
as an s3 object but it acts as a table so I can just run a sequel query on it
my second choice is to say give me a copy of that data because I might want
to enhance it or change it or combine it or do something with it
so it’ll copy that into my s3 bucket and once I’ve done that I can use whatever
tool I want I can use EMR the whole Apache Hadoop family of hive in HBase
and spark I can copy it into my third choice is copied into a redshift data
warehouse and if I don’t have a data warehouse click here it’ll set up a
redshift cluster for me so I have all of these choices in order to do the
analytics and do it easily and what we often see happen is people like data
scientists or data analysts will take that data set enhance it or change it or
make it more interesting and then donate it back into Andy’s as a new dataset and
then part of the notes is to show the heritage to say this data set was built
out of these other data sets and that lets us find interesting things and
correlations we didn’t know one of the data analysts was showing me a
fascinating correlation between Lego sales and weather so now we’re trying to
figure out if that’s just a coincidence like that 10-year period where hemlines
and gold prices were correlated or is this something that actually means
something but this becomes the end of the interesting things that you can do
because having all this information available to everybody
lowers the cost of curiosity it means if I want to figure out what’s happening
here I can figure it out I can give it a try
I can do experiments I can take advantage of the cloud here’s one of the
ways the cloud is so powerful here let’s say I’m gonna do a big regression
analysis that might take ten days to run on a server well you probably know this
at Amazon we charge by the hour actually it’s not true we charge by the second if
you look at an amazon bill and you use something for like a minute and 10
seconds you’ll be charged for one hour divided by whatever that is and a
percentage so running one server for 10 days
or running ten servers for one day or running a hundred servers for two and a
half hours or running a thousand servers for twelve minutes or running say 4000
servers for three minutes costs exactly the same amount so why
wait 10 days when I can get it in three minutes
and this means that the things that used to be very frustrating right the
analysts will set the job to run and go home and come in the next morning and
hope it concluded and just didn’t get an error message now they can do a hundred
of those jobs in a day and actually start getting real depth into the
information and answering the questions that we didn’t know we could answer and
finding that information or taking that information and putting it into machine
learning and being able to do interesting predictive analytics or
interesting correlations on it and submitting that back in once we learn
things I’ll give another Lego example that these people found and then it
works back in our marketing I found this kind of interesting people who collect
Legos will buy all sorts of Legos including Batman Legos people like
Batman stuff will buy Batman Legos and have no interest in other sorts of Legos
don’t try to sell the Unicorn they go to the Batman people they’re just not going
to buy it can you see how this helps us make sure that our marketing is more
relevant I’m sure you can think of lots of other kind of correlations like that
that you can find around citizen behavior you know adult behavior and
other an economic behavior that help you avoid wasting effort on things that will
just be wasted and this really is where the data analytics becomes interesting
not just finding fraud and abuse does that well too but finding new
opportunities and things we didn’t know we’re there so from a technical point of
view it’s just microservices a chain of micro services and we built this out so
we have if you look across the chain you collect the data
that’s the ingest you store the data now mostly that’s about security so as I put
the data into the s3 how do I sign appropriate security on the data because
because as I said it’s accessible to everyone within rules and that Amazon
example I gave I work for Amazon Web Services
so when I went to look up something on Legos I didn’t get the data I got a form
to fill out that said you work in Web Services why do you need data on Legos
so I could fill out that form and it got passed through that data owner I said
well I want to use this as an example and I won’t use any customer information
and they sent back nope there’s no customer information in here so you can
use this one you know within the security parameters now I have
permission so you work all those rules in the store discover is that search
engine I was showing how do we make it easy for people to find what’s out there
subscribe is how you connect to the data that changes every day so when I make a
copy of that data into my own copy to play with I can say just give me one
copy or subscribe me to the daily changes or the hourly changes that come
through and then of course we have the services that actually deliver and then
the services do to the analysis tools like redshift EMR and Athena and it
doesn’t have to be Amazon tools we have great partners like cloud era and and
tableau and others who do interesting things but it’s all the collection of
services now how to do this is pretty straightforward but there’s a lot of
pieces and there’s a lot of work so we had customers say well why can’t you
just automate this so we said hey that’s a good idea so we call that lake
formation so this is a service that is still technically in preview you can
sign up for a preview right now which is what we call a public beta and it will
be generally available very soon but this is about building out the kind of
data at Lake I just talked about in an automated way so that means it comes
with these blueprints to connect to various sources so as I said in our
world those sources are dynamo Postgres streams and other in your world they
might also or instead be oracle or db2 or mainframe db2 or some agencies and
formics or Microsoft sequel server or whatever combination that makes sense so
we put together those sources with blueprints to make it easy to pull that
data in and then crawl the data this shows how bad we are at marketing that
we use large bugs to indicate how you want to deal with your data it’s
supposed to be a crawler and then that’s how I get it from the raw and to the
cooked and then in lake formation it has simple
security management to help me figure out how I do table level row level
column level security I am is Identity and Access Manager that’s my security
rules kms is my key management sister that’s to manage the encryption keys
because everything should always be encrypted in today’s world so it’s
automatically encrypted emotion and at rest that’s how you manage the keys we
don’t keep the master key only you have the master key that way if we’re forced
to hand over your data by some agency it’s encrypted we don’t want your data
we have enough trouble keeping up with our data your data is your business and
then finally on the right to have the find of self-service and combined
analytics so the interface actually looks like this where I just have some
buttons to click to create a data flow or setup table permissions or search my
data catalog or monitor the activity let’s see what’s happening so lake
formation is a zero dollar service it’s free why would we do that well the
services underneath aren’t free what it’s doing is it’s kind of a set of
wizards to configure other services like glue and like cloud watch to make it
easy for you to set up this kind of a data link if you’ve been paying
attention to Amazon services we’re doing a lot more of these so for example
control tower is a set of wizards to do the same kind of thing with security app
mesh is a is a zero dollar service to do the same kind of thing with micro
service connections so the lake formation is help you do that with data
leaks so to mention those services underneath briefly glue is both a data
catalog and an ETL tool so it writes ETL jobs in Python execute them with PI
spark so it’s a way to help move data and and cleanse data and get it from one
place to another EMR is Hadoop so this is 21 open source
project so dupes bar case for HBase presto Zeppelin Livi I can’t rattle off
all 20 once a lot what’s important is that it’s really easy to set up and go
if you’ve ever installed any of these things it’s like the scavenger hunt of
your nightmares where you’re trying to pull all these open source things and
get them to all work together here you just pick which of the ones you want
hit up to 21 checkboxes and hit go and it sets it up for you what I really like
about this though is this is how we can get cost effective because I can spin up
the cluster in minutes run my job spin it down and only spend money when I’m
running the cluster this makes it really easy and really cheap to do these really
expensive ental analytics so an example of that
our friends at BC Hydro they need to do models of hydroelectric output on a dam
with a lot of input that has to do with weather water temperature other things
to figure out you know how what are the settings I want to put on my turbines to
get the most out of the dam without you know flooding any communities or
breaking my voltage or whatever so they used to run these things on local
hardware they had you know servers at the dams it took about 20 hours for a
run and they needed to do at least 10 runs a month and it takes 20 hours for a
run and you all know how Murphy’s Law works it breaks in our 19 right and then
you have to set it up and do it again. So we put this on the cloud. So what happens
when I do the same models on the cloud 10 minutes to spin up the cluster an
average of 12 minutes to run the model five minutes to spin down the cluster
$10 per model and then they figured out hey we can do this using AWS spot. If
you’re not familiar with spot spot is spot capacity it’s about 80% less
expensive but we can take it back with two minutes warning, so you run it on
spot and if the spots not available then you spend $10 if they run it on spot it
costs them two to three dollars so we just took what was costing a huge amount
of money and by the way required like IT staff to support servers in dams which
is not a great environment to you know it vibrates a lot and it’s wet so it’s
not the best possible place to put a server farm and took staff that had to
manage all that stuff remotely and turn this into something that is cheap easy
and repeatable and getting cheaper over time and it lets so because this is now
so inexpensive they can do what they couldn’t afford to do before which is
say okay well this is what that model predicts but what if the weather
forecast is wrong and this is the actual temp
what would that change. So we can actually run multiple models here and
really dive into how we can how BC Hydro can do this as efficiently as possible.
So again this is a whole idea of having data having available to everybody who
needs it have it easily accessible in this case where’s the data coming from
it’s all streams or it’s batches from public sources and then be able to run
it quickly and efficiently through systems that can be spun up and spun
down so you only pay for it you’re using. It’s not like the old days where you buy
a car and you hope you predicted what you need for the next five years it’s
more like riding in a taxi you only pay for when you’re moving except of course
you get to drive. tThe other service that you showed me talk about here are you
that you saw me talk about here is Amazon Elastic search service elastic
search being a popular search engine. It’s actually a rest engine on top of
the leucine Java library which most search engines today are built on it’s a
elastic search is easy for develop on lots of standard templates extremely
widely used and really difficult to manage so customers came to us and say
could you make it easy to manage so we said okay so the elastic search service
does not change elastic search it just provides a management here so it makes
it you know a couple of clicks to set up the cluster grow it shrink it spin it up
spin it down whatever you want to do with it. And then we have that query told
I mentioned Athena so Athena is instead of setting up the EMR cluster just
running the query so why doesn’t BC Hydro do this with Athena while their
queries a little more complex and they need some stuff from SPARC and some
other pieces but if I’m just doing a sequel query I can do that directly with
Athena I don’t have to load anything I just treat the object on S3 as a table
and do a select statement or an insert statement or an update statement. So we
find this is a great way to do ad-hoc queries or infrequent queries. So I’m
going to be doing the same query over and over I probably want redshift which
is that data warehouse and then the last thing I’ll drill in a little bit is the
visualization so as we said quick site is our tool to do visualization and
simple machine learning they set a level that everyone can use it so you get
dashboards that look like this and they’re interactive dashboards I can
drill down on them I can pick any of those areas or I can
pick one of the graphs and drill down and get lower information right so this
might be a graph at a Canada level and I can drill down on it and get it at the
provincial level and I can drill that on that and get it at the at the municipal
level and kind of be able to see that information easily and you can embed it
in other things. So this is a fun one the u.s. national football league you know
football players today are actually data centers that weigh about 200 kilos and
run into each other at high speed so each football player has something
like 15 to 30 sensors on them and then measuring like acceleration and speed
and how hard they threw the ball and so on
last year the NFL decided that every team gets every other team’s data so
every team went out and hired a bunch of data scientists and then they use the
quick site in order to distribute that data so everybody can use it
it’s the embedded thing here is you point out it’s they’re not looking at
quick site they’re looking at an NFL app stats application this is just embedded
inside so it’s an easy thing to do programmatically and so we get something
like that Super Bowl 52 as time ran down in the first nick foles was about to
complete one of the most improbable plays of the game with each tick of the
clock AI from Amazon Web Services processed thousands of data points to
generate real-time insights proving that a 19% chance was all fools needed to
change the course of history welcome to the next generation of
football okay we’re in public service our stuff is maybe less popular than
sports but a lot more important but being able to put those kind of live
predictions and live information is great. One of the things I love about
this is in quick site I can automate anomaly detection I can have it do all
the text that you see on there was automatically generated and have it all
by itself look at the chart and say you know revenues across Canada were up 3%
but usually Quebec represents 16 percent of revenue in this month it was 19
percent and then start drilling down to see why these anomalies were there or
forecasting so here’s my chart that shows something whatever I care about
for six months just drag it out to the next six months and there’s my forecast
and it’s an intelligent forecast that’ll capture things like seasonality and
trending and a good confidence band. So and then here’s an example of some of
that plain language narratives currently sorry English only I tried to convince
my French translator to translate all that stuff in the French and she said no
that’s too much work but we will be doing the automatic generation is
English now it’ll add Spanish French and some other languages in the next month
so we’re able to actually just not just do the anomalies in pointing to you but
actually spell them out and you know be able to give you so the things in green
and red I know it’s a little hard to read or just automatically the green
ones are saying this is above the target the red ones are saying this is below
the target so it’s easy to pull those things out in plain language which also
becomes really good feedstock when I’m wanting to do say machine learning
because then that becomes easy stuff for machine learning to use to do analytics
and create machine learning models. So we have a lot of Canadians who we are very
proud have chosen Amazon as their partner. We have stories about all of
these more importantly they have stories about how they’re using Amazon Web
Services for some sort of analytics to help them do their mission better
right and get back down to the point I started with to help people make better
decisions to help the operators at BC Hydro better manage the turbine settings
and the electrical output of the dams to help people at EMPAC better make
decisions about land management in the province of Ontario to better help the
people at at ICAO better help Airlines optimize
international route operations and so on and all of these are stories about what
people can do to make better decisions and help make better decisions. So with
that said I decided to finish about 10 minutes early so I have time for some
questions so let’s see where’s III know somewhere out here is is our lovely room
host with a microphone there you go. y’all can turn the lights back up now or
maybe you can okay so if you please if you do have a question what do we want
to let them to raise their hand or just yell it out or what I don’t know or it
was my monitor was my boring monotone enough to put you all to sleep and
you’re all just kind of waking up now. okay it’s the shirt yeah as as my one of
the people who works me pointed out if I get hit crossing the street all the
driver has to say is I was blinded by the shirt and maybe no problem yes. sir
test one two. very nice shirt by the way thank you.
can you please maybe elaborate on the blockchain services that you guys are
offering? So I just gave a whole hour in analytics and you want to hear about
blockchain no so I just got a question here ok I’m not going to ask who’s heard
of blockchain because you all have right how many people here think you have an
actual use for blockchain. ok maybe it’s the lights but I see like three hands so
let me it’s a few things about blockchain one
all new services kind of get overhyped and then kind of drop off a bit and then
come up into usefulness have you ever seen anything that got hyped so much and
fell off so steeply the last time in my career I remember a technology being so
overhyped and so under-delivering it’s probably cold fusion right which
turned out to be measurement error and not actually a scientific breakthrough,
but I believe we are right now going through the transition of blockchain
being an overhyped technology into something that’s actually useful for
real work. So if you’re not deeply familiar with what blockchain is it’s a
immutable ledger technology so it’s keeping records that can’t be changed
that are distributed among multiple locations so that you don’t have to
trust any given source you only have to have at least well as it works out
mathematically and leave at least 2/3 of the people involved have to be mostly
honest most of the time and then you can catch it this is actually called the
Byzantine general algorithm that’s used from where you have to assume that at
least some of your generals are going to stab you in the back, but this gives us a
way to distribute information and a trustable useable way across a network
so that I can do interesting things. For example wanting to do micro loans we’re
working with the government of India in order to set up block chains that go
across multiple institutions to distribute the tranches for micro
lending so that we have ways to be able to give out millions of small loans when
I say small loans I mean $100 or less but in India you can start a business on
that and be able to distribute and manage these where it’s not owned by any
one organization because to be honest the the Indian government has found that
the kanto is even trusts their own departments sometimes with money but if
the departments will don’t trust each other either so if we set up this
distributed trust they keep each other honest. I’m seeing it being used
interesting too in Peru to manage land titles so what we found is that there
are open source standards that create these systems so I’m not talking about
Bitcoin here I’m talking about things like hyper ledger fabric which is an
open source way to have distributed applications.
What we’re doing with Amazon manage blockchain is simply making it simple
and easy to deploy those fabrics so if you want to set up a hyper ledger fabric
you create the fabric go into the console it’s about 20 clicks you have to
pick a few things like how big do you want it to be and how many members and
what will the voting rules be you know does the creator own it is it majority
wins whatever then once you set that out you just issue invitations so you can
invite anyone else who has an Amazon account and if they accept that
invitation then they will be adding some servers into the fabric and they pay for
their resource. So it’s very simple to set up this fabric and manage it and so
you can focus on the apps and running it so in the first generation we’re doing a
hyper ledger that’s what is now live we will soon be adding etherium which is an
alternate method which uses a cryptocurrency
to manage the resources cryptocurrency also called aetherium, but it’s an
interesting area I don’t think it’s going to change the world but I think it
gives us another good option for data technologies that you know goes in that
list of things like relational key value in memory graph and a blockchain that
serves specific needs. I hope that was reasonably clear I don’t have any slides
for this with for this presentation but it’s we’re starting to see a world now
where right tool for the right job so things like he value databases document
databases graph databases make more sense for a lot of jobs than the old
table and column relational databases so the key is picking the right technology
for what you want to use. All right does anyone have an easier question yes sir. I I got questions about the Amazon
Elastic search the service has been around for a while and looking on the
website open destroy elastic search is coming out with the user authentication
yep do you know when they’ll be on so let me define a few terms here. So
elastic search the open source elastic search has been around since 2009 Amazon
launched the Amazon elastic search service in 2015, so we’re coming up on
the fourth anniversary and it’s been a very fast growing service and we have a
lot of customers using it although what I find amusing is only about 1/3 of our
customers use it for search more of our customers actually use it for analytics
which is also a very good tool for the problem that we had is elasticsearch is
open source there’s a company called elastic dot Co that supports it and
manages it an elastic dot co stopped putting the innovations into the
open-source version. So what they’ve been doing with the way that they’ve decided
to their business model is to have non open-source software that’s extends
elastic search and we had a lot of customers come to us and say I don’t
like that I’m using open source I want it to be open source so in collaboration
with Expedia and JPMorgan Chase and about 50 other customers we put we
helped put together this idea of an elastic search open distro so it’s a
fully open source version of elastic search that’s adding some of the
functionality that is not currently available in open source like user
authentication like other security like encryption and so on so the open distro
is going the open distres currently available the user authentication is
currently in there in beta and I believe the plan for that is to for it to be
live either very late in May or very early in June and we’ll start seeing
more and more innovation we think on elastic search open distro. So again the
point as we normally do at Amazon is to find a our customers tell us we want to
run this open source thing but run it easily and we’re providing
infrastructure around it so that’s what’s going on elasticsearch. Hi I think
one of the I can see you where are you oh there you are ok do a short. I think
one of the major challenges I can see is that how do you actually build trust
with your partners and clients and customers so that they share with you
the current and relevant data that you’re looking for. How do you get
customers to give you data especially since they’ve been taught not to trust
anyone right because they gave all this data to
places like Facebook and are not happy with how it got used so part of that we
find a lot of that is transparency what are the more interesting things that
I’ve seen happen about making data open is some of the you know the open data
movements city of Chicago this in my lifetime but the city of Chicago has
become a leader in transparency with data. They have put the majority of their
corporate Civic data online and they just made it publicly available and then
people on their own made applications out of it
so one of my favorite applications that was just made by some guys in Chicago or
maybe girls I don’t know who was it is called where’s my snowplow dot com and it
lets the citizens of Chicago figure out where the snowplows are and when they
notice that a snowplow has been parked in front of Denny’s for about three
hours they have a tendency to you know call up the Chicago Roads
Authority and say hey where’s my snowplow except they use more profanity
because they’re in you know Chicago that’s the way it works there, but this
has actually been a way to help citizens better understand what’s happening in
their City and similar applications to do things with price with property taxes
and sewage treatments and zoning issues and actually getting citizens involved
in zoning which is you know I used to serve on the City Council in Portland I
could never get people to care about zoning unless it was their house that
was being affected and it’s but you know we all know this really you know
everyone wants to talk about the prime-minister but that local zoning
commission affects your life a lot more than who’s sitting in the prime
minister’s office. So these kind of things really matter for people but I’m
not really answering your question about how do we get the data part of that is
thinking about what data do I really need to make a better decision and then
what proxies can I use for that data so often data that we think we need from
people we can find a good proxy that can help us make a decision. Analytic data
this is one of those big keys doesn’t need to be precise it needs to be
accurate right precise means what’s the difference between my revenue will be
1.8 1 billion and 1.8 2 billion who cares you’re not gonna make a different
decision based on that, accurate is my revenue is going to be a
a bit more than last year or a lot more than last year or a little bit less than
last year that’s what you used to make a decision.
I don’t need exact information on every citizen to be able to make decisions I
mean directional data that helps me understand what are changes in people’s
lives not each person’s life because the analytics that help you make decisions
unless you’re actually doing social services or medical care should not be
focused on a person there should be focused on groups and trends and that’s
the kind of work that we can do and have been able to do effectively. So I will
throw a couple of things that we can do to help you with this whether you know
it or not if you’re an AWS customer you have a Solutions Architect who supports
you if you don’t know who that Solutions Architect is ask your Amazon
representative and they’ll be happy to hook you up. That one of the things that
solution architect can do is help set you up we call an immersion day which is
we come to you with a team and give you one or two or three days of hands-on
education on how to use and manage these services and maybe help you build out a
prototype for what you might use. he other option that we give that we offer
to important customers and the government entities in Canada certainly
count there it’s what we call a data lab. data Lab means you send four to eight
people to us for about a week we lock you in a room with a whole bunch of
Amazon engineers every couple hours we slide a pizza under the door
you know developers or machines that turn pizza into code right so you got a
you know provide the pizza and it can’t be deep-dish pizza it’s got a Neapolitan
to fit under the door but in any case you know feed it feed them up and over a
week build a prototype. So these are both programs that there’s no charge for you
but it’s except of course for your time and attention which we know is very
expensive but it’s ways that we can help you do these things and accelerate and
we would be honored to help you do that please talk to your solution architect
and they can help us set those up. Did I almost answer your question there almost
okay. And I think I’m out of time. Thank you so
much Darren.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *