Michael Nygard
Cloud: any rapidly-provisioned, on-demand computing or
application services.
Grid: large-scale computing services, typically hiterogeneous,
distributed, and parallel.
Utility: "pay for what you use"
IaaS/PaaS/SaaS -- even though names are similar, they're not
Most of this talk will be discussing Cloud + IaaS
Recent on-demand models:
- LHC, due to the amount of data generated by every collision
- Utility computing: just a billing method; variable demand;
ceiling usually capped
- Good for holiday volumes: to handle 2 months of the year
they're overspending the other 10
Trends leading to clouds:
- Virtualization
- Commoditization
- Horizontally scalable architecture
- Rapid provisioning
Virtualization:
- Be able to factor out devices and drivers
- Applications can be as messy as they like (ISOLATION)
- Take advantage of idle processing
Commoditization:
- 43 Dell M600 for cost of 1 HP 7410
- Nothing in the former is hot-swappable, and it WILL break
- ...but you'll need to be able to fire up a new server
really quickly
How is virtualization + cloud different? Related to who does what
(admin or user), how quick + automatic provisioning is (if it's
under programmatic control).
Cloud: specialized hardware is abstracted away -- load balancers,
firewalls, SANs, etc. These get turned into declarative aspects
rather than actual boxes. ("Environment: I need to allow a
network connection from X to Y, and I need X GB of storage...")
Choosing a provider:
- like corn farming: small margin, large fixed costs, scale
essential
- small providers won't be around long
Cloud near you ("private cloud" or a "fog"); some people don't
call this a cloud, but since it fits a number of the
characteristics above MN does.
Easiest way to get ready for cloud: do nothing! It'll work, but
you'll miss out on some items.
- Advantages.
- Scalability
- Bundling: be able to paper over operational failures
- Ephemerality
- Risks:
- Availability: there are more moving pieces, which means
risk to availability
- Geography
- Ephemerality
EC2 quirks:
- "clean boot" is REALLY clean
- local storaage not persistent
- IP address not predictable
Controlling the flock of EC2s: talking via an API to your
instances. You can talk via SSH or rdesktop to a single one
also.
Bundling: start with an amz image (?); do stuff (install,
configure), then you store it as S3 resource.
- Can automate this and have your build server churning out
fully bootable images.
- ~half of all failures come within 24 hours of change; this
is because of bad change control, lack of automation
- Plus side of ephemerality -- can fire up instance for user
acceptance testing, then once it's good it's just a matter of
assigning IPs. Once that's working for a while, you can just
blow away the old machines.
- This is like trucking in a bunch of machines with every
upgrade.
Handling concurrent versions:
- web assets: old version in URLs
- integration points: version all protocols and encodings;
client has to specify which one it's using
- DB: migration automated; split into phases; expand first,
contract later ("refactoring databases" has info on this)
Scalability: master/worker pattern
- Workers pulling work off queue; master watching the workers and
time to complete jobs; will fire up new EC2 instances if the
work is taking too long(github recently talked about this)
- NYTimes: converting 11 million articles, firing up several
hundred servers for a weekend
- animoto.com: were able to go from 50 to 5000 processing
nodes in 1 week when they got popular (facebook, etc)
Map/reduce, hadoop; many workers in parallel; resilience thru
re-execution; move computation to data. Amazon has 'Elastic
Map/Reduce' which hides you from network topology requirements.
Scalability: horizontal replication -- make read-only snapshots
of database, let them serve up data that changes very rarely.
Now have the ability to use declarative load balancing,
automatic scaling. Every load balancer out there requires you
to configure the workers in the pool, and when configuring must
restart. This is a problem when new nodes can pop up
automatically. Specify some tolerance of use, and if you go
outside it we add another node; this gets put into the auto
load-balanced group. (Takes about 5 minutes to react, "So if
Oprah mentions you on twitter you're still dead.")
Natural affinity between cloud computing and open source.
- Cache servers and in-memory data grids
- essential components; translate things you like to do least
in the cloud and turn them into the things you like to do
most.
Mitigating the downside of ephemerality: anything can disappear
at any moment.
- Failure of block storage: snapshot data to S3, so a parallel
system can be setup and read from the snapshot.
Another pattern:
- Decentralized adoption of cloud computing (he's paying for EC2
instances with secretary's credit card)
Security:
- Perception of risk is just that, stem from idea that security
is either perfect or worthless. Security profs don't think
this way, more about risk data. WIBHI reaction ("Wouldn't it
be horrible if...")
- Cloud security alliance working on this
- Control plane threats (set of APIs who determine who can access
what) any managed hosting provider has this issue
- Patent shutdowns: be sure you have strategies for extracting
yourself (VM + data) from your provider
- Lack of risk management info
Facilities to apply
- Data in motion: transport level encryption
- Data at rest: storage level encryption
- Access control: security groups (~VLANs and firewalls)
- Control plane: not under your control
- Multitenancy concerns: not under your control