INI Properties
Properties files are useful for configuration, but they're still too wordy and the hierarchy or partitioning is left as an exercise to the reader. What if we were to make it more explicit?
Old:
# embedded web server -- don't turn this off
# unless you know what you're doing!
service.jetty.enabled = true
service.jetty.host = *
service.jetty.port = 80
service.jetty.minThreads = 100
service.jetty.maxThreads = 4000
service.jetty.lowThreads = 75
service.jetty.acceptQueueSize = 50
service.jetty.ssl_enabled = false
service.jetty.ssl_port = 8443
service.jetty.ssl_key_password = changeme
service.jetty.ssl_keystore_path = ${path.data}/keystore
service.jetty.ssl_keystore_password = changeme
|
New:
# embedded web server -- don't turn this off
# unless you know what you're doing!
[service.jetty]
enabled = true
host = *
port = 80
minThreads = 100
maxThreads = 4000
lowThreads = 75
acceptQueueSize = 50
ssl_enabled = false
ssl_port = 8443
ssl_key_password = changeme
ssl_keystore_path = ${path.data}/keystore
ssl_keystore_password = changeme
|
The data are exactly the same, and the transform is straightforward (prepend the section name plus a '.' to each key). But you get explicit partitioning with the INI section heading. As a result the keys are much easier to read, particularly for operations people who aren't used to parsing structured text with their eyes.
I haven't seen this anywhere else, but it's so straightforward I can't believe it's new. In terms of getting it done, implementing a subclass of java.util.Properties looks... interesting. (It would be a lot easier if it were an interface.) It would be easier if you could just needed read-only access, but I do have the need to write the file back out when we programmatically add new properties. I think keeping metadata about which section a particular property belongs to should be pretty straightfroward.
The nifty little ini4j library does INI manipulations and has an extension for java.util.Properties that allows property replacement (like the ${path.data} item above), but it doesn't do the partitioned sections. Maybe building on that makes sense...
More things I can do without
(curmudgeon edition)
TL;DR: Do you have to wear your ignorance like a badge of honor? Read the fucking article or go home.
Stand/drive to the right: You are not the only person in the universe. Other people have needs different than yours. Be aware of the world around you, move over jerky.
Driving fucked up: I've done this more times than I'd care to admit (a long, long time ago, complete idiot), but the consequences just aren't worth it.
Put your transformation code in message?
I've been thinking about architecture recently. One of the prevailing paradigms is messaging, since it can minimize coupling, enable scalability, and provides very flexible plumbing for future extensions.
One of the core problems you need to confront is the payload: what's in the message? Is it action-based data -- like commands in CQRS? Data transfer objects? Domain objects?
One scaling strategy is to distribute your load across machines, which also enables you to update these proceses independently. Once you get into that realm you have to worry about versioning the data. You can try and fool yourself into thinking you'll get the canonical version of the data correct the first time, but you won't. The world changes too often for this to happen. Some popular serialization frameworks (protobuf, Avro, etc.) make both forward and backward compatibility a priority for this and other reasons, but they may forbid certain types of changes (renamings, merging, removing).
I recently had a random thought about this versioning and compatibility problem: what if you put references to the code necessary to transform between versions in the message itself? It echoes the REST thesis discussion of code-on-demand -- if your clients are incapable of consuming the media types you're serving them, give them the tools to do so at runtime.
The idea is that you can build your client to consume a particular version of a message, and when you get a message of a different version you ask the message for the code necessary to transform itself to the version you expect. That code could be anything -- typically XSLT is used but that presumes the message is XML. It could also be JavaScript -- the ubiquity of JavaScript interpreters would make it easy to embed into any sort of client. And since you wouldn't be sending the code itself, only a reference (URL whose content can be cached) it's not adding a huge amount of overhead.
For example:
Message:
{
version : '1.2',
type : 'Address',
address1 : '555 Main Street',
city : 'Pittsburgh',
region : 'PA',
postal : '15260',
country : 'USA',
transform : {
'1.1' : 'http://.../address/1.1_to_1.2.js',
'1.0' : 'http://.../address/1.0_to_1.2.js'
}
}
- Client receives message, recognizes it as version 1.2. But it expects version 1.1.
- Client looks up the transformation script for version 1.1 from the message.
- Client fetches 'http://.../address/1.1_to_1.2.js' and executes it with the message as an argument. (Since we're using HTTP the server can set its cache headers so that the client doesn't need to do this for every single message.)
- From the transformation the client gets its expected version of the message and continues processing.
You could even add another layer for efficiency by specifying a resource you can query to get specific resources:
{
version : '1.2',
type : 'Address',
address1 : '555 Main Street',
city : 'Pittsburgh',
region : 'PA',
postal : '15260',
country : 'USA',
transform : 'http://.../address/transforms.js'
}
Client:
GET http://.../address/transforms.js
Server:
{
'1.1' : 'http://.../address/1.1_to_1.2.js',
'1.0' : 'http://.../address/1.0_to_1.2.js'
}
... proceed as before ...
Another variant is to specify a generic resource and execute it with both the message and the desired version.
Yet another is to return function references as the object values instead of URLs.
Yet another is to not inlude any transformation reference in the message at all, relying on the client's knowledge of the server to be able to look it up as needed. While I like standalone messages this isn't as crazy as it sounds, and could rely on work being done in creating an HTTP client workspace so we can really do HATEOAS. (Solomon, among others, has been banging the drum on this a while, and has some blog posts about it).
Of course, this is all part of a trade-off -- you're making it easier to change your messages and for heterogeneous clients to interact with your system, while imposing some costs on them (JavaScript interpreter, more complex client that resolves message diffs at runtime) and on the server (need to distribute a new 'message migration' every time you change a message).
That said, clearly there are issues with this:
- Security risk: you're executing random code off the
Internet!
True, but browsers do this all the time and my impression is that we have pretty good sandboxing mechanisms for JavaScript. But it could certainly be a showstopper for organizations that have a strict policy about this, though it's possible that it may be more appropriate for internal/partner use than the internet-at-large. - Won't scale!
Maybe. While we're introducing a number of places for clients to make things faster (cache the transformation script content, cache the compiled transformation script) this could introduce too much latency for message systems requiring extremly high throughput. However, such systems may typically trade-off performance for deployment inflexibility (ensuring all clients using same versions), negating the whole need for it. - You're bloating my messages!
True. Though I mentioned one way that would add no data to the message at all, the most common way would. It's a trade-off. - JavaScript sucks!
I'm sorry you feel that way. Plugin your favorite dynamic language, or even use remotely loaded Java classes instead. (.NET probably has something similar.)
Anyway, I've probably spent too much time over such an idle thought.
Corollary to Shaw's unreasonable man
The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.
George Bernard Shaw
Corollary: Sometimes the unreasonable man is just a prick.
Valuable phone context
I know that even voice response systems that are not state of the art can tell you, the phone answerer, how long I've been waiting. So in the interest of treating both of us as human beings, I think it would be useful to say something like, "Sorry for making you wait eleven minutes, how can I help you today?" People who react belligerently to that will do so to anything, so there's no helping them. We don't all have to be cogs.
New features and new markets
I haven't been posting much anywhere recently, just been in very productive heads down mode. (Holidays can be awesome for this.) But I'm coming up for air and said this yesterday: "what if we approached creating new features like we approached entering new markets?"[1]
It was in direct response to this article from Tim Bray, Doing It Wrong. The "it" in this case is enterprise systems, and the piece is worth a few minutes of your time, and the comments are a good indicator of the tensions he's talking about. I think it goes past the typical whinging of a whippersnapper making fun of big systems, boasting he can get one done in a few weekends while belittling the its developers as 20th century nimrods.
There can be a lot of infrastructure behind developing features -- documents for requirements, planning, design, roadmapping; meetings to gain consensus on all these; staging meetings during this bring different organization layers together to chime in. (This doesn't scratch the surface at some organizations.)
We assume these are all necessary -- for predictability, to get feedback and input, to best reflect the organization as a whole and benefit from institutional memory, to let the sales organization whet the appetites of customers and channels and allow those groups their own planning cycles.
But what if all we're doing in this process is delaying until the very end the only part that really matters? Putting something in front of customers and figuring out what works by having them use it, and doing that again and again until we nail it.
When you enter a new market you know you're ignorant.[2] So you know you have to work with customers to figure out what it is you're building. You know you have to run lots of ideas past them, and a depressingly large number of those ideas will stink. But you're relying on them to sort the wheat from the chaff. In a sense you're completely dependent on these customers, because the potential is there to wildly oversteer past them and fail entirely.[3]
So, how is that different in kind than developing new features? Are you really that independent from your customers that you can get some ideas from them, go off and develop something, then hand them a 'finished' feature and expect it to work? Is your product team in touch with your customers, or are they relying on bullet points from a slide deck delivered by a biased speaker to just the PMs?
It's certainly different in scope, but I think that getting a sufficient number of features sufficiently wrong over time will kill your product just as dead, even though you can't oversteer as much with a single release.
And before you object that this sort of product development is impossible, that it costs too much and that it has too dramatic an impact on your organization, ask yourself: what are the costs of little failure after little failure? Or what are the costs of your customers being disappointed? Or of your developers working on fix after fix to a 'finished' product because it didn't meet what the customer had in mind?
[1] No, I am not normally that cryptic.
[2] Beware, I am speaking from vast experience, having worked on a team that brought an existing product to a single new market.
[3] I think you can mitigate some of this by bringing domain experts on your team, among other actions. But you have to make sure they're really good or you're sunk, because their experience may be hugely different than your target and your day-to-day exposure to that will permanently impair your view of the domain.
Christmas goodies
- Fahrenheit Fair Enough by Teflon Tel Aviv. Work music, defined as "stuff without words that fades into the background so I can work." I rarely get to hear non-work music anymore, so lists like my cousin Kirk's make me feel a little out of it. I heard a track off this backing a youtube video, but didn't get the CD because it was too expensive ($30 or something). Perfect gift.
- Essential Software Architecture. Super geeky, but I feel like I should be able to describe what our architecture is better, and in a common way.
- Brain Rules. Should provoke one or two blog posts, or at least a tweet or two.
But it's really about Ella, who got tons of stuff, especially since her birthday is two days after Christmas. A new dollhouse with a few rooms full of furniture -- it's detailed at the playmobil level, but made of wood instead. Very cute. (Assembled by me on Christmas Eve night, another notch in the dad-experience belt). An armful of puzzles, which she recently turned on to in a big way, seemingly overnight. A Diego Rescue Center from Grandma, picked up by Barb for $10 at a consignment sale a couple months ago.
She also got loads of Dora and Diego stuff for both Christmas and her birthday -- books, little people ballerina Dora, Dora and Perrito (bigger), pajamas (worn already for 24 hours straight), puzzles, even a Dora sudoku book (I'm not quite sure how that works). This is a downside of having the two celebrations so close -- she loves Dora and Diego now, but what about six months from now? It's a kind of toy bubble, and it's fed by well-meaning relatives who need a hook ("Dora!" "Puzzles!") so they know what to get her for presents. I sympathize; it's not easy buying something for a three year-old unless you have one of your own.
When you're an adult this happens too. If people know you like cats you'll get all sorts of things with cats on them -- mugs, posters, mouse pads, sticky notes, coat hangers, photo frames, aprons, etc. When he was a pilot my dad's call sign was "Chilly" -- "Chilly" "Winters", get it? -- so every year he always got a few things with penguins. But it's mostly junk.
I think the only ways to get around this are to be a jerk: either demand cash/giftcards, or give people a list of what you want and insist they stick to it. But then you have the issue of list coordination, which the Amazon wishlist takes care of nicely. Another way is to avoid it entirely: tell people you don't want presents, and you could even couch it in an environmental/anti-consumerist message.
But the latter isn't an option for a three year-old. So it's Dora Dora Dora the Explorer this year. Next year?
QCon 2009: Wrapup
QCon was a great conference. Loads of learning, hugely smart people. Logistically I thought everything went very smoothly, with the exception of not having at least one or two power strips on all the rooms. Wireless was even a piece of cake.
Here's a not-so-brief overview of my five days at QCon and in SF.
Monday
Attended "Seduced by Scala" led by Dean Wampler. Dean is a great teacher -- very clearly explaining some complex issues, and admitting lack of knowledge on the one or two occasions it happened. Some might see that as an issue (more on something similar later), but it's just reality that we're dealing with sufficiently complicated topics and technology that nobody will know everything.
And even if a teacher did know everything about a topic, it's likely that she'd have little experience with it in the real world. I prefer my teachers to have experience over exhaustive knowledge.
I also liked that Dean seems opinionated, though not in an obnoxious way by any stretch. More like something Dan North mentioned in his talk on Wednesday. He called out one of the characteristics he found in architects he admired: self-belief, as in the courage of your convictions.
I like Scala, but don't yet have any experience using it in anger. It's possible we might be able to develop a subsystem in it, particularly if we need to take advantage of executing work over multiple machines (using something like Akka). But we'll see. I explicitly didn't use any IDE during the class, so I can't say if the IDEA plugin was helpful.
Monday evening I got to hang out with my friend Trudy and her fella. We hadn't seen one another in quite some time, and it was great to catch up.
Tuesday
Attended "REST in Practice" with Jim Webber and Ian Robinson. This felt a little funny because I know REST, at least the basics, okay. But I figured these guys would have some experience using it from an end-to-end project, and possibly guidance about problems, objections people have and whether there are answers. I didn't know when I signed up that they were writing a book on it (coming out from ORA, should be great).
Anyway, the talk was fantastic. Even though they basically just talked for six hours, the time flew by, and they peppered real-world experience into their cogent and unusually clear explanations of the standard concepts (URIs, resources, scalability, caching, universal methods, etc.)
At one point during their discussion of hypermedia a light went off for me, related to embedding hypermedia actions in your resource. (I go on about this more in my notes.) I'd always had the limitation of thinking of the resource representation as a blob of data, but that's only part of the picture. It may be a standard RPC-oriented way of thinking, but it really misses one of the main points of REST.
I had some concerns about using this with resource actions that might vary per user (think workflow actions on JIRA issues). But I talked with Jim Webber on Friday after the conference was over and he confirmed my thinking that this happens infrequently enough that you can either just use the resource + custom actions as-is, forgoing caching benefits, or push it down further into a resource.
Anyway, if you have a chance to see them and you're at all interested in REST, you won't be disappointed.
Tuesday night I wandered around a bit, then went for some sushi, making the mistake of ordering a special off the wall that didn't have a price. The bill was very surprising.
Wednesday
The highlight of the day was learning about Domain Driven Design with Eric Evans (Mr. DDD). I'll probably write more about it in the future, but it was one of those talks that wasn't anything earth shattering, just clicking a lot of things into place that make a lot of sense.
Dan North's talk on teams and change was wonderful. One of the bits that agile talks seem to encourage is the human interaction, which seems sorely missing in a lot of software development writing. Alexander Cockburn demonstrates this in the wonderfully titled Characterizing people as non-linear, first-order components in software development.
Continuous deployment was also really interesting. I mentioned to someone going into the talk that I thought I'd come out of it thinking that it's something you should be able to do by default, and only not do it when you've got good and concrete reasons. It makes a smooth deployment model possible.
I didn't writeup anything on the Wednesday keynote, but I liked it. It was two VCs discussing some trends they saw, outlining things they looked for in ventures and some trends in opensource. They were open and funny -- it's probably a talk they've given a number of times, but it didn't feel that way.
Thursday
- Better Architecture Management Made Easy
- OpenSocial in the Enterprise
- Software Architecture for Cloud Applications
- Architecting for the cloud: hoizontal scalability
- Agile Development to Agile Operations
The first talk was by a vendor, but a technical one. People seem allergic/hostile to these, but I think they can be great. The demo must be given by someone who knows what they're doing, and they need to put the product in a context and as means, not an end.
The second talk, on Open Social, was also very well done. It carved out a little space for itself and filled it nicely. I like the model and will look into implementing it for our product -- it would be interesting to see other clinical vendors use it as well.
This was the first conference where I'd seen Michael Nygard. I'm a huge fan of Release It! -- as were a lot of other people I talked to, saying that it's "required reading" for people on their teams. He gave a solid, clear overview of clouds and their moving parts, pointed out some trends and injected a few well-earned opinions. Very useful for a cloud newcomer like me.
The other two were also ostensibly related to clouds. I liked Adam Wiggins: he clearly knows his stuff (he probably dreams about it), and what they're doing at Heroku seems spectacular. But the discussion was pretty shallow, touching on a number of pieces without going into any in depth, or fitting them together.
Stu Charlton is another guy who's clearly on top of things. His discussion of the tensions between development and operations was informative, and it seemed like he could talk about it for hours if you wanted. He was a very disciplined speaker though, and sounded like one of those folks who have given so many talks that he had an internal clock telling him where he was in the flow. His identification of trends where things are going to get worse, and some ways out of it, was also great.
The Thursday keynote was by Don Box, and if you'd been following on twitter you probably saw quite a bit of discussion about the crash and burn. I liked that he tried though.
Thursday night a college friend and his wife picked me up and we wandered around a bit, getting some beer and dinner, before winding up at another friend's house. More good times.
Friday
- Sustainable Design for Agile
- Codename ''M'': Language, Data and Modeling
- unibet.com Architecture
- SOA at eBay
The day started off with another Eric Evans talk, and my status as a fanboy was now complete. One of the gratifying aspects was being able to look back at some of the things we did with AccuNurse as really central to its success so far. I mentioned this to Eric and he replied with the common idea behind patterns which Dan North summed up as: if you tell me a pattern and it's something new, it's not a pattern. I wish I'd heard about DDD before we started development on it, but I'm glad I know about it now, and will get the book and keep the topic in my mental "things to follow" list from now on.
Don Box and former Pittsburgher Amanda Laucher presented a Microsoft framework that allows you to create grammars and immediately test them, then instantiate them at runtime and do interesting stuff. It was pretty neat, and a little surprising that this isn't more commonplace. Tom will probably see this and mention that Haskell and other functional languages have done this for years. :-) Anyway, it was a great talk: Don is hugely enthusiastic (he must have said "Awesome" at least once a minute) and knowledgable. And I liked that he did live demos of the beta product, even if they bombed once or twice. Some people were bitching about this on twitter, which is just mean-spirited. I think it's great to get peeks into this stuff and where they're headed, by someone who not only knows how they work but is aware of their potential and can explain it well.
The talk on unibet.com was fascinating, but the speaker was a typical geek one: dry sense of humor, not a lot of presentation dynamism, and always with the assumption that the things he was discussing were obvious. (I plead guilty to this as well.)
The final talk was on SOA at eBay, which was a little bit of a departure for me. I'd avoided the SOA track (even though Ian Robinson was giving a talk), but this seemed too interesting to pass up. Getting technical people on the inside to talk about their work on a site like eBay is pretty unusual. Sastry knows the system up and down, as well as the reasons all the technical decisions were made. He can also talk about these really, really quickly! Their use of governance was interesting, and cleared up some misconceptions I had about it (figuring it was just a way to put more useless layers in a process).
The Friday keynote was so-so, very Oracle-centric. And again, a lot of people really disliked it. But for me it was interesting to see what this stuff can do, as all the dynamic cloud provisioning stuff is new to me. It's awfully impressive when you step back and see what's going on. That said, David Chappell is someone who has a lot of knowledge about technology and trends and intersections, and having him talk just about Oracle stuff as he did seems like a waste, at least for this crowd.
Talks I wish I'd been able to see
Not in any particular order:
Scaling your cache and caching at scale - Alex Miller. I love Alex's blog, and putting this against the scaling panel was tough.
Java Puzzlers - Bob Lee and Josh Bloch. If only so I could ask what they thought about the Java7 closures announcement.
Project Voldemort - Jay Kreps. Hearing depth about one of the distributed k-v stores would have been cool.
Enculturating Master Craftspeople - Dave West. No idea how this went, but it sounds fun, but not terribly useful for work.
Clojure in the Field - Stuart Halloway. I got the Clojure book at YAPC this year but haven't had a chance to get in. (Other things are taking precedence.) But I'd still like to hear Stuart talk about it; I saw him speak at a NFJS in DC about 5 years ago and really enjoyed it.
Designing a Scalable Twitter - Nati Shalom. Getting an idea of how you design with something like GigaSpaces would stretch my brain, even though Twitter-scale is insane.
Failure comes in flavors - Michael Nygard. As I mentioned Michael is a great speaker: engaging, knowledgable, authoritative, but still practical. But this sounded a lot like concrete examples from Release It!, and it was competing with a talk by Eric Evans.
Skeptical view of language workbenches - Glenn Vanderburg. There were lots of positives from all the right people on twitter.
Building Languages with MPS - Neal Ford and Nate Schutta. I head about this product from JetBrains a while ago but have't really got it yet. Guess that'll wait.
Actors, and the forgotten art of modeling concurrent systems - Kresten Krab Thorup. Concurrency has been on my brain, and finding a different way to think about it would be very useful, even if only as exercise. The one I went to instead (unibet.com) was just okay, and I probably would have been happier here.
Navigating the rapids: real-world lessons in adopting agile - Sam Newman. I heard this was "Stuff Sam will tell you at the pub given enough beer, which sounds like fun.
Other notes
- Were there really two Spring Roo talks? That seems weird. I know everybody loves Spring, but...
- I loved the quick feedback process. They had a bin and a stack of red, yellow, and green pieces of paper outside the room after a talk. To give feedback you'd just pickup a piece of paper corresponding to your mood and drop it in the bin. Because it was so easy everyone seemed to do it, which hopefully provided a good guide to the organizers and speakers.
- The food was surprisingly good, and lots of vegetarian options. Breakfast was a little sparse, but that wasn't a big deal.
- ...the coffee was only so-so much of the time, even just lukewarm once. In fact I think it got worse as the week progressed.
- The Wednesday night out at a local bar (Jillian's) was cool, nicer than hanging in the hotel. Free beer on Friday was also nice, and a lot of the speakers hung around for it.
QCon 2009: LinkedIn: Network updates uncovered
Ruslan Belkin, Sean Dawson
LinkedIn: "the place you go when you're not loooking for a job... but really are looking for a job"
Stack:
- 90% Java
- 5% Groovy
- 2% Scala
- 2% Ruby
1% C++ (in-memory social graph)
- Fact that everything is in Spring makes it easy to inject objects from other languages
Containers: Tomcat, Jetty
- Oracle, MySQL, Voldemort, Lucene, Memcached
- Hadoop
ActiveMQ
Updates: 35 million/week; ~200 requests/sec (?)
- iPhone app: uses same APIs any LI partner uses
- Email digest drives a lot of engagement
Expectations by user:
- Multi views; comments on updates
- Aggregation on noisy updates (CMW: sounds easy, but it's not)
Expectations infrastructure:
- Tenured storage of update history
- Support testing (Black/White, A/B, etc) of new features
Service API: used XM from start, never had any compatibility issues.
Update service:
- data collection: update data store, buffered in memcache
- can collect from internal store, or from third party
- passed to collator (dedupe, relevancy)
- passed to update resolver; eg, resolve member ID to first and last names as preferenced by user; or malicious 3rd party content could be gone so update should be too
Data collection challenges:
- push architecture, inbox; every member has one; N writes per
update, but very fast to read (since they're already there)
- tough to scale, but ok for targeted/private notifications; still exists for 3rd party notify
- pull architecture, every member has "activity space"; 1 write
per update; need N reads to collect N streams
- how to minimize?
- not all N members have updtes
- not all updates need to be displayed
- some members more important than others (use strength of connection)
- multiple areas of update storage:
- transient (L1), tenured (L2); kind of a LRU cache per user
- reads are tougher, but you filter the number of users who are even eligible for querying
- L1: single row, has CLOB + varchar; use varchar as buffer, and when it fills up write to CLOB (saves 90% of expensive CLOB writes)
- L2: accessed less frequently; K/V store; uses oracle now, will use voldemort soon
- member filtering
- avoid fetching all N feeds
- filter will never return false negative, only false positive
- easy to measure whether heuristic is working (which members who were in the filter had good data) means tunable process
- how to minimize?
commenting: users can creation discussions around updates; denormalize small amount of data onto discussion so you can show first/last comment and time
Twitter sync, announced last week: bi-directional flow of status updates; authen/z with OAuth
CMW: One of the main things that keeps implicitly being brought up wrt designing scalable systems is putting steps in as many places as you can to exploit common data in a cache.
What else?
- Shard DB, memcache; parallelize everything
- User generated writes are asynchronous
- Profile often, know your numbers
- Pay attention to response time vs transaction rate (heard this multiple times over few days), don't just look at averages; gave example of some network update servers that were misconfigured to use a no-op cache rather than a real one, got an call from CEO/CTO...
QCon 2009: SOA @ eBay: How is it a hit
Sastry Malladi, eBay
- Distinguished Engineer; building large systems for ~20 years
SOA is journey
History:
- one of the first to expose APIs/services
- support REST as well as SOAP (former supported way more)
- lots of feedback, lots of evolution
- early adopters of SOA governance automation
- continuously improving architecture with 3 goals: agility, innovation and operational excellence
Stack:
- mix of optimized, custom SOA framework as well as BoB + open source components
Goals:
- organize enterprise as set of business functions
- reduce cost of developing new features (and reduce cost of failure)
- encourage + enable new business opportunities
Practical standpoint, what is SOA?
- architecture to move from brittled, hardwired application silos
- to shared, reusable services
- that eliminates redudnacy and enables agility
SOA: not just technology (tech + process + people)
Common misconceptions:
- SOA is new, just a paradigm applied to existing tech
- implies WS + SOAP: actually not, REST with JSON/K-V pairs is equally popular, if not more
- end to itself: no, just a means to enable agility
- services dev from ground up: always leverage existing functionality, but morph into services
- at dev time, consumers + use cases are known: can evolve
Challenges in largescale deployments
Technical:
- Additional latencies due to multihop
- Debugging/tracing is harder
- Need efficient request/session caching (contextual cache; you don't want to jam it in the message; or make it part of the contract)
- Security/monitoring challenges
- Lots of standards, pick one!
Operational
- Dev adoption + learning curve
- Governance
- Migrating existing apps
- Updates to existing tools + processes
- Deployment + rollout
- Measuring ROI
Further:
- Co-existence of old and new tech during transition
- Supporting internal and external clients that have different protocols/data binding needs for the same deployment
- QoS and SLA management: very low latency needs, huge traffic
- Integration testing: can do functional testing of your own service, so how do you test in absence of dependencies? service virtualization, where you can express your dependencies and it's autosetup for you in test env
- High availability and scalability: high volume, low latency
- Decompose existing app and migrate legacy services
Operational:
- Version and dependecy management: esp related to high change velocity
- Impact to existing tools/env
- Time to market pressure: pressure will ALWAYS be there, must factor in design process
- Strong but simple governance, esp w lots of services and high velocity of changes; point is not just bureaucracy, but to help you achieve consistency across org
How is eBay addressing
Technical:
- Light and fast platform (homegrown + commercial + open source components)
- Unified testing f/w and service virtualization (ITKO)
- Model driven service decomposition; did this methodically and it yes, it took time, but it paid off
- Support for REST + SOAP from start; how is the interface declared for the REST service? WSDL 2.0 has the notion of bindings, including an HTTP binding; ebay uses WSDL 1.1, but includes a concrete way of binding to the REST service
Operational:
- One of the first to automate governance and lifecycle management
- Incremental service deployment: can deploy different services separately based on dependencies declared (?)
- Strong operational management tools
- Developer training and incentives for being good citizen; includes training for designing good services
- Formal process to measure adoption + process; haven't formalized ROI measurement
How many services at eBay?
- internal: several hundred
- external: ??
eBay SOA Platform:
- framework: overhead < 5 ms
- monitoring: customizable, for internal people only; using SNMP; service registers events, and aggregators (within the JVM); these get put into OLAP cubes; eventually they get into a dashboard
- security: XACML/WS-Policy based extensible authen/authz
- rate limiting: enforcing capacity, budgeting, traffic control; use XACML to express these
- service registry and repository: governmance, lifecycle mgt; bought SOA software Repository Manager
- ESB: for routing, transformations, and mediations (use opensource for this, Apache Synapse)
- orchestration engine: Q how different than ESB? A ESB for matching requester and responder for protocol, message type, whether to use sync/async, etc. Several ESBs mingle in the rules for orchestrating. But generally, orchestration is more of a process (BPEL) -- example of getting watched items, which fetches the watched items, the latest prices for each, transforms the data into only what's necessary, then returns.
- dev tools (eclipse plugin) for service/consumer development; very concerned with making things simple, otherwise things won't go anywhere; never leave the IDE space, even for testing
- ops tools (management, monitoring, alerting)
Q: Your end user experience is synchronous, but your services and coordination are async? Is that a tension? A: Absolutely. It took several years to get that absolutely right. Not all services are equal, which impacts the routing rules. Sometimes the traffic goes point-to-point and is synchronous.
Highlights of framework:
- Declarative pipeline high performance architecture
- Request and response decoupling
- Protocol and data binding agnostic service
- same service instance can be invoked using multiple protocols and formats -- NOT like JBI which requires normalization; natively serialize and deserialize (more later -- protocol plugins do this)
- no message normalization or conversations
- Pluggable data formats
- OOB support for SOAP, JSON, Binary XML
- Streaming support + attachment support
- WSDL
- Pluggable transports, including local
Use JAXB for pluggable data formats; use streaming XML api + stream readers/writers custom written for JSON, key value pairs, etc. No intermediate format, avoids extra conversion. (Lots of work done on this that I missed a bit)
Stuff about SOA Governance: my takeaway is that it's declarative; heard this a lot in the last few days. Enables consistent review, change management, dependency management. Automated tools using XQuery to ensure that WSDL matches what's expected. Also, reconciling what you find at runtime vs what you expect to find from your design.
Summary:
- Solving tech part is relatively easy. Easier than solving operational aspects -- must do from beginning.
- Up front design and modeling of contract/interface, including granularity is very important.
- Service layering, dependency + version mgt must be well thought through.
- Invest up front in governance, testing tools, developer training.














