Inside eBay's business intelligence
eBay is the epitome of business intelligence and big data. The company is the second-largest online retailer in the world, and it turns over a lot of data - terabytes every minute - and applying analysis to improve user experience and prevent fraud means number-crunching tens of petabytes every day.
"Business intelligence is in our DNA," says Tom Fastner, senior member of technical staff and architect at eBay. "eBay doesn't produce or sell anything; we only have data and a platform for buyers and sellers to meet. A key driver is analytics and understanding data, and the value of data."
Fastner will be presenting the e-commerce giant's approach to BI at ITWeb's Business Intelligence Summit, in February 2013.
Big data has been a very real challenge for eBay for many years. "We have the world's largest enterprise data warehouse, and we're the world's third-largest Hadoop user," Fastner says. "We've been doing enterprise data warehousing for 10 to 12 years, experiencing 40% to 100% growth every year."
To eBay, business intelligence means money. "We have a keyword data mart to manage our paid search," Fastner says. "We track 170 million keywords, and figure out how much we're willing to bid at Google, Bing and others to ensure we hit the top-five ranking. We do a daily batch, and deliver a file to Google and other search engines at 9am with updates." Analytics-driven search leads directly and measurably to revenue, Fastner says. "Just that one project has probably paid for everything we bought. And if we miss a batch, it could cost us millions - what if we don't deliver an update on the day the iPhone 6 is delivered?"
Being able to process and analyse petabytes of application data every day has meant some creative engineering just to house the data. eBay has a close working relationship with Teradata, where Fastner was previously employed, and houses many petabytes of information in Teradata systems. eBay has also collaborated with the vendor on product development, pushing the boundaries of performance and cost efficiencies.
"Another major use case is A/B testing with our production workload," Fastner says. "We have 350 million active items and 100 million active users. We know everything that every user did for the last 10 quarters. We know exactly what they saw on the screen - even if you just browse, we know the 50 items that were displayed to you. That leads to quite interesting analytics." The customer experience is not just fought competitively, it's hotly contested internally too, Fastner says. "We have 40 product managers competing on real estate on homepage. Every change goes through A/B testing to prove its return, and is analysed on 70 to 80 KPIs."
Site search is another major focus area, with eBay's Hadoop clusters optimising search, helping eBay match its buyers' needs to products, a task that is far more complex than it sounds. "People can describe items how they want, but we have to be able to find it and link it. We don't have a catalogue as such," Fastner says. "We track people looking for stuff, but we also know what they did actually buy. That allows us to link the search queries to products. For example, we might see people searching for "cowboy hats", but it turns out they're looking for Dallas Cowboys merchandise."
The company's data has grown prodigiously over the years, but now the focus is on speed and responsiveness.
"We're working on real-time analytics, closing the gap between our enterprise data warehouse and the speed the site has to operate," Fastner says. "The standard today on analytics is a daily refresh, and there are some projects that can work hourly or every couple of minutes - application logs are in the database 10 minutes after you do something on the Web site, for example. But in the online world, things have to happen in milliseconds. There's a big gap, and we're trying to fill that gap, to come up with a system that allows us to receive massive data close to real time and provide capabilities with very good response times back to applications."
Speed and flexibility is also the name of the game internally - eBay prides itself on facilitating agile development unencumbered by the scale of data involved, Fastner says. "Analysts like to take a copy of the data to a data mart because they want to control the environment. So we created virtual data marts: we give them what they need via a sandbox on the production system where they can create their own marts. That's huge - we're supporting agile development by enabling them to play with data and get results. We can provision new marts in minutes - hundreds of gigabytes, no questions asked. There are probably 400 of these on the enterprise data warehouse and 100 on the Deep system, the largest being 100TB for a single user. This is also interesting from the security perspective because data is not leaving the platform." Agile, but not unmanaged, Fastner stresses.
"The biggest DBA nightmare is CPU and IO. Queries against those objects are the most intensive you can imagine. From the consumption perspective, we have a programme in place that gives every VP and organisation a CPU budget."
ITWeb Business Intelligence Summit 2013
Tom Fastner will elaborate more on eBay's BI strategies and challenges at the 8th Annual ITWeb Business Intelligence Summit and Awards, taking place on 26 and 27 February 2013, with a workshop on 28 February. Themed "Integrated BI for optimised performance", the 2013 summit empowers BI practitioners to derive the maximum value from their BI implementations. For more information and to reserve your seat, click here.