Looking for problems

Looking for problems

Interview with Dr Kai Li, professor at Princeton University and one of the founder fathers of de-duplication vendor, Data Domain.

Dr Kai Li is a professor at Princeton University and one of the founder fathers of de-duplication vendor, Data Domain, which was the subject of an acquisition battle between rivals, NetApp and EMC (the latter won). He talked to TREVOR CLARKE about the technology and what it takes to create a successful start-up.

What were the main reasons for starting the Data Domain business?

Dr Kai Li (KL): It was back in 2001 when I was on sabbatical from Princeton University at Stanford. I started thinking about what the opportunities in datacentres are. When I talk about opportunities, I mean what are the problems people find painful. When I spoke to some datacentre people, the most painful things were backup and disaster recovery. Then I started looking at the big picture. I felt most of the pain was from using tapes, just like we were using music cassette tapes. We don’t like it, but we do like iPods and never looked back.

In the datacentre, the first thing to go is the backup tapes. But to do that, to replace that, there is a kind of revolution – like the iPod with music cassettes. To make this happen, you have to have a new disruptive technology and build a new line of products capable of doing the replacement. Cost-wise, it has to be about the same as previous solutions. Dimensions are also substantially better. That’s why I started thinking about building a deduplication storage system. De-duplication allows us to get a compression ratio of 10:1 to 30:1.

When you build the de-duplication technology into the storage system, and you did it from inside, the system users don’t see any difference from traditional storage systems. But you can put 30 times more data into it. It costs about the same as a traditional storage system and as a result, the total cost is about the same as a tape library – sometimes better. But what you can do better is recover data very quickly; multiple users can recover their data simultaneously and the storage system can continuously verify data.

In addition, if you can compress data that much, I am going to reduce the number of boxes by a factor of 10 or 30, and that will save my rack space and power. I can also move the compressed data over a wide area network [WAN] to a remote site for disaster recovery automatically instead of taking tape out, loading it into a truck, driving to a destination and then putting it into a vault. This is why de-duplication storage systems have now become the standard for datacentres.

How long did it take you to come to this technology? In the early stages, what were the main challenges?

KL: At the beginning, we were thinking about de-duplication technology and how it would replace tape. But there are many ways to go to market and many ways to develop a product. But if you are a start-up, you have very limited resources and you have to bet on the approach that can take you as far as possible. That was the main challenge– how we should build our business plan. We talked to about 30 datacentre customers and we asked them about their pain points and if we built something, what kind of things would be able to help them solve that pain.

One of the co-founders, Brian Biles, who is VP of product management, was described by many people as being able to walk on water. He’s a very good person to have on your funding team. And also I am very eager in talking to customers because I want to solve problems as opposed to shoving a technology down their throat. At the same time, we built a small software-only prototype and convince three of the potential 30 customers to install it into their datacentre to run through their production data early on. Then we observed how much compression we could get. When we built the business plan, we had the confi dence around the technology. Also we had confi dence that what we built would solve customer problems. Then we got funding and started designing the real product– 18 months later, we won the best product award in Storage Magazine.

Did you find it hard to get the funding?

KL: It was hard because the company formed on October 12, 2001… and the market was really down. I am a university professor and I enjoy teaching, so I told my co-founders we should just talk to a couple of venture capital firms. If they like our idea, we’ll do the start-up. I only wanted to do it if we all believed the venture is going to deliver a revolutionary type of technology product and we could take it to market and make a lot of impact.

Do you think you have got to that revolutionary stage now?

KL: I think in many respects, yes. The de-duplication storage system has become, in my opinion, the de facto standard for data production.

Would you have minded if it was NetApp that acquired the company?

KL: I think the initial agreement with NetApp was agreed to by the board. But I think EMC is a great outcome for shareholders, for Data Domain employees and for EMC. I think it is a win-win situation.

What comes next for de-dupe?

KL: In my opinion, Data Domain de-duplication technology is so far ahead compared with competitors in multiple ways. If you look at the latest product release, which is Data Domain DD880 that was announced during the acquisition, it does multiple-stream data de-duplication at a rate of 5.4 terabytes an hour. That is 1.5Gps, which is basically the wireless speed of 10G Ethernet. You can do no more than that. On the same hardware, if you look at the competitors, they are doing de-duplication throughput of less than 1/5 of what we are doing. The reason is with our software design from day one we took a bet on taking advantage of multi-core processors. We tried to avoid using more disks because our belief is that if you want to do de-duplication, customers don’t want to buy more disks.

What advice do you give your students about setting up a company like you did?

KL: What I have learned from this venture is that it is very important to be customer focused. The tendency in academia is to work on a research project and think about commercialising your project or results. In other words, you have the technology and are looking for a market. We did it differently. We went to the datacentres and were looking for problems – the most painful problems in a particular space: Storage. Then we invented a technology to solve that problem.

Follow Us

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags data domainde-duplicationkai li


ARN Innovation Awards 2022

Innovation Awards is the market-leading awards program for celebrating ecosystem innovation and excellence across the technology sector in Australia.

EDGE 2022

EDGE is the leading technology conference for business leaders in Australia and New Zealand, built on the foundations of collaboration, education and advancement.

Show Comments