Thursday, January 7, 2010

Too much information: you may just have all the data you need

"This was not a failure to collect intelligence, it was a failure to integrate and understand the intelligence that we already had." NYTimes quoting President Obama after his meeting with national security advisers about a terror plot to bring down a commercial jetliner on Christmas Day. (Jan 6th 2010)

Going to the movies with friends from the intelligence community is never a cheerful experience. Spending two hours in a conspiracy movie with people who sometimes while seeing a (seemingly) absurdly powerful data collection device say “ah, I know this system”, will make you a firm believer in conspiracy theories or at least a more paranoid individual. But even the most tech savvy and well informed of those people talk like Pres. Obama in that quote above – it’s not lack of data, it’s our inability to process it that limits us. Maybe project ECHELON really stores all of our communication – but what super computer and what sophisticated algorithms can process and identify all of the world’s pictures, plethora of dialects in written natural languages and voice calls? You know what? If you know the answer, I’m not sure I want to know.





Estimations of intelligence units’ capabilities aside, your average merchant or payment service is much more limited (and, to be fair, faced with a less complicated, or should I say critical problem). Between your transactions, industry black lists, account history, mailing lists with bad actor data and various tools offered in the open market, there’s a good chance of losing the ability to reconcile without a dedicated, expert team of analysts and developers that understand automation. But being able to automate isn’t the only challenge with data. Trying to know “everything”, you’re bound to trip over some problems.

Common pitfalls in data source acquisition
First, you have to get the data. Many raw data sources out there on the web are pretty hard to acquire; some are not priced correctly for scale, some require data sharing as a prerequisite (growing their database, but giving away your customers’ data), and some just won’t pass legal because they were attained in shady ways. Many times, because of the above, it becomes extremely difficult to justify the purchase of a new data sources. It takes very complex analysis to show how a data source can move your revenue dial and that its ROI is worth the risk. All in all, data source bizdev is a potential nightmare unless you are air tight on what you need, when you need it and what’s it worth for you.

After you get the data, you need to store it somewhere, and storage space and security are yet another challenge. There’s a limit to the volume of data you can save on your servers, and scaling such a system is no simple or cheap business. “So what”, you say, “I’ll put it all in the cloud” (very hip these days to put stuff in the cloud). Wait – isn’t that exactly the type of reckless use of Personally Identifiable Information (PII) that gets you data breaches? To deal with sensitive data in the payments space we have compliance and information security standards. Are you going to be PCI compliant, for example? A good question that must be answered. Right now the answer is no: clouds are public, shared systems that are hard to secure properly against fraudsters and hackers; if you want your cloud based system to be compliant, you need to give up your PII by receiving payments through a cloud-based payments system – which basically means losing data (having someone else collect your customers’ payment info), not gaining it. Once the field settles, in a couple of years, cloud computing for vast payment data volumes will start to be a possible route.

Finally, once you’ve acquired and stored or can access your data, you have to use it. The challenges here range from data base architecture to modeling methodology; if you don’t build the correct architecture and have a proper DS and modeling methodology, new data integration will be a nightmare. Almost no single data source has 100% coverage across all countries, has homogenous data quality, is 100% available (given that you don’t store it on your system) and adheres to a tight SLA, all at the same time. So on top of what we noted you also need to have models that can cope with partial, sometimes corrupt data and still make the right decision – far from easy.

So what do I do?
I know what you’re thinking. “I don’t need all of this”, you say, “I bought risk scores and tools from various vendors with proven track experience in risk management. I’m all set”. Let me tell you why I’m not fond of this as a general approach: giving scores as a result instead of raw data obfuscates vital components, and severely reduce your ability to understand why a decision was made or why was a specific score given. When you don’t know the underlying reason, your ability to effectively combine scores or simulate any changes made to them and its effect on your system and bottom line is zero. You’re left with a few business rules and a false feeling of control that may result in serious losses or simply lost business.

So what should you do? If you’re a small business without a risk management function, you’re stranded. I suggest that you settle for the few scores and tools that take on at least part of the liability – professionals should be able to put their money where their mouth is (and there are quite a few professionals out there). But being in this situation is not what I’d advise for anyone looking to really grow their business – you need to keep your eye on the ball in risk and fraud. Develop the capability to understand what’s happening in your system, what caused losses and why (easier said than done). If you at least have that, even by mere intuition of being in the business and seeing a lot of fraud, you can start putting a price tag on new scores you’re being offered (I, of course, support hiring and training of domain experts). But what you’re really looking for is creating a data source acquisition methodology.

You need to understand what a new data source does for you. Do not get confused by terminology and flashy names; a common confusion, for example, is between products that verify a person’s identity (i.e. make sure that the name, address etc. belong to a real person in the real world) and authenticate it (i.e. prove that the current user is indeed who they claim to be) – those are not the same. Another common mistake is signing pricy SaaS contracts (say – for phone number type) when similar capabilities can be found and acquired by a bit of Google research. Don’t be tempted by big promises – always make sure you properly simulate the performance on your own system, and fully understand the impact you’d expect to get.

Making sense out of all of this requires expertise, but is definitely worth the price. This is not to say, by the way, that there are no effective tools, scores and services out there. On the contrary – there are sometimes too many, and it’s the job of the risk manager in the organization (a lot of times the owners themselves) to make sure they are using the best ones for their needs. It’s no simple task.

How do you engage in data source acquisition? Do you think that there’s no such thing as too much data? Comment away!

4 comments:

klancy said...

Very interestingly written. And true, we're suddenly in need of 'risk managment' specialists, and even RM departments. Is it really needed? - I think the first answer to that question depends on 'how much are you losing' and 'how much do you have to lose?'

e.g. if you only do small transactions, not much is at stake. If you do BIG volume, it might pay off in the longrun to pay for an RM team and some fancy solutions even if it only boosts net profit by 2%.

There's certainly a lot of solutions, but they aren't in a neat little list. You gotta google, and then interview, and test them out only to find that not everything that sounded great, was actually a good fit for your business and business needs.

I thing the power of a good old fashioned phone call is grossly underestimated.

Ohad Samet said...

Thanks for the comment. My POV (not necessarily new) is that the need for RM should not be only measured by loss exposure but also by lost opportunity - that's much harder to measure. Such as in secondary markets for virtual goods, something you understand very well.

As for phone calls - they do a lot, but take you off the flow. This is why these aren't the optimal solution for some segments.

Anonymous said...

I believe that the old saw - not seeing the forest for the trees - still fits a lot of situations. We can focus on data collection and searching luggage, and overlook that the purpose is not data collection and searching luggage, but identifying fraud or terrorists. My experience has been that the biggest problem in most organizations is poorly communicated goals from senior management.

And if people do not clearly understand the 'big picture' then fraud prevention can frequently collide with customer retention. Say I go on a trip causing my credit card activity to be flagged as out of norms. A fraud specialist notices the activity and freezes the account. I call when a charge is declined and am given a 20 minute grilling about my identity - pissing me off and I cancel the account and bad mouth the company to my friends over drinks. The aggressive prevention policies have lost the company a customer, and I am sure that that was not the intent.

Nor was the intent to be 100% effective in searching luggage at airports while leaving the door unguarded.

Marc J. Miller said...

Two weeks ago I had a job interview on the east coast and somehow my name got munged. Instead of Marc J. Miller I was Marcjmr Miller. I was worried that was going to complicate getting through security or adding my United Airlines FF # to the ticket, but everything went fine. Even the person checking my photo ID in San Francisco didn't even mention the mismatch.

Coming back, things got a bit screwed up. I needed to change the return trip, which ended up meaning cancelling it and booking a new trip home from another airport. The same administrative assistant that booked my original flight booked the one-way return, and this time my name was entered correctly. When I went to check in 24 hours before using my smartphone's web browser, it wouldn't let me, claiming that I needed to speak to an agent at the airport. I called UA, basically got the same information (that someone needed to verify my photo ID before they would let me check in), but managed to add my UA FF number to the reservation.

If you're wondering where this story is going, here's where it gets interesting: Even though the red flag went up before I had added my UA#, TSA had identified that something was amiss based on my previous flight. Though the ticket was booked for Marc Miller and the UA account was assigned to Marc Miller, the automated airport kiosk told me that for security reasons, it needed to verify my name: was I Marcjmr Miller? If not, could I please input my exact full name as it appears on my photo ID? I complied of course and it checked me in with my full name without me needing to show my photo ID to a ticketing agent.

Very cool example of how the airline industry is using available data to mitigate risk. Changed name and changed route, even though my name was common, was enough to flag me as a risk, even before a match was certain.