Anyone can get lucky. But a few years ago, when owners of a casino looked on as a player at a roulette table was cleaning up, they were fairly certain that Lady Luck had little to do with his good fortune. The problem, though, was that there was just too much happening in a short Las Vegas minute for human eyes to catch the scam. That’s where Jeff Jonas came in. A pioneer in the growing field of analysis of big data – the analysis of massive amounts of information — Jonas developed analytical software that was able to sift through billions of bits of data collected on the casino’s employees and thousands of guests, and come up with a theory about what was happening at the roulette table.
“It’s like trying to put together five puzzles at the same time, with all the pieces piled up on the floor,” says Jonas, a scheduled speaker at Sibosin Toronto Sept. 19-23, a conference that is expected to attract more than 8,000 banks. “You start by pulling out the pieces and putting ones that are similar together in a separate pile, until you start to see relationships.”
Back at the casino, his software — dubbed NORA, short for No Obvious Relation Awareness — sifted through the data collected by the casino and came up with one startling piece of information. The dealer at the roulette table and the player who was repeatedly beating the house had once shared the same telephone number. The dealer provided the number on her application for a job at the casino, and the player gave the same number on his registration form at the casino hotel. “It turned out that they were once roommates,” says Jonas, a pioneer in link analysis whose work in the field included an analysis of the connections between the Sept 11 terrorists. Jonas in 1984 launched a company, Systems Research & Development, which was acquired by IBM in 2005. SRD’s technologies formed the foundation of the IBM Entity Analytics Group. He is chief scientist of the group.
With the growth of the Internet, networked commerce, and the spread of social media such as Facebook, companies are increasingly becoming overwhelmed by the data they collect. Experts call it “enterprise amnesia”. A company is so swamped with new data that it “forgets” what is contained in data already stored. There is so much data available, compounded by massive new inputs each day,that it is not possible for humans to make sense of the information. Yet it is becoming increasingly clear that there is enormous economic value locked away in this data chaos. Companies that are able to make sense of it will be the winners of the networked economy.
One of the hottest trends in big-data mining is the use of data to forecast economic events. Conventional economic indicators rely on a complex and time-consuming data-gathering process that usually produces an accurate picture of economic activity, but one that is already dated when published. Take Germany’s report on GDP for the second quarter of this year. When the data were published in August, they showed that Germany’s economy barely budged in the second quarter, causing panic that Europe’s locomotive was grinding to a halt. Details of the report were only available weeks later, showing that the sudden drop in economic activity was caused by Germany’s decision after the Fukushima accident to shut down seven nuclear power plants. The apparent slowdown in the economy was caused by a sharp drop in electricity output, not manufacturing or services. A real-time snapshot of the economy would have given a much clearer picture.
The folks at SWIFT, a Brussels-based global clearing house for interbank transfers, are thinking about just such a solution. SWIFT is developing a tool to analyze the millions of transactions it manages every day as the basis to predict where the economy is heading.
On average, SWIFT handles about 17 million messages per day, or nearly 5 billion a year, says Francis Martin, the product manager in charge of the development of the SWIFT Index. The index is created by using a subset of transaction messages called MT103, which reflect interbank payments. “These payments are related to activities in the economy, some transaction between one corporation and another,” says Martin. “This is the link between economic activity and those types of transactions.”
Using the analysis of this huge data set, Martin says, SWIFT is able to produce an index that reflects real economic activity in the current and previous quarters — and predicts activity for the next quarter. Other leading indicators, such as industrial production or industrial orders, also can suggest where the economy is headed, but publication of these figures is delayed by as much as two months. Martin says SWIFT can produce its index within days of the close of any given month. Other indicators are based on anecdotal evidence. For example, the purchasing managers index, or PMI, is based on surveys of purchasing managers’ intended purchases. By contrast, Martin says, “the SWIFT Index is based on facts. There is real traffic behind it.”
Tools are also being developed that could help corporations in other sectors reap economic benefits from trillions of bits of data. A May report published by consultancy McKinsey says a retailer that uses big-data analysis to better understand consumer behavior could expect to improve its operating margins by 60 %. If the U.S. health care industry were to use big data effectively, savings to the tune of $300 billion a year could be achieved, cutting national health care expenditure by 8 %, the report says. In Europe, public administrations could save as much as €100 billion a year just by making bureaucracy more efficient. Even more savings could be achieved by using big data analysis to reduce fraud, administrative errors, and reducing the gap between potential tax revenue and actual tax collected, a perennial problem for public budget planners. In manufacturing, McKinsey estimates that big-data tools could help companies reduce product development and assembly costs by as much as 50 %. And the nascent industry for global personal location data could become a $100 billion business for service providers and provide as much as $700 billion in value to consumers.
That’s not all. With the development of tools that allow financial services companies to mine their mother lode data also comes new power to prevent scams that can cost millions each year. Consider the popular “fake grandson” scam. In a case study provided by IBM, MoneyGram International was able to spot the scam before one of its customers got stung. The ploy worked like this: a 100-year-old grandmother was contacted by a fraudster who claimed that her grandson was in jail and needed $2,500 for bail. The person who called was so convincing that grandma was distraught and more than happy to wire the money as soon as possible. But when grandma provided the details to MoneyGram, software working in the background identified several bits of information that in combination raised digital red flags, and classified the transaction as high-risk and suspicious. Once identified, MoneyGram staff were able to scrutinize the request and discover that it was fraudulent – before grandma’s money was wired to a con artist.
In the case of the grandson scam, the ability to detect fraud quickly was crucial. Now consider how important that could be for tracking terrorism finance, money laundering, or some other high-stakes financial crime. As the stakes go up, detection speed is of the essence. “This is really all about taking huge amounts of data and analyzing them in real time,” says Larry Ryan, chief technologist for the financial services industry at Hewlett Packard.
Like rival IBM, HP sees a huge opportunity in creating the tools banks and other financial services companies need to do this kind of analysis. HP has made a few acquisitions to get big -data analytics capabilities in-house. One was Vertica, a Billerica, Mass.-based company formed in 2005, whose customers include Verizon, social gaming site Zynga, and the equities trading platform Pink OTC Markets Inc.
The challenge of capturing, storing, and analyzing the data, while keeping costs under control, is daunting. Pink OTC amasses millions of new records every day. Traders use a Web-based dashboard to analyze real-time market data, get quotes, and trade more than 8,000 U.S. equities and a growing number of international equities. Pink OTC says it opted for Vertica instead of expanding its conventional database infrastructure, because Vertica is faster than other databases and reduced hardware costs by 50 % to 90 %.
Such tools are likely to be in big demand as companies strive to leverage and profit from big data. Says the McKinsey report: “Our research suggests that we are on the cusp of a tremendous wave of innovation, productivity, and growth, as well as new modes of competition and value capture – all driven by big data as consumers, companies, and economic sectors exploit its potential.”