SEC Enforcement Today and Tomorrow: Big Data, Machine Learning and Artificial Intelligence
SEC enforcement already relies on technology. That reliance will only grow in future years, with the agency’s use of data analysis increasingly married to machine learning and artificial intelligence. Advisers should be aware that actions they take will often turn up in data, and that the SEC can be counted on to use that data in ever-more sophisticated ways.
Machine learning – the ability of machines to learn – "is now integrated into several risk assessment programs – sometimes in ways we didn’t envision," said SEC Division of Economics and Risk Analysis (DERA) acting director Scott Baugess recently during a speech in New York City. He used the speech to describe what the agency is doing with big data, machine learning and artificial intelligence.
The SEC’s DERA Division has earned a reputation as the agency’s leading edge for its use of information technology. Indeed, the SEC in recent years has made no secret of its use of data analytics in enforcement initiatives (i.e., Rule 105, cherry-picking), in its risk-based approach to examinations, and in determining future trends within the industry.
Baugess, in his speech, discussed not only the SEC’s use of data analytics, but the increasing role that both machine learning and, perhaps most enticingly and perhaps frighteningly, artificial intelligence – which he referred to by its commonly used acronym, AI – will play. It was the first time, he said, that he addressed the emergence of AI in terms of the SEC.
"The speech gives a fascinating view into how the Commission is beginning to use AI, and supervised machine learning in particular, to identify investment advisers to target for closer review," said Willkie Farr partner and former SEC deputy chief of staff James Burns. "Given the pressure on the examinations program to increase both the number and the efficiency of exams of advisers, I expect we will see this kind of analytical tool becoming part of the norm."
"The SEC right now is not your father’s Buick," said Shearman & Sterling partner Nathan Greene. The development of big data has, to some degree, turned the agency on its head, he said. Traditionally, the SEC was staffed primarily by attorneys who would have to train examiners in the basics of securities laws and regulations. With the advent of big data analysis, machines are now increasingly finding patterns of possible fraud, which are then forwarded to the attorneys to review and possibly act on.
"What I’ve been telling clients is that the SEC right now has greater analytic capability than many private firms do," he said. "The agency has a voracious appetite for data and a head start on the industry in terms of what they do with it."
But some are skeptical. "I’m not sure what it all means at the end of the day," said Stern Tannenbaum partner Aegis Frumento. "The SEC is going to experience a tsunami of information coming in when the Consolidated Audit Trail (CAT) (see below) comes into play, and they will have super-computers that will find red flags. But whether anything comes from that remains to be seen, because it is difficult to determine whether the patterns that will be found mean anything. A pattern of words is only indicative of something after an event has occurred that makes investigators look for that pattern."
"I think the SEC is going to cough up so many patterns that the agency won’t know what to do with them," he said. "You can find patterns everywhere."
K&L Gates partner Vincente Martinez agreed that there will be more patterns from more data, and that this development may create difficulty in deciding what to do with it all. However, he said, "the SEC already has models they have worked with for a couple of years that they can use to filter that data, which will allow them to deal with it."
Baugess began by discussing how machines have learned in recent decades to determine human behavior patterns, to the point that today, through the use of computer algorithms, machine learning can predict which retail shoppers are likely to purchase certain consumer goods. Similarly, he said, "regulators can benefit from understanding the likely outcomes of investor behaviors."
Progress has been made. He noted that one of the SEC’s early forays in this area involved analyzing the information in the tips, complaints and referrals (TCRs) received by the agency. "The goal was to learn whether we could classify themes directly from the data itself and in a way that would enable more efficient triaging of TCRs," he said. DERA staff separately examined whether machine learning could digitally identify abnormal disclosures by corporate issuers charged with wrongdoing. They found that when firms were the subject of financial reporting enforcement, they less often mentioned it as a topic in certain performance discussions.
"These machine-learning methods are now widely applied across the Commission," he said. "Topic modeling and other cluster analysis techniques are producing groups of ‘like’ documents and disclosures that identify both common and outlier behaviors among market participants."
The analysis then moves up to a new level. "Working with our enforcement and examination colleagues, DERA staff is able to leverage knowledge for these collaborations to train the machine-learning algorithms," Baugess said. "This is referred to as ‘supervised’ machine learning. These algorithms incorporate human direction and judgment to help interpret machine learning outputs."
How does that play out in practical terms? Consider: "Human findings from registrant examinations can be used to ‘train’ an algorithm to understand what pattern, trend or language in the underlying examination data may indicate possible fraud or misconduct," Baugess said. "From a fraud detection perspective, these successive algorithms can be applied to new data as it is generated, for example from new SEC filings. When new data arrives, the trained ‘machine’ predicts the current likelihood of possible fraud on the basis of what it learned constituted possible fraud from the past."
As an example, he said that "DERA staff currently ingests a large corpus of structured and unstructured data from regulatory filings of investment advisers into a [specific type of] computational cluster. Then DERA’s modeling staff takes over with a two-stage approach. In the first, they apply unsupervised learning algorithms to identify unique or outlier reporting behaviors. . . . The output from the first stage is then combined with past examination outcomes and fed into a second stage [machine learning] algorithm to predict the presence of idiosyncratic risks at each investment adviser."
"The results are impressive," Baugess continued. "Back-testing analyses show that the algorithms are five times better than random at identifying language in investment advisory regulatory filings that could merit a referral to enforcement."
Come this November, the SEC’s use of big data will take another major leap forward when its CAT becomes active (ACA Insight, 12/5/16). Market exchanges will be required to report all of their transactions on CAT, with broker-dealers having to do so over the subsequent two years. "This will result in data about market transactions on an unprecedented scale," he said.
Investment advisers are not immune to CAT requirements and will eventually be asked questions about trades involving best execution, trade timing, and choice made on behalf of some clients and not others, said Burns. "The analytical power that AI will bring to the task of sifting through all that trade data will be extraordinary, and I expect firms will be scrambling to respond to finely tuned inquiries from the agency about trading patterns in the market. A real game changer."
The method is far from foolproof, however. "The results can also generate false positives or, more colloquially, false alarms," Baugess acknowledged. "In particular, identification of a heightened risk of misconduct or SEC rule violation often can be explained by non-nefarious actions and intent."
The agency is aware of this possibility, he said. "Expert staff knows to critically examine and evaluate the output of these models."
Baugess, while seeing an increasing role for AI as it evolves, made the point that today there are limitations. "It is premature to think of AI as our next market regulator," he said. "The science is not yet there. The most advanced machine-learning techniques used today can mimic human behavior in unprecedented ways, but higher-level reasoning by machines remains an elusive hope."
"Machine-learning algorithms may help our examiners by pointing them in the right direction in their identification of possible fraud or misconduct, but machine-learning algorithms can’t then prepare a referral to enforcement," Baugess continued. "And algorithms certainly cannot bring an enforcement action. The likelihood of possible fraud or misconduct identified based on a machine-learning prediction cannot – and should not – be the sole basis of an enforcement action. Corroborative evidence in the form of witness testimony or documentary evidence, for example, is still needed. Put more simply, human interaction is required at all stages of our risk assessment programs."
Yet, just two paragraphs later in his speech, he offered a vision of just how far AI will take enforcement: "I can see the evolving science of AI enabling us to develop systems capable of aggregating data, assessing whether certain federal securities laws or regulations may have been violated, creating detailed reports with justifications supporting the identified market risk, and forwarding the report outlining that possible risk or possible violation to Enforcement or OCIE staff for further evaluation and corroboration."