In Canada, securities regulation is fragmented. Each province oversees its own capital markets, while self-regulatory organizations handle the licensing and oversight of investment dealers, such as the representative at your local bank who sells mutual funds.
This system is unlike that of most other developed countries, which employ some form of national securities regulator, and there are consequences. Enforcement policies vary by jurisdiction, and while regulators do their best to communicate and stay on top of the people and companies that break securities law, some slip through the cracks.
In late 2016, The Globe and Mail set out to analyze thousands of disciplinary cases kept by the Canadian Securities Administrators, an informal umbrella organization for Canada's many regulators. The CSA publishes and regularly updates a "disciplined list," a database and national clearinghouse for information on sanctions issued by Canadian regulators. Though the CSA makes this data available on its website, regulators don't use it to look for overall trends in the industry.
In our analysis, we discovered some people appeared again and again in the CSA files – in one case, 17 times. Others appeared to be sanctioned in one province, only to be sanctioned in another province for a new charge years later. They were repeat offenders. In the end, we found that roughly one in nine people in the CSA files, or 11.1 per cent, matched our statistical profile for repeat offenders. We found that these people had records in multiple jurisdictions 63.3 per cent of the time and had records spread across different provinces half the time.
To run our analysis, we first had to build a scraper that visited each of the CSA's 6,027 disciplinary pages and logged all of their disciplinary records, effectively reverse-engineering the CSA's data. We discovered that the CSA database, while detailed, was flawed: There were many instances of duplicate entries, cases split across different files and inconsistently formatted data. These data-quality problems were surprising given that regulators use the CSA's disciplined list as a resource during investigations and in determining whether someone has a record of misconduct.
Our analysis focused on individuals, so we created an algorithm to determine whether the case name for any given file was a person or company. To ensure our algorithm was properly distinguishing between personal and corporate names, we hand-checked a sample of the results. To account for people with aliases or similar names, we produced lists of people with similar names or aliases and hand-checked their regulatory records to confirm they were, in fact, the same (or, in some cases, different) people.
In addition to scraping the CSA's disciplined list, we also scraped, requested or otherwise manually built databases to track unpaid individual fines for the Alberta Securities Commission, the British Columbia Securities Commission, the Investment Industry Regulatory Organization of Canada, the Mutual Fund Dealers Association, the Manitoba Securities Commission, the Nova Scotia Securities Commission and the Ontario Securities Commission.
Ultimately, The Globe's cleaned-up version of the CSA's database logged a total of 5,774 case files, of which 4,441 (77 per cent) were people. The Globe's unpaid-fine database counted 1,009 different people across seven regulators with partial or fully unpaid fines.
To identify repeat offenders, we worked from one basic assumption: They would have at least two original sanction records in their case files. Sanctions (called "orders" by regulators) come in one of two ways: as a sentence for a new set of violations or "reciprocally." Reciprocal orders are sentences (usually bans on trading or selling securities) issued by regulators that mirror those of another province or self-regulatory organization. They serve as a way of pre-emptively banning a company or individual who has been found to be breaking the law.
We detected repeat offenders by combining two different approaches. First, we identified the clear repeat offenders: people found to have been in "breach of order." A breach of order violation is issued when an individual violates a previously imposed sanction, such as a trading ban or an order to stop selling securities. We looked at these repeat offenders, filtered out all reciprocal orders and found the distance in days between the first and last order in their file. We then took the median of the bottom decile of date distances in this list of "breachers" to determine a minimum-date distance by which to filter our data.
For our final analysis, we decided to use the median of the bottom tenth of date distances for breachers, but it's worth bearing in mind that the calculated recidivism rate depends heavily on the minimum-date distance between each person's first and last sanction. For example, if the data is filtered for a minimum-date distance of one year, the number of possible repeat offenders would be 397 (8.9 per cent). If the distance is two years, 321 (7.2 per cent). Three, 271 (6.1 per cent). In general, we found that the longer the date distance was for a particular individual, the clearer the case of recidivism became.
Once we had our breacher list and a date distance value to work with, we took our full database, filtered it to display only people, filtered it further for people who had at least two non-reciprocal orders against them, then finally filtered for the minimum date distance we'd found earlier. Once we joined that list with the original list of breachers, we had our list of possible serial offenders. In all, the code used to perform this analysis is just 33 lines long.
Because our analysis relied on generating a list based on the statistical profile detailed above, our list of possible repeat offenders almost certainly contains false positives – that is, people who were matched by our filters but aren't actually repeat offenders. To confirm that the algorithm was detecting repeat offenders, The Globe spent weeks reading through complete case files on more than 80 individuals. Ultimately, we determined that at least 20 of those were clear cases of recidivism.
For our forthcoming analysis of unpaid fines, we focused on people and used the same classification algorithm to determine whether an unpaid fine was for an individual or a company. We filtered out any and all "joint and several" fines, which are fines or orders to pay restitution to victims shared between different people or companies. We considered an unpaid fine to be any fine that was either fully or partially unpaid.
The Globe conducted this analysis using a statistical programming language that allowed analyses to be run, tweaked and rerun repeatedly. The full code for the investigation was run, fact-checked and verified by a data journalist with statistical programming experience who was not involved with the project.