Logo

The Data Daily

Another Deceptive NN for Tabular Data — The Wild, Unsubstantiated Claims about Constrained…

Another Deceptive NN for Tabular Data — The Wild, Unsubstantiated Claims about Constrained…

These days I get pinged almost every week, and often more frequently than that, about another paper that just came out claiming a new approach to Neural Networks for tabular data that, among other things, handily beats XGBoost. I’ve now come to consider this entire genre of ML “research” as tantamount to quackery, due to the really poor quality of considerations that go into most of the work in this category. So I usually don’t even bother to check the claims. Nonetheless, there is this little voice in my head that goes something like “well Bojan, maybe you are overreacting, and maybe you are lettign your own biases get the best of you. Why don’t you take a look and see what’s going on.”

So this week I did exactly that. I came across a paper, titled Constrained Monotonic Neural Networks, that, you’ve guessed it, claims not only to be better than XGBoost, but all the other recent attempts at NNs for tabular data. So I did what I always do, skipped all the highfalutin theorizing filler, and go straight to the table that shows the main resilts. Here it is:

Setting aside the common problem of self-serving selection of datasets that go into all such papers, I decided to take a look at the three above-mentioned ones (COMPAS, Blog Feedback, and Loan Default), and it turns out that they all seem like very nice datasets, with seemingly no missing values, and with only numerical features. In other words, easy datasets to work with and great addition to my growing Tabular Benchmarks repo.

I take a look at the Loan Defaulr first. As I usually do, I play with HistGradientClassifier first, as it is an easy to use and just powerful enough for my neds. And lo and behold, I am already getting and average CV on the train set of about 65.3%, on par with the best result in the table above, and definitely better than the purported accuracy of XGBoost.

Next, as a sanity check I try Logistic Regression. Even a simple LR can get accuracy above purported XGBoost, and well into the 64% range.

So finally I try XGBoost. And unsurprisignly, I am getting really good CV results, in the upper 65% range, well above both the claimed best XGBoost performance and their claimed best results. Finally I try just a single XGBoost model trained on the entire train set and evaluated on the test set, with no fiddling and no hyperparameter tuning!!! And I get 65.8% accuracy!

You can find the notebook here: https://github.com/tunguz/TabularBenchmarks/blob/main/datasets/loan/scripts/XGB_0.ipynb

So this is when I now know for sure that the paper is, for the lack of better word, junk. But just to make sure, I go to check out with the other datasets. First, Blog Feedback. I go straight to XGBoost, out of the box, no hyperparameter tuning, no nonsense. And I get 0.1505 RMSE, which is better than their reported best score of 01.56, and definitely better than the supposed 0.176 that the best XGBoost gets.

You can find the notebook here: https://github.com/tunguz/TabularBenchmarks/blob/main/datasets/blog/scripts/XGB_0.ipynb

Finally I take a look at the COMPAS dataset. This is a farily small dataset, and those are notoriously fickle and hard to draw conclusive takaways from. Here, simple out-of-the box XGBoost doesn’t perform that great. I get just 67.4% accuracy, compared to the best of 68.9 and the supposed best of 68.5 for XGBoost. But with just a little of simple playing with hyperparameters I can easily get to 69.2% accuracy:

You can find the notebook here: https://github.com/tunguz/TabularBenchmarks/blob/main/datasets/compas/scripts/XGB_0.ipynb

The big takeaway: for all three datasets used in the paper, the reported performance of XGBoost was widely inaccurate and the real performance was much better than their best results.

This is really disheartening. It took me less than half an hour to get all these results. All the XGBoost models ran in less than 10 seconds on my Mac Studio CPU. The authors of the paper obviously did not bother checking any of these results.

This lack of careful considerations is unfortunately extremely prevalent in the “NNs for tabular data research community.” One gets a feeling that whole community has an actual disdain for tabular data, and a contempt to do all the important and necessary background work to get a better understanding of their subject matter.

Images Powered by Shutterstock