Feature engineering is just easier

You can’t deep-learn your way out of everything — or you wouldn’t necessarily want to, even if you could.

Published in
5 min readNov 21, 2016

--

The staggering proliferation of deep learning architectures in the past few years is evidence for the maxim that architecture engineering is the new feature engineering. It used to be that every new advance in machine learning relied on some clever feat of feature engineering — a tweak taking the raw data and exposing its characteristics in some way that a learning algorithm could exploit. Everyone had access to the same few learning algorithms, so feature engineering was the easiest — and often the only — way to differentiate yourself from the pack.

The pendulum has, however, swung sharply away from hand-guided feature engineering. Convolutional neural networks, to take the best-known example, can learn feature transformations for all sorts of computer vision tasks (among others), given only a dataset and an objective function. Turns out, convolutional neural networks are a good fit for image in part because their performance is “translation-invariant” — i.e., they do the same thing no matter where in the input they are looking. This property has stood them in good stead on problem domains outside of vision, as well, leading to some rather bullish declamations:

Similarly, recurrent neural network architectures have been used to do feature learning for NLP and tackle hard problems like machine translation. These architectures reflect the nature of the problem and elegantly exploit the structure of the data, and this is one reason they work so well. More and more, the “right” way to tackle many machine learning problems is to define a network architecture and objective function that conforms with the problem and the shape of the data.

Building Solutions, With or Without Architectures

Here’s the (not so) dirty secret: this kind of “architecture engineering” is hard — and by hard I mean expensive. Finding the “right” network for the job, especially if one doesn’t exist yet (but make sure to try ResNets first!) is not, strictly speaking, an engineering effort. Fiddling with deep learning architectures is still definitely in the realm of research — there are few established best practices, the risk of failure is high, and personnel with the necessary expertise are rare and costly. Deep learning slays the competition in object detection, image captioning, and machine translation, but small deformations of these more commonly researched problems can make duds of even the dearest of deep learning darlings. And doubly worth noting given all the deep learning hype is that manual feature engineering still provides an edge in image retrieval and tagging, among a plethora of other tasks.

In two of Lab41’s recent endeavors, D*Script and Pythia, we pursued machine learning solutions to real-world problems — identifying writers of handwritten documents and detecting novel documents in large corpora, respectively. In both projects, some combination of “hand-engineered” features rose to the top of the heap, despite a lot of experimentation with deep learning-based approaches. In D*Script, I bet we could have found more competitive solutions using deep learning if we had spent more time looking, or if we had hired a whole team of recent NYU PhDs. But that would have been prohibitively expensive. And when you add in constraints like “not much data available,” it becomes harder to say whether deep learning will ever be able to do the job for you.

Feature engineering, on the other hand, is cheaper than ever. As Anna pointed out in her post on experiment logging with Sacred, firing off thousands of experiments testing different feature configurations is nearly trivial. And though we aren’t half as clever as the many Kaggle winners whose hair-raising feature engineering exploits would make any machine learning enthusiast’s heart skip a beat, calculating a time lag feature or rolling mean takes on the order of minutes, and assessing its usefulness scarcely longer. Devising, training, and testing an end-to-end deep learning framework takes a bit more time than that.

Coming in Under Budget

“Hello! Are you dead yet? Cause… I’m sure not!”

So what deep learning has introduced isn’t so much the death of hand-tuned features, but instead a richer continuum along the risk-reward axis between feature engineering and feature learning. Hand-tuned features combined with versatile, robust learners like XGBoost are a reasonably low-cost effort that can often yield satisfactory results — and if they don’t, who cares? In the upper stratosphere of academic and industrial machine learning, deep learning has almost entirely taken over, but it’s no accident that the field is dominated by a few large companies, and almost everyone involved has a PhD from one of a handful of programs. It’s still an expert’s game — and these days it does make more sense to have the experts spend their time designing sensible network architectures instead of chasing down the One True Feature.

Architecture engineering is getting cheaper, too. Though there’s still a long way to go, efforts such as Keras have done a lot to make deep learning more accessible and tinkerable. And it is only going to get easier. In the meantime, hand-engineering features for your problem isn’t necessarily some rearguard action, undertaken on behalf of a desperate ancien regime that doesn’t know anything else. Sometimes, feature engineering is just the correct, economical choice.

Lab41 is a Silicon Valley challenge lab where experts from the U.S. Intelligence Community (IC), academia, industry, and In-Q-Tel come together to gain a better understanding of how to work with — and ultimately use — big data.

Learn more at lab41.org and follow us on Twitter: @_lab41

--

--