Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
How Generative AI Helps Predictive AI
 Originally published in Forbes, August 21, 2024 This is the...
4 Ways Machine Learning Can Perpetuate Injustice and What to Do About It
 Originally published in Built In, July 12, 2024 When ML...
The Great AI Myth: These 3 Misconceptions Fuel It
 Originally published in Forbes, July 29, 2024 The hottest thing...
Where FICO Gets Its Data for Screening Two-Thirds of All Card Transactions
 Originally published in The European Business Review, March 21,...
SHARE THIS:

11 months ago
Fashion Repeats Itself: Generating Tabular Data Via Diffusion and XGBoost

 
Originally published by Alexia Jolicoeur-Martineau, Sept 19, 2023.

Since AlexNet showed the world the power of deep learning, the field of AI has rapidly switched to almost exclusively focus on deep learning. Some of the main justifications are that 1) neural networks are Universal Function Approximation (UFA, not UFO 🛸), 2) deep learning generally works the best, and 3) it is highly scalable through SGD and GPUs. However, when you look a bit further down from the surface, you see that 1) simple methods such as Decision Trees are also UFAs, 2) fancy tree-based methods such as Gradient-Boosted Trees (GBTs) actually work better than deep learning on tabular data, and 3) tabular data tend to be small, but GBTs can optionally be trained with GPUs and iterated over small data chunks for scalability to large datasets. At least for the tabular data case, deep learning is not all you need.

In this joint collaboration with Kilian Fatras and Tal Kachman at the Samsung SAIT AI Lab, we show that you can combine the magic of diffusion (and their deterministic sibling conditional-flow-matching (CFM) methods) with XGBoost, a popular GBT method, to get state-of-the-art tabular data generation and diverse data imputations.  To make it accessible to everyone (not just AI researchers but also statisticians, econometricians, physicists, data scientists, etc.), we made the code available through a Python library (on PyPI) and an R package (on CRAN). See our Github for more information. [Note: The R code will be released soon.]

To continue reading this article, click here.

15 thoughts on “Fashion Repeats Itself: Generating Tabular Data Via Diffusion and XGBoost

  1. In the context of fashion trends, leveraging advanced machine learning techniques such as diffusion models and XGBoost can offer groundbreaking insights. For instance, when analyzing the popularity of a pharmaceutical product like Ozempic in South Africa, these models can predict shifts in consumer interest or behavior by https://mexicanweightlosspills.com/ generating tabular data that captures patterns over time. By applying these techniques, stakeholders can identify potential cycles in fashion or product usage, thus understanding how historical trends might influence future demands. This approach not only enhances predictive accuracy but also provides a strategic edge in market analysis and planning.

     
  2. Unlock the secrets of Wordle puzzles with our daily hints and answers! wordle hint today Stop struggling with tricky words and get the solution you need to keep your winning streak alive. Check back daily for the latest answers and helpful hints.

     
  3. Great article, Alexia! Your innovative combination of diffusion models with XGBoost for tabular data generation is impressive. The clear explanations and practical accessibility make this a valuable resource for many professionals. Discover the allure of a Maroon Leather Jacket and transform your fashion sense.

     

Leave a Reply