Machine Learning Times
Machine Learning Times
EXCLUSIVE HIGHLIGHTS
Why You Must Twist Your Data Scientist’s Arm To Estimate AI’s Value
 Originally published in Forbes, June 11, 2024. If you’ve...
3 Ways Predictive AI Delivers More Value Than Generative AI
 Originally published in Forbes, March 4, 2024. Which kind...
AI Success Depends On How You Choose This One Number
 Originally published in Forbes, March 25, 2024. To do...
Elon Musk Predicts Artificial General Intelligence In 2 Years. Here’s Why That’s Hype
 Originally published in Forbes, April 10, 2024 When OpenAI’s...
SHARE THIS:

7 months ago
Fashion Repeats Itself: Generating Tabular Data Via Diffusion and XGBoost

 
Originally published by Alexia Jolicoeur-Martineau, Sept 19, 2023.

Since AlexNet showed the world the power of deep learning, the field of AI has rapidly switched to almost exclusively focus on deep learning. Some of the main justifications are that 1) neural networks are Universal Function Approximation (UFA, not UFO 🛸), 2) deep learning generally works the best, and 3) it is highly scalable through SGD and GPUs. However, when you look a bit further down from the surface, you see that 1) simple methods such as Decision Trees are also UFAs, 2) fancy tree-based methods such as Gradient-Boosted Trees (GBTs) actually work better than deep learning on tabular data, and 3) tabular data tend to be small, but GBTs can optionally be trained with GPUs and iterated over small data chunks for scalability to large datasets. At least for the tabular data case, deep learning is not all you need.

In this joint collaboration with Kilian Fatras and Tal Kachman at the Samsung SAIT AI Lab, we show that you can combine the magic of diffusion (and their deterministic sibling conditional-flow-matching (CFM) methods) with XGBoost, a popular GBT method, to get state-of-the-art tabular data generation and diverse data imputations.  To make it accessible to everyone (not just AI researchers but also statisticians, econometricians, physicists, data scientists, etc.), we made the code available through a Python library (on PyPI) and an R package (on CRAN). See our Github for more information. [Note: The R code will be released soon.]

To continue reading this article, click here.

14 thoughts on “Fashion Repeats Itself: Generating Tabular Data Via Diffusion and XGBoost

  1. In the context of fashion trends, leveraging advanced machine learning techniques such as diffusion models and XGBoost can offer groundbreaking insights. For instance, when analyzing the popularity of a pharmaceutical product like Ozempic in South Africa, these models can predict shifts in consumer interest or behavior by https://mexicanweightlosspills.com/ generating tabular data that captures patterns over time. By applying these techniques, stakeholders can identify potential cycles in fashion or product usage, thus understanding how historical trends might influence future demands. This approach not only enhances predictive accuracy but also provides a strategic edge in market analysis and planning.

     
  2. Unlock the secrets of Wordle puzzles with our daily hints and answers! wordle hint today Stop struggling with tricky words and get the solution you need to keep your winning streak alive. Check back daily for the latest answers and helpful hints.

     
  3. Great article, Alexia! Your innovative combination of diffusion models with XGBoost for tabular data generation is impressive. The clear explanations and practical accessibility make this a valuable resource for many professionals. Discover the allure of a Maroon Leather Jacket and transform your fashion sense.

     

Leave a Reply