Edition#1: Things I learned in applied ML and other good resources
In this edition, ml, data portrait and science-inspired art
Welcome to the 1st edition of Cross Sections 🎊 - a newsletter covering personal reflections and curated content on data science, data viz & communication.
🖇 Things I learned in applied machine learning
Applied ML is heavily tied to business metrics and is typically more concerned about goals like profitability or user journey, rather than topping the state-of-the-art benchmarks. Here are some of the stuff I learned.
Build MVP and quickly iterate
Creating a minimal viable product with a simple model first has 3 benefits:
Helps determine the feasibility of the project and refine the scope
Gets early feedback from the stakeholders, which can reduce the risk of project failure. There could be many concerns beyond the technicalities - complex behaviors from millions of users, and products could offer valuable perspectives.
Establishes a baseline, which helps gauge the opportunity space to tune up the model.
Turn one knob at a time to perform model diagnosis
During model tuning, one may need to try different components and parameters/hyperparameters for the optimal combination. Turn one knob at a time could help pinpoint performance bottlenecks.
Choose a sensible metric to start with
Nailing down a primary metric makes the objective clearer. And this metric needs to sensibly chosen such that it reflects the objective specific to the business context. Counter-intuitively, sometimes the model with the best specs & performance might not necessarily provide the best online experience. Thus it would be good if the metrics align with human assessment too.
Plan out code structure
While it’s intuitive to plan out the project structure before coding, I soon realize it’s equally important to plan out the code architecture - it could really save time from refactoring. A project may have both .py files which carry the logic and shared components, and .ipynb files which are for exploration, visualization, interpretation etc, which calls the .py files. In such a setting, it’s necessary to think in advance about the different purposes of the .py scripts. For example, one thing I learned is that it might add clarity to separate the evaluation code – which tends to be more objective and fixed, from the implementation code, which may experience more frequent changes. This could help collaboration.
While all of these above are applicable to my context, different contexts might require different practices 😇
📚 Recent Reads
This is a section I share about the books, papers, or blogs on data visualization, data science, and communication.
Data Portraits: Visualizing Black America by W. E. B. Du Bois
This book is a data portrait of African-American life from a sociologist and historian’s point of view. It captures hand-drawn graphics in bold compositions designed by W. E. B. Du Bois in his 1900 Paris Exposition. The design choice for displaying large numbers is rather unique, such as through the use of the spiral shape above.
Approaching (Almost) Any Machine Learning Problem by Abhishek Thakur
Written by 4X Kaggle grandmaster, this book is accessible and replenished with code. Just don’t expect this to have mathematical derivations or business applications.
Originals: How Non-Conformists Move the World by Adam Grant
This is a book by a Wharton professor specializing in organizational psychology. It talks about the inception of ideas, the benefits of artistic inclinations, and how to campaign creative new ideas. Many of the cases dissect the psychological reasons behind the actions and the responses.
🧉 Worth viewing
This is a section on free and open resources for endless learning.
Distinct from other courses such as fast.ai or Coursera’s deep learning specialization (which I just finished), this course focuses on the applications in the industry. It covers topics such as tooling, debugging, testing, and deployment.
🔦 Kaleidoscope
This is an ad-hoc section on curious finds.
This week we’ll take a look at science-inspired computational art - art generated with the structure of algorithms, but often with a touch of serendipity and randomness. While there are so many awesome generative artists to list, here I’ll only show a few as examples.
Math-inspired
Geometries, fractals, and parametric equations are just so suited to art-making. The piece below is a math model turned visual by Melbourne-based visualizer Marcus Volz, who has also authored an R package mathArt.
Image source: Marcus Volz Shell #2-1496
Physics-inspired
Physics tend to be closely related to interactive art, through its simulation of vectors, particles, forces, and movements. In this installation below, when viewers breathe near a sensor, it blows the on-screen dandelions.
Image source: Les Pissenlits by Michel Bret & Edmond Couchot (video)
Biology-inspired
Biology-inspired art tends to have an organic vibe. Floralform sculptures is a series of 3D printed sculptures inspired by growing leaves and blooming flowers, created through simulating a differentially growing surface and made at Nervous Systems. Mind-blowing.
Image credit: Floralform by Nervous Systems
🔖 Re-cap
How Cross Sections was started
If you have any suggestions or feedback, please don’t hesitate to send them and I’ll be keen to read them!