Itai Yanai: Here are my 12 guidelines for data exploration and analysis with the right attitude for discovery
Itai Yanai, Professor at the Department of Biochemistry and Molecular Pharmacology at NYU, shared on X/Twitter:
“Here are my 12 guidelines for data exploration and analysis with the right attitude for discovery:
1. You never really finish analyzing a dataset. You just decide to stop and move on at some point, leaving some things undiscovered.
2. Analyzing the data is too important to be left to the standard pipelines. Explore it! Instead of going straight to high-level summaries (averages of averages), plot and visualize each intermediate step and meditate on how things look.
3. Don’t wait for the perfect dataset; you’ll wait forever. Explore right away the good-enough dataset.
4. One of the first things you must do with a new dataset is come to terms with its inherent limitations. The quality is never ideal, and it’s just as important to know what the dataset won’t be able to do as what it might still be able to reveal.
5. If you’re lost in the data – make a map. Have the mindset of an explorer, where at every turn something unexpected might be glimpsed.
6. Look for a new pattern in the data and imagine what may explain it. This is how a new hypothesis has a chance at being born.
7. Datasets don’t come with labels marking what is new and exciting about them. Figuring that out is not simple and cannot be automated. Rather, discovery is an act of self-expression and creativity. Different people will make different discoveries with the same dataset.
8. When analyzing what is present in a dataset, it’s just as important to consider what is absent. What are your expectations, and isn’t it interesting if one of them was violated?
9. There is no straightforward protocol for analyzing data for discovery. In exploratory data analysis, there are only operations and tools, and applying each one can lead to a lot of thought on what the result means and what is the next step.
10. Taking control of the creative process actually means losing control of the initial direction and following the data wherever it leads.
11. Big data is nothing without big thinking.
12. And remember, always plot your data! There could be gorillas hiding in there!”
For the article click here.
Source: Itai Yanai/X
-
ESMO 2024 Congress
September 13-17, 2024
-
ASCO Annual Meeting
May 30 - June 4, 2024
-
Yvonne Award 2024
May 31, 2024
-
OncoThon 2024, Online
Feb. 15, 2024
-
Global Summit on War & Cancer 2023, Online
Dec. 14-16, 2023