Data Science
Since my first QTL identification with some specific software ten years ago in my PhD training,
I am facing more and more data analyses everyday now. I then learned to identify the QTL by coding myself with R and SAS in my first postdoc training in Columbia, MO/USDA-ARS. Thank you, Dr. Mike McMullen and Dr. Sherry Flint-Garcia! In my current lab, I am challenged with a lot of phenotype data, from both field and controlled environments, not just genomics data and QTL/GWAS analysis any more.
As we all know, Data Science is the combination of statistics, mathematics, programming, problem-solving and capturing data in smart ways. But a lot of people don't know that agriculture science data is really a field that's great for any data scientist to grow. Dealing with different dimensional, structured and unstructured data sets every week, data science comprising of everything that related to data cleaning, wrangling, preparation, and analysis gets practiced again and again! Meanwhile, my ability to extract insights and information from data and the ability to look at things differently are reshaped every year.
Machine Learning Certificate from Coursera
- You can also reach my credential from the link here:
- https://www.coursera.org/account/accomplishments/verify/WQ29ND594D8G
Examples of figures I made during my recent researches:
1. Genomic selection (Ridge regression was running 1000 times with randomly 70% datapoints pick as training panel and another 30% for cross validation testing each time to predict some root traits from 3d model of gel-growing corn plants. Left-boxplot)
2. G x E analysis (The data set was collected from three years trial at a farm in UIUC. To figure out how stable/flexible the corn plants performed across years, a Finlay-Wilkinson model was fitted. Then heat map indicating the stability/adaptability/plasticity of one specific genotype across environments.Middle-heat map)
3. GWAS manhatton plot (same population but phenotyped with different platform, gel vs field. Right panel. Allele effect size plot is shown below-Left panel)
4. Time series analysis in tomato (three different non-linear models were fitted to show different absolute growth rate between two genotypes with time elapsed. Right-Linechart)
5. OPLS-DA is used for diagnosing differences between two groups (blue vs red) in our tomato data (below,left panel).
6. Linear regressions in our UIUC corn data (Above-A,B,and C).
7. Randomly picked cool figures I made, see below.
