Haiyuan Cao

Redmond, Washington, United States
9K followers 500+ connections

View mutual connections with Haiyuan

Welcome back

Email or phone

Password

Forgot password?

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to follow

Microsoft

Columbia University in the City of New York

About

Principal SDE in Azure AI. Working on infusing AI into product, including big LLM models,…

Activity

GPT5 in Azure AI Foundry! So excited to see what people can build with the next generation of model. https://xmrrwallet.com/cmx.paka.ms/GPT-5-blog

GPT5 in Azure AI Foundry! So excited to see what people can build with the next generation of model. https://xmrrwallet.com/cmx.paka.ms/GPT-5-blog

Liked by Haiyuan Cao
With custom instructions tailored to your repo, agents like GitHub Copilot coding agent can work faster and write higher quality code. But writing…

With custom instructions tailored to your repo, agents like GitHub Copilot coding agent can work faster and write higher quality code. But writing…

Liked by Haiyuan Cao
I'm more optimistic than ever that we at OpenAI can eliminate hallucinations. There's still more research to be done, but GPT-5 is solid progress. 🚀

I'm more optimistic than ever that we at OpenAI can eliminate hallucinations. There's still more research to be done, but GPT-5 is solid progress. 🚀

Liked by Haiyuan Cao

Join now to see all activity

Experience

Microsoft

Bellevue, WA
-

Greater New York City Area
-

Greater New York City Area
-

Shanghai
-

United States

Education

Columbia University in the City of New York

2015 - 2016

Activities and Societies: Columbia Data Science Society

Focus on Machine Leaning, Platform for Big Data Analysis and Developing Data Driven Product
2010 - 2015

Activities and Societies: Member of American Physical Society, Referee of the journal 'Nanotechnology' (impact factor 3.821)

Major in computational simulation, mathematical modelling and data analysis in energy transport in nano-structures and magnetic properties in iron superconductors. Proposed an algorithm in calculating the magnetic interaction and co-proposed an global optimization method in search the structure of complex grain boundaries.
2006 - 2010

Licenses & Certifications

Model Thinking 模型思维（中文版）

Coursera Verified Certificates

Issued Mar 2015

Credential ID 7BXSL9JT23

See credential
An Introduction to Interactive Programming in Python

Coursera Verified Certificates

Issued Nov 2014

Credential ID 6DHAMGBB4Y

See credential
Data Analysis and Statistical Inference

Coursera Verified Certificates

Issued Nov 2014

Credential ID HQ7V8CPG7B

See credential
Introduction to Data Science

Coursera

Issued Sep 2014

See credential
Machine Learning

Coursera

Issued Sep 2014

See credential
Regression Models

Coursera Verified Certificates

Issued Sep 2014

Credential ID DUL5EF2LZU

See credential
Getting and Cleaning Data

Coursera Verified Certificates

Issued Aug 2014

Credential ID SR756QGREE

See credential
R Programming

Coursera Verified Certificates

Issued Aug 2014

Credential ID BPMZL3L3Y3

See credential
The Data Scientist’s Toolbox

Coursera Verified Certificates

Issued Jul 2014

Credential ID GW9DZD3T7D

See credential
edX Honor Code Certificate for Introduction to Computer Science and Programming Using Python

edX

See credential

Join now to see all certifications

Volunteer Experience

Team Member of Youth Ambassador Program for Minorities （YAPM）

Technology and Education: Connecting Cultures, Inc. (TECC) - 501c3

Jul 2010 - Present 15 years 2 months

Education

China, home to 55 minority groups, enjoys a rich ethnic and cultural diversity. However, minority cultures are at risk of being marginalized by economic modernization and national education. Confronted with the outside influences of Western and Han culture, local youngsters are unaware of the role they should play in preserving their own culture.
In this project, we went to a remote village of Honghe Hani and Yi Autonomous Prefecture in Yunnan Province. Out of all Hani Villages in Honghe…

China, home to 55 minority groups, enjoys a rich ethnic and cultural diversity. However, minority cultures are at risk of being marginalized by economic modernization and national education. Confronted with the outside influences of Western and Han culture, local youngsters are unaware of the role they should play in preserving their own culture.
In this project, we went to a remote village of Honghe Hani and Yi Autonomous Prefecture in Yunnan Province. Out of all Hani Villages in Honghe in north Yunnan Province, only a little part of villages still speak the Hani language. While most programs address minority culture preservation through recording and documenting carried out by outside observers, we actively involved local youth by making them ambassadors of their own cultures. The vision of us is to awaken a sense of responsibility among local youngsters and empower them to play a positive role in protecting their own culture and constructively impacting their surroundings.
Project objective:
1. Raised awareness about quintessential elements of Buyi culture and stress the importance of cultural preservation among local youth.
2. Educated local youngsters to use digital cameras, audio recorders and the Internet to record the Hani culture. Encourage local students to communicate with the outside world and provide a platform for global interaction.
3. Devised an organized and effective ethnic culture course to incorporate into local schools’ daily curriculum.
4. Refined a local culture preservation model which other minority groups can adopt.
Referee

Nanoscale

Sep 2011 - Present 14 years

Science and Technology

Initiated as the referee of the leading peer reviewed journal focused in Nano-science <Nanoscale>.

Publications

Thermal conductivity of disordered two-dimensional binary alloys

Nanoscale September 7, 2016

Using advanced statistical simulations, we have studied the effect of disorder on the thermal conductivity of two-dimensional alloys. We find that the thermal conductivity not only depends on the substitution concentration of different elements, but also strongly depends on the disorder distribution.

See publication
Giant biquadratic interaction-induced magnetic anisotropy in the iron-based superconductor AxFe2−ySe2

Physical Review B January 15, 2016
Other authors
See publication
Oxygen Vacancy Induced Flat Phonon Mode at FeSe /SrTiO3 interface

Nature Scientific Reports June 12, 2015
Other authors
See publication
Antiferromagnetic ground state with pair-checkerboard order in FeSe

Physical Review B January 26, 2015
Other authors
See publication
Tuning the band structure and superconductivity in single-layer FeSe by interface engineering

Nature Communications September 26, 2014
Other authors
See publication
Measurement of an Enhanced Superconducting Phase and a Pronounced Anisotropy of the Energy Gap of a Strained FeSe Single Layer in FeSe/Nb: SrTiO 3/KTaO 3 Heterostructures Using Photoemission Spectroscopy

Physical Review Letters (Top journal in physics community) March 11, 2014

See publication
What are grain boundary structures in graphene?

Nanoscale January 31, 2014
Other authors
See publication
Interfacial effects on the spin density wave in FeSe/SrTiO3 thin films

Physical Review B January 6, 2014
Other authors
See publication
Unexpected large thermal rectification in asymmetric grain boundary of graphene

Solid State Communications July 21, 2012
Other authors
See publication
Layer and size dependence of thermal conductivity in multilayer graphene nanoribbons

Physics Letters A January 9, 2012
Other authors
See publication

Join now to see all publications

Courses

Algorithms for Data Science

CSOR 4246
Bayesian Model in Machine Learning

EECS 6720
Computer Systems for Data Science

COMS 4121
Data Mining

STAT 4240
Exploratory Data Analysis and Visualisation

STAT 4701
Foundations of Graphical Models

STAT 6701
Introduction to Databases

COMS 4111
Natural Language Processing

COMS 4705
Statistical Machine Learning

STAT 4400

Projects

(Kaggle Like) Rang-Tech Data Analytics Competition

May 2016
This is a Kaggle like competition which used the transaction data to predict the active customer.

i. Understanding the data, clean the data and subset the data.
We are not provided with a background intro to the data so we spend some time looking into the each variable and tried to find some pattern. Luckily we finally found some correlation between variables and then grouped and reduced the number of variables.
We do not use the…

This is a Kaggle like competition which used the transaction data to predict the active customer.

i. Understanding the data, clean the data and subset the data.
We are not provided with a background intro to the data so we spend some time looking into the each variable and tried to find some pattern. Luckily we finally found some correlation between variables and then grouped and reduced the number of variables.
We do not use the features directly, and we do the feature engineering carefully for each feature. For some feature has outliers ,we eliminate those outliers. For some feature has the range value is quite wide, we do the sqrt transform. For some data, we also found that the NAs occur in all the records about food so we decided to train separate models on the data containing food NAs and those without NAs, We do really a lot of work on feature transformation and engineering.

ii. Add new features. We tried with the variables from the data but cannot make progess when we hit approximately 68% in public leader board. One teammate found a paper describing some interesting features to be used in the customer classification using transaction data. In that paper the authors introduced the variable "number of NAs and number of 0 for each customers" are quite important for final prediction of the active customer, so we add these features to our result and the model give the result goes beyond 69%.

iii. Ensemble methods. We first tried a single model but stopped at around 69%. After that we tried to combine 13 kinds of models with both parametric and non-parametric machine learning method. Based on these prediction models, we use the 2-layer 5-fold stacking method ensemble the output of the first-layer models.

Other creators
See project
Entity Resolution Matching between Foursquare and Locu’s dataset

Apr 2016
1. Take two datasets from Foursquare and Locu that describe the same entities, and identify which entity in one dataset is the same as an entity in the other dataset.
2. We construct some features according to the input dataset. We construct the features hiversine_distances for the location information including longitude and lattitude. The 'name' and 'address' information are evaluated used the jaccard similarity score for both the whole entry and each character in the entry. The 'phone…

1. Take two datasets from Foursquare and Locu that describe the same entities, and identify which entity in one dataset is the same as an entity in the other dataset.
2. We construct some features according to the input dataset. We construct the features hiversine_distances for the location information including longitude and lattitude. The 'name' and 'address' information are evaluated used the jaccard similarity score for both the whole entry and each character in the entry. The 'phone number' is evaluted through the simple matching. The missing values in 'phone number' and 'address' are also marked by the dummy variable feature.
3. In our algorithm, we combine the records in the locu train dataset and the foursquare train dataset, featurize the dataset and then add the tag that whether they are in the matched list or not. Then we use the training data to train the random forest classifier. The number of trees are chosen by the cross validation method and the number of features are used the general "sqrt" method. Finally we choose the random forest classifier with the 100 trees according to the cross-validation F1 score.
4. Here we set a threshold 0.53 which comes the cross validation used in the matching method. For several matched items in the test dataset through the random forest classifier, we use the matched item with the highest probability.
5. Our result has precision 100%, recall 98.33% and F1 score 99.16%.

Other creators
Using AWS Cloud Platform and Spark Machine Learning to Recommend Music and with the Last.fm’s Audioscrobbler Data Set

Apr 2016
1. Using the data set published by Audioscrobbler with 24.2 million records about user’s player of artists to build the music recommender engine.
2. Implementing the alternating least squares recommender algorithm through the MLLib on Spark to build the music recommener
3. Preprocessing the raw data set using python functional programming to correct the misspelled or nonstandard artist’s ID
4. Using cross validation on Spark to select the hyperparameters for the matrix factorization…

1. Using the data set published by Audioscrobbler with 24.2 million records about user’s player of artists to build the music recommender engine.
2. Implementing the alternating least squares recommender algorithm through the MLLib on Spark to build the music recommener
3. Preprocessing the raw data set using python functional programming to correct the misspelled or nonstandard artist’s ID
4. Using cross validation on Spark to select the hyperparameters for the matrix factorization model
5. Implement the final model on AWS platform to handle the huge amount of data

Other creators
Using Hadoop Hive and Mapreduce to analysis Nasa Server Logs

Apr 2016

1. Dealing with the data set contains Apache Logs gathered by NASA's server in the months of July-October, 1995, which is around 1 GB using the HDFS.
2. Create a schema for the dataset in Hive through the regular expression to describe a concrete structure describing all the required fields.
3. Make the plot to depicting the number of requests made in a day for every day in the month of October.
4. Write a MapReduce job to calculate total bandwidth add all the response bytes sent by…

1. Dealing with the data set contains Apache Logs gathered by NASA's server in the months of July-October, 1995, which is around 1 GB using the HDFS.
2. Create a schema for the dataset in Hive through the regular expression to describe a concrete structure describing all the required fields.
3. Make the plot to depicting the number of requests made in a day for every day in the month of October.
4. Write a MapReduce job to calculate total bandwidth add all the response bytes sent by NASA webserver.
Zynga Game Payer Prediction and User Pattern Analysis

Apr 2016

1. Processing real user data and metrics from Zynga platform with 1 million user records and 247 features.
2. Implemented Lasso, ridge regression with logistic regression and random forest method to select the important features in predicting whether the user would be a payer.
3. Ensemble the stochastic gradient descent classifier with perceptron, log and hinge loss function, the knn method and the decision tree method with the selected important features to predict the payer. The…

1. Processing real user data and metrics from Zynga platform with 1 million user records and 247 features.
2. Implemented Lasso, ridge regression with logistic regression and random forest method to select the important features in predicting whether the user would be a payer.
3. Ensemble the stochastic gradient descent classifier with perceptron, log and hinge loss function, the knn method and the decision tree method with the selected important features to predict the payer. The precision, recall and F1 score all reach up to 95%.
4. Using Kmeans++ method with the important features to cluster the user patterns on Zynga platform. The number of cluster is determined by the elbow method. Using cluster method, we can correctly reveal the difference pattern between paying users, the risk-prefer user and the mature user.
5. Based on the user pattern, we propose the strategy to hold campaign between different group of users to improve the engagement of users.
Handwriting Recognizing by SVM and Adaboost Supervised Learning with R

Nov 2015

1. Processed JPEG data from the USPS open handwriting datasets data into the matrix with R.
2. Implemented the non-linear SVM method and Adaboost with R to recognize the handwriting numbers.
3. Chosen the kernel and margin parameters through cross validation to improve the recognized rate to 90%.
Document Text Classification Using Lasso/Ridge Regression and Naïve Bayes

Oct 2015

1. Building an efficient Naïve Bayes classifier to classify the papers belonging to Hamilton or Madison with the help of natural language processing package of R
2. Implementing the Ridge regression, Lasso and mutual information selection, respectively, to remove the irrelevant features in the text documents to improve the efficiency of the Bayes classifier.
Mining the NYPD Open Datasets to Predict the Danger Area for Car Collision in NYC on AWS Cloud Platform

Oct 2015
1. Cleaned, processed and selected a bunch of features to find correlation between the rate of vehicle collisions and the location, time and weather of the driving route with R script through the API of NYC open dataset.
2. Applied normalization and PCA for the features of data, then implementing the unsupervised K-means++ method on AWS Cloud Platform with Spark, obtain the heat map of high danger area in NYC with the inputting time and driving route.

Other creators
Study the Relation Between Users’ Sentiment and Location Tags in Twitter with SQL and

Sep 2014

1. Processed tweets from Twitter Streaming API to extract tweets with locations tags using Python and SQL
2. Done sentiment analysis by writing the classifiers with python: naive Bayes classifier, maximum entropy classifier and support vector machines. The NLTK package is used to parse and analyze each tweet.
3. Improved the accuracy of self-written machine learning classifier by using the bi-grams, tri-grams and word dictionaries. The accuracy is around 80%.
Predict the SSE Index by Bouchard-Sornette option pricing-model

May 2011

Developed C code to implement Bouchard-Sornette option pricing-model to predict the SSE Index
Computational study of the phase transition in the Hexagonal Ising Model

Nov 2010

Implemented Wolff-Monte-Carlo method by C code to study the phase transition in hexagonal Ising model.
Developing New Global Optimization Algorithm for Material Science with Hadoop

Feb 2013 - Jun 2014

1. Proposed a new global optimization algorithm for functional material searching based on the differential evolution algorithm using python.
2. Utilized new algorithm to find the grain boundary structures with lower formation energy on Hadoop.
3. Design A/B test to select the components in the algorithm to make the optimization efficient.
Developing High Efficient Algorithm in Scientific Computation

Nov 2011 - May 2012

1. Developed the efficient algorithm with python to accelerate parallel large-scale data-analysis on Hadoop.
2. Accelerating the efficiency of calculation 10 times without lost the major accuracy comparing to the previous.
Computational Study of the Energy Transport in Nanostructures, Fudan University

Sep 2009 - Sep 2011

1. Developed python code to simulate the thermal transport in graphene-based materials.
2. Using multivariate numerical method with R to analyze the datasets obtained from the experiments.
3. Designed a new kind of 2D thermal rectifier and publish in peer-reviewed paper (top cited paper in journal)

Honors & Awards

Rank 91/2070 (top 5%) in Kaggle Two Sigma Financial Model Challenge

Kaggle

Mar 2017

As a member in the team attending Kaggle Two Sigma Financial Model Challenge. Implementing time-series feature engineering and linear/tree regressors to build the model which achieve top 5% score in the private leaderboard on test dataset.
Brown Medal in Hackerrank Coding Contest (top 15%)

Hackerrank

Sep 2016

Top 15% in Hackerrank Week of Code 23 contest with 10000+ attendees.
Rank No.1 among 279 teams in Rang-Technology Data Analytic Competition (Kaggle like data competition)

Rang-Technology

Jun 2016

https://xmrrwallet.com/cmx.prang.shinyapps.io/Competition/

Rank No.1 among 279 teams composed of Master students around 50 Universities, including CMU, Columbia, Cornell, USC, UIUC etc in the Rang-Tech Data Analytics Competition, a Kaggle like competition which used the transaction data to predict the active customer.
2015 Web of Science Highly Cited Paper Worldwide (First author)

Thomson Reuters

Mar 2016

My first author paper published on Physical Review B about the theoretical computation on magnetic materials "Antiferromagnetic ground state with pair-checkerboard order in FeSe " has been selected as the "Highly Cited Paper" in 2015 period.

My first author paper has been selected as the top 1% high quality science paper from the about 1170000 papers published in physics related subjects. This is the most prestigious criteria about the research impact in the science research field.
National Scholarship

Ministry of Education of People's Republic of China

Nov 2014

Top honor for the best academic achievement of graduate student in China.
Fellowship for Graduate Student’s Short-term International Visiting (to Lawrence Berkeley National Lab)

Fudan University

Sep 2011

Fellowship for excellent graduate student to visit top-class institutions worldwide.
Distinguished award for new graduate student

Fudan University

Sep 2010

For the excellent new coming graduate student.

Languages

Mandarin

Native or bilingual proficiency
English

Professional working proficiency

Organizations

American Physics Society

Member

Sep 2011 - Present

Student member of the American Physics Society. Give two oral talks in the 2013 and 2014 APS Annual March Meeting.

More activity by Haiyuan

Meet GPT-5 - our smartest, fastest and most useful model. It is a unified system that automatically switches between providing a quick response and…

Meet GPT-5 - our smartest, fastest and most useful model. It is a unified system that automatically switches between providing a quick response and…

Liked by Haiyuan Cao
Here at OpenAI we've cracked pretraining, then reasoning, and now we're experimenting with a new set of techniques that maximally leverage their…

Here at OpenAI we've cracked pretraining, then reasoning, and now we're experimenting with a new set of techniques that maximally leverage their…

Liked by Haiyuan Cao
At Zoom we’re thrilled to be among the first to integrate OpenAI’s GPT-5 into our federated AI architecture. Zoom AI Companion is powered by our…

At Zoom we’re thrilled to be among the first to integrate OpenAI’s GPT-5 into our federated AI architecture. Zoom AI Companion is powered by our…

Liked by Haiyuan Cao
I’m hiring STRONG Engineers for data pipelines and analysis, cluster infrastructure, deep learning infrastructure, GPUs acceleration and utilisation,…

I’m hiring STRONG Engineers for data pipelines and analysis, cluster infrastructure, deep learning infrastructure, GPUs acceleration and utilisation,…

Liked by Haiyuan Cao
Welcome to the era of GPT-5: OpenAI’s most advanced model yet. 🖐️ And it’s rolling out to all paid GitHub Copilot plans, starting today. In our…

Welcome to the era of GPT-5: OpenAI’s most advanced model yet. 🖐️ And it’s rolling out to all paid GitHub Copilot plans, starting today. In our…

Liked by Haiyuan Cao
Today marks a major milestone: GPT-5 is now live in Microsoft 365 Copilot and Copilot Studio. This unlocks a new level of capability for our…

Today marks a major milestone: GPT-5 is now live in Microsoft 365 Copilot and Copilot Studio. This unlocks a new level of capability for our…

Liked by Haiyuan Cao
https://xmrrwallet.com/cmx.plnkd.in/gu4Ecdin make sure to tune in at 10am PT!!!

https://xmrrwallet.com/cmx.plnkd.in/gu4Ecdin make sure to tune in at 10am PT!!!

Liked by Haiyuan Cao
Career advice: Live modestly, save money Put your family and health first Keep learning - don't live on your laurels No jerks: don't work for 'em…

Career advice: Live modestly, save money Put your family and health first Keep learning - don't live on your laurels No jerks: don't work for 'em…

Liked by Haiyuan Cao
The public preview for BigQuery Advanced Runtime is here! I'm excited to share our blog post that details how we're boosting throughput and reducing…

The public preview for BigQuery Advanced Runtime is here! I'm excited to share our blog post that details how we're boosting throughput and reducing…

Liked by Haiyuan Cao
What is it like to be an AI Product Manager? Here are 3 very important things we did to launch a Data Science Agent at Google 👇 There's 2 types of…

What is it like to be an AI Product Manager? Here are 3 very important things we did to launch a Data Science Agent at Google 👇 There's 2 types of…

Liked by Haiyuan Cao
A new chapter: I am excited to share that I have recently joined Anthropic as a member of technical staff. Anthropic is a unique company with an even…

A new chapter: I am excited to share that I have recently joined Anthropic as a member of technical staff. Anthropic is a unique company with an even…

Liked by Haiyuan Cao
Amazon Nova models were recognized among the top performers in a new research from #Aymara which involved testing 20 leading language models for…

Amazon Nova models were recognized among the top performers in a new research from #Aymara which involved testing 20 leading language models for…

Liked by Haiyuan Cao
We are looking to hire a research engineer on the Universal Knowledge team in Google DeepMind in Zurich. This is an exciting area of research, in…

We are looking to hire a research engineer on the Universal Knowledge team in Google DeepMind in Zurich. This is an exciting area of research, in…

Liked by Haiyuan Cao
Truly grateful and humbled to receive the award. It's gratifying to see this 13-year old work continues to be useful, and exciting to witness how…

Truly grateful and humbled to receive the award. It's gratifying to see this 13-year old work continues to be useful, and exciting to witness how…

Liked by Haiyuan Cao
CoreAI is hiring a Principal Technical Program Manager to help us build the future of coding with AI. https://xmrrwallet.com/cmx.plnkd.in/gjikmvmk…

CoreAI is hiring a Principal Technical Program Manager to help us build the future of coding with AI. https://xmrrwallet.com/cmx.plnkd.in/gjikmvmk…

Liked by Haiyuan Cao
As students get ready to start classes around the world, we're making our most advanced AI tools available to college students in the US, Japan…

As students get ready to start classes around the world, we're making our most advanced AI tools available to college students in the US, Japan…

Liked by Haiyuan Cao

View Haiyuan’s full profile

See who you know in common
Get introduced
Contact Haiyuan directly

Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Others named Haiyuan Cao

21 others named Haiyuan Cao are on LinkedIn

See others named Haiyuan Cao

Add new skills with these courses

See all courses

Haiyuan Cao

Redmond, Washington, United States 9K followers 500+ connections

About

Activity

GPT5 in Azure AI Foundry! So excited to see what people can build with the next generation of model. https://xmrrwallet.com/cmx.paka.ms/GPT-5-blog

Liked by Haiyuan Cao

With custom instructions tailored to your repo, agents like GitHub Copilot coding agent can work faster and write higher quality code. But writing…

Liked by Haiyuan Cao

I'm more optimistic than ever that we at OpenAI can eliminate hallucinations. There's still more research to be done, but GPT-5 is solid progress. 🚀

Liked by Haiyuan Cao

Experience

-

-

-

-

Education

Licenses & Certifications

Volunteer Experience

Team Member of Youth Ambassador Program for Minorities （YAPM）

Technology and Education: Connecting Cultures, Inc. (TECC) - 501c3

Referee

Nanoscale

Publications

Nanoscale September 7, 2016

Physical Review B January 15, 2016

Nature Scientific Reports June 12, 2015

Physical Review B January 26, 2015

Nature Communications September 26, 2014

Physical Review Letters (Top journal in physics community) March 11, 2014

Nanoscale January 31, 2014

Physical Review B January 6, 2014

Solid State Communications July 21, 2012

Physics Letters A January 9, 2012

Courses

Algorithms for Data Science

CSOR 4246

Bayesian Model in Machine Learning

EECS 6720

Computer Systems for Data Science

COMS 4121

Data Mining

STAT 4240

Exploratory Data Analysis and Visualisation

STAT 4701

Foundations of Graphical Models

STAT 6701

Introduction to Databases

COMS 4111

Natural Language Processing

COMS 4705

Statistical Machine Learning

STAT 4400

Projects

May 2016

Entity Resolution Matching between Foursquare and Locu’s dataset

Apr 2016

Using AWS Cloud Platform and Spark Machine Learning to Recommend Music and with the Last.fm’s Audioscrobbler Data Set

Apr 2016

Using Hadoop Hive and Mapreduce to analysis Nasa Server Logs

Apr 2016

Zynga Game Payer Prediction and User Pattern Analysis

Apr 2016

Handwriting Recognizing by SVM and Adaboost Supervised Learning with R

Nov 2015

Document Text Classification Using Lasso/Ridge Regression and Naïve Bayes

Oct 2015

Mining the NYPD Open Datasets to Predict the Danger Area for Car Collision in NYC on AWS Cloud Platform

Oct 2015

Study the Relation Between Users’ Sentiment and Location Tags in Twitter with SQL and

Sep 2014

Predict the SSE Index by Bouchard-Sornette option pricing-model

May 2011

Computational study of the phase transition in the Hexagonal Ising Model

Nov 2010

Developing New Global Optimization Algorithm for Material Science with Hadoop

Feb 2013 - Jun 2014

Developing High Efficient Algorithm in Scientific Computation

Nov 2011 - May 2012

Computational Study of the Energy Transport in Nanostructures, Fudan University

Sep 2009 - Sep 2011

Redmond, Washington, United States
9K followers 500+ connections