This image is from here!


Breast cancer is the most common cancer among women. Approximately 30% of women diagnosed with cancer are breast cancer and more than 40,000 women die from breast cancer in the USA. In order to help doctors diagnose breast cancer, machine learning can be applied to predict if patients have breast cancer or not based on their ten real-valued features of the cell nucleus.

The data is from and the attribute information as follows:

Nowadays, Python is being common programming and there are a lot of companies using python to manipulate data. If your company requires you to use one of the database management systems such as MySQL or Oracle but you are not really good at this then you can connect Python and SQL databases.

Python is a very ‘easy to use’ programming language, so we can manipulate the data with Python by connecting SQL databases.

Example with Python

We can connect a database and create a cursor. The cursor is a Temporary Memory or Temporary Work Station. It is Allocated by Database Server at the…

Recently I have had some interviews, and one recruiter gave me feedback. Based on the feedback, I realized that he thought I am not skilled at Python.

I have learned python for almost 2 years from online courses to data science boot camp. I am confident with common Python libraries and basic built-in functions and practice python via Hackerrank and Leetcode every day. I wrote all the skills and packages of python on my resume in my technical skills section.

After the first interview, the recruiter told me I need to practice Python more because I was paused so many…

Microsoft released the first version of Excel for the Macintosh on September 30, 1985. After that Google released the Google Sheet which has the same functions as MS Excel. Most companies and their data analysts get used to using Excel or Google Sheet to manipulate and analyze the data. Almost every data analyst use Excel or Google Sheet every day. For me, Excel or Google Sheet is good for storing the data but not comfortable to clean, mine, and visualize the data.

If you are the same as me, there is a way to connect Google Sheet and Python. Then…

The 2020 election just finished last week. Joe Biden was elected president of the US. First of all, I want to express my admiration for the two candidates. They are the same age as my grandfather, but they have never given up on their dreams and worked hard. In the last few days of the election, the two grandpas must have suffered unimaginable psychological pressure. In the meantime, one question came up from my mind — What influences people’s votes?

Nowadays, it is very common to use big data to affect consumers’ behavior. Some e-commerce companies will collect consumers’ data…

This blog presents all the steps that you need to create a nice GitHub profile. There are several steps that you need to do to create your GitHub profile.

As you can see this is my GitHub profile and you can see this is basically a readme displayed on my homepage. So all you need is to create this readme file in your repo!

Step 1. Create a new repository and name it with your GitHub username.

When you use your username as the repository name, a green area shows up and tells you that this is a special repo. Mine profile already exists, that is why it shows the red words.

Data Visualization is a very important part of the analysis and for the business decision-makers to understand complex datasets. There are so many data visualization tools such as Tableau, Plotly, Qlikview, FusionCharts, etc. Google Data Studio is a free online data visualization tool. There are various demo and templates that can be used and make your work earlier.

Here are two steps to creating your own dashboard that can be shared online.

Step 1- Load the data to Google Data Studio (GDS)

After you log in to GDS, you can see the picture as following image below —

Word Cloud is one of the data visualization tools for text data. One of my projects is to analyze the Amazon review data (the project link)and I applied Natural Language Processing and NLTK toolkits for EDA (Exploratory Data Analysis). In this part, I figured out several ways to present and create a Word Cloud using different methods — Tableau, Python, and Google World Cloud Generator.

Data and Data Preprocessing

I created this dataset myself and this dataset was a sample for my hard-coded chatbot project. If you are interested in my chatbot project please feel free to visit my GitHub repo. Let us take…

The image is from here!

Weight loss applications are getting popular. Those apps basically track your lifestyle- calorie budget, track your food, and exercise. Of course, they will also offer you some plans and suggest some low-calorie food. The more interesting thing is they can forecast when you can reach your goal if you use this app, based on your past lifestyle — occupation, gender, your current weight, your historical disease, etc. There are so many interesting weight loss apps, such as MyFitnessPal, Lose It!, MyFitnessPal, Fitbit, and Noom.

For example, the weight management app Lose it which is built by Microsoft. This is a…

The image is from here!

Nowadays there are a lot of dating apps to help people to find their significant others. Some dating apps are quite popular among young people also there are some dating apps especially for the elderly. The purpose of those apps is to filter potential matches based on the users’ personal preferences, such as height, degree, habit, region, and occupation, etc.

Fraud Detection

For dating apps, there is a potential risk — scam! In order to protect the users and provide them better services, fraud detection is an important part of app development. One part of the job for backend developers (data scientists…

Hua Shi

Data Scientist /Data Analyst /Machine Learning / Data Engineer/ MS in Economics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store