Posts

Image
  Types of data validation This reading describes the purpose, examples, and limitations of six types of data validation. The first five are validation types associated with the data (type, range, constraint, consistency, and structure) and the sixth type focuses on the validation of application code used to accept data from user input.  As a junior data analyst, you might not perform all of these validations. But you could ask if and how the data was validated before you begin working with a dataset. Data validation helps to ensure the integrity of data. It also gives you confidence that the data you are using is clean. The following list outlines six types of data validation and the purpose of each and includes examples and limitations. Purpose : Check that the data matches the data type defined for a field. Example : Data values for school grades 1-12 must be a numeric data type. Limitations : The data value 13 would pass the data type validation but would be an unacceptable value.

Hands-on Activity: Introduction to Kaggle (Google Data Analyst Certificate on Coursera)

Image
  1 . Question 1 Activity overview By now, you’ve learned a lot about different data types and data structures. In this activity, you will work with datasets from Kaggle, an online community of people passionate about data. To start this activity, you’ll create a Kaggle account, set up a profile, and explore Kaggle notebooks. Every data analyst has a data community that they rely on for help, support, and inspiration. Kaggle can help you build your own data community.  Kaggle has millions of users in all stages of their data career, from beginners to data scientists with decades of experience. The Kaggle community brings people together to develop their data analysis skills, share datasets and interactive notebooks, and collaborate on solving real-life data problems.  Check out this brief introductory video to learn more about Kaggle.  By the time you complete this activity, you will be able to use many of Kaggle’s key features. This will enable you to create notebooks and browse data

Transforming data

Image
  Transforming data What is data transformation? A woman presenting data, a hand holding a medal, two people chatting, a ship's wheel being steered, two people high-fiving each other In this reading, you will explore how data is transformed and the differences between wide and long data. Data transformation is the process of changing the data’s format, structure, or values. As a data analyst, there is a good chance you will need to transform data at some point to make it easier for you to analyze it.  Data transformation usually involves: Adding, copying, or replicating data  Deleting fields or records  Standardizing the names of variables Renaming, moving, or combining columns in a database Joining one set of data with another Saving a file in a different format. For example, saving a spreadsheet as a comma separated values (CSV) file. Why transform data? Goals for data transformation might be:  Data organization : better organized data is easier to use Data compatibility : diffe

B​oolean logic example

Image
  B​oolean logic example Imagine you are shopping for shoes, and are considering certain preferences: You will buy the shoes only if they are pink and grey You will buy the shoes if they are entirely pink or entirely grey, or if they are pink and grey You will buy the shoes if they are grey, but not if they have any pink Below are Venn diagrams that illustrate these preferences. AND is the center of the Venn diagram, where two conditions overlap. OR includes either condition. NOT includes only the part of the Venn diagram that doesn't contain the exception. The AND operator Your condition is “If the color of the shoe has any combination of grey and pink, you will buy them.” The Boolean statement would break down the logic of that statement to filter your results by both colors. It would say “IF (Color=”Grey”) AND (Color=”Pink”) then buy them.” The AND operator lets you stack multiple conditions.  Below is a simple truth table that outlines the Boolean logic at work in this stateme