Introduction to Data Science
Data Science Importance is growing day by day. Data Science is a combination of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data.
Section what we will cover in this article:
- Introduction to Data Science
- Data Science Life Cycle
- Project on IPL Data Set
- Project on Covid-19 Data Set
Importance of Data Science
What is Data ?
Data is a collection of facts, such as numbers, words, measurements, observations or even just descriptions of things.
The facts that can be recorded and which have implicit meaning known as ‘data’. and facts could be numbers while data is the plural form of the word datum, which means a single observation or measurement. and it can be attached to facts.
For example, the height of Mt. Everest is a fact, but the height of Mt. Everest on 1st January 2019 is a data point. Zuhi loves pizza is a fact, but Zuhi ate 3 slices of pizza on 1st January 2019 is a data point.
Data can be anything like numbers, images, text, video, audio, and so on.
30 years ago data
Early late 80s and early 90s, data was small and structured. It was stored in a tabular format in relational databases. It was easy to store, process, and analyze. Even to store data we had limited size options like floppy disks (1.44 MB), hard disks (10 MB to 1 GB), and CDs (700 MB).
Storage capacity are small at that time, because of data presency was less. So storage capacity was also small.
Structured data means data which is in tabular format. For example, data in excel sheet or in rows and columns format. Because of there is no presency of social media, so data was small and structured.
Today’s data
Today, data is huge and unstructured. It is stored in databases like Hadoop, MongoDB, and Cassandra. It is difficult to store, process, and analyze. To store data we have options like hard disks (1 TB to 4 TB), pen drives (4 GB to 64 GB), and memory cards (4 GB to 64 GB).
Even we talk about big data, which is a huge amount of data. It is difficult to store, process, and analyze. To store big data we have options like hard disks (1 TB to 4 TB), pen drives (4 GB to 64 GB), and memory cards (4 GB to 64 GB). So we need data science.
Unstructured data means data which is not in tabular format. For example, data in pdf, images, videos, audio, and so on. Because of there is presency of social media, so data is huge and unstructured.
Today quotes is “Data is today’s oil”. Because we have huge data and we are not using it to analyze and make decision in business. This is wastage of data. So we need data science.