What is Data Collection? #
Data Collection is the process of gathering raw data from different sources so it can be analyzed and used for decision-making.
Importance of Data Collection #
- Provides the foundation for analysis
- Ensures accurate and reliable results
- Helps in better decision-making
- Improves the quality of models and predictions
Types of Data #
Structured Data: Organized (tables, databases, Excel)
Unstructured Data: Text, images, videos, social media
Semi-Structured Data: JSON, XML
Sources of Data #
- Databases (MySQL, PostgreSQL)
- APIs (weather, social media, etc.)
- Surveys and forms
- Sensors and IoT devices
- Web scraping (websites)
Methods of Data Collection #
- Manual Collection: Surveys, interviews
- Automated Collection: APIs, sensors
- Web Scraping: Extracting data from websites
- Internal Systems: Company databases
Basic Python Example (Simple Data Collection) #
Example: Collect data from an API #
import requests url = "https://api.agify.io?name=alex" response = requests.get(url) data = response.json() print(data)
Output Example:
{‘name’: ‘alex’, ‘age’: 35, ‘count’: 10000}
Example: Collect data from a CSV file
import pandas as pd
data = pd.read_csv("data.csv")
print(data.head())