Semi-Structured Data

What is Semi-Structured Data? #

Semi-Structured Data is data that does not follow a strict table format but still has some organizational structure (like tags or keys).

More flexible than structured data
Easier to store complex data
Widely used in modern applications (APIs, web data)
Bridges the gap between structured and unstructured data

Examples of Semi-Structured Data #

JSON (JavaScript Object Notation)
XML files
HTML data from websites
NoSQL databases (MongoDB)

Characteristics of Semi-Structured Data #

No fixed schema (flexible structure)
Uses tags or key-value pairs
Can vary in format
Easier to process than unstructured data

Common Tools #

Python (JSON, XML libraries)
MongoDB (NoSQL database)
APIs (REST APIs return JSON)
Spark / Hadoop

Basic Python Example #

Example: Working with JSON Data #

import json

data = '{"name": "Alex", "age": 25, "city": "New York"}'

# Convert JSON to Python dictionary
parsed_data = json.loads(data)

print(parsed_data["name"])

Example: API Data (JSON)

import requests

response = requests.get("https://api.agify.io?name=alex")
data = response.json()

print(data)

Best Practices #

Validate data structure (keys and values)
Convert to structured format when needed
Handle missing or inconsistent fields
Use proper parsing tools

Unstructured Data Structured Data

💬

AIRA (AI Research Assistant) Neural Learning Interface • Drag & Resize Enabled

×