View Categories

Semi-Structured Data

What is Semi-Structured Data? #

Semi-Structured Data is data that does not follow a strict table format but still has some organizational structure (like tags or keys).

  • More flexible than structured data
  • Easier to store complex data
  • Widely used in modern applications (APIs, web data)
  • Bridges the gap between structured and unstructured data

Examples of Semi-Structured Data #

  • JSON (JavaScript Object Notation)
  • XML files
  • HTML data from websites
  • NoSQL databases (MongoDB)

Characteristics of Semi-Structured Data #

  • No fixed schema (flexible structure)
  • Uses tags or key-value pairs
  • Can vary in format
  • Easier to process than unstructured data

Common Tools #

  • Python (JSON, XML libraries)
  • MongoDB (NoSQL database)
  • APIs (REST APIs return JSON)
  • Spark / Hadoop

Basic Python Example #

Example: Working with JSON Data #

import json

data = '{"name": "Alex", "age": 25, "city": "New York"}'

# Convert JSON to Python dictionary
parsed_data = json.loads(data)

print(parsed_data["name"])

Example: API Data (JSON)

import requests

response = requests.get("https://api.agify.io?name=alex")
data = response.json()

print(data)

Best Practices #

  • Validate data structure (keys and values)
  • Convert to structured format when needed
  • Handle missing or inconsistent fields
  • Use proper parsing tools

💬
AIRA (AI Research Assistant) Neural Learning Interface • Drag & Resize Enabled
×