# Persian Text Processing Setup

This document explains how to set up the Persian text processing pipeline.

## Prerequisites

1. Python 3.8 or higher
2. Node.js 14 or higher
3. MongoDB running locally or accessible via URI

## Installation

1. Install Python dependencies:

```bash
cd src/AI-PipeLine
pip install -r requirements.txt
```

2. Download Hazm resources:

```bash
python -c "import hazm; hazm.Normalizer(); hazm.Chunker(model='resources/chunker.model')"
```

## How It Works

1. When files are uploaded to the system:
   - Each file is analyzed for Persian language content
   - If Persian content is detected, it's automatically processed using Hazm
   - Processing results are stored in MongoDB

2. Persian processing includes:
   - Text normalization
   - Sentence extraction
   - Phrase chunking
   - Bidirectional text handling

3. Results can be accessed through:
   - File processing history (`processingHistory` array in RawFile documents)
   - Batch processing results (stored in `processed_persian` collection)

## File Processing Flow

1. **File Upload**
   - Files are uploaded to `/api/files/upload` endpoint
   - System analyzes content and detects languages

2. **Automatic Processing**
   - If Persian content is detected, processing starts automatically
   - No manual intervention required

3. **Results**
   - Processing results are included in the upload response
   - Results include:
     - Normalized text
     - Extracted sentences
     - Identified phrases
     - Display-ready text (with proper RTL handling)

## Troubleshooting

1. If Persian processing fails:
   - Check Python dependencies are installed correctly
   - Verify Hazm resources are downloaded
   - Check MongoDB connection
   - Review error logs in the processing history

2. Common issues:
   - Missing Hazm chunker model
   - Incorrect file encoding
   - MongoDB connection issues
   - Python environment issues

## Monitoring

- Check file processing status in MongoDB:

  ```javascript
  db.rawfiles.find({
    "metadata.languages": "fa",
    "processingHistory.stage": "persian_nlp"
  })
  ```

- View processed results:

  ```javascript
  db.processed_persian.find()
  ```

## Support

For any issues or questions:

1. Check the error logs in the processing history
2. Verify all prerequisites are met
3. Ensure all dependencies are installed correctly
