Skip to content

Frontend Developer Documentation

Introduction

The Web Scraper Application is a React-based web interface that allows users to extract data from websites using various scraping methods, view and download the results, and manage their scraping history. The frontend communicates with a Flask-based backend API to perform scraping operations, manage user authentication, and store scraping history.

πŸ› οΈ Project Setup

Prerequisites

Before you begin, ensure you have the following installed:

  • Node.js (v16.x or higher)
  • npm (v8.x or higher) or yarn (v1.22.x or higher)

Installation

  1. Clone the repository:
git clone https://github.com/AliRasikh/data-scraping-application.git
cd data-scraping-application/frontend
  1. Install dependencies:
npm install
# or
yarn install
  1. Create a .env file in the root of the frontend directory with the following content:
VITE_API_BASE_URL=http://localhost:5000
πŸ”Ή Replace the URL with your backend server URL if it's different.

### πŸš€Running the Development Server

To start the development server:

```bash
npm run dev
# or
yarn dev

This will start the Vite development server, typically on http://localhost:5173. The application will automatically reload if you make changes to the source files.

πŸ“¦ Building for Production

To create a production build:

```bash
npm run build
# or
yarn build

This will generate optimized files in the dist directory. You can preview the production build locally with:

npm run preview
# or
yarn preview

Project Structure

The frontend codebase is organized as follows:


frontend/ β”œβ”€β”€ public/ # Static assets that don't need processing β”œβ”€β”€ src/ # Source code β”‚ β”œβ”€β”€ api/ # API communication β”‚ β”œβ”€β”€ api/ # API communication β”‚ β”œβ”€β”€ components/ # Reusable UI components β”‚ β”œβ”€β”€ const/ # Constants, types and utils file β”‚ β”œβ”€β”€ pages/ # Page components β”‚ β”œβ”€β”€ routes/ # Routing configuration β”‚ β”œβ”€β”€ App.tsx # Main application component β”‚ └── main.tsx # Application entry point β”œβ”€β”€ .env # Environment variables β”œβ”€β”€ index.html # HTML template β”œβ”€β”€ package.json # Project dependencies and scripts β”œβ”€β”€ tsconfig.json # TypeScript configuration β”œβ”€β”€ vite.config.ts # Vite configuration └── tailwind.config.js # Tailwind CSS configuration


Technology Stack

Step Description
1️⃣ Login Process400 User submits credentials to /login API.
2️⃣ Token Storage JWT token is stored in localStorage.
3️⃣ Authenticated Requests All protected API requests include Authorization: Bearer .
4️⃣ Logout Process Token is removed from storage.

The frontend is built with the following technologies:

  • React 19: UI library for building component-based interfaces
  • TypeScript: For type safety and enhanced developer experience
  • Vite: Fast, modern frontend build tool
  • React Router v7: For client-side routing
  • Axios: For HTTP requests to the backend API
  • Tailwind CSS: For styling and responsive design
  • Prettier: Code formatting tool for maintaining consistent code style across the project

Application Architecture

Routing

The application uses React Router v7 for handling client-side routing. The routes are defined in src/routes/AppRoutes.tsx:

```typescript
// src/routes/AppRoutes.tsx
import { Routes, Route } from "react-router-dom";
import ScrapePage from "../pages/ScrapePage";
import Login from "../pages/Login";
import SignUpPage from "../pages/SignUpPage";
import HistoryPage from "../pages/HistoryPage";

const AppRoutes = () => {
  return (
    <Routes>
      <Route path="/" element={<ScrapePage />} />
      <Route path="/login" element={<Login />} />
      <Route path="/signup" element={<SignUpPage />} />
      <Route path="/history" element={<HistoryPage />} />
    </Routes>
  );
};

export default AppRoutes;

For persistent state across sessions:

Authentication state is stored in localStorage to persist across page refreshes and browser sessions:

  • isAuthenticated: Boolean flag indicating if a user is logged in
  • authToken: JWT token used for authenticated API requests

Example of state management in components:

```typescript
// Local component state with useState
const [urlInput, setUrlInput] = useState<string | undefined>(undefined);
const [isLoading, setIsLoading] = useState<boolean>(false);
const [error, setError] = useState<string | undefined>(undefined);

// Side effects with useEffect
useEffect(() => {
  // Check authentication on component mount
  const isAuthenticated = localStorage.getItem("isAuthenticated") === "true";
  if (!isAuthenticated) {
    navigate("/login");
    return;
  }

  // Fetch data from API
  fetchData();

  // Cleanup function (runs on component unmount)
  return () => {
    // Cleanup operations if needed
  };
}, [navigate]); // Dependencies array

API Communication

The application communicates with the backend API using Axios. The API integration is configured in `src/api/axios.ts` and `src/api/globalvariables.ts`.

#### Base URL Configuration

```typescript
```typescript
// src/api/globalvariables.ts
export const BASE_URL = import.meta.env.VITE_API_BASE_URL;
#### API Requests

```typescript
```typescript
// src/api/axios.ts
import axios from "axios";

export const sendAxiosRequest = async (url: string, data: object) => {
  try {
    const response = await axios.post(url, data, {
      headers: {
        "Content-Type": "application/json",
      },
    });
    return response.data;
  } catch (error) {
    console.error("Error:", error);
    throw error;
  }
};

// More utility functions for file handling...

## Components Overview

### Pages

#### ScrapePage.tsx

The main page where users can input a URL, select a scraping method, and execute scraping operations.

**Key Features:**

- URL input with validation
- Scraping method selection (Requests, BeautifulSoup, Selenium)
- Optional data cleaning and company name input for Selenium
- Loading state during scraping
- Preview and download functionality for scraped content

**Key State Variables:**

```typescript
const [urlInput, setUrlInput] = useState<string | undefined>(undefined);
const [selectedOption, setSelectedOption] = useState<RadioOption>("requests");
const [cleanData, setCleanData] = useState<boolean>(false);
const [companyName, setCompanyName] = useState<string>("");
const [scrapedPage, setScrapedPage] = useState<string | null>(null);
const [isLoading, setIsLoading] = useState<boolean>(false);

HistoryPage.tsx

Displays the user's scraping history with color-coded visualization of different scraping methods.

Key Features:

  • Authentication check to redirect unauthenticated users
  • Fetching and displaying scraping history from the backend
  • Color-coded labels for different scraping methods
  • Preview of scraped content
  • Download functionality for previously scraped content

Implementation:

// Authentication check
useEffect(() => {
  const isAuthenticated = localStorage.getItem("isAuthenticated") === "true";
  if (!isAuthenticated) {
    navigate("/login");
    return;
  }
  fetchHistory();
}, [navigate]);

// Fetching history
const fetchHistory = async () => {
  try {
    // API call to fetch history
    // Processing and displaying results
  } catch (err) {
    // Error handling
  }
};

Login.tsx

Handles user authentication with email and password.

Key Features:

  • Email and password validation
  • Error handling for authentication failures
  • JWT token storage in localStorage
  • Redirection after successful login

Implementation:

const signInWithEmail = async () => {
  // Validation
  // API call to backend authentication endpoint
  // Store token and authentication status
  // Redirect to main page
};
#### SignUpPage.tsx

Manages new user registration with form validation.

**Key Features:**

- Form validation for username, email, and password
- Password confirmation
- Error handling for registration failures
- Success feedback and redirection

### Reusable Components

#### Navbar.tsx

Navigation component that appears on all pages, with conditional rendering based on authentication status.

**Implementation:**

```typescript
// Check authentication status
useEffect(() => {
  const checkAuth = () => {
    const authStatus = localStorage.getItem("isAuthenticated");
    setIsAuthenticated(authStatus === "true");
  };
  // Add event listener for auth changes
  // ...
}, []);

// Render different links based on auth status
return (
  <nav>
    {!isAuthenticated ? (
      // Links for non-authenticated users
    ) : (
      // Links for authenticated users
    )}
  </nav>
);
#### LogoutButton.tsx

Handles user logout by clearing authentication state.

**Implementation:**

```typescript
const handleLogout = () => {
  localStorage.removeItem("authToken");
  localStorage.removeItem("isAuthenticated");
  delete axios.defaults.headers.common["Authorization"];
  navigate("/login");
};

RadioButtonsExample.tsx

Provides a customizable radio button group for selecting scraping methods, with additional input fields based on selection.

Props:

type RadioButtonsProps = {
  setter: Dispatch<SetStateAction<RadioOption>>;
  getter: string;
  cleanData: boolean;
  setCleanData: Dispatch<SetStateAction<boolean>>;
  companyName: string;
  setCompanyName: Dispatch<SetStateAction<string>>;
};

Authentication Flow

The application uses JWT-based authentication:

  1. Login Process:

  2. User submits email and password to /login endpoint

  3. Backend validates credentials and returns a JWT token
  4. Frontend stores token in localStorage and sets isAuthenticated flag
  5. Axios is configured to include the token in subsequent requests

  6. Authentication Check:

  7. Protected pages (like History) verify authentication status on load

  8. If not authenticated, redirect to login page

  9. Logout Process:

  10. Remove token and authentication flag from localStorage
  11. Clear authorization headers from Axios
  12. Redirect to login page

Example Authentication Check:

useEffect(() => {
  const isAuthenticated = localStorage.getItem("isAuthenticated") === "true";
  if (!isAuthenticated) {
    navigate("/login");
  }
}, [navigate]);
**Example API Call with Authentication:**

```typescript
const fetchData = async () => {
  const token = localStorage.getItem("authToken");
  if (!token) throw new Error("No authentication token found");

  const response = await axios({
    method: "get",
    url: `${BASE_URL}/endpoint`,
    headers: {
      Authorization: `Bearer ${token}`,
      "Content-Type": "application/json",
    },
  });

  // Process response...
};
## Scraping Functionality

The application offers three scraping methods, each with different capabilities:

1. **Requests**:

   - Simple HTTP requests to retrieve static content
   - Fastest method but limited to static websites

2. **Beautiful Soup (BS4)**:

   - HTML parsing and content extraction
   - Option to clean and format the data
   - Better for more complex static websites

3. **Selenium**:
   - Browser automation for dynamic websites
   - Can interact with JavaScript-driven content
   - Requires a company name for search functionality
   - Slowest but most powerful method

**Scraping Process:**

1. User enters a URL and selects a scraping method
2. Additional options are configured based on the selected method
3. The scraping request is sent to the backend
4. Results are displayed with options to preview and download

**Implementation:**

```typescript
const handleScrape = async () => {
  // Validation
  try {
    setIsLoading(true);

    // Prepare request payload based on selected method
    const payload = {
      url: urlInput,
      scraping_method: selectedOption,
      clean_data: cleanData,
      company_name: selectedOption === "selenium" ? companyName : undefined,
    };

    // Send request to backend
    const response = await axios.post(`${BASE_URL}/scrape`, payload, {
      headers: { Authorization: `Bearer ${token}` },
    });

    // Process successful response
    setScrapedPage(response.data.scrape_result);
  } catch (err) {
    // Error handling
  } finally {
    setIsLoading(false);
  }
};

Code Conventions

The project follows these coding conventions:

  1. TypeScript Usage:

  2. Define interfaces for all props and state

  3. Use type assertions when necessary
  4. Leverage TypeScript's type checking to prevent runtime errors

  5. Component Organization:

  6. Pages are stored in src/pages/

  7. Reusable components are in src/components/
  8. API integration is in src/api/
  9. Types and constants are in src/const/

  10. Naming Conventions:

  11. PascalCase for component names and types

  12. camelCase for variables, functions, and props
  13. File names match the component name (e.g., ScrapePage.tsx)

  14. Error Handling:

  15. Use try/catch blocks for API calls

  16. Set error state for UI feedback
  17. Log detailed errors to console for debugging

  18. Styling:

  19. Use Tailwind CSS utility classes for styling
  20. Consistent color scheme with blue as the primary color
  21. Responsive design using Tailwind's breakpoint utilities

Troubleshooting

Common Issues and Solutions

  1. Backend Connection Issues:

  2. Check that the backend server is running

  3. Verify the VITE_API_BASE_URL in your .env file is correct
  4. Check browser console for CORS errors

  5. Authentication Problems:

  6. Check that the token is being stored correctly in localStorage

  7. Verify token format in API requests (should be Bearer <token>)
  8. Check token expiration (backend validates token lifetime)

  9. Build Issues:

  10. Run npm clean-install or yarn install --force to reset dependencies

  11. Check TypeScript errors in the console
  12. Verify that all required environment variables are set

  13. Styling Issues:

  14. Make sure Tailwind CSS is properly configured
  15. Check for conflicting class names
  16. Use browser developer tools to inspect element styles

Development Tips

  1. Adding New Routes:

  2. Create a new page component in src/pages/

  3. Add the route in src/routes/AppRoutes.tsx
  4. Add navigation links in Navbar.tsx if needed

  5. Adding New API Endpoints:

  6. Update or add functions in src/api/axios.ts

  7. Use the existing pattern for API calls
  8. Handle authentication headers for protected endpoints

  9. Component Development:

  10. Start with defining the props interface
  11. Implement the component with appropriate state
  12. Add error handling for any asynchronous operations
  13. Test the component in isolation before integration