How to Build

1. Clone the Repository

If you don’t already have the project on your local machine, you need to clone the repository from Git. Open a terminal and run:

git clone https://github.com/Scibun/Scimon.git

2. Setup Your Environment

Ensure that Rust and Cargo are installed and added to your system’s PATH. You can check this by running:

rustc --version
cargo --version

If Rust and Cargo are properly installed, these commands should return the version numbers.

3. Navigate to Your Project Directory

Open a terminal and navigate to the root directory of your Rust project. For example:

cd Scimon

If you just cloned the repository, navigate to the newly created directory:

cd Scimon

4. Build the Project

To build the project, simply run:

cargo build

This command compiles your project and places the output binaries in the target/debug directory. You should see output similar to:

Compiling scimon v0.1.0 (/path/to/your/project)
    Finished dev [unoptimized + debuginfo] target(s) in 2.34s

5. Run the Project

To run the project after building it, use:

cargo run

This command builds and runs the project in one step.

6. Building for Release

If you want to build an optimized release version of your project, run:

cargo build --release

This command compiles your project with optimizations and places the output binaries in the target/release directory. The output will be similar to:

Compiling scimon v0.1.0 (/path/to/your/project)
    Finished release [optimized] target(s) in 2m 13s

Additional Commands

  • Clean the Project: To remove the target directory and clean the project, run:
    cargo clean
    
  • Run Tests: To run tests defined in your project, use:
    cargo test
    

Summary

  1. Clone the repository:
    git clone https://github.com/Scibun/Scimon.git
    cd Scimon
    
  2. Setup your environment:
    rustc --version
    cargo --version
    
  3. Navigate to your project directory (if not already there):
    cd Scimon
    
  4. Build the project:
    cargo build
    
  5. Run the project:
    cargo run
    
  6. For a release build:
    cargo build --release
    

By following these steps, you can successfully clone, build, and run your Scimon.

Basic Usage

You can download files using a local or remote list with scimon. Here are the instructions for both methods:

Downloading Files

To download files specified in a local list, use the following command:

scimon -r scimon.mon

Useful Flags for Download List

There are several flags available to customize the download process. Here are some commonly used ones:

Download Without Skipping Any Files

Use the --no-ignore flag to download all files without skipping any:

scimon -r scimon.mon --no-ignore

Skip All Comments

Use the --no-comments flag to skip downloading lines that are comments:

scimon -r scimon.mon --no-comments

Skip README File Rendering

Use the --no-readme flag to skip rendering README files during the download process:

scimon -r scimon.mon --no-readme

By using these flags, you can control how scimon handles different parts of the download list, ensuring a customized download process according to your needs.

Flags

Flags and Options

--url / -u

Description: Specifies the URL to perform scraping on a page.

Usage:

scimon --url http://example.com

--scrape

Description: Selects the scrape mode. When this flag is set, scimon will perform scraping as per the provided URL.

Usage:

scimon --scrape

--run / -r

Description: Executes a list of tasks or runs a specific task defined in the list. You can specify the task file or task name.

Usage:

scimon --run tasks.toml

--no-ignore

Description: Ensures that no PDF files are ignored during the download process.

Usage:

scimon --no-ignore

Description: Disables the processing of the !open_link directive. This is useful if you do not want scimon to handle external links specified in the list.

Usage:

scimon --no-open_link

--no-readme

Description: Disables the rendering of README files during the download process. This can be useful to skip the processing of README instructions.

Usage:

scimon --no-readme

--options

Description: Specifies additional settings for scimon. You can provide a configuration file or specific options as a string.

Usage:

scimon --options settings.toml

By using these flags, you can customize the behavior of scimon to suit your specific needs, ensuring a more tailored and efficient download and scraping process.

Scrape

Scrape Option

The --scrape option in Scimon enables scraping functionality, allowing you to extract information or data from a specified URL.

Usage:

To initiate a scraping operation on a specific URL, you use the --url flag followed by the URL you want to scrape, along with the --scrape flag.

Example Command:

scimon --url https://scibun.com --scrape

In this command:

  • https://scibun.com should be replaced with the URL you want to perform the scraping on.

Functionality:

When you execute the command with the --url and --scrape flags, Scimon will visit the specified URL and extract relevant information based on predefined scraping rules or patterns.

Result:

The result of the scraping operation depends on the specific implementation and configuration of Scimon. It could involve extracting text, images, links, or other structured data from the webpage.

Additional Notes:

  • Scimon provides flexibility in configuring scraping rules and patterns to suit your specific requirements.
  • Ensure that you have proper permissions or authorization to scrape the content of the specified URL to comply with legal and ethical guidelines.

By utilizing the --scrape option in Scimon, you can automate the process of extracting valuable information from webpages, streamlining tasks such as data collection, content aggregation, and web monitoring.

Providers

Providers Compatible with Scimon

Scimon is compatible with various providers, enabling it to access and retrieve data from different sources. These providers include:

  • Arxiv.org: A popular repository for research papers in various fields of science.
  • GitHub: A widely used platform for version control and collaboration on software projects. (Depreceated)
  • NASA: The National Aeronautics and Space Administration, providing information and resources related to space exploration and aeronautics.
  • SciELO: A digital library covering a wide range of scientific journals from Latin America, Spain, Portugal, and South Africa.
  • Sci-Hub (Experimental): An online repository of academic articles and papers, often used to access paywalled content. (Depreceated)
  • Wikipedia: A free online encyclopedia covering a vast array of topics in multiple languages.
  • Wikisource: A Wikimedia project that hosts free-content textbooks, source texts, and other material.

These providers offer diverse content and resources, ranging from research papers and academic articles to general knowledge and reference materials. Scimon leverages its compatibility with these providers to access and utilize information from various sources, enhancing its functionality and versatility for users.

Monset

What is Monset?

Monset is a language designed specifically for downloading files. It offers a streamlined syntax that makes the process of retrieving files from the internet straightforward and efficient. By focusing on simplicity, Monset ensures that users can quickly grasp its fundamentals and start downloading files with minimal effort.

The key strength of Monset lies in its user-friendly design. The syntax is intuitive, reducing the learning curve typically associated with programming languages. This makes it accessible to both beginners and experienced developers, allowing them to integrate file downloading capabilities into their projects seamlessly. Monset abstracts the complexities involved in file transfers, providing a clear and concise way to handle downloads.

Summary

Download’s Block

URL List

You can specify multiple URLs for downloading files. Each URL should be placed on a new line. Optionally, you can append !ignore to a URL to indicate that it should be skipped during the download process.

Example Usage:

downloads {
    https://example.com/file1.pdf !ignore
    https://example.com/file2.pdf
    https://example.com/file3.pdf !ignore
    https://example.com/file4.pdf
    https://example.com/file5.pdf !ignore
    https://example.com/file6.pdf
}

In this example:

  • https://example.com/file1.pdf will be skipped because it is followed by !ignore.
  • https://example.com/file2.pdf will be downloaded.
  • https://example.com/file3.pdf will be skipped because it is followed by !ignore.
  • https://example.com/file4.pdf will be downloaded.
  • https://example.com/file5.pdf will be skipped because it is followed by !ignore.
  • https://example.com/file6.pdf will be downloaded.

Path Configuration

You can specify the directory where the downloaded files should be stored by setting the path variable. This ensures that all files are saved in the specified folder in your file system.

Example Usage:

path "path/to/folder"

In this example:

  • All downloaded files will be stored in the directory path/to/folder.

Ignoring Specific URLs

The !ignore macro allows you to skip specific URLs in your download list. This is useful if you have certain files that you do not want to download during a particular operation.

Example Usage:

https://example.com/file1.pdf !ignore

In this example:

  • The URL https://example.com/file1.pdf will be omitted from the download process because it is followed by the !ignore directive.

Summary

  1. Download URLs: List URLs line by line. Append !ignore to skip specific URLs.

    downloads {
        https://example.com/file1.pdf !ignore
        https://example.com/file2.pdf
    }
    
  2. Set Download Directory: Define where the files should be saved using the path variable.

    path "path/to/folder"
    
  3. Skip Specific URLs: Use !ignore to bypass certain URLs.

    https://example.com/file1.pdf !ignore
    

By following these instructions, you can efficiently manage your download list, specify storage directories, and selectively ignore certain files.

Readme Block

Variable

The readme{} block in the list file allows for the direct rendering of Markdown content within the list file itself.

Fetching Content:

The URL specified in the readme variable is accessed to retrieve the Markdown content.

Converting to Text:

The retrieved content is then converted to text format. This text content is assumed to be in Markdown format.

Rendering Markdown:

The Markdown content retrieved from the URL is rendered directly within the list file. This means that you can include Markdown snippets within the list file, and Paimon will automatically render them during processing.

Example Usage:

readme "http://example.com/readme.md"

In this example:

  • The readme variable is assigned the URL "http://example.com/readme.md".
  • Scimon will fetch the content from the specified URL and process it as described above.

Block

The Markdown content retrieved from the URL is rendered directly within the list file. This means that you can include Markdown snippets within the list file, and Paimon will automatically render them during processing.

Example Usage:

readme {
    # My Project

    This is an example of how you can use the `readme{}` block to include Markdown content directly in the Paimon list file.

    ## Example Section

    Here's an example of Python code:

    ```python
    def hello_world():
        print("Hello, world!")
    ```

    ![Example Image](https://example.com/image.png)
}

In the above example:

  • The content inside the readme{} block is treated as Markdown.
  • It can include titles, paragraphs, code, images, and other Markdown-supported elements.
  • During the processing of the list file, Scimon will render this Markdown content within the context of the list file.

This provides a convenient way to include documentation, code examples, images, and other elements directly within the list file, keeping everything in one place and making it easy to maintain and share the content.

Commands Block

This feature is Experimental

Command Usage Documentation

Purpose

The provided command index.py is used to perform a specific action or operation. In this case, it appears to be referencing a Python script named index.py.

Usage

To use this command, follow the syntax:

commands {
    index.py
}

Replace index.py with the actual name of the Python script you want to execute.

Example

Suppose you have a Python script named my_script.py and you want to execute it using this command. Your configuration file would look like this:

commands {
    my_script.py
}

Scripts files locations in Operations Systems:

SystemLocation
Linuxhome/<YOUR_USERNAME>/.config/scimon/scripts/
MacOS/Users/<YOUR_USERNAME>/Library/Application Support/scimon/scripts/
WindowsC:\Users\<YOUR_USERNAME>\AppData\Roaming\scimon\scripts\

Notes

  • Ensure that the Python script file (index.py in this case) exists in the current directory or provide the full path to the script.
  • Make sure you have Python installed on your system and it is accessible from the command line.
  • Only Python and JavaScript are supported.

Compress folder

To compress a folder, you can use the variable compress and assign the compressed file name as its value. The folder will be compressed in the root directory of the project.

compress "downloads.zip"

Open links

Open Variable

The open variable specifies a URL that Scimon will open in a web browser after processing the list file. This URL is typically used to provide additional information or resources related to the processed task or project.

Opening URLs:

After Scimon completes processing the list file, it automatically opens the URL specified in the open variable in a web browser.

Usage:

The open variable is useful for directing users to relevant websites, documentation, or resources associated with the tasks or projects being processed.

Example Usage:

open "https://example.com"

In this example:

  • The open variable is assigned the URL "https://scibun.com".
  • After processing the list file, Scimon will open this URL in a web browser, allowing users to access the GitHub repository associated with the task or project.

By utilizing the open variable, you can seamlessly provide users with additional information and resources to enhance their understanding and engagement with the processed tasks or projects.

Markdown render

PrimeDown is a markdown rendering engine that enhances HTML content generated by the default Rust crate by injecting JavaScript plugins.

Features

PrimeDown now supports:

  • Headers (h1 to h6)
  • Links
  • Bold, Italic, Strikethrough
  • Images
  • Tables
  • Blockquotes
  • Task lists
  • Unordered and Ordered Lists
  • Inline and Block Code (with syntax highlighting)
  • MathJax formulas
  • Mermaid diagrams
  • HTML tags

Extra Features

  • DocsSources
  • Citations (References)

Learn more about how to use Extras features here.

File Formats Supported by DocsSources:

  • .pdf
  • .doc, .docx
  • .epub, .mobui
  • .rst, .yml, .yaml, .toml, .json
  • .7z, .zip, .rar, .tar, .tar.gz, .gz
  • .bin, .img

Third-party Libraries Loaded on README File Rendered:

Markdown flags alerts (!note, !important, !warning…) are not supported yet.

Standard Directory where README.html Files are Generated

SystemLocation
Linuxhome/<YOUR_USERNAME>/.config/scimon/readme
MacOS/Users/<YOUR_USERNAME>/Library/Application Support/scimon/readme
WindowsC:\Users\<YOUR_USERNAME>\AppData\Roaming\scimon\readme

Style

Simply use the style variable followed by the valid URL of the CSS file, and this will apply the defined style to the document:

style "https://example.com/path/to/custom_style.css"

With this configuration, the CSS file at the specified URL will be used to style the generated PDF. Ensure the URL is accessible and points to a valid CSS file.

Prints

The print variable is used to display messages to the user. It is useful if you want to show a message to the user who is downloading the list.

print "Hello, World!"

Covers

To extract covers, use the following derivative:

covers "path/to/covers"

How It Works

The covers derivative allows you to specify a directory or path where the covers of the files should be extracted and stored. Simply assign the desired path to the derivative.

Configs

Scimon.yml file

This configuration file is utilized to set up the tool. Please utilize this default version if any alterations you make result in issues.

general:
  default_text_editor: 'notepad' # String (default: 'notepad')
  urlfilter_open: false # Boolean (valid values: 'true' or 'false'; default: 'false')

ui:
  show_header: true # Boolean (valid values: 'true' or 'false'; default: 'true')

render_markdown:
  output_path: '{app_path}' # String (default: '{app_path}')
  overwrite: true # Boolean (valid values: 'true' or 'false'; default: 'true')
  minify_html: true # Boolean (valid values: 'true' or 'false'; default: 'true')

Save this file at the following location:

SystemLocation
Linuxhome/<YOUR_USERNAME>/.config/scimon/scimon.yml
MacOS/Users/<YOUR_USERNAME>/Library/Application Support/scimon/scimon.yml
WindowsC:\Users\<YOUR_USERNAME>\AppData\Roaming\scimon\scimon.yml

.env file

Open .env file

To open .env file, simply use run:

scimon --options open-env

.env file locations in Operations Systems:

SystemLocation
Linuxhome/<YOUR_USERNAME>/.config/scimon/.env
MacOS/Users/<YOUR_USERNAME>/Library/Application Support/scimon/.env
WindowsC:\Users\<YOUR_USERNAME>\AppData\Roaming\scimon\.env

Environments of system

NameDescription
SCIMON_API_KEYYour API key for access Scimon (Under in development)

External Resources Usage

This library accesses the following external resources:

Scibun:

This API provides additional functionality or data support for the library.

Wikipedia and Wikisource:

This domain provides download of wiki pages and documents.

GitHub:

This domain is necessary to download the configuration files.