Laigo GmbH

Model-Background

Introducing our smartDocSplit, which automates the task of splitting the documents between lot of joined pages within a big document, providing users with a more efficient and accurate way to process documents. Our model is capable of handling any document type, and provides extracted documents with their exact location where they begin and end. This means that you can save time and reduce the risk of errors that can occur with manual processing.

By utilizing our AI solution, you can experience faster and more accurate processing of documents, freeing up valuable time and resources that can be used elsewhere. With our model, we offer a valuable tool for anyone who regularly works with document processing, providing a more streamlined and effective solution for the Page Stream Segmentation problem. Try our smartDocSplit solution for this problem today and see how it can help improve your document processing efficiency and accuracy.

Limitations

Max File Size is 8 MB
Max requests per second: 1
Processing Time: Up to the number of characters in an image

Licenses

All rights reserved by Laigo

API Call

Requests to the smartDocSplit API must be authenticated. Requests are authenticated by providing an “Authorization” header with an access token. Access tokens are obtained by calling the endpoint shown below and then the required secrets are available in the API tab for your task.

				
					https://account.laigo.io/realms/SmartTools/protocol/openid-connect/token 
				
			

If you are not sure how to get the access token please visit “Make your first request” page to get that information.

Upload a file

POST

				
					https://use.laigo.io/api/api/FileUpload/v1/Upload/smartDocSplit
				
			
Parameters Description
accessToken
A JWT issued to your application by the Laigo identity provider.
file
The file for uploading
email
If you want to sent the result to your email, you can define the destination email with this parameter

Status codes:

Status code Message
200
OK
201
Created – Upload was successfully
202
Accepted
400
Bad Request
401
Unauthorized
404
Not Found

Example Response

The result is the jobid, you can use later for getting the result.

"754a9eff-2633-47d6-ab10-f6b18da1da4f"

Get a result as JSON

If you want to have to request the extracted text in a JSON, you can request the following API route.

GET

				
					https://use.laigo.io/api/Result/v1/JobResults/<jobid>/json
				
			
Parameters Description
accessToken
A JWT issued to your application by the Laigo identity provider.
fileName
The path to the image file to use as input.

Example Response:

				
					{ 
    "files":  [
        { 
            "inputFileName": "test.pdf",
            "size": 59472, 
            "pageCount": 1, 
            "content": "Hello World!"
        } 
    ]
}
				
			

Get a result as ZIP
If you want to get the result as files (searchable PDF + Textfile), you can request the following API route.

GET

				
					https://use.laigo.io/api/Result/v1/JobResults/<jobid>/zip
				
			
Parameters Description
accessToken
A JWT issued to your application by the Laigo identity provider.
jobID
This is the jobid you get as response after uploading a file.

Example Response (ZIP):
filebytes

Status code Message
200
OK
202
Accepted
400
Bad Request
401
Unauthorized
404
Not Found

Error Messages

While using smartDocSplit API there could be some error messages, that are done for file validation checking:

Error Message Description
Result not available any more.
Outcome information no longer accessible.
Result not yet available, please try again later.
Outcome data currently unavailable, kindly retry at a later time.
No valid input formats chosen for this SmartTool
No suitable input formats selected for this SmartTool.
Model does not support Language Hint
Language hint feature not supported by the model.
You need to accept the newsletter in order to use this API
Newsletter acceptance required for API usage.
You ran out of Laigos...
You've depleted your Laigos supply.
Please send the correct input file format!
The file format is not (PNG,JPG) or (PDF)
Please send a file that doesn’t exceed the size limit!
The file size is more than the allowed limit (number)
Please make sure that your file doesn’t contain any suspicious malware data!
The file contains suspicious data, then security check system warns you

Example (step by step)

Shortly here it is explained how the smartDocSplit can solve your problem. On the left side you have one big pdf with joined documents inside, where you have no clue of how many documents you have, where do they start and where do they end. By only sending this document to our smartDocSplit, it can solve you the whole process by extracting the documents that are inside in a fully automated process which uses AI.

1.First you need to have a document sample that you want to use with smartDocSplit. It can be any document that can have more joined pages inside. You can send the pdf to our API and get back the extracted documents based on where do they start and end, and also their type of document.

2.Inorder to use our smartDocSplit tool you need to have an API key which can be generated from here

If you don’t have an account, first you need to set up your account then you can generate the API key. Create an account here.

3. After you have generated your API key, then you can use smartDocSplit in any popular programming language, where you can find the code snippets here.

4. Run your code. You will receive a JSON response with the invoice details. See expected responses here.

Credits (Laigo)

The calcualtion of processing a page:

Page Laigos
1
1

EU-AI Act Reports

TECHNICAL DOCUMENTATION referred to in Article 11(1)

1. A general description of the AI system including:

(a) its intended purpose,

The AI system smartDocSplit is splitting joined documents in English and German, regarding their start and ending of document and also the type of document. The smartDocSplit is a general solution and not limited to any specific document type, it can be any document such as: invoice, receipt, letter, email, contract etc.

the person/s developing the system the date and the version of the system;

The AI system smartDocSplit was developed at

Laigo GmbH
Eckenerstr. 65
88046 Friedrichshafen

and

Laigo DOO
Cevahir Sky City, 16B
Municipality Aerodrom, 1000 Skopje
North Macedonia

Releasing Version 1.0 of the AI system, on the 20.12.2022.

b) how the AI system interacts or can be used to interact with hardware or software that is not part of the AI system itself, where applicable;

The AI system is integrated on the webpage https://laigo.ai/ and can be called by using API. It is served on Laigo internal servers.

(c) the versions of relevant software or firmware and any requirement related to version update;

[…]

(d) the description of all forms in which the AI system is placed on the market or put into service;

The AI system is integrated on the webpage https://laigo.ai/ and can be called by using API. It is served on Laigo internal servers.

(e) the description of hardware on which the AI system is intended to run;

The AI system smartDocSplit is running on Laigo internal server, located in Germany.

(f) where the AI system is a component of products, photographs or illustrations showing external features, marking and internal layout of those products;

The AI system smartDocSplit is deployed on Laigos internal server, located in Germany. It is part of a Software as a Service product. 

(g) instructions of use for the user and, where applicable installation instructions;

All instructions of the product usage are mentioned on Laigos homepage https://laigo.ai/docs.

2. A detailed description of the elements of the AI system and of the process for its development, including:

(a) the methods and steps performed for the development of the AI system, including, where relevant, recourse to pre-trained systems or tools provided by third parties and how these have been used, integrated or modified by the provider;

The AI system smartDocSplit is based on the Open Source Model Donut( Documenunderstanding transformer) from Clova AI Research. At Laigo, we fine-tuned with our internal and public data, with documents in English and German language. The document types that fine-tuned with include: invoice, receipts, letters and contracts. After the fine tuning process, we also included a rule based approach, which for we use to make the final splitting on the documents.

(b) the design specifications of the system,

No modifications has been done to the original neural network architecture. Only the output classes are changed.

namely the general logic of the AI system and of the algorithms;

Algorithm and approach can be found here: https://arxiv.org/abs/1912.13318, 20.12.2022.

the key design choices including the rationale and assumptions made, also with regard to persons or groups of persons on which the system is intended to be used;

No design choices or modifications are done from the existing system by Laigo GmbH.

the main classification choices;

Laigo GmbH classifies this AI system with “no risk”, since none of the EU requirements are applicable for this use case.

what the system is designed to optimise for and the relevance of the different parameters;

The AI system is designed to do page stream segmentation from documents automatically with some  rule based approach. With this technology and approach a higher accuracy can be reached compared to classical rule based approaches.

[…]

(c) the description of the system architecture explaining how software components build on or feed into each other and integrate into the overall processing; the computational resources used to develop, train, test and validate the AI system;

The AI system “Donut” by Clova AI Research has been served by using model serving API. Training, validation and testing of the AI model are done by Clova AI Research.

[…]

[…]

[…]

[…]

3. Detailed information about the monitoring, functioning and control of the AI system, in particular with regard to:

its capabilities and limitations in performance, including the degrees of accuracy for specific persons or groups of persons on which the system is intended to be used and the overall expected level of accuracy in relation to its intended purpose;

Especially, if the quality of the image and the resolution is bad, text can be missed or misinterpreted in a wrong language. This appears also, if the orientation of the picture/PDF is not aligned to the orientation of the text. With this, it is possible that documents are classified not properly.

The overall accuracy is independent of specific persons or groups of persons which the system is intended to be used.

the foreseeable unintended outcomes and sources of risks to health and safety, fundamental rights and discrimination in view of the intended purpose of the AI system;

No risk to health and safety expected. Fundamental rights are respected and no discrimination of certain individuals considered.

[…]

[…]

5. A description of any change made to the system through its lifecycle;

As the AI system has been released with the first version, no changes appeared so far. 

[…]

[…]

[…]

EU DECLARATION OF CONFORMITY

1. AI system name and type and any additional unambiguous reference allowing identification and traceability of the AI system;

The AI system is called “smartDocSplit” and is based on Clova AI Research open source model “Donut”. 

Information about Donut can be found in the research paper.
Changes, adaptions and interaction with Donut in the context of “smartDocSplit” can be found on https://laigo.ai/docs.

2. Name and address of the provider or, where applicable, their authorised representative;

Laigo GmbH .
Eckenerstr. 65
88046 Friedrichshafen

Representative: Yvonne Gaissmaier

[…]

[…]

[…]

[…]

7. Place and date of issue of the declaration, name and function of the person who signed it as well as an indication for, and on behalf of whom, that person signed, signature.

Friedrichshafen, 20.12.2022 Yvonne Gaissmaier 

INFORMATION TO BE SUBMITTED UPON THE REGISTRATION OF HIGH-RISK AI SYSTEMS IN ACCORDANCE WITH ARTICLE 51

1. Name, address and contact details of the provider;

Laigo GmbH
Eckenerstr. 65
88046 Friedrichshafen

[…]

3. Name, address and contact details of the authorised representative, where applicable;

Yvonne Gaissmaier
Eckenerstr. 65
88046 Friedrichshafen

4. AI system trade name and any additional unambiguous reference allowing identification and traceability of the AI system;

The AI system trade name is “smartDocSplit”.

5. Description of the intended purpose of the AI system;

The AI system smartDocSplit does page stream segmentation, which comes as a general solution for any document type and can solve this by using AI and rule based approach combined in a very efficient way.

6. Status of the AI system (on the market, or in service; no longer placed on the market/in service, recalled);

On the market

[…]

[…]

9. Member States in which the AI system is or has been placed on the market, put into service or made available in the Union;

The AI system is placed in Friedrichshafen, Germany.

[…]

11. Electronic instructions for use; this information shall not be provided for high-risk AI systems in the areas of law enforcement and migration, asylum and border control management referred to in Annex III, points 1, 6 and 7.

Any instructions for the usage of smartDocSplit can be found in https://laigo.ai/docs.

12. URL for additional information (optional).

https://laigo.ai/docs

WHAT’S NEXT:

Table of Contents

Let’s go!

[miniorange_social_login]
[miniorange_social_login]

Let’s go!

[miniorange_social_login]

Let’s go!

Let’s go!

[miniorange_social_login]