Handwriting Databases----Institute of Automation

Handwriting Databases

Nov 01, 2016Author：

PrintText Size A A

1. Introduction

Writer identification has been an active research topic in recent years. The Institute of Automation, Chinese Academy of Sciences (CASIA) will provide the CASIA online Database freely of charge to writer identification researchers in order to promote research. In the CASIA online handwriting Database there are three datasets: Dataset 1 (Chinese database) , Dataset 2 (English database) and Dataset 3 (Chinese and English database).

2. Brief Descriptions of the Database

The CASIA online handwriting database contains 1074 handwritten texts in online format from 188 writers in two sessions.

Each writer writes seven pages of texts include four pages of Chinese texts and three pages of English texts respectively. In the first session, each writer has written same sentence of about 50 Chinese words in one page respectively and different Chinese and English sentence chosen by writers freely about 50 words in two pages respectively.

Dataset 1 (Chinese database) was created on Sept.20, 2007, including 187 persons in first session and second session on Dec.24, 2007. Handwriting data is collected by Wacom Intuos2 tablet. Each writer has written same sentence of about 50 Chinese words in one page respectively and different Chinese sentence chosen by writers freely about 50 words in two pages. In the second session, 111 of 187 persons write one page about 50 Chinese words. Each handwritten text is stored in a separate text file. The naming convention of the files is the writer name. In each writer file, the signature is represented as a sequence of points. The first line stores a single integer which is the total number of points in the writer. Each of the following lines corresponds to one point characterized by features listed in the following order: x-coordinate, y-coordinate, time stamp, button status, azimuth, altitude, and pressure.

Dataset 2 (English database) was created on Sept.20, 2007, including 134 persons. Handwriting data is collected by Wacom Intuos2 tablet. Each writer has written same sentence of about 50 English words in one page respectively and different English sentence chosen by writers freely about 50 words in two pages. Each handwritten text is stored in a separate text file. The naming convention of the files is the writer name. In each writer file, the signature is represented as a sequence of points. The first line stores a single integer which is the total number of points in the writer. Each of the following lines corresponds to one point characterized by features listed in the following order: x-coordinate, y-coordinate, time stamp, button status, azimuth, altitude, and pressure.

Dataset 3 (Chinese and English database) was created on Sept.20, 2007, including 133 persons. We merged Dataset 1 and Dataset 2 to obtain Dataset 3 ( Chinese and English database)

http://biometrics.idealtest.org/dbDetailForUser.do?id=10

Resources

Scientific Database