Search
j0ke.net Open Build Service
>
Projects
>
server:search
>
libtextcat
> libtextcat.spec
Sign Up
|
Log In
Username
Password
Cancel
Overview
Repositories
Revisions
Requests
Users
Advanced
Attributes
Meta
File libtextcat.spec of Package libtextcat
# # spec file for package libtextcat (Version 2.2) # # Copyright (c) 2007 SUSE LINUX Products GmbH, Nuernberg, Germany. # This file and all modifications and additions to the pristine # package are under the same license as the package itself. # # Please submit bugfixes or comments via http://bugs.opensuse.org/ # # norootforbuild %define pkgname libtextcat %define soname %{pkgname}0 Name: %{soname} Version: 2.2 Release: 0 # License: BSD Group: Development/Languages/C and C++ # BuildRoot: %{_tmppath}/%{pkgname}-%{version}-build # Url: http://software.wise-guys.nl/libtextcat/ Source: http://software.wise-guys.nl/download/libtextcat-%{version}.tar.gz # Summary: Library for text classification %description Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1]. It was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy. The central idea of the Cavnar & Trenkle technique is to calculate a "fingerprint" of a document with an unknown category, and compare this with the fingerprints of a number of documents of which the categories are known. The categories of the closest matches are output as the classification. A fingerprint is a list of the most frequent n-grams occurring in a document, ordered by frequency. Fingerprints are compared with a simple out-of-place metric. See the article for more details. Considerable effort went into making this implementation fast and efficient. The language guesser processes over 100 documents/second on a simple PC, which makes it practical for many uses. It was developed for use in our webcrawler and search engine software, in which it it handles millions of documents a day. Authors: -------- Frank Scheelen %package -n %{pkgname}-devel Group: Development/Languages/C and C++ Requires: %{soname} = %{version} # Summary: Development files for libtextcat %description -n %{pkgname}-devel Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" [1]. It was primarily developed for language guessing, a task on which it is known to perform with near-perfect accuracy. The central idea of the Cavnar & Trenkle technique is to calculate a "fingerprint" of a document with an unknown category, and compare this with the fingerprints of a number of documents of which the categories are known. The categories of the closest matches are output as the classification. A fingerprint is a list of the most frequent n-grams occurring in a document, ordered by frequency. Fingerprints are compared with a simple out-of-place metric. See the article for more details. Considerable effort went into making this implementation fast and efficient. The language guesser processes over 100 documents/second on a simple PC, which makes it practical for many uses. It was developed for use in our webcrawler and search engine software, in which it it handles millions of documents a day. This package holds the development files for libtextcat. Authors: -------- Frank Scheelen %debug_package %prep %setup -n %{pkgname}-%{version} find -type d -name CVS -print0 | xargs -r0 rm -rv %build %configure %{__make} %install %makeinstall # install missing header file install -D -m 644 src/textcat.h %{buildroot}%{_includedir}/textcat.h # Configuration and language files %{__install} -d -m 0755 %{buildroot}%{_datadir}/%{soname} %{__cp} -av langclass/conf.txt langclass/LM langclass/ShortTexts %{buildroot}%{_datadir}/%{soname}/ %clean %{__rm} -rf %{buildroot} %post -p /sbin/ldconfig %postun -p /sbin/ldconfig %files %defattr(-,root,root,-) %{_bindir}/createfp %{_libdir}/libtextcat.so.0 %{_libdir}/libtextcat.so.0.0.0 %{_datadir}/%{soname}/ %doc ChangeLog LICENSE README TODO %files -n %{pkgname}-devel %defattr(-,root,root,-) %{_libdir}/libtextcat.a %{_libdir}/libtextcat.la %{_libdir}/libtextcat.so %{_includedir}/textcat.h %changelog