Search
j0ke.net Open Build Service
>
Projects
>
server:search
>
libtextcat
> Changes
Sign Up
|
Log In
Username
Password
Cancel
Overview
Repositories
Revisions
Requests
Users
Advanced
Attributes
Meta
Changes of Revision 3
[-]
[+]
Added
libtextcat.spec
@@ -0,0 +1,120 @@ +# +# spec file for package libtextcat (Version 2.2) +# +# Copyright (c) 2007 SUSE LINUX Products GmbH, Nuernberg, Germany. +# This file and all modifications and additions to the pristine +# package are under the same license as the package itself. +# +# Please submit bugfixes or comments via http://bugs.opensuse.org/ +# + +# norootforbuild + +Name: libtextcat +Version: 2.2 +Release: 0 +# +License: BSD +Group: Development/Languages/C and C++ +# +BuildRoot: %{_tmppath}/%{name}-%{version}-build +# +Url: http://software.wise-guys.nl/libtextcat/ +Source: http://software.wise-guys.nl/download/libtextcat-%{version}.tar.gz +# +Summary: Library for text classification + +%description +Libtextcat is a library with functions that implement the classification +technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" +[1]. It was primarily developed for language guessing, a task on which it is +known to perform with near-perfect accuracy. + +The central idea of the Cavnar & Trenkle technique is to calculate a +"fingerprint" of a document with an unknown category, and compare this with the +fingerprints of a number of documents of which the categories are known. The +categories of the closest matches are output as the classification. A +fingerprint is a list of the most frequent n-grams occurring in a document, +ordered by frequency. Fingerprints are compared with a simple out-of-place +metric. See the article for more details. + +Considerable effort went into making this implementation fast and efficient. +The language guesser processes over 100 documents/second on a simple PC, which +makes it practical for many uses. It was developed for use in our webcrawler +and search engine software, in which it it handles millions of documents a day. + +Authors: +-------- + Frank Scheelen + + +%package devel +Group: Development/Languages/C and C++ +Requires: %{name} = %{version} +# +Summary: Development files for libtextcat + +%description devel +Libtextcat is a library with functions that implement the classification +technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization" +[1]. It was primarily developed for language guessing, a task on which it is +known to perform with near-perfect accuracy. + +The central idea of the Cavnar & Trenkle technique is to calculate a +"fingerprint" of a document with an unknown category, and compare this with the +fingerprints of a number of documents of which the categories are known. The +categories of the closest matches are output as the classification. A +fingerprint is a list of the most frequent n-grams occurring in a document, +ordered by frequency. Fingerprints are compared with a simple out-of-place +metric. See the article for more details. + +Considerable effort went into making this implementation fast and efficient. +The language guesser processes over 100 documents/second on a simple PC, which +makes it practical for many uses. It was developed for use in our webcrawler +and search engine software, in which it it handles millions of documents a day. + + +This package holds the development files for libtextcat. + + +Authors: +-------- + Frank Scheelen + + +%debug_package +%prep +%setup +find -type d -name CVS -print0 | xargs -r0 rm -rv + +%build +%configure +%{__make} + +%install +%makeinstall +# install missing header file +install -D -m 644 src/textcat.h %{buildroot}%{_includedir}/textcat.h +# Configuration and language files +%{__install} -d -m 0755 %{buildroot}%{_datadir}/%{name} +%{__cp} -av langclass/conf.txt langclass/LM langclass/ShortTexts %{buildroot}%{_datadir}/%{name}/ + +%clean +%{__rm} -rf %{buildroot} + +%files +%defattr(-,root,root,-) +%{_bindir}/createfp +%{_libdir}/libtextcat.so.0 +%{_libdir}/libtextcat.so.0.0.0 +%{_datadir}/%{name}/ +%doc ChangeLog LICENSE README TODO + +%files devel +%defattr(-,root,root,-) +%{_libdir}/libtextcat.a +%{_libdir}/libtextcat.la +%{_libdir}/libtextcat.so +%{_includedir}/textcat.h + +%changelog