Brojanje reči

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
Ja sam na TRIOS-u (Debian oldstable…) koji je omatorio odavno, pa su i verzije haskela i svega ostalog starije 🙂
View attachment 5485
 
Last edited:

Lazar

Member
Joined
Sep 1, 2013
Messages
550
Reaction score
21
Code:
import sys
import re
from collections import Counter
from string import punctuation

cnt = Counter()
words = re.findall(’\w+’, open(sys.argv[1]).read().lower())
for word in words:
cnt[word] += 1
top_words = cnt.most_common(20)
for w, n in top_words:
print(f’{w}\t{n}’)

Као почетник, мислим да је за први покушај сасвим читко решење.
Требало би овај проблем решити и у модерном C+±у.

Go је врло добар избор као језик који је једноставнији него C, а бржи него Python.
 
Last edited:

Lazar

Member
Joined
Sep 1, 2013
Messages
550
Reaction score
21
@Dragan
Можеш једноставно да додаш својој скрипти нешто слично:
tr '[:upper:]' '[:lower:]' < bible.txt | tr -d '[:punct:]'> text.txt
и тако испуниш услове задатка.
 
Last edited:

Branimir_Maksimovic

Well-known member
Joined
Nov 22, 2018
Messages
928
Reaction score
370
alt
Lazar:
Code:
import sys
import re
from collections import Counter
from string import punctuation

cnt = Counter()
words = re.findall(’\w+’, open(sys.argv[1]).read().lower())
for word in words:
cnt[word] += 1
top_words = cnt.most_common(20)
for w, n in top_words:
print(f’{w}\t{n}’)

Као почетник, мислим да је за први покушај сасвим читко решење.
Требало би овај проблем решити и у модерном C+±у.

Go је врло добар избор као језик који је једноставнији него C, а бржи него Python.
Dobro ti je resenje ;p
Sto se tice C++, neko ce vec izbaciti resenje ;p
 
Last edited:

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
trebalo bi da moze i sa 7icom, koje greske ti javlja?
Evo grešaka:

Code:
[dragan@trios-eudev][/media/dragan/Hg/TRIOS-SCRIPTS/Playground/TMP]$ ghc -O2 bbl-t1.hs
[1 of 1] Compiling Main ( bbl-t1.hs, bbl-t1.o )

bbl-t1.hs:15:22:
No instance for (hashable-1.2.1.0:Data.Hashable.Class.Hashable
B8.ByteString)
arising from a use of [ICODE]updatefreq' Possible fix: add an instance declaration for (hashable-1.2.1.0:Data.Hashable.Class.Hashable B8.ByteString) In the first argument of [/ICODE]mapM_’, namely `(updatefreq ht)’
In a stmt of a ‘do’ block: mapM_ (updatefreq ht) xs
In the expression:
do { ht <- H.new :: IO (HashTable B8.ByteString (IORef Int));
mapM_ (updatefreq ht) xs;
lst <- H.toList ht;
mapM
(\ (x, y)
-> do { v <- readIORef y;
… })
lst }
[dragan@trios-eudev][/media/dragan/Hg/TRIOS-SCRIPTS/Playground/TMP]$

Mada…nemoj da se zamaraš, ionako nema vajde od mog “programiranja” 🙂
 
Last edited:

Branimir_Maksimovic

Well-known member
Joined
Nov 22, 2018
Messages
928
Reaction score
370
Bah, starija ti je hashtable bibiblioteka nema implementirano hash za ByteString, u tvojoj verziji to treba da se doda 😉
Nema veze.
 
Last edited:

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
@Branimir Maksimovic
Koja je verzija hashtable bibiblioteke kod tebe?
 
Last edited:

Branimir_Maksimovic

Well-known member
Joined
Nov 22, 2018
Messages
928
Reaction score
370
Inace shell resenje je :
time awk '{print tolower($0)}' bible.txt|grep -o \[a-zA-Z\]* |sort |uniq -c |sort -rn |head -n 20
 
Last edited:

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
@Branimir Maksimovic
@hashtable
da, kod tebe je najnovija verzija, koju ne mogu kroz cabal da instaliram iako sam omogućio opciju “-- allow-newer” u cabal config-u
View attachment 5486

@awk
nešto ne valja u tvojoj shell verziji, premalo reči daje kao rezultat
View attachment 5487
 
Last edited:

Branimir_Maksimovic

Well-known member
Joined
Nov 22, 2018
Messages
928
Reaction score
370
Pa to je to, prvih 20 reci sortirane reverzno po frekvenciji 😉
Drugo, Hashable za ByteString u Haskell-u ispadne prljava implementacija, a naivna implementacija ce biti sporija od one sa listama 😉
Probaj da iskompajliras onu sa listama.
 
Last edited:

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
prvih 20 reci sortirane reverzno po frekvenciji
Mea culpa…prevideo sam da je to u postavci zadatka u inicijalnoj poruci 🙂
 
Last edited:

Branimir_Maksimovic

Well-known member
Joined
Nov 22, 2018
Messages
928
Reaction score
370
Nego evo verzija koja bi trebala i kod mene i kod tebe da kompajlira, sa naivnom implementacijom hasha 😉

Code:
{-# Language BangPatterns,DeriveGeneric #-}
import qualified Data.HashTable.IO as H
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import Data.List
import Text.Printf
import Data.Char
import Data.IORef
import Data.Hashable
import GHC.Generics
type HashTable k v = H.BasicHashTable k v
newtype MyString = MyString B.ByteString
deriving (Generic,Eq)
instance Hashable MyString where
hash (MyString s) = B.foldl (\x y -> x*33 + fromIntegral y) 5381 s

wordFreq :: [B.ByteString] -> IO [(B.ByteString, Int)]
wordFreq xs = do
ht <- H.new :: IO (HashTable MyString (IORef Int))
mapM_ (updatefreq ht) xs
lst <- H.toList ht
mapM ((MyString x,y)-> do
v <- readIORef y
return (x,v)) lst
where updatefreq ht word = do
!lu <- H.lookup ht (MyString word)
case lu of
Nothing -> do
ref <- newIORef 1
H.insert ht (MyString word) ref
Just x -> modifyIORef’ x (+1)
return ()

main = do
contents <- B.readFile “bible.txt”
result <- wordFreq.B8.words $
B8.map toLower $ B8.filter (not.isPunct) contents
let sorted = reverse.sort $ map ((x,y) -> (y,x)) $ result
mapM_ ((x,y) -> printf “%8d %s\n” x (B8.unpack y)) $ take 20 sorted

isPunct c =
c == ‘’’ || c == ‘.’ || c == ‘;’ || c == ‘(’ || c == ‘)’
|| c == ‘"’ || c == ‘?’ || c == ‘-’ || c == ‘_’ || c == ‘!’
|| c == ‘,’ || c == ‘:’ || c == ‘|’
 
Last edited:

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
@Branimir Maksimovic
Prva verzija, iz inicijalne poruke kompajlira se i kod mene…vreme izvršavanja je nešto više od 4,5 sekunde, verovatno zato što sam ja matoriji od tebe pa je i moj procesor matoriji 😃
View attachment 5488

Probaću sada i ovu najnoviju verziju koju si okačio ^
 
Last edited:

Branimir_Maksimovic

Well-known member
Joined
Nov 22, 2018
Messages
928
Reaction score
370
Verzija sa Haskell Stringom se i kod mene izvrsava toliko 😜
Verzija sa listama i ByteStringom upola od toga.
Sa HashTabelom i naivnom implementacijom hasha sad kod mene 0.2 sekunde.
 
Last edited:

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
evo verzija koja bi trebala i kod mene i kod tebe da kompajlira, sa naivnom implementacijom hasha 😉
Nix, ne ide kompajliranje…

Code:
[dragan@trios-eudev][/media/dragan/Hg/TRIOS-SCRIPTS/Playground/TMP]$ cabal install bytestring
Resolving dependencies…
All the requested packages are already installed:
bytestring-0.10.8.2
Use --reinstall if you want to reinstall anyway.
[dragan@trios-eudev][/media/dragan/Hg/TRIOS-SCRIPTS/Playground/TMP]$ cabal install hashtables
Resolving dependencies…
All the requested packages are already installed:
hashtables-1.2.1.1
Use --reinstall if you want to reinstall anyway.
[dragan@trios-eudev][/media/dragan/Hg/TRIOS-SCRIPTS/Playground/TMP]$ ghc -O2 bbl-t3.hs
[1 of 1] Compiling Main ( bbl-t3.hs, bbl-t3.o )

bbl-t3.hs:14:10:
No instance for (Hashable B8.ByteString)
arising from a use of [ICODE]hashable-1.2.1.0:Data.Hashable.Class.$gdmhashWithSalt' Possible fix: add an instance declaration for (Hashable B8.ByteString) In the expression: (hashable-1.2.1.0:Data.Hashable.Class.$gdmhashWithSalt) In an equation for [/ICODE]hashWithSalt’:
hashWithSalt
= (hashable-1.2.1.0:Data.Hashable.Class.$gdmhashWithSalt)
In the instance declaration for `Hashable MyString’
[dragan@trios-eudev][/media/dragan/Hg/TRIOS-SCRIPTS/Playground/TMP]$
U kodu su verzije koje kod mene mogu da instaliram, kao i greška koju daje kod kompajliranja…
 
Last edited:

Branimir_Maksimovic

Well-known member
Joined
Nov 22, 2018
Messages
928
Reaction score
370
Silly me 😉
Generic derivacija implementira hashWithSalt oslanjajuci se da podtip ima implementaciju sto u tvom slucaju nema.
E sad bi trebalo da ide.

Code:
{-# Language BangPatterns #-}
import qualified Data.HashTable.IO as H
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as B8
import Data.List
import Text.Printf
import Data.Char
import Data.IORef
import Data.Hashable
import GHC.Generics
type HashTable k v = H.BasicHashTable k v
newtype MyString = MyString B.ByteString
deriving (Eq)

instance Hashable MyString where
hash (MyString s) = B.foldl (\x y -> x*33 + fromIntegral y) 5381 s
hashWithSalt salt s = hash s

wordFreq :: [B.ByteString] -> IO [(B.ByteString, Int)]
wordFreq xs = do
ht <- H.new :: IO (HashTable MyString (IORef Int))
mapM_ (updatefreq ht) xs
lst <- H.toList ht
mapM ((MyString x,y)-> do
v <- readIORef y
return (x,v)) lst
where updatefreq ht word = do
!lu <- H.lookup ht (MyString word)
case lu of
Nothing -> do
ref <- newIORef 1
H.insert ht (MyString word) ref
Just x -> modifyIORef’ x (+1)
return ()

main = do
contents <- B.readFile “bible.txt”
result <- wordFreq.B8.words $
B8.map toLower $ B8.filter (not.isPunct) contents
let sorted = reverse.sort $ map ((x,y) -> (y,x)) $ result
mapM_ ((x,y) -> printf “%8d %s\n” x (B8.unpack y)) $ take 20 sorted

isPunct c =
c == ‘’’ || c == ‘.’ || c == ‘;’ || c == ‘(’ || c == ‘)’
|| c == ‘"’ || c == ‘?’ || c == ‘-’ || c == ‘_’ || c == ‘!’
|| c == ‘,’ || c == ‘:’ || c == ‘|’
E sad se izvrsava ocekivano nesto sporije
 
Last edited:

Dragan

Well-known member
Staff member
Joined
Jan 13, 2012
Messages
6,371
Reaction score
65
@Branimir Maksimovic
To je to, sada kompajliranje prolazi 🆙
Vreme izvršavanja: malo više od pola sekunde…

View attachment 5489
 
Last edited:

Prizma

Active member
Joined
Feb 13, 2017
Messages
461
Reaction score
76
Капирам да сам згужвао ствар и да има ту шта да се оптимизује, ал 14 секунди? :eek:

Code:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

namespace ConsoleAppBible
{
class Program
{
public class Word
{
public string WordChars { get; set; }
public int WordCount { get; set; } = 0;
}
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
sw.Start();
Func<string, string> lowerwriting = fnc => fnc.ToLower();
List isString(List list, int len)
{
List filterd = new List();
bool isword;
for (int i = 0; i < len; i++)
{
isword = Regex.IsMatch(list[i], @"^[a-zA-Z]+$");
if (isword)
filterd.Add(list[i]);
}
return filterd;
}
IEnumerable SeparateWords(string input)
{
MatchCollection matches = Regex.Matches(input, @"\b[\w’]*\b");
[CODE]            var words = from m in matches.Cast<Match>()
                        where !string.IsNullOrEmpty(m.Value)
                        select (m.Value);
        
            return words;
        }
        List<Word> GetUniqueWords(IEnumerable<string> list, out int x)
        {
            List<Word> words = new List<Word>();
            var uWords = list.Distinct().ToList();
            x = uWords.Count() - 1;
            uWords = isString(uWords, x);
            x = uWords.Count() - 1;
            for (int i = 0; i < x; i++)
            {
                Word word = new Word();
                word.WordChars = uWords[i];
                word.WordCount = 0;
                words.Add(word);
            }
            return words;
        }
        void CalcReps(List<string> text, List<Word> words, int count)
        {
            for (int i = 0; i < count; i++)
            {
                var finalList = words.Where(w => w.WordChars == text[i]).Select(w => { w.WordCount++; return w; }).ToList();
            }

            var sorted = words.ToList().OrderBy(w => w.WordCount).TakeLast(20).Reverse();
            sw.Stop();
        
            foreach (var item in sorted)
            {
                Console.ForegroundColor = ConsoleColor.Red;
                Console.Write($"{item.WordChars} ");
                Console.ResetColor();
                Console.Write("appeared ");
                Console.ForegroundColor = ConsoleColor.Red;
                Console.Write($"{item.WordCount} ");
                Console.ResetColor();
                Console.Write(" times.");
                Console.WriteLine();
            }
            Console.WriteLine($"it took me {sw.ElapsedMilliseconds} miliseconds to do this");
        }
        string txt = System.IO.File.ReadAllText(@"E:\bible.txt");
        var wholeText = SeparateWords(txt);
        var unique = GetUniqueWords(wholeText, out int y);
        CalcReps(wholeText.ToList(), unique, y);
        Console.ReadLine();
    }
}
}
[/CODE]

Edit:
Зато сам тражио онај пакетић libicu, да могу C# на линџи да покренем. Кад оспособим, видећемо резултат на њему {1f918}

Editedit:
А ни резултат не ваља XD Журим на Аду, вечерас ћу ово
 
Last edited:
Top