错误说vector size cannot be NA when using R with data mining – Data Science – Community

31/03/2021 11:06 am

I'm using R for data analytics andconnected it with elasticsearch and retrieve a dataset of Shakespeare Complete Works.

library("elastic")connect（）maxi<-count(index='shakespeare')s<-Search(index='shakespeare',size=maxi)dat<-s$hits$hits[[1]]$`_source`$text_entryfor(iin2:maxi){dat<-c(dat,s$hits$hits[[i]]$`_source`$text_entry)}rm(s)

After that I want to do a tf_idf matrix but apparently I can't since it uses too much memory (I have 4GB of RAM), here is my code:

library("tm")myCorpus<-Corpus(VectorSource(dat))myCorpus<-tm_map(myCorpus,content_transformer(tolower),lazy=TRUE)myCorpus<-tm_map(myCorpus,content_transformer(removeNumbers),lazy=TRUE)myCorpus<-tm_map(myCorpus,content_transformer(removePunctuation),lazy=TRUE)myCorpus<-tm_map(myCorpus,content_transformer(removeWords),stopwords("en"),lazy=TRUE)myTdm<-TermDocumentMatrix(myCorpus,control=list(weighting=function(x)weightTfIdf(x,normalize=FALSE)))

myCorpus is around 400 Mb.

But then I do:

>m<-as.matrix(myTdm)Errorinvector(typeof(x$v),nr*nc):vector size cannot beNAIn addition:Warning message:In nr*nc:NAs produced by integer overflow