Category: crisis apparition

marylandparanormal: Exploring Nursing Ghost S…

marylandparanormal:

Exploring Nursing Ghost Stories through Machine Learning: Topic Discovery with Latent Dirichlet Allocation

NOTE: Click to open graphics for an expanded and clearer view of the findings they contain  

As reported in earlier posts, the Allnurses.com web site hosts a long-running moderated discussion thread called “Nursing Ghost Stories” (NGS).  The NGS collection spans over a decade (2005-2017) amounting to 199 pages as of the time of this writing.  As a dataset NGS contains multiple first and second hand accounts and commentary on paranormal type experiences 

The archive contains classic examples of ghosts and hauntings phenomena.  Patients were generally the percipients in ghost experiences.   Sometimes the ghosts in question appeared to be former nurses in period dress, or former doctors and patients, or former area residents.  However, these kinds of paranormal experiences did not dominate the collection  

In actuality, the NGS archive conveys several varieties of psi and post-mortem survival phenomena.  The archive contains several examples of extrasensory perception and presentiment in particular  

There were also examples of after-death communication (ADC), which are sensed-presence or apparitional experiences involving deceased family members or friends.  Unlike hauntings which are place-centered, ADC encounters are person-centered involving meaningful coincidences (or synchronicities) for the percipients.  ADC encounters are more common among widows and widowers, but are not exclusive to them

The archive contains several reports of near-death experiences (NDEs). However, the more representative encounters involved nearing death awareness (NDA) type experiences.  In NDA situations, terminally-ill patients experiencing death-bed visions will have perceptions of welcoming apparitions of deceased relatives or loved ones

  • Terminal patients will also appear to hold conversations with persons who are not physically present in their room.  Sometimes nurses described these aspects of NDA experiences as dementia
  • It is also not uncommon for gravely-ill patients to be alert and conversant in their final hours before death, a phenomenon called “terminal lucidity”

Provided below are examples of exchanges regarding NDA situations as characterized by nurses working in long-term care and palliative care settings 

I’ve been a hospice nurse for 5 years. I have been with hundreds of people at the time of their death & I can tell you first hand that if the patient is alert enough to speak, you’ll hear them talking to loved ones that have already passed over

That is so true. I, too am a hospice nurse and when pts. start talking to their dead relatives, you know that they have about a week MAX before they are gone

From experience I’ve learned that when a pt tells you they’re going to die…they usually do…and if they start talking to dead family members…they usually die…it’s like the family members have come to take them…..

As a follow-on to the earlier wordcloud project, we wondered whether unsupervised machine learning, specifically topic generation models, could discover the abovementioned themes in the NGS archive 

  • Generative topic models

    view documents as having a latent semantic structure of topics that can be inferred from co-occurrences of words in documents  

  • For this project, the Latent Dirichlet Allocation (LDA) topic model was employed.  LDA views documents as probability distributions over topics and topics as probability distributions over words
  • All documents share the same collection of topics, but each document contains those topics in different proportions.  The LDA algorithm samples words across topics until it arrives at topics and word selections that most likely generated the documents

Various packages and libraries for natural language processing within Python were used to include: the Natural Language ToolKit (NLTK) for processing the data set; scikit-learn to prepare and fit the LDA model; pyLDAvis to display the results and t-Distributed Stochastic Neighbor Embedding (t-SNE) to map topic distances

The project pipeline involved: data set processing; conversion of words and documents into a document-term matrix and vector space; fitting the LDA models; and displaying the results

Processing. The data set was decomposed into 199 documents from its constituent web pages.  In contrast to the wordcloud project, the set of stopwords was enlarged to find meaningful insights in the NGS archive

  • The core set of stopwords consisted of commonly-used prepositions, conjunctions, and contractions.  Stopwords from the wordcloud application were used as a start point for this purpose
  • Since the archive consisted of first or second hand accounts, words related to stories and/or storytelling were added to stopwords, along with words related to the maintenance of the thread
  • Since spontaneous experiences can occur at any moment, words conveying times were removed.  While many experiences were singular events, numeric references involving cardinal (e.g. one, two) and ordinal (e.g. first, second) rankings were removed
  • Titles of persons were removed (e.g. Mr., Mrs., etc.); however, person and gender types (e.g. man, woman, etc.) and interpersonal relationships (e.g. family, friends, or strangers) were preserved
  • Domain-related words relating to patient care or standard procedures were removed (e.g. hospital, unit, shift, staff, work, station, monitor, code)

Conversion. Vector transformations converted the data set into a document-term matrix for mathematical processing.  The rows of the matrix correspond to documents with columns corresponding to the frequency of a term

  • Count vectorizers count word frequencies.  Term Frequency-Inverse Document Frequency (TF-IDF) vectorizers normalize (divide) word counts by their frequency in the documents
  • Both vectorizers converted words to lower case and removed non-word expressions. The vectorizers were parameterized to look for bigrams (or words that were often used together) 

Model Fit/Display. The LDA model was fitted using ten topics.  Words within topics were sorted and ranked with respect to their frequency in and relevance within a topic

  • The LDA model was fitted with using Count and TF-IDF vectorization and ran with a maximum of 100 iterations.  LDA model results were displayed using pyLDAvis and t-SNE to map topic distances

Results. Although topics produced from the model are unlabeled, words within topics usually can be woven into a coherent theme

The first four pyLDAvis graphs provide the top 30 words and bigrams in Topics 1 through 4 using Count vectorization  

  • Topic 1 is the most representative of the body of stories in the thread and generated around 86% of the content.  Words in Topic 1 included: “nurse” and “patient”; both nurses and patients were percipients and sometimes sources of “ghost” experiences.  If apparitions represented unrecognized persons, patients had “asked” whom they “saw.”  Many apparitional encounters involved patients who were “heard” “talking” to deceased “family” members or a “friend.“  These telepathic types of apparitions were often described as “sitting” near the bedsides of patients, or transiting their rooms or into an adjacent “hall” on their “floor.”  Overall, this could be considered an apparitional experiences topic

  • Topic 2 is derived from user commentary and seems reflective of internal varieties of psi functioning. Words in Topic 2 included:  "dreams”, “feel(ings)” and a “sense” of awareness or presentiment of events that were happening or about to “happen”, usually in connection with the deaths of family members. In other cases the dreams were possible telepathic connections with lost “loved” ones. Overall, this can be considered a extrasensory perception topic and it generated 7% of the content  
  • Topic 3 appears reflective of external forms of psi and survival phenomena to include auditory and physical encounters commonly associated with hauntings and poltergeists.  Words in Topic 3 included: “haunted”, “voice(s)”, and other imitative sounds such as “music.”  There were also reported instances of anomalous telephone contact possibly involving “phone” calls from the dead and “strange” behaviors of televisions, call lights and other electrical appliances.  Overall, this could be considered a hauntings and poltergeists topic and it generated around 4% of the content 
  • Topic 4 is also derived from user commentary and seems reflective of general discussions on the paranormal, religious and exceptional experiences.  Discussions included: “paranormal” television, “movie” and “radio” entertainment;  synchronicities (meaningful coincidences) and "photo” and other evidence from paranormal investigations.  Discussions also involved ghost stories outside a nursing context; some were urban legends and a few were probably larks.  Overall, this could be considered a paranormal discussions topic and it generated around 3% of the content

The fifth pyLDAvis graph provides the top 30 words in Topic 1 using TF-IDF vectorization.  

  • The findings were close to those encountered for Topic 1 with the Count Vectorization.  However, it appears to be a combined apparitional experiences and extrasensory perception topic accounting for 94% of the content.

     This consolidation arises from the fact that TF-IDF vectorization lowers the contribution weight of commonly used words

This project again demonstrates the usefulness of topic generation models for finding meaningful patterns in masses of unlabeled or unstructured data.

  The LDA topic discovery method indicated several varieties of psi and survival experiences that went beyond ghost stories 

  • Many apparitional encounters described in the archive represented the intersection of nearing death awareness (involving death-bed visions of welcoming apparitions) and after-death communication experiences (involving apparitions of deceased family members and friends)
  • Even though the algorithm knows nothing intrinsically about the above experiences, the model was able to infer topics and words corresponding to the most representative kinds of encounters 

Greater insights could be gained by structuring the NGS dataset and labeling the experiential elements within it.  Follow-on research could employ semi-supervised methods to train models to classify types of psi and survival experiences and to find correlates within them  

Specifically, deep learning models could be trained on the semantics around typologies of apparitions with tagged documents.  Parapsychology categorizes apparitions along four lines: living agent; crisis; post-mortem; and haunting  

  • If an apparition is seen within ±12 hours of a person’s death, that represents a crisis apparition 
  • If an apparition is seen 24 hours or more after a person’s death, that apparition is post-mortem
  • If the apparition is of a long-deceased person and has a location affinity, that is a haunting apparition

Nonetheless, the apparitional experiences in NGS appear roughly consistent with survey results elsewhere.  Apparitional experiences rarely occur in the general population, but when they do, the apparitions are likely to represent recognized persons, known to the individuals who are perceiving them

REFERENCES

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, (Jan), 993-1022.

Gauld, A., & Cornell, A. D. (1979). Poltergeists. Routledge Kegan & Paul.

Kircher, P. and Callanan, M. (2017, Dec 14).  NDEs and Nearing Death Awareness in the Terminally Ill. International Association for Near Death Studies (IANDS).

Natural Language Toolkit: NLTK 3.2.5 documentation. (2017, Sep 24). NLTK Project.

Pearson, P. (2014). Opening Heaven’s Door: What the Dying May be Trying to Tell Us about where They’re Going. Random House Canada. Sponsored

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

What’s Your Best Nursing Ghost Story? (2017, Oct 30). AllNurses.com

IMAGES

pyLDAvis Graph of Topic 1 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 2 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 3 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 4 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 1 (TF-IDF Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

Exploring Nursing Ghost Stories through Machine Learning: Topic…

Exploring Nursing Ghost Stories through Machine Learning: Topic Discovery with Latent Dirichlet Allocation

NOTE: Click to open graphics for an expanded and clearer view of the findings they contain  

As reported in earlier posts, the Allnurses.com web site hosts a long-running moderated discussion thread called “Nursing Ghost Stories” (NGS).  The NGS collection spans over a decade (2005-2017) amounting to 199 pages as of the time of this writing.  As a dataset NGS contains multiple first and second hand accounts and commentary on paranormal type experiences 

The archive contains classic examples of hauntings and poltergeist phenomena.  Patients were generally the percipients in ghost experiences.   Sometimes the ghosts in question appeared to be former nurses in period dress, or former doctors and patients, or former area residents.  However, these kinds of paranormal experiences did not dominate the collection  

In actuality, the NGS archive conveys several varieties of psi and post-mortem survival phenomena.  The archive contains several examples of extrasensory perception and presentiment in particular  

There were also examples of after-death communication (ADC), which are sensed-presence or apparitional experiences involving deceased family members or friends.  Unlike hauntings which are place-centered, ADC encounters are person-centered involving meaningful coincidences (or synchronicities) for the percipients

The archive contains several reports of near-death experiences (NDEs). However, the more representative encounters involved nearing death awareness (NDA) type experiences.  In NDA situations, terminally-ill patients experiencing death-bed visions will have perceptions of welcoming apparitions of deceased relatives or loved ones

  • Terminal patients will also appear to hold conversations with persons who are not physically present in their room.  Sometimes nurses described these aspects of NDA experiences as dementia
  • It is also not uncommon for gravely-ill patients to be alert and conversant in their final hours before death, a phenomenon called “terminal lucidity”

Provided below are examples of exchanges regarding NDA situations as characterized by nurses working in long-term care and palliative care settings 

I’ve been a hospice nurse for 5 years. I have been with hundreds of people at the time of their death & I can tell you first hand that if the patient is alert enough to speak, you’ll hear them talking to loved ones that have already passed over

That is so true. I, too am a hospice nurse and when pts. start talking to their dead relatives, you know that they have about a week MAX before they are gone

From experience I’ve learned that when a pt tells you they’re going to die…they usually do…and if they start talking to dead family members…they usually die…it’s like the family members have come to take them…..

As a follow-on to the earlier wordcloud project, we wondered whether unsupervised machine learning, specifically topic generation models, could discover the abovementioned themes in the NGS archive 

  • Generative topic models

    view documents as having a latent semantic structure of topics that can be inferred from co-occurrences of words in documents  

  • For this project, the Latent Dirichlet Allocation (LDA) topic model was employed.  LDA views documents as probability distributions over topics and topics as probability distributions over words
  • All documents share the same collection of topics, but each document contains those topics in different proportions.  The LDA algorithm samples words across topics until it arrives at topics and word selections that most likely generated the documents

Various packages and libraries for natural language processing within Python were used to include: the Natural Language ToolKit (NLTK) for processing the data set; scikit-learn to prepare and fit the LDA model; pyLDAvis to display the results and t-Distributed Stochastic Neighbor Embedding (t-SNE) to map topic distances

The project pipeline involved: data set processing; conversion of words and documents into a document-term matrix and vector space; fitting the LDA models; and displaying the results

Processing. The data set was decomposed into 199 documents from its constituent web pages.  In contrast to the wordcloud project, the set of stopwords was enlarged to find meaningful insights in the NGS archive

  • The core set of stopwords consisted of commonly-used prepositions, conjunctions, and contractions.  Stopwords from the wordcloud application were used as a start point for this purpose
  • Since the archive consisted of first or second hand accounts, words related to stories and/or storytelling were added to stopwords, along with words related to the maintenance of the thread
  • Since spontaneous experiences can occur at any moment, words conveying times were removed.  While many experiences were singular events, numeric references involving ordinal (e.g. one, two) and cardinal (e.g. first, second) rankings were removed
  • Titles of persons were removed (e.g. Mr., Mrs., etc.); however, person and gender types (e.g. man, woman, etc.) and interpersonal relationships (e.g. family, friends, or strangers) were preserved
  • Domain-related words relating to patient care or standard procedures were removed (e.g. hospital, unit, shift, staff, work, station, monitor, code)

Conversion. Vector transformations converted the data set into a document-term matrix for mathematical processing.  The rows of the matrix correspond to documents with columns corresponding to the frequency of a term

  • Count vectorizers count word frequencies.  Term Frequency-Inverse Document Frequency (TF-IDF) vectorizers normalize (divide) word counts by their frequency in the documents
  • Both vectorizers converted words to lower case and removed non-word expressions. The vectorizers were parameterized to look for bigrams (or words that were often used together) 

Model Fit/Display. The LDA model was fitted using ten topics.  Words within topics were sorted and ranked with respect to their frequency in and relevance within a topic

  • The LDA model was fitted with using Count and TF-IDF vectorization and ran with a maximum of 100 iterations.  LDA model results were displayed using pyLDAvis and t-SNE to map topic distances

Results. Although topics produced from the model are unlabeled, words within topics usually can be woven into a coherent theme

The first four pyLDAvis graphs provide the top 30 words and bigrams in Topics 1 through 4 using Count vectorization  

  • Topic 1 is the most representative of the body of stories in the thread and generated around 86% of the content.  Words in Topic 1 included: “nurse” and “patient”; both nurses and patients were percipients and sometimes sources of “ghost” experiences.  If apparitions represented unrecognized persons, patients had “asked” whom they “saw.”  Many apparitional encounters involved patients who were “heard” “talking” to deceased “family” members or a “friend.“  These telepathic types of apparitions were often described as “sitting” near the bedsides of patients, or transiting their rooms or into an adjacent “hall” on their “floor.”  Overall, this could be considered an apparitional experiences topic

  • Topic 2 is derived from user commentary and seems reflective of internal varieties of psi functioning. Words in Topic 2 included:  “dreams”, “feel(ings)” and a “sense” of awareness or presentiment of events that were happening or about to “happen”, usually in connection with the deaths of family members. In other cases the dreams were possible telepathic connections with lost “loved” ones. Overall, this can be considered a extrasensory perception topic and it generated 7% of the content  
  • Topic 3 appears reflective of external forms of psi and survival phenomena to include auditory and physical encounters commonly associated with hauntings and poltergeists.  Words in Topic 3 included: “haunted”, “voice(s)”, and other imitative sounds such as “music.”  There were also reported instances of anomalous telephone contact possibly involving “phone” calls from the dead and “strange” behaviors of televisions, call lights and other electrical appliances.  Overall, this could be considered a hauntings and poltergeists topic and it generated around 4% of the content 
  • Topic 4 is also derived from user commentary and seems reflective of general discussions on the paranormal, religious and exceptional experiences.  Discussions included: “paranormal” television, “movie” and “radio” entertainment;  synchronicities (meaningful coincidences) and “photo” and other evidence from paranormal investigations.  Discussions also involved ghost stories outside a nursing context; some were urban legends and a few were probably larks.  Overall, this could be considered a paranormal discussions topic and it generated around 3% of the content

The fifth pyLDAvis graph provides the top 30 words in Topic 1 using TF-IDF vectorization.  

  • The findings were close to those encountered for Topic 1 with the Count Vectorization.  However, it appears to be a combined apparitional experiences and extrasensory perception topic accounting for 94% of the content.

     This consolidation arises from the fact that TF-IDF vectorization lowers the contribution weight of commonly used words

This project again demonstrates the usefulness of topic generation models for finding meaningful patterns in masses of unlabeled or unstructured data.

  The LDA topic discovery method indicated several varieties of psi and survival experiences that went beyond ghost stories 

  • Many apparitional encounters described in the archive represented the intersection of nearing death awareness (involving death-bed visions of welcoming apparitions) and after-death communication experiences (involving apparitions of deceased family members and friends)
  • Even though the algorithm knows nothing intrinsically about the above experiences, the model was able to infer topics and words corresponding to the most representative kinds of encounters 

Greater insights could be gained by structuring the NGS dataset and labeling the experiential elements within it.  Follow-on research could employ semi-supervised methods to train models to classify types of psi and survival experiences and to find correlates within them  

Specifically, deep learning models could be trained on the semantics around typologies of apparitions with tagged documents.  Parapsychology categorizes apparitions along four lines: living agent; crisis; post-mortem; and haunting  

  • If an apparition is seen within ±12 hours of a person’s death, that represents a crisis apparition 
  • If an apparition is seen 24 hours or more after a person’s death, that apparition is post-mortem
  • If the apparition is of a long-deceased person and has a location affinity, that is a haunting apparition

Nonetheless, the apparitional experiences in NGS appear roughly consistent with survey results elsewhere.  Apparitional experiences rarely occur in the general population, but when they do, the apparitions are likely to represent recognized persons, known to the individuals who are perceiving them

REFERENCES

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, (Jan), 993-1022.

Gauld, A., & Cornell, A. D. (1979). Poltergeists. Routledge Kegan & Paul.

Kircher, P. and Callanan, M. (2017, Dec 14).  NDEs and Nearing Death Awareness in the Terminally Ill. International Association for Near Death Studies (IANDS).

Natural Language Toolkit: NLTK 3.2.5 documentation. (2017, Sep 24). NLTK Project.

Pearson, P. (2014). Opening Heaven’s Door: What the Dying May be Trying to Tell Us about where They’re Going. Random House Canada. Sponsored

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

What’s Your Best Nursing Ghost Story? (2017, Oct 30). AllNurses.com

IMAGES

pyLDAvis Graph of Topic 1 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 2 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 3 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 4 (Count Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 1 (TF-IDF Vectorization) from Nursing Ghost Stories Corpus. (2018, Apr 08). © Maryland Paranormal Research ®.  All rights reserved.

marylandparanormal: Exploring Phantasms of th…

marylandparanormal:

Exploring Phantasms of the Living (1886) through Machine Learning: Topic Discovery with Latent Dirichlet Allocation

NOTE: Click to open graphics for an expanded and clearer view of the findings they contain  

Phantasms of the Living, published in 1886 by the Society for Psychical Research (SPR), was a landmark study that presented the case for “telepathy" or thought transference from mind to mind.  The study consisted of 702 cases spanning over 1400 pages that considered several varieties of spontaneous telepathic experiences collectively referred to “phantasms of the living”  

  • The case collection examined non-sensory and internalized impressions, many of which were presentiment experiences involving dreams, clairvoyance, visions, feelings or an awareness in connection with the deaths of family members or friends.  These experiences often coincided with the approximate time of death
  • Cases also considered

    sensory and externalized impressions, in particular apparitional representations of living persons, who were perceived to be in moments of crisis or danger.  These situations appeared evidential of shock-induced forms of thought transference from a distressed agent to a percipient in the form of telepathic hallucinations

As a follow-on to the earlier wordcloud project, we wondered whether unsupervised machine learning could discover main topics within Phantasms of the Living.  For the project, two varieties of generative topic models were used: Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Analysis (pLSA)

Both models view documents as having a latent semantic structure of topics that can be inferred from co-occurrences of words in documents. The mathematics underlying both models are beyond the scope of this post, but on an intuitive level there are key differences between the two methods

  • pLSA views topics over probability distributions over words. Each word is generated by a single topic. Topics are seen as conditionally independent across the documents that produced them.  Non-Negative Matrix Factorization (NMF) is a method for finding topic clusters used for pLSA
  • LDA by contrast views documents as probability distributions over topics and topics as probability distributions over words.  All documents share the same collection of topics, but each document contains those topics in different proportions 

The project used various packages and libraries for natural language processing within the Python programming platform to include: the Natural Language ToolKit (NLTK) for processing the data set; Sci-Kit Learn to prepare and fit the LDA and NMF models; pyLDAVis was used to display the results and t-Distributed Stochastic Neighbor Embedding (t-SNE) to map topic distances

The end-to-end project pipeline involved: data set processing; conversion of words and documents into matrix and vector space; fitting the LDA and NMF models; and then displaying the results

Processing. The book was decomposed into several documents from its constituent sections, chapters and volumes for the data set. Stopwords were removed such as common prepositions and conjunctions using the wordcloud application   

  • Since telepathic experiences are spontaneous and can occur at any time or place, words conveying times and locations were removed as well as ordinal and cardinal types of rankings
  • Nouns or titles representing persons were removed (e.g. man, woman, Mr., Mrs., etc.); however, interpersonal relationships were preserved (i.e. family, friends, acquaintances or strangers)  

Conversion. Vector transformations converted the data set into a document-term matrix for mathematical processing  

  • The rows of the matrix correspond to documents with columns corresponding to the frequency of a term.  Count vectorizers count word frequencies.  Term Frequency-Inverse Document Frequency (TF-IDF) vectorizers normalize (divide) word counts by their frequency in the documents 
  • Both vectorizers converted words to lower case and removed non-word expressions. The vectorizers were also instructed to look for bigrams (or words that were often used together) such as “thought-transference” and “telepathic hallucination”

Model Fit/Display. The LDA and NMF models were fitted using ten topics.  Words within topics were sorted and ranked with respect to their frequency in and relevance within a topic

  • The LDA model was fitted with using Count and TF-IDF vectorization and ran with a maximum of 10 iterations.  LDA model results were displayed using pyLDAVis and t-SNE to map topic distances
  • The NMF model was fitted with TF-IDF vectorization only and ran with a maximum of 200 iterations. NMF model results were displayed via spreadsheet

In unsupervised machine learning data is unlabeled, hence the topics produced from both models were also unlabeled.  However words within topics often can be woven into a coherent theme

The first two pyLDAvis graphs provide the top 30 words and bigrams in Topics 1 and 2 using Count vectorization  

  • Words in Topic 1 include: “dreams”; “visions”; “impressions”; and “experiences” in connection with the “death”(s) of family members and friends.  This can be considered a “presentiment” topic.  This topic contained 67% of the top terms.  This mirrors results from the prior wordcloud project  
  • Words and bigrams in Topic 2 include: “thought-transference”, “hallucination(s)”, “phantasms”, “mind(s)”, “percipients”, “agent” and “telepathy.”  This can be considered a “crisis apparitions” topic.  This topic contained 27% of the top terms 

The third pyLDAvis graph provides the top 30 words in Topic 1 using TF-IDF vectorization

  • Topic 1 combines all the aforementioned words into one topic.  This can be considered a “presentiment and crisis apparitions” topic.  

    This topic contained 95% of the top terms, rendering all other topics comparatively insignificant in generating the documents

The spreadsheets compare LDA and NMF model runs using TF-IDF vectorizations, however, with results limited to the 10 top words.  Although topic weights and distances are not available, some topics appear more meaningful and cohesive, and are likely more impactful than others

  • There is considerable overlap between topics 6 and 7 in the LDA model and together they form the presentiment and crisis apparitions topic. Topics 0 and 1 in the NMF model respectively appear to correspond to presentiment and crisis apparitions topics
  • The bigram “thought-transference” arises in both the LDA and NMF results and is associated with the “Society” for “Psychical” Research and the late F.W.H. “Myers” who invented the term “telepathy”

This project had an extended preparation and production pipeline.  However, results clearly show that unsupervised machine learning using LDA and NMF effectively and comprehensively summarized topical content in Phantasms of the Living.  Moreover, the topics approximately corresponded to the types of internalized and externalized telepathic experiences described in the book  

This project demonstrates the usefulness of topic generation models for finding meaningful patterns in masses of unlabeled or unstructured data.  Moreover, visualization and graphing tools are essential for fully comprehending these patterns. Elsewhere in parapsychology LDA or NMF could also be applied to survey data, case collections, web or social media content of interest.  

REFERENCES

Anaya, L. A. Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers. University of North Texas, 2011.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

Christou, D. (2016). Feature extraction using Latent Dirichlet Allocation and Neural Networks: A case study on movie synopses. arXiv preprint arXiv:1604.01272.

Deerwester, S. (1988). Improving information retrieval with latent semantic indexing.

Gurney, E., Myers, F. W., & Podmore, F. (1886). Phantasms of the Living (2 vols.). London: Trübner.  Reprinted at the Esalen Center.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

IMAGES:

pyLDAvis Graph of Topic 1 (Count Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 2 (Count Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 1 (TF-IDF Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

Top Words List: Latent Dirichlet Allocation (TF-IDF Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

Top Words List: Non-Negative Matrix Factorization (Frobenius) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

Exploring Phantasms of the Living (1886) through Machine…

Exploring Phantasms of the Living (1886) through Machine Learning: Topic Discovery with Latent Dirichlet Allocation

NOTE: Click to open graphics for an expanded and clearer view of the findings they contain  

Phantasms of the Living, published in 1886 by the Society for Psychical Research (SPR), was a landmark study that presented the case for “telepathy” or thought transference from mind to mind.  The study consisted of 702 cases spanning over 1400 pages that considered several varieties of spontaneous telepathic experiences collectively referred to “phantasms of the living”  

  • The case collection examined non-sensory and internalized impressions, many of which were presentiment experiences involving dreams, clairvoyance, visions, feelings or an awareness in connection with the deaths of family members or friends.  These experiences often coincided with the approximate time of death
  • Cases also considered

    sensory and externalized impressions, in particular apparitional representations of living persons, who were perceived to be in moments of crisis or danger.  These situations appeared evidential of shock-induced forms of thought transference from a distressed agent to a percipient in the form of telepathic hallucinations

As a follow-on to the earlier wordcloud project, we wondered whether unsupervised machine learning could discover main topics within Phantasms of the Living.  For the project, two varieties of generative topic models were used: Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Analysis (pLSA)

Both models view documents as having a latent semantic structure of topics that can be inferred from co-occurrences of words in documents. The mathematics underlying both models are beyond the scope of this post, but on an intuitive level there are key differences between the two methods

  • pLSA views topics over probability distributions over words. Each word is generated by a single topic. Topics are seen as conditionally independent across the documents that produced them.  Non-Negative Matrix Factorization (NMF) is a method for finding topic clusters used for pLSA
  • LDA by contrast views documents as probability distributions over topics and topics as probability distributions over words.  All documents share the same collection of topics, but each document contains those topics in different proportions 

The project used various packages and libraries for natural language processing within the Python programming platform to include: the Natural Language ToolKit (NLTK) for processing the data set; Sci-Kit Learn to prepare and fit the LDA and NMF models; pyLDAVis was used to display the results and t-Distributed Stochastic Neighbor Embedding (t-SNE) to map topic distances

The end-to-end project pipeline involved: data set processing; conversion of words and documents into matrix and vector space; fitting the LDA and NMF models; and then displaying the results

Processing. The book was decomposed into several documents from its constituent sections, chapters and volumes for the data set. Stopwords were removed such as common prepositions and conjunctions using the wordcloud application   

  • Since telepathic experiences are spontaneous and can occur at any time or place, words conveying times and locations were removed as well as ordinal and cardinal types of rankings
  • Nouns or titles representing persons were removed (e.g. man, woman, Mr., Mrs., etc.); however, interpersonal relationships were preserved (i.e. family, friends, acquaintances or strangers)  

Conversion. Vector transformations converted the data set into a document-term matrix for mathematical processing  

  • The rows of the matrix correspond to documents with columns corresponding to the frequency of a term.  Count vectorizers count word frequencies.  Term Frequency-Inverse Document Frequency (TF-IDF) vectorizers normalize (divide) word counts by their frequency in the documents 
  • Both vectorizers converted words to lower case and removed non-word expressions. The vectorizers were also instructed to look for bigrams (or words that were often used together) such as “thought-transference” and “telepathic hallucination”

Model Fit/Display. The LDA and NMF models were fitted using ten topics.  Words within topics were sorted and ranked with respect to their frequency in and relevance within a topic

  • The LDA model was fitted with using Count and TF-IDF vectorization and ran with a maximum of 10 iterations.  LDA model results were displayed using pyLDAVis and t-SNE to map topic distances
  • The NMF model was fitted with TF-IDF vectorization only and ran with a maximum of 200 iterations. NMF model results were displayed via spreadsheet

In unsupervised machine learning data is unlabeled, hence the topics produced from both models were also unlabeled.  However words within topics often can be woven into a coherent theme

The first two pyLDAvis graphs provide the top 30 words and bigrams in Topics 1 and 2 using Count vectorization  

  • Words in Topic 1 include: “dreams”; “visions”; “impressions”; and “experiences” in connection with the “death”(s) of family members and friends.  This can be considered a “presentiment” topic.  This topic contained 67% of the top terms.  This mirrors results from the prior wordcloud project  
  • Words and bigrams in Topic 2 include: “thought-transference”, “hallucination(s)”, “phantasms”, “mind(s)”, “percipients”, “agent” and “telepathy.”  This can be considered a “crisis apparitions” topic.  This topic contained 27% of the top terms 

The third pyLDAvis graph provides the top 30 words in Topic 1 using TF-IDF vectorization

  • Topic 1 combines all the aforementioned words into one topic.  This can be considered a “presentiment and crisis apparitions” topic.  

    This topic contained 95% of the top terms, rendering all other topics comparatively insignificant in generating the documents

The spreadsheets compare LDA and NMF model runs using TF-IDF vectorizations, however, with results limited to the 10 top words.  Although topic weights and distances are not available, some topics appear more meaningful and cohesive, and are likely more impactful than others

  • There is considerable overlap between topics 6 and 7 in the LDA model and together they form the presentiment and crisis apparitions topic. Topics 0 and 1 in the NMF model respectively appear to correspond to presentiment and crisis apparitions topics
  • The bigram “thought-transference” arises in both the LDA and NMF results and is associated with the “Society” for “Psychical” Research and the late F.W.H. “Myers” who invented the term “telepathy”

This project had an extended preparation and production pipeline.  However, results clearly show that unsupervised machine learning using LDA and NMF effectively and comprehensively summarized topical content in Phantasms of the Living.  Moreover, the topics approximately corresponded to the types of internalized and externalized telepathic experiences described in the book  

This project demonstrates the usefulness of topic generation models for finding meaningful patterns in masses of unlabeled or unstructured data.  Moreover, visualization and graphing tools are essential for fully comprehending these patterns. Elsewhere in parapsychology LDA or NMF could also be applied to survey data, case collections, web or social media content of interest.  

REFERENCES

Anaya, L. A. Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers. University of North Texas, 2011.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

Christou, D. (2016). Feature extraction using Latent Dirichlet Allocation and Neural Networks: A case study on movie synopses. arXiv preprint arXiv:1604.01272.

Deerwester, S. (1988). Improving information retrieval with latent semantic indexing.

Gurney, E., Myers, F. W., & Podmore, F. (1886). Phantasms of the Living (2 vols.). London: Trübner.  Reprinted at the Esalen Center.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825-2830.

Sievert, C., & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63-70).

IMAGES:

pyLDAvis Graph of Topic 1 (Count Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 2 (Count Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

pyLDAvis Graph of Topic 1 (TF-IDF Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

Top Words List: Latent Dirichlet Allocation (TF-IDF Vectorization) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

Top Words List: Non-Negative Matrix Factorization (Frobenius) from Phantasms of the Living Corpus. (2018, Mar 24). © Maryland Paranormal Research ®.  All rights reserved.

Presentiment and Crisis Apparitions in Victorian-era Case…

Presentiment and Crisis Apparitions in Victorian-era Case Collections:  Phantasms of the Living (1886) as seen through Natural Language Processing

Presentiment in the classical sense is a form of precognition involving a feeling,  perception or premonition that something will or is about to happen.  From experimental parapsychology, the term now refers to an effect involving capacities to feel or intuit the future.  This modern sense of presentiment (feeling) sets it apart from precognition (knowing)

Phantasms of the Living, published in 1886 by the Society for Psychical Research (SPR), contains a collection of 702 Victorian-era cases involving classical presentiment phenomena occurring in various forms to include: dreams and premonitions; clairvoyance; as well as crisis apparitional experiences   

Phantasms was also a pathfinding study on extrasensory perception as it introduced the term “telepathy.”  The authors believed presentiment involving living persons in moments of crisis or danger, to include crisis apparitions, were evidence of “shock-induced” forms of thought transference

As a project, we wondered how might how machine learning might make sense of classical presentiment experiences in the Phantasms case collection  

Natural Language Processing (NLP) is a field of computing that enables computers to analyze, understand and communicate human language.  

The Natural Language Toolkit (NLTK) is a platform of libraries and programs for natural language processing written in the Python programming language

Word clouds are the most basic and familiar NLP products.  Word clouds do not support fine grained analysis, but instead provide a visualization of key words and phrases,

where the sizing of words reflects their prominence within the text

Phantasms is available in various Internet archives in a variety of digital formats. For this project, a corpus (body) of plain text was created from a reprint of Phantasms hosted at the Esalen Center 

  • To derive meaningful insights the corpus was processed to remove stop words, such as commonly used prepositions.  Since the work represents a case collection, the set of stop words was expanded to remove references to “case(s)” and “fact(s)” and how they were documented, (i.e.words like “said ” and “told”).  Formal salutations (“Mr”, “Mrs”, “Miss) were also added to stop words
  • Since presentiment experiences can occur at any time, words having a temporal character were excluded from consideration to include: “day”, “night”, “morning”, “evening”, “hour”, and “time.”  Many presentiment experiences were singular events, hence ordinal and cardinal numeric references were also removed to include: “first”, “second” as well as “one” and “two”
  • Finally, some words often associated with precognition such as “will” and “may” were also added to set of stop words to allow greater emphasis on the features of presentiment experiences

The resulting word cloud presents a case pattern of presentiment experiences involving families and friends.  Mothers featured prominently in case accounts often as the percipient, and in other instances as the agent (or source) of the presentiment experience  

Many presentiment experiences were in the form of dreams or premonitions that were coincident with the death of loved ones. However the case collection also included other forms extrasensory perception (e.g. telepathy, clairvoyance and precognition) as well as crisis apparitional experiences 

The fact that mothers featured so prominently in these presentiment cases suggests a wider question, which arises elsewhere in parapsychology, of whether there is a natural (in the sense of evolutionary) explanation for presentiment phenomena and for psi functioning writ large. Toward this end, one of the authors of Phantasms concludes:

“If the natural system includes telepathy, Nature has certainly not exhausted herself in our few hundreds of instances: that these facts should be genuine would be almost inconceivable if she had not plenty more like them in reserve”

REFERENCES

Bem, D. J. (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of personality and social psychology, 100(3), 407.  Hosted on Semantic Scholar

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. “ O’Reilly Media, Inc.”.

Gurney, E., Myers, F. W. H., & Podmore, F. (1886). Phantasms of the living (Vols. 1-2). London: Trübner & Co. Reprint by the Esalen Center, Carmel CA

Radin, D. (2016). Presentiment. Psi Encyclopedia, Society for Psychical Research

IMAGE

Wordcloud from Phantasms of the Living Corpus. (2018, Feb 24). © Maryland Paranormal Research ®.  All rights reserved.

“Hello From Heaven!” on ABC’s 20/20 (April 12,…

“Hello From Heaven!” on ABC’s 20/20 (April 12, 1996). Documentary on After-Death Communication

Archival documentary footage from ABC’s 20/20 on After-Death Communication (ADC) experiences. Explores personal stories of ADC encounters and the incidence of ADC experiences in the general population

Why ghosts are good for you: Patricia Pearson at TEDx Tucson…

Why ghosts are good for you: Patricia Pearson at TEDx Tucson 2012

Award-winning journalist Patricia Pearson discusses her research after-death communication and near-death awareness experiences in advance of her book Opening Heaven’s Door: What the Dying May be Trying to Tell Us about where They’re Going. Random House Canada. (2014).

REFERENCE:

Pearson, P. (2014). Opening Heaven’s Door: What the Dying May be Trying to Tell Us about where They’re Going. Random House Canada. Sponsored

Opening Heaven’s Door: What the Dying Are Trying to Say About Where They’re Going

Opening Heaven’s Door: What the Dying Are Trying to Say About Where They’re Going:

Memoirs and research by award-winning journalist Patricia Pearson on after-death communication and near-death awareness experiences to include: crisis, grief and welcoming apparitions; terminal lucidity and sensed presences. 

The book explores the literature on death and dying in depth and explains shortcomings in understanding these experiences in clear terms. 

The book was also recently presented by the author in a Parapsychology Foundation book seminar.  Also see the TEDx Talks presentation – Why ghosts are good for you – for her perspectives  and insights on these phenomena.

REFERENCE:

Pearson, P. (2014). Opening Heaven’s Door: What the Dying May be Trying to Tell Us about where They’re Going.  Random House Canada. Sponsored