BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Berkeley Graduate Division - ECPv6.15.18//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Berkeley Graduate Division
X-ORIGINAL-URL:https://grad.berkeley.edu
X-WR-CALDESC:Events for Berkeley Graduate Division
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20220313T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20221106T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20230312T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20231105T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20230927T140000
DTEND;TZID=America/Los_Angeles:20230927T170000
DTSTAMP:20260409T164336
CREATED:20230919T235215Z
LAST-MODIFIED:20230919T235215Z
UID:41347-1695823200-1695834000@grad.berkeley.edu
SUMMARY:Python Text Analysis Fundamentals: Part 2
DESCRIPTION:This two-part workshop series will prepare participants to move forward with research that uses text analysis\, with a special focus on humanities and social science applications. \n\nPart 1: Preprocessing Text.  How do we standardize and clean text documents? Text data is noisy\, and we often need to develop a pipeline in order to standardize the data\, to better facilitate computational modeling. In the first part of this workshop\, we walk through possible steps in this pipeline using tools from basic Python\, NLTK\, and spaCy in order to preprocess and tokenize our data.\nPart 2: Bag-of-words Representations How do we convert text into a representation that we can operate on computationally? This requires developing a numerical representation of the text. In this part of the workshop\, we study one of the foundational numerical representation of text data: the bag-of-words model. This model relies heavily on word frequencies in order to characterize text corpora. We build bag-of-words models\, and their variations (e.g.\, TF-IDF)\, and use these representations to perform classification on text.\n\nTo continue with Text Analysis sign up for Topic Modeling or Word Embeddings.  \n\nPart 3: Topic Modeling. How do we identify topics within a corpus of documents? In this part\, we study unsupervised learning of text data. Specifically\, we use topic models such as Latent Dirichlet Allocation and Non-negative Matrix Factorization to construct “topics” in text from the statistical regularities in the data.\nPart 4: Word Embeddings How can we use neural networks to create meaningful representations of words? The bag-of-words is limited in its ability to characterize text\, because it does not utilize word context. In this part\, we study word embeddings\, which were among the first attempts to use neural networks to develop numerical representations of text that incorporate context. We learn how to use the package gensim to construct and explore word embeddings of text.\n\nThe first two parts are taught as a joint series. Parts 3 and 4 can be attended “a la carte”; however\, prior knowledge of Parts 1 and 2 is assumed. \nPrerequisites: D-Lab’s Python Fundamentals introductory series or equivalent knowledge. \nWorkshop Materials: https://github.com/dlab-berkeley/Python-Text-Analysis \nSoftware Requirements:Installation Instructions for Python Anaconda
URL:https://grad.berkeley.edu/event/python-text-analysis-fundamentals-part-2/
LOCATION:Online via Zoom
END:VEVENT
END:VCALENDAR