JSTOR Labs is developing a new text mining platform for JSTOR, its sister organization Portico, and other corpora. While text mining has the potential to revolutionize research across disciplines, it requires coding skills and statistical knowledge that may take years to learn. JSTOR Labs has tried to mitigate this problem through a new platform for creating, visualizing, and linking datasets within a hosted JupyterHub environment, which incorporates popular code packages for topic modeling, sentiment analysis, and more. The platform allows users to start text mining without the hassle of configuring an environment. It also provides an opportunity for common infrastructure for teaching text mining: the platform will feature a library of open education resources—Jupyter notebooks with accompanying lesson plans—which will make it easier to teach and learn text mining, without hiding complexity or nuance.
2. At ITHAKA, our passion drives us to make the world smarter. Our mission comes
to life in four service areas.
JSTOR One of the world's
leading academic
databases, JSTOR powers
the research and learning
of 6 million users each
month.
Ithaka S+R Ithaka S+R
provides research and
strategic guidance to help
the academic and cultural
communities serve the public
good and navigate
economic, technological, and
demographic change.
Portico Portico, a
community-supported digital
archive, preserves over 1
million e-books and e-
journals for future scholars.
Artstor Artstor provides 2+
million high-quality images
and digital asset
management software to
enhance scholarship and
teaching.
3. JSTOR Labs works with partner publishers,
libraries and labs to create tools for
researchers, teachers and students that are
immediately useful – and a little bit magical.
9. Other kinds of infrastructure…
We are working with NISO to initiate a group to define a standard
for non-consumptive text analytics datasets
We received an NEH grant to launch:
The Text Analytics Pedagogy Institute
Spring 2021 at U of Virginia
Summer 2022 at Arizona U
As a quick reminder, Portico and JSTOR are members of a family of services offered by ITHAKA. I suspect JSTOR is well known to all of you. Portico is a little less known and is a dark archive. We calculate that we are preserving about half of all published scholarly research -- at least, a little over half of CrossRef. <next>
I should remind everyone that JSTOR has a free text mining site - called Data for Research - that has been available for the past decade. The site gets used quite a bit, but only provides the basics, and the corpus is limited to JSTOR content.
We have learned a lot from the requests we get for text and data mining, and that has informed the next iteration of what we are building <next>