UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
20140106 qu seminar
1. Digital Libraries:
L
History, Technology,
T
R&D
Edward
A.
Fox
Professor,
Computer
Science,
Virginia
Tech
Blacksburg,
VA
24061
USA
fox@vt.edu
h�p://fox.cs.vt.edu
6
Jan.
2014
1
2. Outline
Acknowledgments
Introduc�on
History
Technology
Research
Development
Summary
and
Discussion
6
Jan.
2014
2
3. Sponsored
by
Qatar
University
&
Qatar
Na�onal
Library
HTTP://qnl.qa
HTTP://WWW.QU.EDU.QA/
Funding provided thru the ELISQ project:
Electronic Library Institute - SeerQ
HTTP://WWW.VT.EDU/
HTTP://WWW.PSU.EDU/
6
Jan.
2014
HTTP://WWW.TAMU.EDU/
3
4. ELISQ Project Team
P
T
Qatar
University,
Qatar:
Mohammed Samaka (Ph.D., Co-Lead PI)
Sumaya Ali S A Al-Maadeed (Ph.D., PI)
Myrna Tabet
Asad Nafees
Tahseena Moideen
Qatar
Na�onal
Library,
Qatar:
Claudia Lux (PI)
Krishna Roy Chowdhury
Postdoc - TBA
Virginia Tech, USA:
Edward Fox (Ph.D., Lead-PI)
Tarek Kanan
Penn. State University, USA:
C. Lee Giles (Ph.D., PI)
Sagnik Ray Choudhury
Texas A&M, USA:
Richard Furuta (Ph.D., PI)
Hamed Alhoori
Consultants:
John Impagliazzo (Ph.D., Key Investigator)
Susan Lukesh (Ph.D.)
This
project
was
made
possible
by
NPRP
Grant
#
4
-‐
029
-‐
1
–
007
from
Carole Thompson
the
Qatar
Na�onal
Research
Fund
(a
member
of
Qatar
Founda�on).
6
Jan.
2014
4
5. Acknowledgements
Dr.
Mazen
Hasna,
VP
and
Chief
Academic
Officer,
Qatar
University
Dr.
Rashid
Alammari,
Dean,
College
of
Engineering,
Qatar
University
Dr.
Moumen
Hasnah
,
Director
of
Academic
Research,
Qatar
University
Dr.
Claudia
Lux,
Qatar
Na�onal
Library
Director
Dr.
Imad
Bachir,
Qatar
University
Library
Director
Dr.
Munir
Tag,
Ac�ng
Director
Technical,
ICT
Program
Manager
(QNRF)
Ms.
Krishna
Roy
Chowdhury,
Associate
Director
for
Library
IT,
Qatar
Na�onal
Library
Prof.
Seb�
Foufou,
Head
of
Department
of
Computer
Science
and
Engineering,
Qatar
University
6. Addi�onal Thanks
T
Qscience
–
providing
collec�on:
Christopher J. Leonard, Editorial Director
Paul Coyne, CTO
US
Na�onal
Science
Founda�on
(recent
and
current
grants
to
Fox):
IIS-‐1319578
IIS-‐0916733
DUE-‐0840719
OCI-‐1032677
plus
those
to
PSU,
TAMU
6
Jan.
2014
6
7. Outline
Acknowledgments
Introduc�on
History
Technology
Research
Development
Summary
and
Discussion
6
Jan.
2014
7
8. Introduc�on
Reasons
to
be
here
Interested
Find
what
to
do
with
your
content
Find
how
to
help
your
user
community
h�p://www.morganclaypool.com/toc/icr/1/1
1.
DL
Introduc�on,
5S
framework
(2012)
2.
DL
Quality,
Integra�on
(2013)
3.
DL
Technologies
(in
press)
4.
DL
Applica�ons
(in
press)
6
Jan.
2014
8
18. Informal 5S DL Defini�ons
5 D D
DLs
are
complex
systems
that:
help
sa�sfy
info
needs
of
users
(socie�es)
provide
info
services
(scenarios)
organize
info
in
usable
ways
(structures)
present
info
in
usable
ways
(spaces)
communicate
info
with
users
(streams)
18
19. Informa�on Life Cycle
L C
Authoring
Modifying
Using
Creating
Organizing
Indexing
Retention
/ Mining
Storing
Retrieving
Accessing
Filtering
Distributing
Networking
6
Jan.
2014
19
21. SeerSuite is Not Google
i N G
Metadata
(as
in
library
catalogs)
as
well
as
content
Sets
of
collec�ons,
rather
than
the
Web
as
a
whole
Provided
by
a
curator
(e.g.,
publisher,
museum)
Provided
by
user
submissions
Or
collected
by
focused
‘crawling’
Tailored
services,
rather
than
the
same
for
everyone
Browsing
using
categories,
preserving,
adding
value
Based
on
studying
user
requirements,
e.g.,
chemists
Working
with
en��es,
rather
than
just
words
Cita�ons,
tables,
figures,
names,
chemical
formula
Using
knowledge
bases,
machine
learning,
ar�ficial
intelligence
6
Jan.
2014
21
22. Outline
Acknowledgments
Introduc�on
History
Technology
Research
Development
Summary
and
Discussion
6
Jan.
2014
22
23. History Overview
O
1991,
esp.
from
Informa�on
Retrieval
Connec�ng
computer,
library,
and
informa�on
science
communi�es
NSF
DL
Ini�a�ve
1
in
1994
included
funding
for
Stanford,
where
Google
was
prototyped
Interna�onal
conferences
in
the
Americas
(JCDL),
as
well
as
Europe
(TPDL,
by
DELOS),
Asia
(ICADL)
Publishers:
ACM,
…
DOIs,
(Ins�tu�onal)
Repositories
Spinoffs:
content
&
courseware
management
systems
Recently
including
(linked)
data
6
Jan.
2014
23
26. Ins�tu�onal Repositories
R
“Ins�tu�onal
repositories
are
digital
collec�ons
that
capture
and
preserve
the
intellectual
output
of
a
single
university
or
a
mul�ple
ins�tu�on
community
of
colleges
and
universi�es.”
Crow,
R.
“Ins�tu�onal
repository
checklist
and
resource
guide”,
SPARC,
Washington,
D.C.,
USA
www.arl.org/sparc/IR/IR_Guide_v1.pdf
6
Jan.
2014
26
27. NDLTD: www.ndltd.org
w
Networked
Digital
Library
of
Theses
and
Disserta�ons
(NDLTD)
Vision:
Every
thesis
and
disserta�on
in
the
world
is:
o Devised
to
take
advantage
of
the
most
helpful
electronic
publishing
methods
o Shared
globally
and
easily
found
o Supported
by
a
suite
of
digital
library
services
to
aid
authors,
researchers,
learners,
universi�es
o Preserved
and
migrated
permanently
6
Jan.
2014
27
28. Crisis, Tragedy, and Recovery (CTR) Network /
T
a R
(
N
/
Integrated Digital Event Archive & Library (IDEAL)
D
E
A
& L
(
Human
tragedies
that
result
from
man-‐made
and
natural
events
affect
humans
and
communi�es
significantly.
During
and
a�er
a
tragic
event,
there
are
a
series
of
needs
that
have
to
be
addressed.
o Compounded
by
communica�on
failures
and
a
confusing
plethora
of
data
and
informa�on
6
Jan.
2014
28
30. CTRnet
(Crisis,
Tragedy
&
Recovery
Net)
Word
Clouds
of
Japan
Earthquake
and
Libya
Revolu�on
(using
tweets)
Japan
Earthquake,
Tsunami
Disaster
Updated
every
10
minutes
Libya
Revolu�on
30
33. — CINET:
Network
Science
Middleware
Netviz:
Course
project
aims
to
develop
a
visualiza�on
component
for
CINET
which
contains
large
network
graphs.
The
visualiza�on
service
will
get
Networks
from
CINET,
convert
from
Galib
to
Gexf
format,
then
visualize
the
graphs
using
Gelphi.
CINET
network
displayed
using
Gephi
33
34. Outline
Acknowledgments
Introduc�on
History
Technology
Research
Development
Summary
and
Discussion
6
Jan.
2014
34
35. Web Archiving
A
Introduc�on:
Web
archiving
is
the
process
of
gathering
up
data
recorded
on
the
World
Wide
Web,
storing
it,
ensuring
the
data
is
preserved
in
an
archive,
and
making
the
collected
data
available
for
future
research.
The
Internet
Archive
and
several
na�onal
libraries
ini�ated
Web
archiving
prac�ces
in
1996.
6
Jan.
2014
35
36. Crawler (Heritrix)
(
(for search engines & Web archives)
s
e
& W a
A
Web
crawler
starts
with
a
list
of
URLs
to
visit,
called
the
seeds.
On
those
page,
iden�fies
all
the
hyperlinks
adds
them
to
the
list
of
URLs
to
visit
recursively
visits
pages
pointed
to
according
to
a
set
of
policies.
Priori�zes
its
downloads
–
some
pages
change
o�en.
6
Jan.
2014
36
37. Focused Crawlers
C
For
a
par�cular
topic
or
event
to
build
a
Web
collec�on
focused
in
that
area
Start
with
URLs
of
interest,
viewed
as
seeds
to
grow
from
Expand
in
a
‘smart’
way
to
get
all
and
only
what
is
relevant
Use
informa�on
retrieval
/
ar�ficial
intelligence
/
machine
learning
o Require
‘knowledge
bases’
and/or
human
training
examples
Nevertheless,
there
is
a
tradeoff
between
the
resul�ng
o Recall
(i.e.,
coverage
of
what
is
out
there)
o Precision
(i.e.,
freedom
from
noise
in
what
is
collected)
6
Jan.
2014
37
38. SeerSuite Instan�a�ons
I
CiteSeerx
http://citeseerx.ist.psu.edu
A scientific literature digital library and search engine
ChemXSeer
http://chemxseer.ist.psu.edu
Portal for researchers in environmental chemistry
integrating the scientific literature with experimental,
analytical, and simulation results and tools
ArchSeer
http://archseer.ist.psu.edu/
Archeology literature
TableSeer
ANY fields with tables
6
Jan.
2014
38
39. CiteSeerX
h�p://citeseerx.ist.psu.edu
CiteSeerX
crawls
researcher
homepages
on
the
web
for
scholarly
papers,
formerly
in
computer
science
Converts
PDF
to
text
Automa�cally
extracts
OAI
metadata
and
other
data
Automa�c
cita�on
indexing,
links
to
cited
documents,
crea�on
of
document
page,
author
disambigua�on
So�ware
open
source
–
can
be
used
to
build
other
such
tools
3
M
documents
Ms
of
files
60
M
cita�ons
3
to
6
M
authors
2
to
4
M
hits
day
100K
documents
added
monthly
800K
individual
users
several
Tbytes
6
Jan.
2014
39
42. SeerSuite
Tool
kit
used
to
build
search
engines
and
digital
libraries
CiteSeerX
,
MyCiteSeerX
,
ChemXSeer,
ArchSeer,
AlgoSeer,
AckSeer,
BizSeer,
CSSeer,
CollabSeer,
RefSeer,
GrantSeer,
SeerSeer,
YouSeer,
etc.
Built
on
commercial
grade
open
source
tools
(Solr/Lucene)
Penn
State
exper�se
–
automated
specialized
metadata
extrac�on
Supports
research
in
Indexing
and
search
Data
mining
&
structures
Informa�on
and
knowledge
extrac�on
Social
networks:
Name/en�ty
disambigua�on
Scientometrics/infometrics
Systems
engineering
User
interface
design
(HCI
=
human-‐computer
interac�on)
So�ware
engineering
and
management
43. ChemXSeer Highlights
Portal for academic researchers in chemistry which integrates the scientific
literature with experimental, analytical and simulation results and tools
Provides unique metadata extraction, indexing and searching pertinent to the
chemical literature by using heuristics combined with machine learning
Chemical formulae and names
Tables
Figures
Publication functions as in CiteSeerX
Expert and expertise search.
After extraction, data stored API accessible xml for users.
Hybrid repository: Serves as a federated information interoperational system
Scientific papers crawled and indexed from the web
User submitted papers and datasets (e.g. excel worksheets, Gaussian and CHARMM
toolkit outputs)
Scientific documents and metadata from publishers, web or archives.
Access control for proprietary provided content and user-submitted
experiment data
Takes advantage of in-house open source projects such as CiteSeerX/
Seersuite.
47. Infrastructure -‐ PSU
-‐ P
Computers,
so�ware,
launching
infrastructure
at:
QU:
powerful
server,
now
crawling
+
ready
to
help
any
group
interes�ng
in
cura�ng
a
collec�on
VT,
QNL
(postdoc),
QCRI
(Prof.
Mitra),
…
Adapt
to
disciplines,
interes�ng
parts
of
documents
Adapt
to
each
collec�on
Develop
knowledge
base
and
heuris�cs
for
the
coll.
Change
document
parser
Change
database
to
match
what
occurs
Change
extractors
:
document
-‐>
database
6
Jan.
2014
47
48. Arabic -‐ VT
-‐ V
Handle
Arabic
text
documents
Obtain
a
suitable
category/classifica�on
system
Have
people
provide
‘training
set’
Use
machine
learning
to
automa�cally
classify
future
Arabic
text
documents
Support
cross-‐language
informa�on
retrieval
Arabic
ques�on
against
English
documents
English
ques�on
against
Arabic
documents
6
Jan.
2014
48
49. Arabic Handwri�ng -‐ QU
H
-‐ Q
Images
of
historic
documents
Arabic
text
extracted
Mapping
from
a
part
of
the
text
to
the
corresponding
part
of
the
image
Special
tools
for
Those
processing
the
original
documents
Those
doing
research
with
the
collec�on
Will
allow
work
on
non-‐textual
collec�ons
too,
e.g.,
museum
images,
set
of
photos
for
teaching
architecture
6
Jan.
2014
49
50. Accessible Collec�ons in Qatar -‐ QNL
C
i Q
-‐ Q
What
collec�ons
have
the
highest
priority?
What
special
handling
is
needed
for
each
class,
for
each
subclass
of
collec�on
type?
How
do
DLs
best
fit
into
the
ac�vi�es
of
the
Na�onal
Library?
Can
.qa
be
fully
archived
for
Wayback
Machine
use?
6
Jan.
2014
50
51. Outline
Acknowledgments
Introduc�on
History
Technology
Research
Development
Summary
and
Discussion
6
Jan.
2014
51
52. RELATED
TOPICS
CORE DL
TOPICS
COURSE
STRUCTURE
DL Curriculum Framework
C
F
Semester 1:
DL collections:
development/creation
Digitization
Storage
Interchange
Metadata
Cataloging
Author
submission
Digital objects
Composites
Packages
Semester 2:
DL services and
sustainability
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Spaces
(conceptual,
geographic,
2/3D, VR)
Documents
E-publishing
Markup
Multimedia
streams/structures
Capture/representation
Compression/coding
Bibliographic
information
Bibliometrics
Citations
Content-based
analysis
Multimedia
indexing
Naming
Repositories
Archives
Services
(searching,
linking,
browsing, etc.)
Archiving and
preservation
Integrity
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Thesauri
Ontologies
Classification
Categorization
Info. Needs
Relevance
Evaluation
Effectiveness
Intellectual property
rights mgmt.
Privacy
Protection (watermarking)
Routing
Filtering
Community
filtering
Search & search strategy
Info seeking behavior
User modeling
Feedback
Multimedia
presentation,
rendering
6
Jan.
2014
Info
summarization
Visualization
52
55. h�p://elisq.qu.edu.qa/
ELISQ Audience (Users)
A
(
Primary:
o
o
o
o
Librarians
and
libraries
in
Qatar
Researchers
and
academics
Government
organiza�ons
Non-‐Governmental
organiza�ons
(such
as
h�p://www.fsd.org.qa/)
Secondary:
o
o
o
o
o
University
/
School
Students
Teachers
/
Faculty
Managers
Qatari
ci�zens
Other
stakeholders
6
Jan.
2014
55
56. ELISQ Project ((1 of 2)
P
o 2
Project
Objec�ves/Aims
A. Research
and
prototype
digital
library
systems
and
infrastructure
for
Qatar,
focusing
ini�ally
on
Qatari
informa�on
related
to
government
and
scholarly
ac�vi�es.
Leverage
the
crawling
engine
from
Penn
State‘s
SeerSuite
so�ware
infrastructure,
and
extend
it
beyond
its
current
focus
on
English
to
support
Arabic-‐English
collec�ons,
and
to
cover
a
broad
range
of
scholarly
disciplines,
and
all
types
of
government
informa�on.
6
Jan.
2014
56
57. ELISQ Project ((2 of 2)
P
o 2
Project
Objec�ves/Aims
(con�nued)
B. Research
and
build
the
digital
library
community
in
Qatar,
suppor�ng
digital
library
use,
services,
collec�on
development,
tailored
systems,
and
advancing
toward
a
Knowledge
Society.
Study
scholarly
ac�vi�es,
and
engage
in
community
building
in
Qatar,
so
DLs
can
be
tailored
to
specific
domains
and
to
the
unique
needs
of
Qatar.
Through
workshops,
a
consul�ng
center
at
the
proposed
Ins�tute,
and
collabora�ve
efforts
with
libraries
and
museums
in
Qatar,
we
will
iden�fy
par�cular
needs
and
uses,
and
tailor
collec�ons,
systems,
and
services,
to
lead
toward
the
Qatari
Knowledge
Society.
6
Jan.
2014
57
58. Significance to Librarians, Corpora�ons,
t L
C
and Governmental Agencies
G
A
The
need
to
preserve
cultural
and
historical
heritage
=>
o Collec�ons
of
fragile
and
precious
ar�facts
=>
o Libraries,
museums,
and
archives
developing
digital
collec�ons
=>
o Users
from
all
over
the
world
accessing
and
studying
A
one
stop
search
of:
o Informa�on
about
Qatar
o Informa�on
to
preserve
the
culture
of
Qatar
Deep
indexing,
analysis,
and
retrieval
of:
o Resources,
reports,
sta�s�cs,
and
other
types
of
informa�on
o Informa�on
in
the
Arabic
language
as
well
as
in
English
6
Jan.
2014
58
59. ELISQ Content
C
Metadata,
data,
and
many
types
of
documents
(including
full
text)
Qatari
resources
that
first
appeared
in
digital
form
-‐
‘born’
digital
At
a
later
stage
the
project
will
include:
o Digital
versions
of
material
already
exis�ng
in
print
o Mul�media
(image,
audio,
video)
forms
Free
and
open
as
well
as
content
with
limited
access
6
Jan.
2014
59
60. ELISQ Focus
F
Community
in
Qatar
Iden�fy
interested
stakeholders,
to
tailor
to
needs
Train
next
genera�on
of
digital
librarians,
archivists,
and
curators
Partners
helping
with
addi�onal
collec�on
development
Advanced
Technology
for
Enhanced
Access
“Low
hanging
fruit”
by
crawling
Qatar-‐related
Web
Improved
analysis
(cita�ons,
tables,
chemicals,
…)
Support
for
both
Arabic
and
English
6
Jan.
2014
60
61. Outline
Acknowledgments
Introduc�on
History
Technology
Research
Development
Summary
and
Discussion
6
Jan.
2014
61
62. Summary (some highlights)
(
h
Introduc�on
to
digital
libraries:
5S,
any
content
History:
since
1991,
Google,
repositories
Technology:
SeerSuite,
Heritrix,
Solr,
HCI
Ini�al
collec�ons:
Qscience,
news,
…
Research:
extend
SeerSuite;
Arabic
Adapt
other
tools
for
handwri�ng
collec�on,
non-‐text
collec�ons
Development:
consul�ng
center
(addressing
needs)
6
Jan.
2014
62
63. Ques�ons for You
f Y
What
communi�es
should
be
served?
What
collec�ons
should
be
made
accessible?
What
services
are
required?
What
are
the
priori�es
in
the
above?
Can
you
help
us
find
suitable
partners,
content
owners,
curators,
user
groups?
6
Jan.
2014
63
64. Ques�ons for Us?
f U
h�p://elisq.qu.edu.qa/
fox@vt.edu
h�p://fox.cs.vt.edu
6
Jan.
2014
64