SlideShare a Scribd company logo
1 of 41
Download to read offline
Television	
  News	
  Search	
  and	
  
 Analysis	
  with	
  Lucene/Solr	
  

    Kai	
  Chan	
  <kai@ssc.ucla.edu>	
  
     Social	
  Sciences	
  CompuAng	
  
 University	
  of	
  California,	
  Los	
  Angeles	
  
                          	
  
  Lucene	
  RevoluAon,	
  May	
  10,	
  2012	
  
CommunicaAon	
  Studies	
  Archive	
  
           Background	
  (1)	
  
•  ConAnuaAon	
  of	
  analog	
  recording	
  of	
  TV	
  news	
  
   –  Thousands	
  of	
  tapes	
  since	
  Watergate/1970s	
  
   –  Hard	
  to	
  look	
  for	
  a	
  parAcular	
  news	
  program	
  or	
  
      topic	
  




                                                                                 1
CommunicaAon	
  Studies	
  Archive	
  
          Background	
  (2)	
  
•  Digital	
  recording	
  since	
  2005	
  
•  Capture	
  news	
  programs	
  on	
  computers	
  
   –  Video:	
  can	
  be	
  streamed	
  over	
  the	
  Web	
  
   –  Closed	
  capAoning	
  (“subAtle	
  text”):	
  indexed	
  and	
  
      searchable	
  
   –  Image	
  snapshots	
  
   –  Search	
  engine	
  and	
  analysis	
  tools	
  



                                                                          2
CommunicaAon	
  Studies	
  Archive	
  
           Background	
  (3)	
  
•  Also	
  download	
  transcripts	
  and	
  web-­‐streamed	
  
   news	
  programs	
  
•  100	
  news	
  programs	
  and	
  600,000	
  words	
  added	
  
   each	
  day	
  




                                                                 3
CommunicaAon	
  Studies	
  Archive	
  
           Background	
  (4)	
  
•  January	
  2005	
  to	
  present	
  
    –  28	
  networks	
  
    –  1,600	
  shows	
  
    –  130,000	
  hours	
  
    –  160,000	
  news	
  programs	
  
    –  50,000,000	
  images	
  
    –  880,000,000	
  words	
  



                                               4
Why	
  This	
  is	
  Important	
  (1)	
  
                                 	
  
•  Researchers	
  
   –  Large	
  and	
  unique	
  collecAon	
  of	
  communicaAon	
  
   –  Many	
  modaliAes	
  
       •  Speech,	
  facial	
  expression,	
  body	
  gesture,	
  etc.	
  
   –  Different	
  condiAons/secngs	
  
   –  Different	
  networks	
  and	
  communiAes	
  
   –  Allows	
  study	
  of	
  TV	
  news	
  +	
  communicaAon	
  in	
  
      general	
  in	
  ways	
  impossible	
  before	
  


                                                                             5
Why	
  This	
  is	
  Important	
  (2)	
  
                                  	
  
•  Non-­‐researchers	
  
   –  TV	
  news	
  about	
  presentaAon	
  and	
  persuasion	
  
       •  Which	
  happen	
  in	
  daily	
  life	
  also	
  
   –  TV	
  main	
  source	
  of	
  news	
  for	
  many/most	
  
   –  Greatly	
  affects	
  the	
  public’s	
  decisions	
  
   –  Learn	
  about	
  what	
  we	
  watch	
  




                                                                    6
7
8
9
10
11
13
ApplicaAon	
  in	
  Research	
  
                        	
  
•  CommunicaAon	
  Studies	
  
   –  Amount	
  of	
  coverage	
  for	
  events	
  over	
  Ame	
  
•  LinguisAc	
  
   –  Speech	
  and	
  language	
  pagerns	
  
•  Computer	
  Science	
  
   –  Object	
  idenAficaAon	
  
   –  IdenAfy	
  news	
  anchors,	
  public	
  figures	
  
   –  Story	
  segmentaAon	
  

                                                                     14
ApplicaAon	
  in	
  Teaching	
  (1)	
  
                          	
  
•  Chicano	
  Studies:	
  RepresentaAons	
  of	
  LaAnos	
  
   on	
  the	
  Television	
  News	
  
   –  May	
  1,	
  2007	
  immigraAon	
  march	
  
   –  MacArthur	
  Park,	
  Los	
  Angeles,	
  CA	
  
   –  2	
  days	
  (May	
  1	
  &	
  2,	
  2007)	
  
   –  Framing,	
  stereotyping,	
  metaphor,	
  silencing	
  
   –  reports	
  with	
  screenshots	
  and	
  links	
  to	
  news	
  stories	
  



                                                                               15
ApplicaAon	
  in	
  Teaching	
  (2)	
  
                           	
  
•  CommunicaAon	
  Studies:	
  PresidenAal	
  
   CommunicaAon	
  
   –  2008	
  presidenAal	
  primary	
  
   –  6	
  weeks	
  (Dec	
  2007	
  to	
  Feb	
  2008)	
  
   –  Coverage	
  of	
  sound	
  bites	
  
        •  Amount	
  of	
  Ame	
  given	
  to	
  candidate/party	
  
        •  Types	
  of	
  response	
  (posiAve,	
  neutral,	
  negaAve)	
  
   –  Students	
  created	
  their	
  own	
  poliAcal	
  ad.	
  


                                                                              16
Work	
  flow	
  (1)	
  
         Capture/conversion	
  machines	
  
•  2	
  groups,	
  2	
  machines	
  per	
  group	
      Capture/     Backup
    –  Keep	
  the	
  best	
  recording	
              conversion    storage
                                                        machines      server
    –  6	
  TV	
  tuners	
  per	
  machine	
  
•  Capture	
  video	
  and	
  CC	
  to	
  
   separate	
  files	
  in	
  real-­‐Ame	
               Storage/
                                                         control
                                                                     Image
    –  MPEG-­‐TS	
  (~7	
  GB/hr)	
                      server
                                                                     server

    –  Timestamp	
  every	
  2-­‐3	
  seconds	
  
•  Generate	
  image	
  snapshots	
                                   Video
                                                        Search
•  Convert	
  videos	
                                  server
                                                                    streaming
                                                                      server
    –  MP4/H.264	
  (VGA,	
  ~240	
  MB/hr)	
  

                                                                        17
Work	
  flow	
  (2)	
  
             Storage/staAc	
  file	
  servers	
  
•  Control	
  server	
  	
                          Capture/     Backup
                                                   conversion    storage
    –  Download	
  TV	
  schedules	
                machines      server
    –  Download	
  web-­‐streamed	
  news	
  
       programs	
                                   Storage/
                                                                 Image
    –  Collect	
  and	
  check	
  recordings	
       control
                                                                 server
                                                     server
    –  Pushes	
  files	
  to	
  places	
  
•  Video	
  streaming	
  server	
                                 Video
                                                    Search
•  Backup	
  storage	
  server	
                    server
                                                                streaming
                                                                  server
•  Image	
  server	
  
                                                                    18
Work	
  flow	
  (3)	
  
                         Search	
  server	
  
•  Lucene	
  index	
  updated	
  daily	
              Capture/     Backup
                                                     conversion    storage
   –  Main	
  text	
  field	
  tokenized	
             machines      server

   –  Separate	
  fields	
  for	
  date,	
  
      network,	
  show,	
  etc.	
                     Storage/
                                                                   Image
                                                       control
   –  Binary	
  fields	
  for	
  segment	
  and	
       server
                                                                   server

      Ame	
  data	
  
•  Hosts	
  search	
  engine	
                        Search
                                                                    Video
                                                                  streaming
                                                      server
                                                                    server



                                                                      19
The	
  search	
  process	
  
                               	
  
 Video server                               Retrieve thumbnails   Image server
                   Watch videos               and montages         Web server
  Video files
                                                                    (Apache)
Video streaming                                                    Thumbnail
server (Wowza)                       User                         & montages

                                       Perform searches

                             Search server
                Web server   Custom code (PHP)      front end


                    PHP-Java Bridge or Solr         bridge


                Custom code (Java)      Lucene      back end
                MySQL database    Lucene index
                                                                           20
Custom	
  query	
  type	
  
           Segment-­‐enclosed	
  query	
  (1)	
  
•  Problem	
  1:	
  search	
  for	
  “X	
  near	
  Z”	
  
•  Lucene:	
  search	
  for	
  “X	
  within	
  Y	
  words	
  of	
  Z”	
  
    –  How	
  to	
  pick	
  Y?	
  
    –  Hard	
  to	
  pick	
  a	
  fixed	
  number	
  




                                                                            21
Custom	
  query	
  type	
  
         Segment-­‐enclosed	
  query	
  (2)	
  
•  Problem	
  2:	
  all	
  matched	
  search	
  words	
  might	
  
   not	
  be	
  talking	
  about	
  same	
  story	
  
   –  E.g.	
  “Obama	
  AND	
  visit	
  AND	
  Afghanistan”	
  
   –  Might	
  match	
  a	
  news	
  program	
  about	
  Obama’s	
  visit	
  
      to	
  Canada	
  +	
  violence	
  in	
  Afghanistan	
  




                                                                           22
Custom	
  query	
  type	
  
          Segment-­‐enclosed	
  query	
  (3)	
  
•  A	
  news	
  program	
  can	
  contain	
  several	
  stories	
  
    –  E.g.	
  Local,	
  naAonal,	
  world,	
  weather,	
  sports	
  




                                                                        23
Custom	
  query	
  type	
  
    Segment-­‐enclosed	
  query	
  (4)	
  
 local story 1
 local story 2
 commercials
national story 1
national story 2
   weather 1
 commercials
 world story 1
 world story 2
   weather 2
 commercials
     health
 entertainment
     sports                                  24
Custom	
  query	
  type	
  
           Segment-­‐enclosed	
  query	
  (5)	
  
•  One	
  soluAon:	
  search	
  for	
  “X	
  and	
  Z	
  within	
  same	
  
   story	
  segment”	
  
    –  Possible	
  with	
  Lucene	
  +	
  story	
  segment	
  info	
  
•  Bonus:	
  enables	
  searching/filtering	
  for	
  a	
  
   parAcular	
  story	
  type	
  
    –  E.g.	
  PoliAcs	
  




                                                                          25
Custom	
  query	
  type	
  
         Segment-­‐enclosed	
  query	
  (6)	
  
•  How	
  to	
  mark	
  segments	
  
   –  Automated	
  
       •  Computer	
  Science	
  researchers	
  working	
  on	
  them	
  
       •  Word	
  frequency	
  
       •  Scene	
  change	
  
       •  Black	
  frame	
  and	
  silence	
  
   –  Manual	
  segmentaAon	
  
       •  Watch	
  the	
  video	
  
       •  Decide	
  where	
  a	
  story	
  starts	
  and	
  ends	
  
       •  Mark	
  posiAons	
  in	
  semi-­‐automated	
  system	
  

                                                                            26
Custom	
  query	
  type	
  
          Segment-­‐enclosed	
  query	
  (7)	
  
seg. 1              seg. 1      seg. 2      seg. 2   seg. 3   seg. 3
begin                end        begin        end     begin     end


         span 1


                  span 2


                       span 3


                                         span 4


                                   span 5

                                                                   27
Custom	
  query	
  type	
  
              Segment-­‐enclosed	
  query	
  (8)	
  
•  Idea	
  
    –  Get	
  spans	
  from	
  SpanNearQuery	
  
    –  Filter	
  and	
  keep	
  those	
  fully	
  within	
  segments	
  
•  In	
  producAon:	
  segment	
  info	
  in	
  stored	
  fields	
  
    –  As	
  a	
  list	
  of	
  <start	
  posiAon,	
  end	
  posiAon>	
  
    –  Simple	
  to	
  implement	
  
    –  Reasonably	
  fast	
  searching	
  
•  AlternaAve:	
  store	
  segment	
  info	
  as	
  terms	
  
    –  Possible	
  to	
  find	
  segments	
  by	
  themselves	
  
    –  Appears	
  to	
  run	
  much	
  faster	
  

                                                                            28
Custom	
  query	
  type	
  
                 Time-­‐enclosed	
  query	
  
          20 s    25 s      30 s    35 s    40 s     45 s   50 s   55 s   60 s



<= 20 s                   span 1


<= 15 s                  span 2


<= 10 s                            span 3


<= 35 s                                      span 4


<= 25 s                                            span 5

                                                                           29
Custom	
  query	
  type	
  
     MulA-­‐term	
  regular	
  expression	
  (1)	
  
•  “here	
  is	
  _	
  _	
  _	
  with	
  the	
  (news|story|details|
   report)”	
  
•  Apply	
  RegEx	
  to	
  a	
  phrase	
  or	
  sentence	
  
    –  Not	
  just	
  individual	
  words	
  
•  Lucene	
  core	
  has	
  regular	
  expression	
  query	
  
   support	
  
    –  Good	
  starAng	
  point	
  
    –  Not	
  a	
  complete	
  soluAon	
  for	
  us	
  

                                                                       30
Custom	
  query	
  type	
  
    MulA-­‐term	
  regular	
  expression	
  (2)	
  
•  Problems	
  
   –  Some	
  analyzers	
  do	
  not	
  work	
  with	
  RegEx	
  
   –  Lucene’s	
  RegEx	
  query	
  classes	
  only	
  apply	
  RegEx	
  to	
  
      individual	
  terms	
  
        •  Want	
  to	
  match	
  a	
  pagern	
  against	
  a	
  phrase/sentence	
  
        •  Want	
  placeholders	
  for	
  whole	
  words	
  (not	
  just	
  characters)	
  
   –  Term(fieldName,	
  “.*”)	
  matches	
  all	
  terms,	
  and	
  all	
  
      documents,	
  and	
  all	
  posiAons	
  in	
  the	
  index	
  
        •  very	
  slow	
  
        •  takes	
  lots	
  of	
  memory	
  

                                                                                              31
Custom	
  query	
  type	
  
     MulA-­‐term	
  regular	
  expression	
  (3)	
  
•  What	
  we	
  did	
  
    –  Parse	
  and	
  translate	
  mulA-­‐term	
  RegEx	
  into	
  Lucene	
  
       built-­‐in	
  queries	
  (SpanNearQuery,	
  RegexQuery)	
  
         •  E.g.	
  “here	
  is	
  _	
  _	
  _	
  with	
  the”	
  =	
  “here	
  is”	
  followed	
  by	
  “with	
  
            the”	
  (with	
  exactly	
  3	
  terms	
  in	
  between)	
  
    –  Leading	
  and	
  trailing	
  placeholders	
  
         •    E.g.	
  “_	
  _	
  is	
  the	
  _	
  _	
  _”	
  
         •    Preserve	
  for	
  correctness	
  
         •    Store	
  word	
  count	
  for	
  each	
  document	
  
         •    Expand	
  each	
  span	
  on	
  both	
  sides	
  
         •    Bounds	
  checking	
  


                                                                                                                     32
Custom	
  query	
  type	
  
     MulA-­‐term	
  regular	
  expression	
  (4)	
  
•  Regular	
  expression	
  libraries	
  differ	
  in	
  
    –  Syntax	
  (e.g.	
  Perl	
  5-­‐compaAble)	
  
    –  CapabiliAes	
  (e.g.	
  back-­‐references)	
  
    –  Speed	
  
•  Memory	
  usage	
  
    –  ProporAonal	
  to	
  number	
  of	
  terms	
  matched	
  
    –  Increasing	
  available	
  memory	
  might	
  help	
  


                                                                   33
Custom	
  result	
  format	
  
                   Occurrence	
  count	
  
date  word   crisis   crash    meltdown     tsunami
                                                   go through every span
                                                        generated by
    ...
                                                (SpanTermQuery(meltdown)
                                                  filtered by date 9/15/08)

 9/14/08
                                X docs, Y
 9/15/08
                               occurrences
 9/16/08


    ...



                                                                        34
Future	
  work	
  
                          Job	
  queue	
  (1)	
  
•  Research	
  front	
  moving	
  towards	
  analysis	
  of	
  
   whole	
  database	
  
   –  Want	
  full	
  search	
  result	
  set	
  
   –  Queries	
  are	
  intensive	
  and	
  take	
  a	
  long	
  Ame	
  
•  SoluAon	
  will	
  be	
  beyond	
  increasing	
  Ameout	
  
   –  Users	
  might	
  close	
  their	
  browsers	
  
   –  We	
  might	
  restart	
  the	
  search	
  back-­‐end	
  



                                                                           35
Future	
  work	
  
                       Job	
  queue	
  (2)	
  
•  Features	
  
   –  Query	
  runs	
  in	
  background	
  
   –  NoAficaAon	
  when	
  finished/failed	
  
   –  Restart	
  queries	
  with	
  recoverable	
  errors	
  
   –  Check	
  and	
  cancel	
  jobs	
  
   –  Downloadable	
  result	
  
   –  Schedule	
  recurring	
  queries	
  
   –  Manage	
  job	
  priority	
  and	
  quota	
  

                                                                36
Future	
  work	
  
   MulAple	
  sources	
  and	
  languages	
  (1)	
  
•  MulAlingual	
  news	
  programs	
  
   –  E.g.	
  some	
  have	
  English	
  +	
  Spanish	
  CC	
  
•  MulAple	
  text	
  and	
  Amestamp	
  sources	
  
   –  E.g.	
  CNN	
  transcript	
  available	
  from	
  website	
  
   –  Applying	
  speech-­‐to-­‐text	
  to	
  videos	
  
   –  Manual	
  correcAon	
  of	
  text	
  and	
  Amestamps	
  
•  MulAple	
  markets	
  
   –  E.g.	
  Capture	
  TV	
  programs	
  in	
  Denmark	
  and	
  Norway	
  

                                                                           37
Future	
  work	
  
   MulAple	
  sources	
  and	
  languages	
  (2)	
  
•  Need	
  language	
  detecAon	
  
   –  Libraries	
  exist	
  
•  Search	
  for	
  specific	
  channel	
  
   –  Search	
  by	
  language	
  more	
  useful	
  
   –  But	
  no	
  fixed	
  channel	
  -­‐>	
  language	
  mapping	
  
•  What	
  will	
  proximity	
  search	
  and	
  occurrence	
  
   counAng	
  mean	
  when	
  there	
  are	
  mulAple	
  
   channels/languages?	
  

                                                                        38
Future	
  work	
  
                        Metadata	
  
•  Types	
  of	
  metadata	
  
   –  Segment	
  boundary,	
  type	
  and	
  topic	
  
   –  Headline	
  and	
  descripAon	
  (from	
  transcripts)	
  
   –  Website	
  links	
  
   –  SyntacAc	
  tags	
  (e.g.	
  part	
  of	
  speech)	
  
   –  Generated	
  annotaAon	
  (e.g.	
  object	
  idenAficaAon)	
  
   –  User	
  annotaAon	
  (e.g.	
  scene	
  descripAon)	
  
   –  Screen	
  text	
  
•  Eventually:	
  want	
  them	
  to	
  be	
  searchable	
  
                                                                      39
Thank	
  you	
  for	
  coming!	
  
                           	
  
•  Any	
  quesAons?	
  
•  My	
  e-­‐mail:	
  kai@ssc.ucla.edu	
  
•  Slides	
  available:	
  hgp://ucla.in/IDJq2u	
  




                                                      40

More Related Content

What's hot

Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementSecuring Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementNovell
 
Spring 2013 coordinated ims and db2 recovery
Spring 2013 coordinated ims and db2 recoverySpring 2013 coordinated ims and db2 recovery
Spring 2013 coordinated ims and db2 recoveryJessica Toy
 
SANsymphony V
SANsymphony VSANsymphony V
SANsymphony VTTEC
 
Integrating Apple Macs Using Novell Technologies
Integrating Apple Macs Using Novell TechnologiesIntegrating Apple Macs Using Novell Technologies
Integrating Apple Macs Using Novell TechnologiesNovell
 
Tivoli Storage Productivity Center... What’s new in v4.2.2?
Tivoli Storage Productivity Center... What’s new in v4.2.2?Tivoli Storage Productivity Center... What’s new in v4.2.2?
Tivoli Storage Productivity Center... What’s new in v4.2.2?IBM India Smarter Computing
 
Lessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made EasyLessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made EasyNovell
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarHitachi Vantara
 
The CIBER / CA partnership & Why CIBER is moving to Nimsoft Monitor
The CIBER / CA partnership & Why CIBER is moving to Nimsoft MonitorThe CIBER / CA partnership & Why CIBER is moving to Nimsoft Monitor
The CIBER / CA partnership & Why CIBER is moving to Nimsoft Monitor CA Nimsoft
 
Lync Server 2010: High Availability [I3004]
Lync Server 2010: High Availability [I3004] Lync Server 2010: High Availability [I3004]
Lync Server 2010: High Availability [I3004] Fabrizio Volpe
 
Lync 2010 High Availability
Lync 2010 High AvailabilityLync 2010 High Availability
Lync 2010 High AvailabilityHarold Wong
 
Novell Teaming: Automating Business Processes with Forms and Workflows
Novell Teaming: Automating Business Processes with Forms and WorkflowsNovell Teaming: Automating Business Processes with Forms and Workflows
Novell Teaming: Automating Business Processes with Forms and WorkflowsNovell
 
Progressive deduplication & off site protection of vm ware data
Progressive deduplication & off site protection of vm ware dataProgressive deduplication & off site protection of vm ware data
Progressive deduplication & off site protection of vm ware datasubtitle
 
Simplifying network management with Platespin
Simplifying network management with PlatespinSimplifying network management with Platespin
Simplifying network management with PlatespinAdvanced Logic Industries
 
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure  to Enable Data Analysis CollaborationThe Efficient Use of Cyberinfrastructure  to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure to Enable Data Analysis CollaborationCybera Inc.
 
Ronin cast overview 3-21-2013
Ronin cast overview 3-21-2013Ronin cast overview 3-21-2013
Ronin cast overview 3-21-2013Mitchell Wade
 
Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...
Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...
Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...Microsoft Private Cloud
 

What's hot (18)

Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security ManagementSecuring Your Endpoints Using Novell ZENworks Endpoint Security Management
Securing Your Endpoints Using Novell ZENworks Endpoint Security Management
 
Spring 2013 coordinated ims and db2 recovery
Spring 2013 coordinated ims and db2 recoverySpring 2013 coordinated ims and db2 recovery
Spring 2013 coordinated ims and db2 recovery
 
Fms35
Fms35Fms35
Fms35
 
SANsymphony V
SANsymphony VSANsymphony V
SANsymphony V
 
Integrating Apple Macs Using Novell Technologies
Integrating Apple Macs Using Novell TechnologiesIntegrating Apple Macs Using Novell Technologies
Integrating Apple Macs Using Novell Technologies
 
Zab dsn-2011
Zab dsn-2011Zab dsn-2011
Zab dsn-2011
 
Tivoli Storage Productivity Center... What’s new in v4.2.2?
Tivoli Storage Productivity Center... What’s new in v4.2.2?Tivoli Storage Productivity Center... What’s new in v4.2.2?
Tivoli Storage Productivity Center... What’s new in v4.2.2?
 
Lessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made EasyLessons Learned: Novell Open Enterprise Server Upgrades Made Easy
Lessons Learned: Novell Open Enterprise Server Upgrades Made Easy
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinar
 
The CIBER / CA partnership & Why CIBER is moving to Nimsoft Monitor
The CIBER / CA partnership & Why CIBER is moving to Nimsoft MonitorThe CIBER / CA partnership & Why CIBER is moving to Nimsoft Monitor
The CIBER / CA partnership & Why CIBER is moving to Nimsoft Monitor
 
Lync Server 2010: High Availability [I3004]
Lync Server 2010: High Availability [I3004] Lync Server 2010: High Availability [I3004]
Lync Server 2010: High Availability [I3004]
 
Lync 2010 High Availability
Lync 2010 High AvailabilityLync 2010 High Availability
Lync 2010 High Availability
 
Novell Teaming: Automating Business Processes with Forms and Workflows
Novell Teaming: Automating Business Processes with Forms and WorkflowsNovell Teaming: Automating Business Processes with Forms and Workflows
Novell Teaming: Automating Business Processes with Forms and Workflows
 
Progressive deduplication & off site protection of vm ware data
Progressive deduplication & off site protection of vm ware dataProgressive deduplication & off site protection of vm ware data
Progressive deduplication & off site protection of vm ware data
 
Simplifying network management with Platespin
Simplifying network management with PlatespinSimplifying network management with Platespin
Simplifying network management with Platespin
 
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure  to Enable Data Analysis CollaborationThe Efficient Use of Cyberinfrastructure  to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
 
Ronin cast overview 3-21-2013
Ronin cast overview 3-21-2013Ronin cast overview 3-21-2013
Ronin cast overview 3-21-2013
 
Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...
Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...
Microsoft Unified Communications - Introduction to Exchange Server 2010 (II) ...
 

Viewers also liked

Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
 
A Tour of Ruby On Rails
A Tour of Ruby On RailsA Tour of Ruby On Rails
A Tour of Ruby On RailsDavid Keener
 
Quick Start: Rails
Quick Start: RailsQuick Start: Rails
Quick Start: RailsDavid Keener
 
Quick Start: ActiveScaffold
Quick Start: ActiveScaffoldQuick Start: ActiveScaffold
Quick Start: ActiveScaffoldDavid Keener
 

Viewers also liked (6)

Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
A Tour of Ruby On Rails
A Tour of Ruby On RailsA Tour of Ruby On Rails
A Tour of Ruby On Rails
 
Practical JRuby
Practical JRubyPractical JRuby
Practical JRuby
 
Sais svcc
Sais svccSais svcc
Sais svcc
 
Quick Start: Rails
Quick Start: RailsQuick Start: Rails
Quick Start: Rails
 
Quick Start: ActiveScaffold
Quick Start: ActiveScaffoldQuick Start: ActiveScaffold
Quick Start: ActiveScaffold
 

Similar to Television News Search and Analysis with Lucene/Solr

Evolution of motion picture digitization at the National Library of Medicine
Evolution of motion picture digitization at the National Library of MedicineEvolution of motion picture digitization at the National Library of Medicine
Evolution of motion picture digitization at the National Library of MedicineJohn Rees
 
Standards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related effortsStandards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related effortsIMTC
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...SWAMI06
 
CD presentation march 12th, 2018
CD presentation march 12th, 2018CD presentation march 12th, 2018
CD presentation march 12th, 2018Ran Levy
 
Extending the Reach of Southern Audiovisual Sources
Extending the Reach of Southern Audiovisual SourcesExtending the Reach of Southern Audiovisual Sources
Extending the Reach of Southern Audiovisual Sourcesekemeyer
 
Introduction of file based workflows 111004 vfinal
Introduction of file based workflows 111004 vfinalIntroduction of file based workflows 111004 vfinal
Introduction of file based workflows 111004 vfinalMarie Josée (MJ) Drouin
 
Présentation AXF à la smpte
Présentation AXF à la smptePrésentation AXF à la smpte
Présentation AXF à la smpteMarc Bourhis
 
Publishing data into INSPIRE data specifications
Publishing data into INSPIRE data specificationsPublishing data into INSPIRE data specifications
Publishing data into INSPIRE data specificationsSnowflake Software
 
Research Proposal Presentation Pitch
Research Proposal Presentation PitchResearch Proposal Presentation Pitch
Research Proposal Presentation Pitchtchoonyong
 
Adaptive Media Streaming over Emerging Protocols
Adaptive Media Streaming over Emerging ProtocolsAdaptive Media Streaming over Emerging Protocols
Adaptive Media Streaming over Emerging ProtocolsAlpen-Adria-Universität
 
Tutorial adaptive-streaming
Tutorial adaptive-streamingTutorial adaptive-streaming
Tutorial adaptive-streamingJohnGregory89
 
Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...
Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...
Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...eCommConf
 
Video Communications and Video Streaming
Video Communications and Video StreamingVideo Communications and Video Streaming
Video Communications and Video StreamingVideoguy
 
Software curation as a digital preservation service
Software curation as a digital preservation serviceSoftware curation as a digital preservation service
Software curation as a digital preservation serviceKeith Webster
 
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...Databricks
 
Use the SAP Content Server for Your Document Imaging and Archiving Needs!
Use the SAP Content Server for Your Document Imaging and Archiving Needs!Use the SAP Content Server for Your Document Imaging and Archiving Needs!
Use the SAP Content Server for Your Document Imaging and Archiving Needs!Verbella CMG
 
SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...
SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...
SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...Michael Noel
 

Similar to Television News Search and Analysis with Lucene/Solr (20)

Evolution of motion picture digitization at the National Library of Medicine
Evolution of motion picture digitization at the National Library of MedicineEvolution of motion picture digitization at the National Library of Medicine
Evolution of motion picture digitization at the National Library of Medicine
 
Standards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related effortsStandards' Perspective - MPEG DASH overview and related efforts
Standards' Perspective - MPEG DASH overview and related efforts
 
f.live
f.livef.live
f.live
 
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
A Segmentation based Sequential Pattern Matching for Efficient Video Copy De...
 
CD presentation march 12th, 2018
CD presentation march 12th, 2018CD presentation march 12th, 2018
CD presentation march 12th, 2018
 
Extending the Reach of Southern Audiovisual Sources
Extending the Reach of Southern Audiovisual SourcesExtending the Reach of Southern Audiovisual Sources
Extending the Reach of Southern Audiovisual Sources
 
Introduction of file based workflows 111004 vfinal
Introduction of file based workflows 111004 vfinalIntroduction of file based workflows 111004 vfinal
Introduction of file based workflows 111004 vfinal
 
Présentation AXF à la smpte
Présentation AXF à la smptePrésentation AXF à la smpte
Présentation AXF à la smpte
 
Publishing data into INSPIRE data specifications
Publishing data into INSPIRE data specificationsPublishing data into INSPIRE data specifications
Publishing data into INSPIRE data specifications
 
Research Proposal Presentation Pitch
Research Proposal Presentation PitchResearch Proposal Presentation Pitch
Research Proposal Presentation Pitch
 
Adaptive Media Streaming over Emerging Protocols
Adaptive Media Streaming over Emerging ProtocolsAdaptive Media Streaming over Emerging Protocols
Adaptive Media Streaming over Emerging Protocols
 
Bluetube
BluetubeBluetube
Bluetube
 
Tutorial adaptive-streaming
Tutorial adaptive-streamingTutorial adaptive-streaming
Tutorial adaptive-streaming
 
Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...
Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...
Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...
 
Video Communications and Video Streaming
Video Communications and Video StreamingVideo Communications and Video Streaming
Video Communications and Video Streaming
 
Software curation as a digital preservation service
Software curation as a digital preservation serviceSoftware curation as a digital preservation service
Software curation as a digital preservation service
 
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...
 
Use the SAP Content Server for Your Document Imaging and Archiving Needs!
Use the SAP Content Server for Your Document Imaging and Archiving Needs!Use the SAP Content Server for Your Document Imaging and Archiving Needs!
Use the SAP Content Server for Your Document Imaging and Archiving Needs!
 
Linuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharborLinuxcon secureefficientcontainerimagemanagementharbor
Linuxcon secureefficientcontainerimagemanagementharbor
 
SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...
SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...
SQL 2012 AlwaysOn Availability Groups (AOAGs) for SharePoint Farms - Norcall ...
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Television News Search and Analysis with Lucene/Solr

  • 1. Television  News  Search  and   Analysis  with  Lucene/Solr   Kai  Chan  <kai@ssc.ucla.edu>   Social  Sciences  CompuAng   University  of  California,  Los  Angeles     Lucene  RevoluAon,  May  10,  2012  
  • 2. CommunicaAon  Studies  Archive   Background  (1)   •  ConAnuaAon  of  analog  recording  of  TV  news   –  Thousands  of  tapes  since  Watergate/1970s   –  Hard  to  look  for  a  parAcular  news  program  or   topic   1
  • 3. CommunicaAon  Studies  Archive   Background  (2)   •  Digital  recording  since  2005   •  Capture  news  programs  on  computers   –  Video:  can  be  streamed  over  the  Web   –  Closed  capAoning  (“subAtle  text”):  indexed  and   searchable   –  Image  snapshots   –  Search  engine  and  analysis  tools   2
  • 4. CommunicaAon  Studies  Archive   Background  (3)   •  Also  download  transcripts  and  web-­‐streamed   news  programs   •  100  news  programs  and  600,000  words  added   each  day   3
  • 5. CommunicaAon  Studies  Archive   Background  (4)   •  January  2005  to  present   –  28  networks   –  1,600  shows   –  130,000  hours   –  160,000  news  programs   –  50,000,000  images   –  880,000,000  words   4
  • 6. Why  This  is  Important  (1)     •  Researchers   –  Large  and  unique  collecAon  of  communicaAon   –  Many  modaliAes   •  Speech,  facial  expression,  body  gesture,  etc.   –  Different  condiAons/secngs   –  Different  networks  and  communiAes   –  Allows  study  of  TV  news  +  communicaAon  in   general  in  ways  impossible  before   5
  • 7. Why  This  is  Important  (2)     •  Non-­‐researchers   –  TV  news  about  presentaAon  and  persuasion   •  Which  happen  in  daily  life  also   –  TV  main  source  of  news  for  many/most   –  Greatly  affects  the  public’s  decisions   –  Learn  about  what  we  watch   6
  • 8. 7
  • 9. 8
  • 10. 9
  • 11. 10
  • 12. 11
  • 13.
  • 14. 13
  • 15. ApplicaAon  in  Research     •  CommunicaAon  Studies   –  Amount  of  coverage  for  events  over  Ame   •  LinguisAc   –  Speech  and  language  pagerns   •  Computer  Science   –  Object  idenAficaAon   –  IdenAfy  news  anchors,  public  figures   –  Story  segmentaAon   14
  • 16. ApplicaAon  in  Teaching  (1)     •  Chicano  Studies:  RepresentaAons  of  LaAnos   on  the  Television  News   –  May  1,  2007  immigraAon  march   –  MacArthur  Park,  Los  Angeles,  CA   –  2  days  (May  1  &  2,  2007)   –  Framing,  stereotyping,  metaphor,  silencing   –  reports  with  screenshots  and  links  to  news  stories   15
  • 17. ApplicaAon  in  Teaching  (2)     •  CommunicaAon  Studies:  PresidenAal   CommunicaAon   –  2008  presidenAal  primary   –  6  weeks  (Dec  2007  to  Feb  2008)   –  Coverage  of  sound  bites   •  Amount  of  Ame  given  to  candidate/party   •  Types  of  response  (posiAve,  neutral,  negaAve)   –  Students  created  their  own  poliAcal  ad.   16
  • 18. Work  flow  (1)   Capture/conversion  machines   •  2  groups,  2  machines  per  group   Capture/ Backup –  Keep  the  best  recording   conversion storage machines server –  6  TV  tuners  per  machine   •  Capture  video  and  CC  to   separate  files  in  real-­‐Ame   Storage/ control Image –  MPEG-­‐TS  (~7  GB/hr)   server server –  Timestamp  every  2-­‐3  seconds   •  Generate  image  snapshots   Video Search •  Convert  videos   server streaming server –  MP4/H.264  (VGA,  ~240  MB/hr)   17
  • 19. Work  flow  (2)   Storage/staAc  file  servers   •  Control  server     Capture/ Backup conversion storage –  Download  TV  schedules   machines server –  Download  web-­‐streamed  news   programs   Storage/ Image –  Collect  and  check  recordings   control server server –  Pushes  files  to  places   •  Video  streaming  server   Video Search •  Backup  storage  server   server streaming server •  Image  server   18
  • 20. Work  flow  (3)   Search  server   •  Lucene  index  updated  daily   Capture/ Backup conversion storage –  Main  text  field  tokenized   machines server –  Separate  fields  for  date,   network,  show,  etc.   Storage/ Image control –  Binary  fields  for  segment  and   server server Ame  data   •  Hosts  search  engine   Search Video streaming server server 19
  • 21. The  search  process     Video server Retrieve thumbnails Image server Watch videos and montages Web server Video files (Apache) Video streaming Thumbnail server (Wowza) User & montages Perform searches Search server Web server Custom code (PHP) front end PHP-Java Bridge or Solr bridge Custom code (Java) Lucene back end MySQL database Lucene index 20
  • 22. Custom  query  type   Segment-­‐enclosed  query  (1)   •  Problem  1:  search  for  “X  near  Z”   •  Lucene:  search  for  “X  within  Y  words  of  Z”   –  How  to  pick  Y?   –  Hard  to  pick  a  fixed  number   21
  • 23. Custom  query  type   Segment-­‐enclosed  query  (2)   •  Problem  2:  all  matched  search  words  might   not  be  talking  about  same  story   –  E.g.  “Obama  AND  visit  AND  Afghanistan”   –  Might  match  a  news  program  about  Obama’s  visit   to  Canada  +  violence  in  Afghanistan   22
  • 24. Custom  query  type   Segment-­‐enclosed  query  (3)   •  A  news  program  can  contain  several  stories   –  E.g.  Local,  naAonal,  world,  weather,  sports   23
  • 25. Custom  query  type   Segment-­‐enclosed  query  (4)   local story 1 local story 2 commercials national story 1 national story 2 weather 1 commercials world story 1 world story 2 weather 2 commercials health entertainment sports 24
  • 26. Custom  query  type   Segment-­‐enclosed  query  (5)   •  One  soluAon:  search  for  “X  and  Z  within  same   story  segment”   –  Possible  with  Lucene  +  story  segment  info   •  Bonus:  enables  searching/filtering  for  a   parAcular  story  type   –  E.g.  PoliAcs   25
  • 27. Custom  query  type   Segment-­‐enclosed  query  (6)   •  How  to  mark  segments   –  Automated   •  Computer  Science  researchers  working  on  them   •  Word  frequency   •  Scene  change   •  Black  frame  and  silence   –  Manual  segmentaAon   •  Watch  the  video   •  Decide  where  a  story  starts  and  ends   •  Mark  posiAons  in  semi-­‐automated  system   26
  • 28. Custom  query  type   Segment-­‐enclosed  query  (7)   seg. 1 seg. 1 seg. 2 seg. 2 seg. 3 seg. 3 begin end begin end begin end span 1 span 2 span 3 span 4 span 5 27
  • 29. Custom  query  type   Segment-­‐enclosed  query  (8)   •  Idea   –  Get  spans  from  SpanNearQuery   –  Filter  and  keep  those  fully  within  segments   •  In  producAon:  segment  info  in  stored  fields   –  As  a  list  of  <start  posiAon,  end  posiAon>   –  Simple  to  implement   –  Reasonably  fast  searching   •  AlternaAve:  store  segment  info  as  terms   –  Possible  to  find  segments  by  themselves   –  Appears  to  run  much  faster   28
  • 30. Custom  query  type   Time-­‐enclosed  query   20 s 25 s 30 s 35 s 40 s 45 s 50 s 55 s 60 s <= 20 s span 1 <= 15 s span 2 <= 10 s span 3 <= 35 s span 4 <= 25 s span 5 29
  • 31. Custom  query  type   MulA-­‐term  regular  expression  (1)   •  “here  is  _  _  _  with  the  (news|story|details| report)”   •  Apply  RegEx  to  a  phrase  or  sentence   –  Not  just  individual  words   •  Lucene  core  has  regular  expression  query   support   –  Good  starAng  point   –  Not  a  complete  soluAon  for  us   30
  • 32. Custom  query  type   MulA-­‐term  regular  expression  (2)   •  Problems   –  Some  analyzers  do  not  work  with  RegEx   –  Lucene’s  RegEx  query  classes  only  apply  RegEx  to   individual  terms   •  Want  to  match  a  pagern  against  a  phrase/sentence   •  Want  placeholders  for  whole  words  (not  just  characters)   –  Term(fieldName,  “.*”)  matches  all  terms,  and  all   documents,  and  all  posiAons  in  the  index   •  very  slow   •  takes  lots  of  memory   31
  • 33. Custom  query  type   MulA-­‐term  regular  expression  (3)   •  What  we  did   –  Parse  and  translate  mulA-­‐term  RegEx  into  Lucene   built-­‐in  queries  (SpanNearQuery,  RegexQuery)   •  E.g.  “here  is  _  _  _  with  the”  =  “here  is”  followed  by  “with   the”  (with  exactly  3  terms  in  between)   –  Leading  and  trailing  placeholders   •  E.g.  “_  _  is  the  _  _  _”   •  Preserve  for  correctness   •  Store  word  count  for  each  document   •  Expand  each  span  on  both  sides   •  Bounds  checking   32
  • 34. Custom  query  type   MulA-­‐term  regular  expression  (4)   •  Regular  expression  libraries  differ  in   –  Syntax  (e.g.  Perl  5-­‐compaAble)   –  CapabiliAes  (e.g.  back-­‐references)   –  Speed   •  Memory  usage   –  ProporAonal  to  number  of  terms  matched   –  Increasing  available  memory  might  help   33
  • 35. Custom  result  format   Occurrence  count   date word crisis crash meltdown tsunami go through every span generated by ... (SpanTermQuery(meltdown) filtered by date 9/15/08) 9/14/08 X docs, Y 9/15/08 occurrences 9/16/08 ... 34
  • 36. Future  work   Job  queue  (1)   •  Research  front  moving  towards  analysis  of   whole  database   –  Want  full  search  result  set   –  Queries  are  intensive  and  take  a  long  Ame   •  SoluAon  will  be  beyond  increasing  Ameout   –  Users  might  close  their  browsers   –  We  might  restart  the  search  back-­‐end   35
  • 37. Future  work   Job  queue  (2)   •  Features   –  Query  runs  in  background   –  NoAficaAon  when  finished/failed   –  Restart  queries  with  recoverable  errors   –  Check  and  cancel  jobs   –  Downloadable  result   –  Schedule  recurring  queries   –  Manage  job  priority  and  quota   36
  • 38. Future  work   MulAple  sources  and  languages  (1)   •  MulAlingual  news  programs   –  E.g.  some  have  English  +  Spanish  CC   •  MulAple  text  and  Amestamp  sources   –  E.g.  CNN  transcript  available  from  website   –  Applying  speech-­‐to-­‐text  to  videos   –  Manual  correcAon  of  text  and  Amestamps   •  MulAple  markets   –  E.g.  Capture  TV  programs  in  Denmark  and  Norway   37
  • 39. Future  work   MulAple  sources  and  languages  (2)   •  Need  language  detecAon   –  Libraries  exist   •  Search  for  specific  channel   –  Search  by  language  more  useful   –  But  no  fixed  channel  -­‐>  language  mapping   •  What  will  proximity  search  and  occurrence   counAng  mean  when  there  are  mulAple   channels/languages?   38
  • 40. Future  work   Metadata   •  Types  of  metadata   –  Segment  boundary,  type  and  topic   –  Headline  and  descripAon  (from  transcripts)   –  Website  links   –  SyntacAc  tags  (e.g.  part  of  speech)   –  Generated  annotaAon  (e.g.  object  idenAficaAon)   –  User  annotaAon  (e.g.  scene  descripAon)   –  Screen  text   •  Eventually:  want  them  to  be  searchable   39
  • 41. Thank  you  for  coming!     •  Any  quesAons?   •  My  e-­‐mail:  kai@ssc.ucla.edu   •  Slides  available:  hgp://ucla.in/IDJq2u   40