At STM Week in London, technologists working in publishing consider what it takes to make content ready for computer readers.
Recorded at STM Week 2019, London
Subscribe: Apple Podcasts | Stitcher | Email | Download
While the public may marvel at machine-generated output from Siri and Alexa to their questions about the world, publishers are beginning to understand that producing the input to help such machines form their answers is an attractive, forward-thinking opportunity.
Computers, however, do not read in the same way as do humans. Savvy publishers must recognize the types of adjustments that will cater to this new machine reader and then make systematic changes across their repertoire, or perhaps in a specific subject area, to maximize results.
As part of STM Week in London in December 2019, CCC’s Chris Kenneally explored with a panel of technologists working in publishing consider what it takes to make content ready for computer readers.
“A very basic thing about machine readability is the availability of structured data for all of your content,” Springer Nature’s Andy Halliday said. “That’s not always there, so we’ve embarked upon a program to start provisioning full-text structured data for all of our content so we can then widen the information that goes into any kind of summarization for a book or journal content.
“Having good ontologies to describe the content well and having it consistently tagged [means that] machines who read that content can make linkages between things that could be [published[ as far apart as 10 or 15 years and across various different geographical boundaries,” explained Halliday, who is senior product manager at Springer Nature, based in London. His work is focused on the development of content services and the future of content. This work includes looking at how new technologies can solve user problems by improving efficacy and usability of scientific content.
Other panelists include –
- Lucie Kaffee, a Ph.D. researcher at the University of Southampton researching in the Web and Internet Science Research Group. She is currently a research intern at Bloomberg in London and a Newspeak Fellow. Her research focuses on multi-linguality in structured data, particularly in supporting low-resourced languages in Wikipedia. Previously, Lucie Kaffee worked at Wikimedia Deutschland in the Wikidata team and is currently still involved in Wikimedia projects.
- Tom Morris, who spent more than 20 years in IT as a principal architect in publishing systems integrations projects, before becoming CTO at Ixxus and subsequently senior director, engineering, at Copyright Clearance Center, which acquired the London-based startup in 2016. In this role, he has led infrastructure and development teams, but has given special attention to content and knowledge organization systems.
- Sadia Shahid, director of strategy, growth, and partnerships at Wizdom.ai, an AI-powered startup from the Oxford University Software Incubator that was acquired in 2017 by Informa. Shahid is part of the founding team at Wizdom.ai, which interconnects billions of data points about the global research ecosystem using artificial intelligence, machine learning, and natural language processing.