Text Analyzer - February, March 2018

Feedback, Comments, Mentions, Questions on Twitter, Email, and Facebook

  1. "Just wanted to check in real quick and make sure I get you connected with Dennis for the text analyzer API for us to start playing around with our network logs. I've cc'd him here."

    --email from scholarly forum director in MI to Alex, Feb 1 15:33

  2. "Dear Alex,

    It was lovely to meet you yesterday at the Center for Science and Society at Columbia and talk for a little bit about my dissertation research in the history of nutrition science.
    I tried to use both Jstor Data For Research and Jstor text analysis and had some challenges with both tools, perhaps you could direct me in the right direction for solution:
    1. I am interested in a specific scientific theory called the "Nem". Nem was an alternative to a calorie as a basic food unit. I tried to use DFR to see who reffed to Nem between 1910 and 1930, but because it's a short word, and also apparently a word in Hungarian and Vietnamese, I ended up with many irrelevant items and corpus was not useful for analysis.
    2. I have all of the reports of the British Association for the Advancement of Science. These are huge files, and I am looking for a way to read them from distance. They are all on a PDF format (see example here). I was trying to use Jstor text analyze for that, but it could not use the PDF. Do you know what I need to change in the format for it to work on jstor text analysis?

    Thank you so much again for coming to the event Yesterday and for spending some time explaining Jstor's recent tools to me. I look forward to learn more. "

    --email from PhD history student in NY to Alex, February 1 16:09
  3. "Thanks for emailing and it was great to meet you as well yesterday. This sounds like a fun project!

    Regarding the BDH PDF – it looks, based on the sample you linked to, like the PDF doesn’t have the text embedded in it – it’s “just” a flat picture, with an image of the text but without containing the text digitally. I’ll ask around about some simple online OCR services that you could run these PDFs through to extract the text, and then you could upload that text into Text Analyzer. Have you tried text analyzer with another kind of document (say, a webpage or Word doc) to see what it provides (topics, relevant articles) is what you’re looking for?

    I’ll also ask the dfr folks if there’s a way to be more targeted with creating a dataset for DFR. The online tool doesn’t have filtering by language, but that should theoretically be possible… I’ll let you know what I find."

    --reply to NY history PhD from Alex, Feb 1 16:24
  4. "Interesting. I think that it is actually a searchable text, not just a flat picture. If you download one of the reports as pdf from the link you can see that a simple cntrl + F leads you to key words. I wonder why it does not work. Maybe something about the way I downloaded the file "

    --reply from NY history Phd to Alex, Feb 1 22:48
  5. "Dear Ron

    I wonder whether you might consider my project for beta testing of the text analyzer.

    I'm trying to map an online database of pdfs at ombudsman-decisions.org.uk.

    The Financial Ombudsman Service is the UK's equivalent of the CFPB, crossed with a States' Attorney General.

    Their user interface is terrible. You need to know the term you want to search for before you look for it. Or the legal entity company name. So you'll never find any new terms coming through. Or any new trends.

    Your analyzer has excellent named entity recognition, and I'd really like to test it on product names in the sample.

    There are circa 150k pdf, each with a unique URL. I'm only interested in 100k of these (payment protection insurance covers the 50k I'm not interested in).

    Please drop me a line on whether I could use your tool for such a bulky request."

    --email from compliance officer in UK to Ron, Feb 5 6:52
  6. "Thanks for your interest. I’d be happy to add you to our Text Analyzer API beta program. Please send me your MyJSTOR userid and I’ll add it to our beta users list. MyJSTOR accounts are available to anyone and if you haven’t done so already you’ll need to register for one at jstor.org/register to use the service. After I receive your MyJSTOR userid I’ll send you an email confirming access.

    At this point we’re not providing access to the entity recognition features, only to the topic inferencer and text extractor. Documentation for the API is available at labs.jstor.org/api/docs/. There are 2 separate endpoints associated with the Text Analyzer service. The Topic Inferencer endpoint is the primary endpoint for the service and is used to identify topics associated with a submitted text. The second Text Extractor endpoint is provided as a convenience for extracting text from web URLs and other documents including PDF, and MS-Word files.

    I’d be happy to share more info about what we’re doing with in our named entity recognition back-end if you’re interested in trying to replicate this on your own. In a nutshell, I’m using multiple entity recognition packages and services (including Stanford NER, Apache OpenNLP, OpenCalais and Alchemy) to do the heavy lifting and then doing some aggregation and filtering of the results generated by the individual NER engines.

    If you’re interested in the topic inferencing you can give the beta API a try to see if it would be of value in what you desire. If it is and if you do plan to bulk process 100k docs with this please let me know beforehand as we’d need to do some capacity adjustment on our end to accommodate that volume."

    --reply to UK compliance officer from Ron, Feb 5 14:18
  7. "Gosh, that's very kind of you. Thank you for getting back to me so quickly.

    My JSTOR account name is XXXXXXXX.

    As a possible workaround to the loss of NER functionality (which produced some of the most relevant results in the individual documents I fed it) if I send a specialist list of terms, is that something you could add to your global topic cluster list?

    I should add that I only have a layman's awareness of machine learning/API integration, so apologies in advance if I use slightly the wrong terms. "

    --reply from UK compliance officer to Ron, Feb 6 7:28
  8. 📢 EPQ Cohort | JSTOR #TopTip 4 - as deadline draws near, this easy to use tool will be a HUGE help towards #citing #sources accurately:  https://twitter.com/jstor/status/882593294873767937 
  9. 📢 EPQ Cohort | JSTOR #TopTip 2 - upload your own doc and find articles based on your own text !  https://twitter.com/jstor/status/922903086494093313 
  10. "Hi, thanks for the followup. You’re right, the text is in fact there – I had been looking at the page image, not the pdf, silly me.

    The Labs team is looking closer into it, but our working hypothesis is that the problem is the size of this file. At 28M, it’s pretty large, and we think that may be causing problems. We’ll let you know if we discover any more detail, or find a way to process such a large document."

    ---reply to NY history PhD from Alex, Feb 6 9:27
  11. gerade gelernt: JSTOR hat einen Text Analyzer. Einfach eigenen Text uploaden (auch auf Deutsch!) und man bekommt englischsprachige Inhalte aus JSTOR zu ähnlichen Themen!  http://www.jstor.org/analyze/ 
  12. Text Analyzer de JSTOR Text Analyzer, la herramienta que le permite usar sus propios documentos para buscar...  https://fb.me/9pL7yS1B0 
  13. "Uploaded a word97/2000/XP doc file, and it said 'no text found in the document'. Worked once converted to rtf."

    --bug report from UK math Phd student, Feb 10 4:20
  14. "Thanks for the feedback!

    You helped us identify a regression in the service— it should absolutely be able to process DOC files, and we believe the problem is now resolved.

    Please let us know if you have any more questions, or if you have any additional feedback on the Text Analyzer."

    --reply to UK math Phd student from Matt, Feb 12 9:24
  15. Great survey of resources for digital scholarship @britishlibrary - I love the JSTOR Text Analyser, which enables you to drop your own article into the analyser, get a list of prioritised terms, change their weights, then get other JSTOR recommendations  https://twitter.com/mia_out/status/963104533999226881 
  16. @margoline @univerlag @hirmeos Let us know what you think of it! Be sure to check out our Text Analyzer too:  http://www.jstor.org/analyze . There's even a (beta) api for that.
  17. @Hollybooklover Hmm, have you tried text analyzer? It might help you get through this research roadblock!  https://www.jstor.org/analyze 
  18. Stuck in a research rut? Try uploading a doc, having the analyzer scan it for search terms, and seeing articles related to it! Learn more: https://t.co/DC5e9V7Aec https://t.co/4jxcJ5UIy6
    Stuck in a research rut? Try uploading a doc, having the analyzer scan it for search terms, and seeing articles related to it! Learn more:  http://jstor.info/MIlJ30ijjST  pic.twitter.com/4jxcJ5UIy6
  19. The nice people that works at @JSTOR Labs made this machine to find journal articles with conceptual affinity to an text or article file  http://www.jstor.org/analyze/?url= 
  20. Text Analizer - Ferramenta fantástica do @JSTOR Labs. Você carrega um texto ou imagem, a máquina analise e te apresenta opções de artigos com conceitos afins.  https://twitter.com/JSTOR/status/948981188168962048 
  21. @ce_s @JSTOR Plus, you can upload a doc in Portuguese and get back relevant articles in English...
  22. Why, thank you. We *do* try our very best to be nice people (and also to build cool tools).  https://twitter.com/ce_s/status/964088729399975936 
  23. Text Analizer - Ferramenta fantástica do @JSTOR Labs. Você carrega um texto ou imagem, a máquina analise e te apresenta opções de artigos com conceitos afins.  https://twitter.com/JSTOR/status/948981188168962048 
  24. "I work as a Digital Scholarship Librarian at XXXXXX University Library and I am writing to express our interest in becoming a beta tester for the JSTOR TextAnalyzer tool/s.

    I look forward to finding out more details from you and remain,"

    --email from Illinois university librarian, Feb 21 10:24
  25. "Thanks for this note. If you’d like to get access to our Text Analyzer API all we’ll need is your MyJSTOR user id. MyJSTOR accounts are available to anyone and if you haven’t done so already you’ll just need to register for one at jstor.org/register to use the service. After I receive your MyJSTOR user id I’ll send you an email notifying you when API access has been enabled.

    Documentation for the API is available at labs.jstor.org/api/docs/. There are 2 separate endpoints associated with the Text Analyzer service. The Topic Inferencer endpoint is the primary endpoint for the service and is used to identify topics associated with a submitted text. The second Text Extractor endpoint is provided as a convenience for extracting text from web URLs and other documents including PDF, and MS-Word files.

    Please let me know if you have any additional questions."

    --reply to IL university librarian from Ron, Feb 21 11:19
  26. Ferramenta de pesquisa: @JSTOR Text Analyzer. Você dropa um texto, mesmo em português, e recebe sugestões bibliográficas  http://www.jstor.org/analyze/?url= 
  27. "Great news on Text Analyzer becoming COUNTER compliant, and that it may be moving out of beta soon. I demo TA during my meetings with libraries, and the feedback has been great. They do ask from time to time if it will be moving out of Beta and become one of the standard search options, and if so, when and what features will be included. I didn’t realize that there were certain benchmarks that needed to be met before moving a platform feature out of beta.

    Thanks, Alex for the update."

    --email from ITHAKA employee to Alex, Mar 2 9:25
  28. "Great to hear the feedback has been positive. At this point, we don’t think there’s any more functionality required to remove the beta label (counter-compliance was the last of these) – our efforts are instead focused primarily reliability and stability. There are still some edge-cases when TA doesn’t perform as well as we’d like and can throw errors to users (for instance, we just released a fix for people who were timing out because they were trying to process super-large PDFs).

    Note: even when we do remove the beta label on the general tool, the multilingual feature will still be considered beta and experimental. "

    --reply to ITHAKA employee from Alex, Mar 2 9:32
  29. "This lab is a wonderful idea, and I am very excited about it and how it is helping me with my research, but I'd also like to be able to say that I do not want certain terms to appear in my search. When I used the search, I got quite a few articles on Japanese armor, which is not one of the topics of my paper, and I would like to sort it out of my results."

    --email from Ashley, Mar 2 9:32
  30. JSTOR Labs has recently rolled out a beta version of a JSTOR Text Analyzer. The purpose of the Text Analyzer is...  https://fb.me/M2pHV5gh 
  31. The Beta Text Analyzer from @JSTOR is proving a fantastic resource for finding research gaps. I highly recommend! #researching #writing #revisions
  32. "Cannot get this to work! Dragging suggested document "Under Victorian Microscopes" does not seem to do anything."

    --email from librarian in UK, Mar 6 8:25
  33. "Hello, we're sorry that Text Analyzer is not working for you. I tried clicking-and-dragging the suggested document as well and noticed that it only worked if I clicked on the bottom half of the image. Whereas clicking on the top half seems to just select the title of the document. I've notified the JSTOR Labs team of this UI inconsistency.

    However, to make sure another issue isn't affecting your user of Text Analyzer, could you let us know what web browser and operating system you're using to access Text Analyzer?

    Thank you for taking the time to help us make Text Analyzer better!"

    --reply to UK librarian from Jarod, Mar 6 8:38
  34. Have you tried locating related articles by dragging and dropping into JSTOR Text Analyzer? Now you can do so in 14 new languages to find related English materials. Visit https://t.co/elaQmzVhTL to try it out and see the full list of supported languages. https://t.co/YZ9ddsXcd6
    Have you tried locating related articles by dragging and dropping into JSTOR Text Analyzer? Now you can do so in 14 new languages to find related English materials. Visit  http://www.jstor.org/analyze/  to try it out and see the full list of supported languages. pic.twitter.com/YZ9ddsXcd6
  35. "Hi, using Chrome & Windows 10. I tried doing as you suggested and selecting the bottom half of the image, but that doesn't seem to make any difference!Hope you can help, thanks"

    --reply from UK librarian to Jarod, Mar 7 4:23
  36. @abhumphreys @JSTOR @Miller_DG @MissCollege @MC_Writing Exciting! #SpeedLibrary's Reference Team will be looking for this expansion and will share the news with our students and faculty! @JSTOR Labs' research tools are simply great. We're about to share a blog post on Topicgraph and Text Analyzer.
  37. "About 6 months ago the searches seemed better. Since that time, I am getting random suggestions that have nothing to do with the topic. Did you change something in the Text Analyzer to suggest topics that would seem to pair with the topic? It was a better match about 6 months ago. Still love it, and so do my students. Just a little harder to use now, and not as relevant in suggestions."

    --email from HS librarian in Illinois, Mar 7 12:11
  38. @abhumphreys Thank you! We think our students and faculty will love @JSTOR Labs' research tools!
  39. "Hello, thank you for the extra info, that really helps us narrow down where Text Analyzer needs improvements. If you have a minute, could you try the drag-and-drop of the example doc on Text Analyzer again? We just pushed out an update that changes the ‘click target’ area. jstor.org/analyze/ It seems to work ok on our end.

    Have you tried Text Analyzer with other documents, text, or URLs? Do those work as expected?"

    --reply to UK librarian from Jarod, Mar 7 14:20
  40. Do you need, to find:
- more sources for a research paper, or
- a specific topic in an article/book quickly?

If so, read our new post at https://t.co/NpNTkm6scx to learn about 2 tools that make the research process "a little bit magical" - @JSTOR Labs' Topicgraph & Text Analyzer https://t.co/7eXIHnjZSz
    Do you need, to find: - more sources for a research paper, or - a specific topic in an article/book quickly? If so, read our new post at  http://speedblogs.weebly.com  to learn about 2 tools that make the research process "a little bit magical" - @JSTOR Labs' Topicgraph & Text Analyzer pic.twitter.com/7eXIHnjZSz
  41. "Thank you for this feedback. I'm glad to hear that you're still using the tool and are finding value in it but I'd really like to figure out why your results are poorer than what you'd seen previously. We made significant updates in the topic inferencing back-end last Fall and while I think the overall inferencing has been improved there are subject areas in which our topic model may still be a little weak and the inferencing may have regressed some in those areas. Do you have any specific examples you could share with us? If so, that would be super helpful in getting to the bottom of this. If not, or if it would be easier, would you be available for a short call to discuss?

    Thanks again for the feedback. It is very much appreciated."

    --reply to IL HS librarian from Ron, Mar 7 16:06
  42. "Hi Jarod,Unfortunately no, it didn't. It doesn't seem to "talk" to any of the documents I have tried to insert. I've tried using Internet Explorer, that doesn't work either.
    I've cleared browser & cookies - no success.
    Is there anything else you can suggest?"

    --reply from UK librarian to Jarod, Mar 9 4:15
  43. "This is a bit of a mystery to us. If you’re willing, we would love to do a Skype (or similar) call with screensharing to see what’s going on.
    If not, it would also be helpful for you to click Ctrl+Shift+J in chrome after you’ve tried to get it to talk to a document and either copy and paste what shows up in that console or send us a picture of it.

    Sorry it’s not working,"

    --reply to UK librarian from Jessica, Mar 13 11:07
  44. "It's OK thanks, sorry I didn't get back to you. The matter has been resolved, it was due to the firewall restrictions at our college."

    --reply from UK librarian to Jessica, Mar 13 11:17
  45. @abhumphreys @JSTOR Our bachelors degrees are 3-years and I'm in my second programme now. So 7 years in total, but 5 years actually spent studying full time. One sabbatical year and one year in total on sick leave. P.S. I love the text analyzer!
  46. "Every document I tried (both PDF and doc) didn't work."

    --email from education professor in NY, Mar 13 13:10
  47. "Hello, we're sorry Text Analyzer didn't work with your documents! We’d love to work with you to figure out the problem. Can you tell us more about the files you were trying to upload? Would it be possible to share some of the documents with us? This would help us figure out any issues on our end with parsing the document.

    Thank you for taking the time to help us make Text Analyzer better. "

    --reply to NY ed professor from Jarod, Mar 13 15:13
  48. " sure they were PDF articles that had been downloaded from library searches.
    I also tried personal doc files of my own as well as a link from a NYT article. Either it gets stuck at “making recommendations” screen and I have to try again (locked my computer up in one case) or it says there’s no text to analyze.

    I’ll attach a document I tried to use:"

    --reply from NY ed professor from Jarod, Mar 13 13:27
  49. "Thank you for sharing the document! Somewhat unfortunately, Text Analyzer worked for me with the document you sent. I tried in Firefox, Chrome, Edge, and Safari.

    Do you know what operating system and web browser you are using? Have you had success using Text Analyzer on other computers? Are you able to click-and-drag the example document on the Text Analyzer page and get results?"

    --reply to NY ed professor from Jarod, Mar 13 13:49
  50. "The document I wanted analyzed was scanned and it said that results were generated, but nothing showed up! I do now know how I am supposed to view the results once they are generated. Please make it more clear for other people as well as myself who have not used it before."

    --email from Canadian student, March 15 19:24
  51. "Thanks for the note and we're sorry that Text Analyzer didn't show you any results. What *should* happen is that after you upload a document it just shows you the results. Would you be willing to share the document you were using with us so we can try it on our side? Also, can I ask what browser and operating system you're using?

    Thanks. We'll get to the bottom of this."

    --reply to Canadian student from Alex, Mar 16 10:10
  52. "I don't need to send you the document I used because the analyzer did not work even with the sample that was provided to test it. The OS I am using is Windows 10 Home and the browser is Mozilla Firefox version 58.0.2 (64-bit).

    Hopefully the issue can be solved."

    --reply from Canadian student to Alex, Mar 16 19:39
  53. "Thanks, [redacted]. Hm. Ok. Well look into that browser/os combo and if we have any questions for you I’ll let you know. "

    --reply to Canadian student from Alex, Mar 16 20:05
  54. This is quite a nifty thing from @JSTOR for developing extra search terms and topics in your research. Just drag&drop a PDF you have been reading:  http://www.jstor.org/analyze/?cid=soc_tw_JSTOR 
  55. "Hello, Jaden, somewhat unfortunately, we are unable to replicate the issue you are having. Have you tried using Text Analyzer with other web browsers or computers with any success? Does pasting in some random text or a URL into Text Analyzer work?

    To help us dig deeper, would you be comfortable sending us a screenshot of the results, or lack thereof, you get from Text Analyzer? In Windows, you can press the ‘print screen’ button then ctrl+v to paste the image into an email or document.

    For comparison, the results page should look something like this:"

    --reply to Canadian student from Jarod, Mar 19 13:51
  56. "I have no other browser installed on my computer, but I am willing to send screenshots of what I am seeing on my end although I think a video (if I had the software to record it) would be more useful.

    Anyways, the images attached show the progression of the only two screen I see when I paste any text into the analyzer. I hope they help."

    --reply from Canadian student to Jarod, Mar 19 19:53
  57. "Hi,I seem to be having more difficulties with Text Analyzer, when I try to access it now, I get an error message and the page will not load.
    I've asked our IT Support if it's likely to be anything related to the previous problem (our firewall), but they are saying not, and that it seems to be coming from your side.
    Can you investigate this for me please?"

    --reply from UK librarian to Jessica, Mar 20 9:38
  58. "Thanks for getting in touch with us. We’re having some issues with Text Analyzer on our side this morning. We’ll let you know when they have been resolved. Sorry for the inconvenience. "

    --reply to UK librarian from Alex, Mar 20 9:42
  59. "No worries, thanks for the update! "

    --reply from UK librarian to Alex, Mar 20 10:03
  60. "This outage was caused by issues at Amazon Web Services, and those seem to be improving. You should be able to use Text Analyzer again. Again, we apologize for the inconvenience"

    --reply to UK librarian to Alex, Mar 20 16:18
  61. Stuck in a citation rut? Try uploading your own doc to Text Analyzer (beta) and see what articles are related to the terms in your work! https://t.co/5dZ7YQuYSn https://t.co/KGfo9Z9ePn
    Stuck in a citation rut? Try uploading your own doc to Text Analyzer (beta) and see what articles are related to the terms in your work!  http://jstor.info/prbO30j2gNZ  pic.twitter.com/KGfo9Z9ePn
  62. Stuck in a citation rut? Try uploading your own doc to Text Analyzer (beta) and see what articles are related to the terms in your work! https://t.co/KreEDYR9Rl https://t.co/pavXWPJ20g
    Stuck in a citation rut? Try uploading your own doc to Text Analyzer (beta) and see what articles are related to the terms in your work!  http://jstor.info/sI1T30j2gJH  pic.twitter.com/pavXWPJ20g
  63. "Have you developed a function to analyze bibliographies or footnotes rather than just bodies of text?"

    --email from student in Nebraska, Mar 23 2:56
  64. "Oo, I like that idea. Sadly, we've not developed something to look specifically at bibliographies or footnotes, but it's an interesting idea and we love interesting ideas. I wonder, what would you want it to do specifically with that material? "

    --reply to NE student from Alex, Mar 23 10:04
  65. "For example, when I uploaded a paper I just assumed it would scan the authors referenced in the paper, then put those authors' papers up as suggestions."

    --reply from NE student from Alex, Mar 23 11:25
  66. "Ah, yes, that would be helpful. Thanks for the suggestion! We'll add it to the "great ideas to look into" list..."

    --reply to NE student from Alex, Mar 23 12:13
  67. "i loved that this came up with very helpful documents and was exactly what i needed for my research"

    --email from athletic training major in Vermont, Mar 27 14:50
  68. "Thanks for the note! It's great to hear it helped you out. Out of curiosity, what was the subject area of the documents you were looking for?"

    --reply to VT athletic major from Jessica, Mar 27 14:52
  69. "I was looking at public health, and factory safety in the early 20tg century"

    --reply from VT athletic major to Jessica, Mar 27 14:55
  70. Stuck in a citation rut? Try uploading your own doc to Text Analyzer (beta) and see what articles are related to the terms in your work! https://t.co/KreEDYR9Rl https://t.co/cGq9iw7QSW
    Stuck in a citation rut? Try uploading your own doc to Text Analyzer (beta) and see what articles are related to the terms in your work!  http://jstor.info/sI1T30j2gJH  pic.twitter.com/cGq9iw7QSW