Show HN: ReadToMe (iOS) turns paper books into audio https://ift.tt/9MO7VbW
Show HN: ReadToMe (iOS) turns paper books into audio I'm launching something that started as a side project publicly today: ReadToMe, which is an iPhone app that turns paper books and other printed text into audio. Originally this was a Christmas present for my fiancĂ©e, who loves books but has an eye problem that makes it hard for her to read more than a few pages at a time. She mostly listens to audiobooks while following along with the paper book, but some books aren't available in audiobook or even e-book form, and all of the existing apps we tried were surprisingly bad at scanning paper books into audio — they make lots of mistakes, include footnotes and page numbers, etc., in a way that really degrades the experience. Being an AI-oriented engineer by training, I had a crack at solving the problem myself, and was pleasantly surprised at how well the proof of concept worked. I then had some time free while shutting down my previous company (Mezli, YC W21), during which I polished up the app to the point you see it at now. The way it works: On the front end, it's a SwiftUI app (mostly written by ChatGPT!) that consists mostly of a document scanner (VNDocumentCameraViewController) and a custom-built audio player. The back end is more complex — book photos are first sent to an OCR API, then some custom code I wrote does a first pass at stitching together and correcting the results. Then, the corrected OCR results are sent to GPT-3.5-turbo for further post-processing and re-stitching together, and finally to a text-to-speech API for conversion to audio. The hardest part of this process was actually getting the GPT calls right — I ended up writing a custom LLM eval framework for making sure the LLM wasn't making edits relative to the true text of the book. A few issues remain, which I'll work on fixing if the app gets a significant amount of traction, including: 1) It can take multiple minutes to get audio back from a scan, especially if it's on the longer side (10+ pages). I'll be able to bring this down by spinning up dedicated servers for the OCR and TTS back-end. 2) The LLM sometimes does TOO good of a job at correcting "mistakes" in book text. This issue crops up particularly often when an author deliberately uses improper grammar, e.g. in dialogue. The app is priced at $9.99/month for up to 250 pages/month right now, which I estimate will just about cover the costs of API calls. I'll be bringing the price point down as the pricing of the required AI APIs comes down. There's also a 3-day free trial if you want to try it out. If you do find this useful, or know somebody who might, I'd appreciate you giving it a try or letting them know! And please let me know if you have any feedback, including issues or feature requests. https://ift.tt/RqeNokd February 5, 2024 at 12:56AM
No comments