Hacker Newsnew | past | comments | ask | show | jobs | submit | denfromufa's commentslogin

My wife hit a wall trying to upload a hefty PDF - every “shrink” tool we tried barely compressed the size, and some even made it larger! Frustrated by the state of PDF compressors (looking at you, Adobe), I turned to LLMs - Claude, Deepseek, and Gemini came up short, but OpenAI’s o4-mini saved the day with a perfect solution. That inspired me to build pdfmini: a tiny, open‑source, client‑side HTML app that crushes PDF sizes right in your browser!!! No installs, no fees, zero privacy worries - all your data stays on your machine.

Try pdfmini now:

https://den-run-ai.github.io/pdfmini/

Source code for pdfmini:

https://github.com/den-run-ai/pdfmini


This gave me an idea. You seem to be the right person to talk to.

Here is my workflow. Have a bunch of PDFs and images I need to combine.

I go to tools.PDF24.org, Merge pdfs, then compress them, then more compress them because of size limits, then add or remove pages. Then add page numbers.

These are multiple steps.

Could we have a way of defining these terms at start, either textual or no-code-like or something where we could define stuff like

Take input, merge > compress with greyscale, Max size 1MB, add page numbers on bottom right

Or

Convert input to jpg with image size 8cm by 8cm

I know many people who simply fail at such stuff. They just throw their hands up in defeat.

Not saying we should have llms do the job but if we could have multiple actions so that people could tell the software what they have in mind.

People dont just compress PDFs, often merge and then compress.

I recently say pdfux.com but it is not as featureful as PDF24 but PDF24 crashes a lot.


#!/bin/bash

# Convert images to PDF

img2pdf *.jpg -o images.pdf

# Merge PDFs

pdfunite file1.pdf file2.pdf images.pdf merged.pdf

# Compress

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \ -dNOPAUSE -dQUIET -dBATCH -sOutputFile=compressed.pdf merged.pdf

# Remove unwanted pages (e.g., page 3)

pdftk compressed.pdf cat 1-2 4-end output final.pdf

# Add page numbers

pdfjam final.pdf --outfile final_numbered.pdf --pagecommand '{}' --landscape


You know what. I will share my script in the morning.

I used scantailor go scan a book. That gave out tif files.

So I built a script to convert them to jpg, then merge into PDF. Then OCR and add the text layer on PDF. Then compress.

I know this for a niche automation..... web OTOH where normies reside and are scared by terminal, it wont work.

Been using pdftk for years now but im only person who can use it in my office.


I'll be adding compression support for BreezePDF, so this can be done in a click


Merge/compress with Max size / color-greyscale/ remove pages / multi format import like PDF and images as input / export options/ export into multiple files if file size exeeds certain size.

And like my earlier comment, a way to define these multiple steps in a flow so that people can do multiple steps with a single file without having to learn command


This is very cool, are all these command-line tools open-source?


Yes


If you can define this as a feature request to pdfmini, please submit it on github, e.g. drag-and-drop flow builder


Well so I glanced at what that project does.

Congratulations, you've managed to "compress" PDF files by rasterizing every page to JPEG, while destroying all the vector and textual information in it.

The resulting PDF is nothing like the input -- it's just a bunch of blurry JPEG images wrapped in a PDF format.

You can't search or copy the text, and trying to print it will just make a blurry mess of the text.


Nail it. I requested a 50% compression for a 200MB PDF file that contained pictures, and the tool made it an illegible mess. I can't imagine using this tool for anything serious, like tax returns, that requires a machine-readable file.


I would appreciate if stackoverflow integrated something like a REPL or replit in their Q&A to reproduce example easily (maybe even CI?). For Python it would actually be very easy with backends such as Google Colab or even built-in ChatGPT Code Interpreter.


The highlight of this event was running with Jeff at Rice University before his talk:

https://x.com/JeffDean/status/1756319820482592838?s=20


Can you please tell more about your ML stack?


Can you please tell more about the ML stack?


When are you expecting applied scientist position to open?


Does Rosetta still work in the virtualized MacOS when using Apple’s virtualization framework?


Yes: Just now I checked TextEdit's "Open with Rosetta" box in Get Info, launched it, and saw it come up as an Intel process in Activity Monitor.

This was in a Ventura beta 2 VM run with Apple's virtualization sample project: https://developer.apple.com/documentation/virtualization/run...


One limitation I have observed is that this VM can't host another VM itself.


Reportedly M2 supports nested virtualization though.


But why would you ever want to?


Docker desktop for macOS requires a linux vm on the macOS host, so nested virtualization is required if you want to use docker desktop inside the macOS guest.

Other Tools like multipass kind and minikube on the guest will not work


It’s helpful when spinning up labs, or testing infrastructure deployments


Works in Linux VMs too. (on Ventura)



This looks like a better open-source option:

https://github.com/KhaosT/MacVM


UTM should support most of the same features (aside from ease of us for installing all macOS versions). It also now supports paravirtualisation using the hypervisor framework.

https://github.com/utmapp/UTM


> This looks like a better open-source option

Comparing the code between the two, VirtualBuddy seems like the better option to me (albeit not by a lot though). They are both lightweight wrappers around MacOS’s built in hypervisor, so I’m really not sure what you’re going on.


Since you have an opinion about this, can you explain why one option is better than the other?


VirtualBuddy is still experimental


> VirtualBuddy is still experimental

Really? You’ve gotta be trolling, right?


Sounds like you're the one trolling when the site says this upfront.

WARNING: This project is experimental. Things might break or not work as expected.


Look at the code for the one he’s suggesting. It’s the same essentially. One is just more upfront about giving its visitors a clear understanding. Taken out of context I can maybe see what you mean, but like come on, it’s not an effort to keep up is it?


Anyone using Parallels to virtualize MacOS on M1 Macs?


Yes, I am. It's quite smooth. It has its problems, but for the most part its alright. What did you want to explore further about this?

Edit: many people download the default parallels; you need to download the parallels from this page https://www.parallels.com/blogs/parallels-desktop-apple-sili... to be able to have access to M1 virtualization.


Does Rosetta still work in the virtualized MacOS in Parallels?


Yes.


Parallels run a similar thin wrapper on top of the OS-provided VM API which looks somewhat like: vm = createVM([device list]); vmWindow = createVMWindow(vm); vm.run();


Yes, and it's terrible.

You can't sign into icloud and you can't maximize a VM to 4k resolutions. It's usable, but for $100 they could do much much better.


Those are well known limitations of Apple virtualized OSes. The threshold for solving those issues involves using a different virtualization framework and a lot of reverse engineering.


Yes, I use macOS as development VM on a maxed out 16 inch M1 MacBook Pro. It all works as expected, except you don’t have any VM settings (e.g. how much ram / cpu you want to give the VM) and Docker doesn’t run inside the VM.


You can change some of the settings by editing an ini file.

https://kb.parallels.com/en/128842


Oh, I didn't know that. Thanks a lot!


I’ve been using it on Monterey. It’s not nearly as optimized as virtualized Windows or Linux on the same hardware (most Parallels features like auto-scaling not available yet), but I think the situation should improve with Ventura.


It works! I can even run x86 binaries for Windows in a VM. Don't ask me how that works though


Microsoft has their own x86 emulator.


One of the most important things not mentioned in the article is sleep. Not enough sleep and your diet will change. Not enough sleep and your body will not finish recovering overnight. Read the book “Why we sleep”. It will change your life.

P.S. I do marathon training and glycogen storing as well as activating the fat burning are essential!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: