[announce] pdfjs-viewer: A better PDF viewer for Atom

I made a package:

It does basically the same thing as pdf-viewer and pdf-viewer-plus, but it is better than both. :wink:

Than pdf-viewer because it has a fully featured UI.
Than pdf-viewer-plus because it is more polished, most obviously by a UI that blends into Atom.

Please, if you are at all interested, check it out and tell me whether it works for you.

2 Likes

It’s missing synctex capabilities, which both pdf-view and pdf-view-plus provide (indirectly, through an interaction service with pdf-view-plus).

Unfair! Your package description says “some features, such as synctex … have not been implemented yet” – so I thought I could skimp on that, too. Alright, coming up. :wink:

And I can’t actually get it to work with your package.

I’ve got it in an init script, but the service (exposing clicks, etc in terms of PDF coordinates) is available. In my opinion, the PDF viewer should be generic, but open to customisation through this service. So synctex shouldn’t be built in, because not all PDFs are for TeX, but a provider can be registered to handle them when appropriate. I just haven’t polished the script and how I’m going to provide this provider. I recommend the same approach, and would like to work with you to get a standardised API. Part of the point of the service is so that users can swap out PDF viewers easily, and that means they all need respect the same service.

As for not modifying PDFJS, which you are apparently quite proud of, that was intentional. In the interest of not making upgrading a hassle, all the modifications are in a separate CSS file, with minimal direct changes. It’s all well and good for the authors to want changes, but I don’t have any idea what is stable or not, so I’m not going to make any non style & config changes that might be broken in future. The separate file also makes keeping track of changes much easier.

2 Likes

OK, there is a new version that supports SyncTeX reverse lookup, bound to right-click.
It probably needs some polishing, but for me it already works well.
Just like for the whole thing, I’d be interested in any testing results.

Sure. I haven’t worked with services yet, but just let me know what you have in mind. For now it is builtin, about 70 lines of code.

By the way, there is a .syntex.gz parser in JavaScript: https://github.com/tdurieux/synctex-js
I dislike the dependence on the external binary, so maybe this is something to build on.

Sure I am, it was a lot of work. And I think the default styling is quite ugly, especially within a nicely designed editor like Atom. I like my stuff smooth and pretty. :slight_smile:

But more importantly, the PDF.js developers explicitly request it:
https://mozilla.github.io/pdf.js/getting_started/

I thought of that, too, but my solution was to keep the changes completely localized. It will still be some work to upgrade, but the relevant locations are easily identifiable through a diff.

I guess it should be possible to move my changes to an additional style sheet that overwrites the declarations in the standard CSS. But that only works for upgrades if the PDF.js people don’t decide to completely restructure it: https://github.com/mozilla/pdf.js/issues/11372#issuecomment-570058415

Yes, I saw the request too, but they don’t give any API stability notes on the display layer, so it will probably change with newer versions. Their request is just not practical for me, because I don’t want to spend time maintaining the package. I’ve seen too many abandoned packages to think I’ll be able to keep up PDFJS versions if I did. I’m OK with the style changes breaking if they do, until I or someone else fixes them.


Regarding services, the idea is to expose a hook into your package that others can register for. So the autocomplete-plus package consumes the autocomplete.provider service, and packages like atocomplete-html provide it. This is how packages add completions to the popup list. Similarly, to actually expand the snippet, autocomplete-plus consumes the snippets service, which the snippets package provides (though other packages can provide it too). Services are also versioned to allow updates.

Mine currently provides a pdfview service, which returns a PdfEvents object to the consumer. The consumer can then use the onDidOpenPdf method of this object to subscribe a callback, which is run on each PDF when opened. This callback is passed PdfEditor, which represents the open PDF, and it allows registering callbacks with the onDidClick, onDidDoubleClick, and scrollToPosition methods. All coordinates are in PDF points, to keep it representation agnostic.

The exact objects being passed around are not important, only the public interface is. I can also change just about everything, but I need this functionality as a minimum. I also plan to add support for flashing a region, which the atom-latex package uses to show the synctex match location.

Personally, I expect the user to have synctex installed, and dislike direct dependencies when not necessary, but that’s neither here nor there. The only thing our packages should agree on is the interaction service, the rest is up to the end user to consider.

That’s a good issue, thanks! Hopefully they make things a lot easier to style.

@allefeld I’ve documented the service here. I hope we can develop a useful API.

Also cc @ashthespy, seeing as the atom-latex package is a large part of why this could be a useful service.

I had a look, here are some comments so far:

The discussion started with your comment

So synctex shouldn’t be built in, because not all PDFs are for TeX, but a provider can be registered to handle them when appropriate.

If I understand correctly, the PDF viewer provides a service that alerts the consumer if a specific mouse event occurred on (a page in) the PDF, correct? So I guess it would be the consuming package that calls SyncTeX to translate the coordinates into line numbers? But SyncTeX needs page number and page-relative coordinates in bp, so the part that figures that out would be implemented in the provider?

In my implementation that would be the first 40 lines of handleSynctex, approximately 50% of the SyncTeX-related code.

This figuring out is based on the viewer’s HTML structure, PDFViewerApplication.pdfViewer properties pagesRotation and currentScale etc., so it has to happen in the pdf view package. – I see that that’s what PdfPosition provides.

On the other hand, the only context I know in which such information (page number and page-relative coordinates in bp) is useful is SyncTeX, which makes me wonder if we wouldn’t still have “built in synctex”, just omitting the actual call.


A possible addition to the service:

I have the problem that pdf-view's autoreload doesn’t seem to work. As a workaround, I call that package’s PdfEditorView.updatePdf method (in another, yet unpublished package Pandoc/PDF). A PDF viewer package might also have an option to disable autoreload, making such a call necessary, too. I therefore propose that the service provides a hook that can be called for that purpose.


Regarding the events, I have chosen to bind SyncTeX to right-click, which is provided by the contextmenu event. So I would either propose to add that to the list, or let the consumer decide which event(s) they want to listen to.

And a few minor comments:

  • I don’t like the name PdfEditor. It would in principle possible to create a PDF-editing package for Atom, but none of ours do that. You write that it is “similar to an Atom TextEditor”, yes, but only insofar as it lives in an Atom Pane. I propose to call it PdfView or PdfPane or something like that.

  • Are PdfClick, PdfPosition, PdfPositionWithDimen meant as separate classes? If yes, I feel that is a bit overkill. For example, the Atom API contains many points in which options objects are passed, with documentation of the fields, but without making them instances of a class.

  • Ok, PdfClick is modelled on JavaScript events which are classes. But then I would call it PdfEvent or so. And I would attach the originating JavaScript event, which doesn’t necessarily have to be a MousEvent, does it – depends on the implementation?

  • “bottom left corner”: that makes sense from the PDF specification. On the other hand, both SyncTeX and HTML/CSS/JavaScript use a top left corner-based coordinate system, which means this implies additional, technically unnecessary conversions. But still, the specification…

That’s it for now. :slight_smile:

So I would either propose to add that to the list, or let the consumer decide which event(s) they want to listen to.

Or make it completely generic, onUserSelected or something, and let the viewer decide under which circumstances this is triggered.

And finally, shouldn’t we include @izuzak in this discussion?

I’ll preface this with: conceptually, it should be possible for an external PDF viewer to implement this protocol. There is no need for the provider to be a PDF.js iframe.

That’s part of it, yes. It also allows the consumer to control other things, like scrolling to a position (and potentially more; rotation, zoom, etc. could all be added).

Yes, the consumer is the only one that needs to know about SyncTeX. I’ve uploaded my basic reverse SyncTeX consumer here. It uses the location of the click to know the page number and location, which is calculated by the provider.

  • It’s the most natural representation agnostic way I know of to represent a location.
  • There may be more than one package that wants to provide SyncTeX capabilities. This is about giving options, so that the user isn’t stuck with the provider’s implementation (e.g., a dedicated LaTeX package may be able to integrate it more naturally).

Yup, my plan too. The interface might be something like

  • PdfEditor.setAutoReload(enabled: boolean) <- we haven’t specified that a provider must auto reload files, but this can just be a no-op if they don’t.
  • PdfEditor.reload(uri: string) <- argument could be optional to represent same URI as before

this works much better with packages that regenerate the PDFs, as they will know when the changes have finished and it can be reloaded.

The consumer already can decide, but the event does need to be exposed. I’ll add it to the list.


Sure, PdfView works too.

How you represent it is up to the provider, the spec only requires that the listed properties and methods exist, and do the expected thing. They don’t need to be ‘classes’, just regular objects are fine. You could even skip PdfPosition and always use a PdfPositionWithDimen; this wouldn’t break the spec.

It’s exactly like the options objects in Atom. I’ve just given them names that can be referred to.

That was specifically meant for a click (action on a specific location of the PDF), but it could perhaps be generalised.

I don’t want to attach the actual event, as any information should be serialisable to something like JSON. But we could add more context to it.

This is why PdfPositionWithDimen exists, so conversions are at least trivial. I’m OK with either, the important thing is to be consistent.


This is reasonable, but consumers may want to determine what “selected” means for themselves. It should be possible to add both though, and providers can always default to a single left click if they don’t want to add configuration for it.


sorry @izuzak. If we get this stable, would you be interested in a PR to implement it?

OK, updated the spec with the suggestions and added some additional tweaks. I avoided using onUserSelected as a name, as that implies a text selection. I’ve started naming the versions in the revision history, so changes can be compared. I won’t make an actual changelog until it’s stable.

Thanks for updating. Further comments:

Offer way to directly open / retrieve specific PDF?

Yes, definitely.

Keeping SyncTeX in mind as the application, for a PDF-generating package it doesn’t really make sense to keep track of all open / newly opened PDFs via observePdfViews / onDidOpenPdfView, but only of those it has generated. The user can open other PDFs in Atom, which may or may not have been generated by TeX, and for which source may or may not be available. I therefore think it is necessary to have a way to directly open a PDF, returning a PdfView.

Right now the way I do it (and I suspect you, too) is that the viewer registers an opener which responds to file paths ending in .pdf, and the PDF-generating package uses atom.workspace.open, which returns a Promise which resolves into a view object. Actually, I don’t really see a way around registering an opener, because that is the way to create a view in Atom.

I would therefore propose the following for the specification: The provider has to provide a way to directly open a PDF, specifically it has to do so by registering an opener for file paths ending in .pdf (and maybe other extensions, I see you added .ai). The returned Promise has to resolve to a PdfView object.

getUrl()

What URL schema are you thinking of here? file://?

onDidInteract

That’s your renaming of what I proposed as the generic onUserSelected? Ok. But if you keep the event-specific listeners onDidClick, onDidDoubleClick, I’d propose to add onRightClick or onContextMenu, too.

reload(uri?: string)

I propose to remove the uri parameter. PDF.js certainly supports loading another PDF in the existing viewer, but in the Atom context I find it better that one tab / pane item is dedicated to one file. If another PDF needs to be loaded, that’s what openPdf would be for.

PDF points

Just for absolute clarity, I would explicitly say in the specification that a PDF point is 1/72 inches.

Btw. do you know for certain which resolution PDFViewerApplication.pdfViewer.currentScale is relative to? Or to put it differently, at what resolution the PDF is rendered when “100%” is selected? I assumed it is the “standard browser resolution” of 96 dpi, but I’m not sure whether that holds generally (Retina?).

Pages start at 0.

If we’re keeping in line with the way SyncTeX does things, pages should start at 1.

PS: I’ve seen that izuzak has made his last post here in 2014, and was last seen last May. We should contact him via an issue on his pdf-view repository.

Similar for ashthespy, last seen end of October.

It does, because the user may open the PDF directly. The consumer does not need to subscribe to any PDF events though. This is why I added the getPath and getUrl methods, so the consumer gets some context about the opened PDF.

I don’t think this is necessary; regardless of how the viewer opens the PDF, the callbacks are expected to work. Similarly, what counts as a PDF (.pdf, .ai, etc.) can be left to the viewer. It should still try to open any URI passed to it though (see this for a way to open specific files reliably with a custom protocol).

Yup, though it should note that the promise may be rejected if the file could not be found or opened as a PDF.

That was to prevent the service being limited to PDFs on disk. So actual files would have the file:// protocol. The viewer can always reject the open if it doesn’t recognise the URI.

:sweat_smile:. The problem with onUserSelected to me was that if we did add something for selecting text (or a provider wanted to extend its capabilities to offer this), that’s the name I would have picked.

Right click is covered by the button property of PdfMouseEvent. I don’t know about context menu; do you have a use case in mind? In general it’s not good to let things interfere with the context menu.

Makes sense. I was thinking it may be useful to support renaming a PDF, but opening a new PDF should work fine.

I don’t use that. See this for how I get the position (note this is from bottom left). It appears to work for all scales and rotations.

That wasn’t quite the intention, but it’s fine either way.


Just noting that defining what corner the locations are relative to may not be ideal. See this. I’ll try to experiment with how PDFjs handles different origin locations, but I feel it will be best to go back to using the PDF origin and directions when working with locations. This is the natural PDF coordinate system, PDFjs supports it nicely, and consumers can translate it as needed. It means the width and height properties might need some clarification / adjustment too.

RE: PDF locations

The crop box is the rectangle that viewers like PDF.js show. From the PDF spec 14.11.2

The crop box defines the region to which the contents of the page shall be clipped (cropped) when
displayed or printed.

For synctex, it follows the TeX convention (TeXbyTopic 26.1) of using the top left of the “page”. This is not a direct PDF concept, and has no relation to the crop box. From web2c/synctexdir/synctex.c:

… The origin of the coordinates is at
the top left corner of the page. For pdf mode, it is straightforward, but
for dvi mode, we’ll have to record the 1in offset in both directions,
eventually modified by the magnification.

Synctex does not handle a custom crop box such as one made by

\pdfpageattr{
/MediaBox [-100 -100 1000 1000]
}

(CropBox defaults to MediaBox if not defined).

I haven’t tried different unit lengths yet, though I’m not sure if they can be applied to the page boxes. I think it’s best to make the Xpoint and Ypoint values be from the PDFs page box coordinate system

  • It doesn’t invent a new arbitrary coordinate system
  • A different box (media, trim, bleed, or art) could potentially be displayed instead of the crop box. This would break the meaning of “top left”
  • Synctex has no relation to the top left of any page box, it just looks like that with default settings
  • PDF.js has pageView.viewport.convertToPdfPoint(relX, relY), which takes the position of the mouse relative to the top left of the box and returns the PDF coordinates, properly handling zoom and rotation.

The only issue is how to define width and height; they can still be taken as the crop box dimensions, but then other properties might be necessary to identify where on the page was clicked (as the origin can be in a completely arbitrary location).

Maybe we misunderstand each other here. Yes, the callbacks should work regardless how the PDF was opened.

But for a PDF-generating package there should be one unified way to open the generated PDF and get a PdfView object in the process. Otherwise it would have to adapt to different PDF-viewer packages’ different opening methods.

Of course the specification could say that there is one unified custom URL scheme, maybe pdf-view://. But I would prefer to keep it simple, and as far as I can tell all three currently existing viewers register an opener not for a custom URL scheme, but for a plain pathname ending in .pdf (or .ai).

I propose for the specification to say that such an opener responding to pathnames ending in .pdf has to be provided by the PDF-viewer package. That doesn’t prohibit it from additionally supporting a custom URL scheme, or alternative file extensions – it would be the minimum supported way to open a PDF.

It’s not really about the context menu. It’s that I wasn’t able to get right-click mouse events through a standard click listener on the iframe’s document, but I was able to get right-clicks through a contextmenu. I agree that feels like the wrong thing to do, so if you know how to do it through click?

Interfering with the context menu: practically not an issue because PDF.js doesn’t define one, and Atom’s doesn’t work within an iframe. But I get your point.

On the other hand, listening to left clicks might also interfere with whatever the viewer chooses to do with left clicks normally. Which in my mind speaks for only supporting a generic event (onDidInteract), and leaving it up to the viewer how that event is triggered.

Agreed, I wasn’t sure about the change anyway.

Didn’t know, yes, using that is much better than fiddling around with internals. Is this method documented anywhere?

I looked whether there is a standard PDF way to index pages (0-based or 1-based) but couldn’t find anything. If there is, that one should be used for the specification, too.