Repo-Lookout: Fix repository leaks
Repo information publicly available
One fine day I got an email from Repo Lookout. It said that one of my repositories was open to access from the internet. This posed a potential security risk as it might contain secret source files, hidden functions or even passwords.
At first I thought this “your repo is not secure” warning was a phishing attempt. But by simply entering the links contained therein, it became clear that Repo Lookout was right and that my Lectures repo was not only public as I had wanted, but that the version control metadata was also openly available on the web server.
Is that bad?
Normally, the internal structure and configuration (actions, discussions, wiki, etc.) behind a repository should not be public. Especially not if the repository is set up as private
. But even with public repos, no one should be able to access the structure behind them.
Hence the mission of Repo-Lookout:
“Find source code repositories that have been inadvertently exposed to the public and report them to the domain’s technical contact.” - Repo Lookout /about (Crissy Field GmbH)
In this case, not critical, but unexpected and unpleasant.
Doesn’t do me any harm, because my web server only has the files there for retrieval and even if they were tampered with, they would have had no effect on my repository. I also had all secrets stored in specially created files as described in this article, so that they no longer appear in the configuration files. The repository is also located separately on the Gitea server instance. Nevertheless, only what I consciously want to make public should be connected to the Internet.
The image section from Gitea shows the same commit as the warning from Repo Lookout.
How it came about
In the deploy pipeline for my subdomain lectures.schallbert.de
and landing page schallbert.de
I have a direct checkout from the Gitea instance to the web server Caddy. This starts automatically as soon as the main
branches receive an update. The runner starts a checkout action, which copies the repository to the corresponding directory on the web server.
Checkout action also copies the .git
folder
The .git
folder is simply set up as well. This is where all the data required by the version control software for state management of the repo is stored. I cannot see the copying process itself on Gitea, because the hidden .git
folder does not even appear in the directory there. Understandable, because the entire display on Gitea is based on this folder’s content.
So the unwanted behavior remained under my radar - and according to Repo Lookout I am by no means the only one who has this problem.
Option 1: Fix on the web server
The most obvious solution is to block access to the file on the server side. This costs few resources and is easy to set up.
This forum entry shows how to do it: hide-entire-folder-caddyfile. For the files I want to protect, I add the following entries to the Caddyfile
:
# /caddy2/Caddyfile
# [...]
respond /.git/* "Access denied" 403
respond /.gitea/* "Access denied" 403
This tells Caddy that when any file /*
in the folder /.git
is called, it should respond with the error code 403
“Forbidden”. The wildcard (*
) is absolutely necessary, otherwise only the folder itself and not the files it contains will be locked.
To check, I check what happens when I request the Git logs:
# Terminal
curl https://lectures.schallbert.de/.git/logs/HEAD
Access denied
Funktioniert prima!
Option 2: Fix in the checkout action
There is, however, a more elegant solution: ensure beforehand in the pipeline that the folder does not appear on the server at all.
Option 1: Using sparse-checkout
sparse-checkout allows you to select folders and files that should be part of the checkout. All other files in the repository remain untouched and do not appear in the branch. This saves time and storage space, especially with large repositories. But of course it only makes sense if you already know in advance that not all files need to be touched.
Negative list for sparse-checkout
In my case, I don’t want to copy the folders mentioned above to the server using checkout, but I want to copy everything else. How do I do that? Using negation in no-cone
mode.
“The user has explicitly said ‘I want these directories and not those directories.’” - Derrick Stolee, Microsoft, on Github
The Checkout Action Guide states that sparse-checkout
is also supported for the action automated by the runner.
Now I program with the help of the Github Actions Cheet Sheet:
# /.gitea/workflows/deploy-lectures.yml
# [...]
steps:
- name: --- CHECKOUT ---
uses: actions/checkout@v4
with:
path: ./tmp
sparse-checkout: |
/*
!.git
!.gitea
sparse-checkout-cone-mode: false
# [...]
To clarify: The script for publishing to my web server uses the action checkout
, subfunction sparse-checkout
and includes all files in the folder in the root directory tmp
and below except .git
and .gitea
.
What is no-cone mode
?
By default, sparse-checkout
expects a list of folders to include for checkout. In no-cone
mode, a list of patterns is expected instead. All operators that can also be used in the .gitignore
to specify files, folders, omissions, etc. are possible here. This allows me to exclude certain folders, but has some significant disadvantages. Due to the much higher complexity of the pattern commands, the associated susceptibility to errors and the significantly more computationally intensive evaluation for larger repositories, the use of the no-cone
mode is not recommended and is listed as “deprecated” in the documentation. Nevertheless, the proof is in the pudding!
Test with sparse-checkout
Now I upload the action to my Gitea instance and let my runner run it once. Then I log in to the web server and check whether the .git
folder was created or not:
lectures# ls -la
[...]
drwxr-xr-x 8 root root 4096 Dec 20 10:41 .git
[...]
Damn, the folder is still there. I look in the logs of the action on my Gitea instance:
[...]
hint: git branch -m <name>
Initialized empty Git repository in /workspace/schallbert/lectures/tmp/.git/
[...]
::group::Setting up sparse checkout
[command]/usr/bin/git config core.sparseCheckout true
So it’s not because of sparse-checkout
. It’s because of the way checkout works: Obviously, the .git
folder is absolutely necessary for setting up the repository properly on my web server. So the only option I have is to delete it automatically after checking out.
Option 2: rm -rf
And so I try it by force:
# /.gitea/workflows/deploy-lectures.yml
# [...]
steps:
- name: --- CHECKOUT ---
uses: actions/checkout@v4
with:
path: ./tmp
- name: --- REMOVE TEMPORARY FILES ---
run: |
rm -rfv ./tmp/.git ./tmp/.gitea
# [...]
And finally the .git
folder no longer appears on my web server and my “repo leak” is patched. Thanks again to Repo Lookout!
Conclusion
I had the problem that the hidden .git
folder, where the configuration and structure of repositories are stored, was published on my web server unintentionally and without my knowledge.
I have presented two working options for solving the problem here:
- Set up an access ban on the web server
- Modify the pipeline so that it automatically deletes the
.git
folder after it has been rolled out.
The second option is a bit more complex to implement, but it gets to the root of the problem instead of just fixing the symptoms. It also corresponds to the first principle of data protection: data minimization takes precedence over protective measures.
“What doesn’t exist cannot be lost” - Schallbert