How much money does Lemmy.ml need to temporarily boost their servers?

OsrsNeedsF2P@lemmy.ml · 1 year ago

How much money does Lemmy.ml need to temporarily boost their servers?

nutomic@lemmy.ml · 1 year ago

The site currently runs on the biggest VPS which is available on OVH. Upgrading further would probably require migrating to a dedicated server, which would mean some downtime. Im not sure if its worth the trouble, anyway the site will go down sooner or later if millions of Reddit users try to join.

Pisck@lemmy.ml · 1 year ago

There will either be an hour of downtime to migrate and grow or days of downtime to fizzle.

I love that there’s an influx of volunteers, including SQL experts, to mitigate scaling issues for the entire fediverse but those improvements won’t be ready in time. Things are overloading already and there’s less than a week before things increase 1,000-fold, maybe more.

OsrsNeedsF2P@lemmy.ml · 1 year ago

8 vCore
32 GB RAM

😬

2 follow-ups:

Can we replace Lemmy.ml with Join-lemmy.org when Lemmy.ml is overloaded/down?
Does LemmyNet have any plans on being Kubernetes (or similar horizontal scaling techniques) compatible?

makingStuffForFun@lemmy.ml · 1 year ago

We need Self hosted team and team networking to represent. Would be amazing to see some community support in scaling Lemmy up.

poVoq@slrpnk.net · 1 year ago

Maybe some dns fail-over for lemmy.ml to point to join-lemmy.org might be cool indeed 🤔

tmpod@lemmy.pt · 1 year ago

Yeah, was thinking of a DNS based solution as well. Probably the easiest and most effective way to do it?

nutomic@lemmy.ml · 1 year ago

Can we replace Lemmy.ml with Join-lemmy.org when Lemmy.ml is overloaded/down?

I dont think so, when the site is overloaded then clients cant reach it at all.

Does LemmyNet have any plans on being Kubernetes (or similar horizontal scaling techniques) compatible?

It should be compatible if someone sets it up.

Leigh@lemmy.ml · 1 year ago

You could configure something like a Cloudflare worker to throw up a page directing users elsewhere whenever healthchecks failed.

nutomic@lemmy.ml · 1 year ago

Then cloudflare would be able to spy on all the traffic so thats not an option.

Leigh@lemmy.ml · 1 year ago

spy on all the traffic

That’s…not how things work. Everyone has their philosophical opinions so I won’t attempt to argue the point, but if you want to handle scale and distribution, you’re going to have to start thinking differently, otherwise you’re going to fail when load starts to really increase.

Cadendee [they/them]@lemmygrad.ml · 1 year ago

Cloudflare does have the ability to spy on traffic though, they hold SSL keys.

TheAnonymouseJoker@lemmy.ml · edit-2 4 months ago

Removed by mod

Cadendee [they/them]@lemmygrad.ml · edit-2 1 year ago

A better option for a simple usecase like that is using something from your DNS provider. Depending on who you use they may have a health check service that has no access to user data that can simply ping a URL, and if it fails hard enough, start redirecting traffic to join-lemmy.org

I think Constellix has it, though I’m not necessarily recommending them specifically

wagesof@links.wageoffsite.com · 1 year ago

You could run an interstitial proxy yourself with a little health checking. The server itself doesn’t die, just the webapp/db. nginx could be stuck on there (if it’s not already there) with a temp redirect if the site is timing out.

Sam BOT@slrpnk.net · 1 year ago

How about https://deflect.ca/ they could still spy but probably less bad?

Lobstronomosity@lemmy.ml · edit-2 1 year ago

I’m sure you know this, but getting progressively larger servers is not the only way, why not scale horizontally?

I say this as someone with next to no idea how Lemmy works.

nutomic@lemmy.ml · 1 year ago

Its better to optimize the code so that all instances benefit.

Lobstronomosity@lemmy.ml · edit-2 1 year ago

Is it possible to make Lemmy (the system as a whole) able to be compatible with horizontally scaling instances? I don’t see why an instance has to be confined to one server, and this would allow for very large instances that can scale to meet demand.

Edit: just seen your other comment https://lemmy.ml/comment/453391

nutomic@lemmy.ml · 1 year ago

It should be easy once websocket is removed. Sharded postgres and multiple instances of frontend/backend. Though I don’t have any experience with this myself.

wiki_me@lemmy.ml · 1 year ago

I think that is unavoidable, Look at the most popular subreddits , they can get something like 80 million upvotes and 66K comments per day, do you think a single server can handle that?

Splitting communities just so that it will be easier technically is not good UX.

Bob/Paul@fosstodon.org · 1 year ago

@nutomic @Lobstronomosity In one of the comments I thought I saw that the biggest CPU load was due to image resizing.

I think it might be easier to split the image resizer off to its own worker that can run independently on one (or more) external instances. Example: client uses API to get a temporary access token for upload, client uploads to one of many image resizers instead of the main API, image resizer sends output back the main API.

Then your main instance never sees the original image

ccunix@lemmy.ml · 1 year ago

There is already a docker image so that should not be too hard. I’d be happy to set something up, but (as others have said) the DB will hit a bottleneck relatively quickly.

I like the idea of splitting off the image processing.

nutomic@lemmy.ml · 1 year ago

Image processing isnt causing any noticable cpu load.

ccunix@lemmy.ml · 1 year ago

I saw someone say it was, obviously I have no access to data.

nutomic@lemmy.ml · 1 year ago

Maybe on another instance but not on lemmy.ml

pe1uca@lemmy.one · 1 year ago

What’s the current bottleneck?

Dessalines@lemmy.ml · 1 year ago

SQL. We desperately need SQL experts. It’s been just me for yeRs, and my SQL skills are pretty terrible.

Valmond@lemmy.ml · 1 year ago

Put the whole DB in RAM :-)

Makes me remember optimization, lots of EXPLAIN and JOIN pain, on my old MySQL multiplayer game server lol. A shame I’m not an expert …

poVoq@slrpnk.net · 1 year ago

There are some SQL database optimisations being discussed right now and apparently the picture resizing on upload can be quite CPU heavy.

itsmikeyd@lemmy.ml · 1 year ago

SQL dev here. Happy to help if you can point me in the direction of said conversation. My expertise is more in ETL processes for building DWs and migrating systems, but maybe I can help?

nutomic@lemmy.ml · 1 year ago

https://github.com/LemmyNet/lemmy/issues/2877

poVoq@slrpnk.net · 1 year ago

this seems to be the relevant issue: https://github.com/LemmyNet/lemmy/issues/2877

veroxii@lemmy.world · 1 year ago

I’ve been helping on the SQL github issue. And I think the biggest performance boost would be to separate the application and postgresql onto different servers. Maybe even use a hosted postgresql temporarily, so you can scale the db at the press of a button. The app itself appears to be negligible in terms of requirements (except the picture resizing - which can also be offloaded).

But running a dedicated db on a dedicated server - as close to the bare metal as possible give by far the best performance. And increase it for more connections. Our production database at my data analytics startup runs a postgresql instance on an i9 server with 16 cores, 128GB RAM, and a fast SSD. We have 50 connections set up, and the run pgbouncer to allow up to 500 connections to share those 50. And it seamlessly runs heavy reporting and dashboards for more than 500 business customers with billions of rows of data. And costs us less than US$200pm at https://www.tailormadeservers.com/.

Cadendee [they/them]@lemmygrad.ml · 1 year ago

And I think the biggest performance boost would be to separate the application and postgresql onto different servers.

I think hexbear.net (an older lemmy fork-ed site) is working on this in conjunction with moving back to a modern lemmy version

Mike@lemmy.ml · edit-2 1 year ago

apparently the picture resizing on upload can be quite CPU heavy

This suggestion probably won’t help with hosted VPS, but lib nvJPEG pushes crazy theoretical numbers for image resizing.

Maybe this could be worth investigating?

poVoq@slrpnk.net · edit-2 1 year ago

Probably not, but it does mention a more general CUDA based solution that might be interesting to add to Pictrs. I could for example move my Pictrs instance onto a server that does have an older Nvidia GPU to accelerate stuff (to use for Libretranslate and some other less demanding ML stuff).

Edit: Ok looks like the resizing is anyways only supported on Pictrs 0.4.x which most Lemmy instances are not using yet. However this seems to use regular ImageMagick in the background, so chances are quite high that it can be made to work with OpenCL: https://imagemagick.org/script/opencl.php

Mike@lemmy.ml · 1 year ago

deleted by creator

esturniolo@lemmy.ml · 1 year ago

And may be the bandwidth. Serve thousands and thousands need at minimum 1gbps.

nutomic@lemmy.ml · 1 year ago

Its mostly text so bandwidth shouldnt be a problem.

Ashwag@lemmy.ca · 1 year ago

So reading this correctly, it’s currently a hosting bill of 30 Euro a month?

Milan@discuss.tchncs.de · edit-2 1 year ago

No, thats the 8 GB memory option… if its the biggest, it should be around 112 €. Meanwhile i keep wondering if i should let Lemmy stay on the current KVM (which is similarely specked but with dedicated cores and stuff) or if it is better to move it to one of my dedis just in case… well… will see xD

nutomic@lemmy.ml · 1 year ago

Its the one for 30 euros, Im not seeing any vps for 112. Maybe thats a different type of vps?

Milan@discuss.tchncs.de · 1 year ago

in vservers, it depends on the memory … and storage option for the one starting at 30…

nutomic@lemmy.ml · 1 year ago

It currently has 8gb and only uses 6gb or so. CPU is the only limitation.

Milan@discuss.tchncs.de · 1 year ago

It does not sound like OVHs vServers offer dedicated cores, and it is common to quickly become a bottleneck with VPS offerings across hosters and for example with the initial Mastodon hypes, i had to learn that shared hardware lesson the hard way. For the price you are currently paying, maybe something like a used dedicated (or one of the fancy AMD ones) server at Hetzner is of interest: https://www.hetzner.com/sb

nutomic@lemmy.ml · 1 year ago

Hetzner is great but they are very strict about piracy, so its not an option for lemmy.ml. For now the load has gone down so I will leave it like this, but a dedicated OVH server might be an option if load increases again.

Leigh@lemmy.ml · 1 year ago

You should use this relatively quiet time to migrate to a larger server, because when the time comes where you need to do it, you’re going to be in for a world of hurt. This is the calm before the storm–take advantage of it.

Ultimately, you need to scale horizontally. You need to shard your database and separate out your different functions (database, front end, whatever back end applications you use, etc) onto different servers, all fronted by load balancers. That’s going to be the only way to even begin to handle increasing load. If you don’t have a small team of experienced engineers with a deep understanding of how to build for scale, and you get a sudden mass exodus of users from Reddit, you’re fucked. So if I were you, here’s what I’d do:

Scale up to the largest instance type you can. If possible, switch (at least temporarily) to AWS and use something in the c6i instance family, such as the c6id.32xlarge. Billing for AWS instances is done by the hour, so you wouldn’t need to pay for an entire month up front if you only need that extra horsepower for a few days (such as when the blackouts are planned from the 12th through 14th).
Because the above will do nothing but buy you time until you crash–and if you get a huge spike of users, without horizontal scaling, you WILL crash–migrate your DNS to something like Cloudflare. From there, configure workers to respond when health checks to your site fail, so that users attempting to access the site can be shown a static page directing them to something like http://join-lemmy.org or someplace, instead of simply getting 5xx errors.
Once the hug of death is over, evaluate where you stand. Reduce your instance size, if you can, and start investigating what it’s going to take to scale horizontally.

I’m not a SQL expert, but I am a principal network architect, and my day job for the last 15 years has been working on scale and automation for the world’s largest companies, including 7 years spent at AWS. In my world, websites like Reddit, as large as they are, are still considered to be of ‘average’ size. I can’t help you with database, but I’m happy to provide guidance around networking, DNS, scale, automation, security, etc.

sysgen@lemmy.ml · 1 year ago

Hexbear ran (runs?) on Hetzner, I don’t recall them ever having an issue.

Sam BOT@slrpnk.net · 1 year ago

I’m relatively new to https://elest.io/pricing but it seems an easy way to scale stuff up (and down again) Dockerised, just upgrade the plan to the next tier when needed. Pay by the hour. Downgrade it again later.

There’s also a bunch of load balancer options I haven’t even begun to explore yet.

If you select Hetzner it’s EU based & powered by green energgy

Sam BOT@slrpnk.net · 1 year ago

With Elestio you can choose from a range of cloud providers.

epical@lemmy.ml · 1 year ago

Nowadays doesn’t even make any sense to use servers. Everyone already have decent computers and/or smartphones able to host their own content (text) and their friends content. Doesn’t require much. Why not create something better? We already have decentralized finance but we still using centralized social networks. How’s this possible?!

nutomic@lemmy.ml · 1 year ago

Then users would have to deal with key pairs. By using websites we get the domain system which users are already familiar with. And it supports normal password login which is impossible in p2p.

epical@lemmy.ml · 1 year ago

deleted by creator

roho@lemmy.ml · edit-2 1 year ago

Nowadays doesn’t even make any sense to use servers. … Why not create something better?

i think you might underestimate the problem.

Jami.net (a decentralized messaging app) works p2p. it uses a torrent-like distributed-hashmap to locate the peers at any moment. (The main usability issue for nontechnical users, is that devices on an internal ip address aren’t addressable from outside. This requires (a single point of failure and privacy concern), a turn-server)

They started to incorporate Git for merging chats for the reason that any of set of peers (of a group chat) can be out of reach of another set of peers, i.e. the chat continues on different branches and needs to be merged again later.(this happens in the clients-app, because there is no central server). Jami is aiming at double-digit group sizes… That’s not nearly the size of what Lemmy is handling.

epical@lemmy.ml · 1 year ago

deleted by creator

ch1cken@discuss.tchncs.de · 1 year ago

Everyone already have decent computers and/or smartphones able to host their own content (text) and their friends content

What if there was something like lemmy, but p2p, similar to how peertube works. And for dead content it could fallback to a server?

epical@lemmy.ml · 1 year ago

deleted by creator

elouboub@kbin.social · 1 year ago

Is it running in a single docker container or is it spread out across multiple containers? Maybe with docker-machine or kubernetes with horizontal scaling, it could absorb users without issue - well, except maybe cost. OVH has managed kubernetes.

Divided by Zer0@lemmy.ml · 1 year ago

Do you have the frontend a DB serving in the same VPS? If so it would be a great time to split them. Likewise if you DB is running in a VPS, you’re likely suffering from significant steal from the hypervisor so you would benefit from switching to a dedicated box. My API calls saw a speedup of 10x just from switching from a VPS DB to a Dedicated Box DB.

I just checked OVH VPS offers and they’re shit! Even at 70 Eur dedicated on hetzner, you would gain more than double those resources without steal. I would recommend switching your DB ASAP for immediate massive gains.

If you’re wondering why you should listen to me, I built and run https://aihorde.net and are handling about 5K concurrent connections currently.

nutomic@lemmy.ml · edit-2 1 year ago

Hetzner is very strict about piracy so thats not an option. And now is almost weekend so I wont have time for a migration. Anyway there are plenty of other instances in case lemmy.ml goes down.

Edit: I also wouldnt know which size of dedicated server to choose. No matter what I pick, it will get overloaded again after a week or two.

Divided by Zer0@lemmy.ml · 1 year ago

Even if you choose Hetzner, it won’t even know it has anything to do with piracy because it will be just hosting the DB, and nobody will know where your DB is. That fear is overblown.

Likewise believe me a dedicated server is night and day from a VPS.