Networking (Storage) Is Hard

So in a way the crazy redundant remote storage for remote storage is back to what I’m doing…however I’ve learned a LOT along the way and I’ve learned something. It doesn’t matter if storage is involved or not; networking is hard.

Call It A StorageNetwork?

So let me start out by saying that all of this crap started out a month ago during an attempt to solve a bottleneck in my storage situation; which largely was the fact I was relying on network attached storage. I couldn’t even get a dedicated network connection to the hardware I was using as a server. This wasn’t a concern for the first 2 years when my internet connection was only 75mbps symmetrical. Trying to pull enough data from a network mount to sustain that amount of bandwith, on a single interface, with GigE; it wasn’t any problem.

Now make that internet connection go from 75mbps symmetrical….to 1Gbps symmetrical. Suddenly…trying to serve data from the network storage means I’m getting about 500mbps out at best; and this was with the best possible combination of the server and storage being on the same switch, avoiding possibly dealing with additional switch saturation elsewhere. My original plan to solve this was to pick up a GigE interface PCExpress card for that HP laptop, plug the Seagate PersonalCloud in to it, and bridge the network interfaces.

So when I moved the PersonalCloud down here, it was to originally just get it out of where it was. The switch to Debian was caused by two things; some issues that started with it before getting my Dell R610, and getting my R610. The R610 if you don’t know has 4 gigabit network interfaces and then a 100mbps port strictly for it’s onboard system management module (iDRAC; yes, it’s datacenter/enterprise equipment). It seemed like the perfect build a poor-man’s SAN (Storage Area Network). Toss Debian on the Personal Cloud, plug it in to one of the network ports, (hey….does anyone remember the days where you needed a special crossover cable and not miss those?) and then bridge it on to an virtualized internal LAN. In fact, originally I was going to do iSCSI over it. The performance sucked. I was told it was because my shitty drive wasn’t suited for that type of usage. I thought it was maybe the device not being powerful enough. I think it was just the drive taking a shit.

But what I had in the brief period before it totally failed; was basically an NFS appliance bridged to the same internal LAN, with one VM sharing that NFS mount over Samba. It worked…ok. But the drive was failing…so…there is that.

Why Does This Shit Need A GUI?

For long enough to put drives in the thing, let it do it’s setup once, trashing it’s initial storage pool, then redoing it as a JBOD with zero data-protection (no, I don’t learn); I wasn’t even aware the thing had SSH access. So I was pretty convinced that I would need to keep browser access to it; meaning I was going to have to either dump it on my main LAN or figure out how to route between the LANs. It was just within the last couple of weeks that I learned a lot more about how kernel routing works…so I figured it would be easy.

It was…but it wasn’t, for reasons. So while I worked on those, I just bridged it’s network interface to the main LAN. The overall performance was actually really good; I pushed large ISOs over at around 100MB/s over SMB from my Windows machine, which itself is bridged through a VM to the main LAN. The VM’s were writing over NFS at about the same speed; read speeds were good too. I was pretty happy since I was hitting the limit of what I could do over gigabit LAN.

But…that routing thing…it bugged me. I wanted to figure it out. I also realized while I may have avoided bottlenecks in network switches; there was still this pesky issue the VMs were still operating over one interface. Maybe full-duplex somehow helped? I don’t know. I didn’t fully test before I started working on that problem again.

All Of Your Routers Need To Know The Route

So here’s the basic issue I was having; I could talk to/ping the machine I configured as my gateway to 10.1.1.0/24, and I could even ping it’s 10. IP with no issue. I could ping everything on 192.168.1.0/24 just fine from the gateway; but I wasn’t getting packets to travel elsewhere. I fought this as a firewall rules issues, I fought this every way I could think of.

Long story short…it came down to a couple of ip routes and a screwed up networking state on the VM. I isolated a VM to just the 10.1.1.0/24 interface and booted it up; the VM got the IPv6 prefix and DNS. This is fine, except this information isn’t supposed to be broadcasted on the 10.x interface in the router VM. I also couldn’t ping it’s 10. IP from the VMs at all. Restarting the networking service failed; so I verified the configs were what I should have needed and rebooted the VM. Waiting for it to come back up, I made sure both my test VM and laptop had the proper routes through the networking/router VM. It worked. I pinged the 10. test VM from my laptop; and I was able to ping it. What I couldn’t do was ping anything else on my 192 network, like nothing.

It was then I remembered I forgot about a pretty critical piece of equipment, my other router; the ISP’s router that’s doing all the IPv4 on the 192 network. The router VM was only broadcasting IPv6 routes, so of course only the lone machine that has the route will get to it. I thought about progressing to setting up something to broadcast the route to the main LAN; but it was just a whole lot easier to set a static route in the ISP’s router/gateway. Hamburger! I pinged 192.168.1.1 from the 10. network; packets were making it to the internet and back.

So at this point I decided to install a DHCP server on the router VM for the same reasons I did it with the Seagate; it seemed easier. I wouldn’t have to set a static IP on the Synology and kick myself should I need it on a different network later. It’s the only device using DHCP on that network; so we get a static assignment by pure virtue of giving it a single IP range. I unpluged the Synology from the LAN to ensure network reset, took down all the network interfaces, brought them up with the new configuration, started up DHCPv4, looked to see if all the services came back up, and plugged the Synology’s cable back up.

dewdude@ipv6:~$ dhcp-lease-list
To get manufacturer names please download http://standards.ieee.org/regauth/oui/oui.txt to /usr/local/etc/oui.txt
Reading leases from /var/lib/dhcp/dhcpd.leases
MAC                IP              hostname       valid until         manufacturer
===============================================================================================
ff:ff:b0:0b:ee:ee  10.1.1.2        synology       2020-06-01 01:16:52 -NA-

Results from my laptop were just as pleasing, the Synology’s interface opened in the browser, and it’s SMB share loaded at it’s new IP.

It’s Still Complicated And Stupid

So at this point I had the Synology connected to a physical Ethernet port, bridged to an internal virtual LAN for the Xen VMs. But this still didn’t solve two last issues; Samba and redundant bridging. The Sama issue I could have probably solved by just really screwing with the Samba conf on the Syno; but I decided I’d rather just install Samba in a fresh VM that can serve as my guest/public access. I don’t have to keep banging my head around what Synology does with their Samba this way; the new install is largely just set up for a single read-only share that requires no authentication. I also moved my SFTP service over to this VM as well.

So what about the redunant briding?

So technically there was some bridging within bridging occurring on the router VM for bridging “network 2” to the internal LAN for the NAS; because “network 2” is itself just a network bridge for the second physical interface. So, I was bridging a bridged interface in to a virtual interface that itself was bridge. Bridges inside bridges inside bridges. Bridgeception. Yo dawg, I heard you like network bridges.

So why not just set all the devices to just use the existing network 2 bridge Xen provides? That’s basically been my last tweak. The Synology is connected to the third network port, and all the VMs have an interface connected to “network 2”.

Whether or not it makes any difference, I don’t know; it seems to reason that having storage on a dedicated network interface would mean it can push it out to the main LAN as quickly as it can suck it from the NAS. Or does it? I’m thinking it probably does, but I will say I’ve noticed a difference in speed between accessing my VM’s SMB share of a NFS mount vs. accessing it’s SMB share directly with just the VM routing it over. The double sharing costs around 20MB/s.

Of course that’s just fine for the people I’m making it for.