Sr. Site Reliability Engineer(devops) Resume
5.00/5 (Submit Your Rating)
Atlanta, GA
SUMMARY:
- Worked closely with Developers, DBAs, Security, Network Engineers & Solution Architects
- Design, Architect & Implement technology solutions
- Served as the Lead Engineer on linux projects
- Served as the Technical Project Manager on linux projects
- Training in DevOps and tools(Puppet, Chef, Git, Github, Jenkins, Python, Ruby, AWS, Vagrant, Docker, etc)
- Architected Tripwire Security Solution
- Deployed Splunk company wide
- Redesigned and built FICO Falcon POC environment for the QA environment
- Hands on build out of Redhat servers
- Install and Configure Apache web servers
- Installing and Configuring Oracle on Redhat servers
- Member of project team that upgraded TLS & SHA certs
- Install SSL Certs
- Deployed Web Applications on Jboss. Tomcat
- Experience with Big Ip F5 load balancer
- Very knowledgeable about SQL commands, statements
- Worked in Data Centers to maintain servers and network uptime
- Experience monitoring production, dr & test environments
- Experience with server monitoring tools such as Nagios, Groundworks, Webmetrics & Foglight
- Experience working with Mercury SiteScope and Solarwinds to monitor for server, router and circuits failures
- Well rounded knowledge in IT includes web hosting, application support, server support, coding, dns, sql, email and engineering solutions
- Experience bash scripting to create monitoring scripts
- Experience on Linux/Unix(Redhat, Centos, FreeBSD) and Windows servers
- Experience troubleshooting production applications and servers
EXPERIENCE:
Confidential, Atlanta, Ga
Sr. Site Reliability Engineer(DevOps)
Responsibilities:
- Collaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable software solutions
- Works with Product Team to ensure user stories that are developer - ready, easy to understand, and testable
- Writes custom code or scripts to automate infrastructure, monitoring services, and test cases
- Writes custom code or scripts to do "destructive testing" to ensure adequate resiliency in production
- Configures commercial off the shelf solutions to align with evolving business needs
- Creates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
- 20% - Support & Enablement:
- Fields questions from other product teams or support teams
- Monitors tools and participates in conversations to encourage collaboration across product teams
- Provides application support for software running in production
- Proactively monitors production Service Level Objectives for products
- Proactively reviews the Performance and Capacity of all aspects of production: code, infrastructure, data, and message processing
- Participates in learning activities around modern software design and development core practices (communities of practice)
- Proactively views articles, tutorials, and videos to learn about new technologies and best practices being used within other technology organizations
- Daily team stand ups - give an update of the stories you are working on pivotal tracker
- Floor stand up - managers or leaders of all the devoplement teams give an update on projects, recognize outstanding work of teams members and introduce new team members
- Sprint retros & deliveriables meetings - all the stories/tasks you are assigned from the project
- Work on pivotal tracker stories(current intertation, backlog and ice box) and assign point value to story based on complexity
- Use appdynamics(APM) to gather applications baseline stats and used them on custom SLO dashboard
- Monitor application performance to ensure it meet SLO(service level object)...monitor for volume, availability, latency, errors, tickets)....VALE( Confidential )
- Perform application reliability reviews with developers and other stake holders
- Write scripts to monitor application performance
- Configure dashboards in graphite/grafana using collectd
- Excluding known errors in AppD
- Use Extra Hop to analyze network for issues that could be affecting application
- Use Extra Hop to monitor the number of requests to Web Services End Points(focus more monitoring on high usage end points)
- Pull monitoring metrics from App Dynamics using Restful APIs and JSON and feed into Dashboard
- Convert word docx to markdown file and publish to Github
- Setup monitoring on web services end points(VALET stats on ComOrder for example)
- Setup monitoring for http error codes in AppD
- Deploy code(apps) from Github to Cloud Foundry using PCF CLI
- Create topics on Apache Kafka distributed messaging system
- Create splunk queries and setup alerts
- Creating alerts in AppDynamics
- Testing a web service end point by calling its rest api using postman
- Complete an Application Reliability Review on Cassandra database
- Add, Delete & Update records in MySQL database
- Clone, Pull & Push Dashboards repositories using Git commands
- Working in a fast paced, evolving, growing, dynamic agile environment
- Create RPMs from from tar and zip files using FPM
- Upload RPMs into Spacewalk to be picked up by Puppet
- Wrote Python script to send messages/metric from AppD to Google Cloud Platform(Google PubSub Topic) & from PubSub to SLO Dashboard
- Developing monitoring strategies for asynchronous(queue) & synchronous(real time) communications
- Write scripts(curl) to stress test applications/servers
- Perform destructive testing in QA and QP environments
- Perform version control with Git Tagging
- Creating alerts to monitor business transactions(Order Creates, Order Recall, etc) and feed to SLO Dashboard
- Export monitoring data from rhel kafka servers to appd using appd's machine agents and monitor.xml confile file
- Creating Kafka dashboards and alerts in App Dynamics using JMX metrics
- Use jconsole to monitor/view Kafka mbeans and jmx metrics
- Monitor key Kafka metrics such as UnderReplicatedPartitions, ActiveControllerCount, OfflinePartitionsCount & UncleanLeaderElectionsPerSec
- Install AppD Java Agent for jmx metrics and AppD machine agent for host level metrics
- Configure yaml config file to export kafka jmx/mbeans metrics to AppD for time series dashboarding
- Evaluate Prometheus application performace monitoring tool
- Use Vagrant to configure Virtualbox with Centos and Puppet
- Write puppet modules to automate deployments and enforce state configurations
- Involed in the Kafka DR design process
- Use Docker to test applications such as prometheus and grafana locally on my laptop
- Install AppDynamics java agent & machine agent on Elastic Search servers
- Puppertize the AppDynamics agent installation for Elastic Search servers
- Monitor key Elastic Search metrics such as shards
- Deploy puppet modules from GIT using a jenkins job
- Verifying Puppet modules deployments via the Puppet dashboard
- Respond to alerts via PagerDuty during on-call
Confidential, Alpharetta, GA
Systems Engineer
Responsibilities:
- Worked with project manager and project team to gather project requirements, and define project scope
- Architected, Built, Deployed and Configured Tripwire across multiple data centers
- Deployed Splunk company wide
- Identified and remediated PCI DSS related problems on Redhat servers
- Built Redhat servers(RHEL 5, 6 & 7)
- Installed and Configured Oracle on Redhat servers via HPSA & PXE booting
- Installed and configured Couchbase database
- Patched redhat servers using Redhat Satellite server
- Upgraded TLS and SHA certs
- Installed and configured apache and ssl certs
- Troubleshooting Firewall, IP routing, NATing & Trunking issues
- LVM manangement - creating filesystems and mounting them
- Using single user mode and rescue/recovery mode to rescue and troubleshoot servers
- Use iDRAC to remotely manage physical servers
- Use VMware vSphere to manage VM infrastructure
- Install Websphere and configure on rhel servers
- Create ssh keys for passwordless server access
- Create DNS records(A records, Cnames, MX, PTR)
Confidential, Alpharetta, GA
Systems Administrator
Responsibilities:
- Adding servers to BladeJump/Oasis updating/upgrading servers to fix vulnerabilities shown in mcafee port scan
- Deploy applications on unix & windows servers
- Responding to servers down and outages
- Approve or Deny privilege requests in Oasis communicate via Q messenger and Confidential & Confidential Connect
- Create and implement ichanges
Confidential, Atlanta, GA
Senior Applications Engineer
Responsibilities:
- Troubleshoot application and web severs such as Apache, Tomcat, JBOSS, IIS, Coldfusion on LINUX & WINDOWS servers
- Investigates, analyzes, and resolves operational issues with production applications.
- Works proactively to improve efficiency and application reliability
- Analyzes, develops, tests, and implements PYTHON, PERL, BASH shell scripts to prevent and or resolve problems with production applications.
- Analyzes application failures and provides input on how to prevent recurrences.
- Identifies and implements automation of routine tasks using scripting or other tools.
- Consults with users to identify current operating procedures and to clarify program objectives.
- Works with the development team and external business partners to implement and support production applications.
- Serve as an escalation point for NOC engineers
- Provides mentorship for junior team members
- Communicate with all levels of staff and management-- both internal to Confidential and external with business partners.
- Works with the development team to develop operational procedures for new products
- Deployment of applications to test, dr and production environments
- Conducts performance testing on applications prior to release
- Produce and run SQL queries
- Provide Data Analysis
- Evaluate new tools and applications
NOC Engineer
Confidential
Responsibilities:
- Performed the following day to day duties:
- Ran Select statements in PL/SQL to verify scripts added columns/rows(records) to Database table
- Added(Update) columns/rows/records to Database tables
- Reset session pools in JBOSS from JMX console
- Start/stop communicator in JBOSS from JMX Console
- Restart JBOSS and all dependent servers after application deployment on jboss server
- Attempt to log into JMX Console to verify that jboss is running properly after jbosss restart
- Used Groundwork command, services and host templates, and profile to create alerts for server(added monitoring services from groundwork on hosts in other words)
- Deploy new build of application code to servers(jboss & coldfusion)
- Review Event Viewer Application logs and send runtime errors to developers
- Configured Big Ip F5 to work with different load balancing methods for pool members such as round robin, least connections, etc
- Configured Big Ip F5 to work with different health monitors for pool members such as imcp, coldfusion, etc
- Take servers out of F5 load balancers and restart services on servers and put servers back in F5
- Run scripts to add text file content to databases
- Write scripts to automate routine tasks
- Troubleshoot issues as they arrive
- Move databases to different servers(Update data sources if coldfusion site)
- Move websites to different servers(Update directory mappings if coldfusion site)
- Update Coldfusion mappings and data sources in Coldfusion Administrator
- Review log files while troubleshooting applications
- Study cache servers hit/miss reports
- Monitor alerts in Groundwork, Nagios, Webmetrics, etc
- Initiating and leading crisis calls
- Initiating and leading QACC and PCC calls
- Responding to ddos, dos attacks
- Create virtual directories in IIS
- Stop jboss via command line, download code build(ear file or rpm) via ftp, deploy code to jboss via command line, start jboss via command line, reset jboss session pools via jmx console, test sites urls
- Starting and stopping jboss using its built in commands run and shutdown
- Monitoring heap memory(old gen) via Jboss JMX console and resolving heap memory problems
- Checking garbage collection on Jboss via jboss server logs
- Look Confidential Jboss Postoffice for information on server cluster
- Look Confidential Jboss ServerPeer for information on message count & queue depth
- Add users to groups in Active Directory
- Perform memory dump after receiving memory error alerts using JMAP
- Perform thread dump after receiving too many active threads alerts via serverinfo in JMX Console
- Use SVN CO command to download development repositories
- Nightly server reboots
- Run marcos to generate weekly reports
- Provide monthly SLA reports to clients
- Oracle Sqlplus
- Work with vendors such as Pythian
- Used Oracle 11g Warehouse builder design center to perform ETL staging runs
- Used PL/SQL developer to generate competitive intelligence reports for clients
- Added servers and port numbers to Tunnelier tool for access within server farm
- Run queries on database to see how fast queries are processing
- Look Confidential the date & time of entries into the reservations application log file to verify reservation are processing and troubleshoot if there are problems
- Look Confidential Databridge application log file to view date and time of entries to verify it's processing and resolve any issues
- Create ftp accounts
- Looking Confidential jboss server logs for errors such as too many open connections when getting too many open files alerts
- Research why services stopped running(looking Confidential event viewer application logs, google research, etc)
- Start and stop JMS bridge
- Troubleshoot messages not sending from JMSBridge server
- After restarting JMSBridge check for exception errors in JMS STDout.log
- Verify that reservation messages are processing in JMS STDout.log
- Verifying via ServerPeer in JMX console that JMS queues have initialized after restarting JBOSS and before restarting JMSBridge
- Generate Monthly Reports using Oracle Reports and saving output as HTML
- Write and Run scripts that replicate oracle updates across multiple clusters
- Work with QA team to resolve issues after a major deployment
- Use Terminal Server Manager to logoff inactive users
- Zip log files on servers using zip and gzip commands
- Give new users/employees administer rights on windows/linux servers & sudo rights on linux servers
- Verifying that Type A and Type B traffic is going out to and being received from Pegasus
- Creating macros for repetitve tasks
- Configuring & Troubleshooting GFI Faxmaker server software
- Change the number of open files limit for jboss
- Correct application deployment mistakes such as moving backup folder with application ear, jar files out of the deploy folder
- Use free command to check memory on server
- Use top command to check load on server
- Executing scheduled or unscheduled tasks
- Add servers to mremote
Confidential
Freelance Web Developer
Responsibilities:
- System Engineering(Installed operating systems, setup web servers, application servers, etc)
- Internet Marketing(helped clients with SEO, Google Adwords, Affiliate Marketing, etc)
- Create flowchart and pseudo code for applications
- Gather client requirements and take application from conception to creation
- Installed SSL certs
- Installed sites on web servers with Confidential
- Installed and setup sites on Wordpress, Joomla and Drupal
- Installed Fantastico Confidential
Confidential, Atlanta, GA
Operations Engineer
Responsibilities:
- Troubleshoot & Test banking applications payplus & cashplus
- Monitored different servers availability
- Monitored Solarwinds for alerts for server and router failure
- Maintained 3 clustered, 11 esx host servers, 411 virtual machines environment
- Used Cisco VPN client to log onto Confidential VPN
- Used resource pools to manage resources of virtual machines
- Used clusters to manage the resources of the hosts
- Set up SNMP traps for troubleshooting
- Worked with SQL server 2000 and SQL server 2005
- Troubleshoot remote fax servers
- Added virtual machine to hosts.
- Added virtual machines hosts to cluster.
- Monitored foglight for errors on the bank servers(Windows 2003)
- Started and Stopped foglight agents on the foglight console
- Started the foglight service on the windows bank servers
- Monitor cpu utilization, find and resolve cause of high cpu usage
- Clear resolved issues from foglight alerts monitor through foglight web console
- Take the temperature of many rtu zones in data center
- Performed scheduled server checks such as disk space, last time anti-virus software updated
- Monitored/Killed server processes
- Restarted services on servers
- Start/Stop applications and components
- Troubleshoot wirehouse application
- Performed daily checks of certain services to make sure they are running
- Performed daily operational checks
- Verify that you can telnet to production and test servers
- Verify that scheduled backups completed
- Performed tape backup duties using legato storage solution
- Verify all bank servers were backed up to off site backup location
- Performed nightly scheduled server reboots
- Performed Windows server 2003 and application troubleshooting
- Verify that certain files were loaded on the server
- Processed Change Control requests
- Went to CAB(change approval board) meetings
- Went to Tech Review meeting
- Performed Data Center walk though to make sure all server cabinets are closed, wires are neat, check amber lights on machines, check for leaks from the ceiling
- Used crash carts to connect to malfunctioning servers where a remote connection wasn’ Confidential possible
- Installed guest operating systems on virtual machines
- Supported cash management, payment, ach, cashplus and payplus applications
- Unlocking accounts and resetting passwords in Active Directory on the primary domain controller(PDC) for terminal server users
- Take escalations from level 1 help desk
- Train, mentor level 1 help desk
- Worked closely with Network Engineers, Database Administrators and Development team to solve complex system and network issues
- Add servers to mremote so that servers could be accessed remotely
- Checked and updated data center inventory(servers, routers) and added to database
- Open, update, track and close tickets in Onyx
- Labeled servers
- Document shift turnover logs
Confidential, Atlanta, GA
Sr. Application Support Analyst
Responsibilities:
- Assisted customers with mysql and phpmyadmin
- Managed DNS and zone files
- Went through server logs to resolve issues with disk quota
- Assisted customers with pop, smtp email
- Assisted customers with websites, html, frontpage
- Administered web hosting servers, both windows and linux
- Modified IIS and apache web servers
- Assisted customers with mysql and phpmyadmin
- Managed dns and zone files
- Went through server logs to resolve issues with disk quota
- Write some bourne shell scripts for system administers