Sr. Big Data Infrastructure Engineer Resume
Washington, DC
SUMMARY
- 7+ years of intensive experience in development environments wifin cross - platform systems
- Extensive experience implementing and maintaining Open-SourceDistributed System, Big Data and Search Engine solutions
- Diverse experience includes developing reporting and procedural standards
- Experienced working wif HL7 and NHINin Healthcare industry
- Diagnosing and resolvingcomplextechnical problems in cross-platformsystems
- Engineering and managementas well as performance monitoring and capacity planningon distributed systems
- Extensive noledge ofSoftware Development Life Cycle(SDLC)
- Strong technical noledge inRDBMS, NOSQL and Object Oriented Designed andimplementedsystem
TECHNICAL SKILLS
Languages: Bash/Shell Scripting, SQL, Java, JSON, CSS, XML/XSLT, XPath, XQuery
Platform: UNIX,Linux, Windows, VMware, VirtualBox, Hyper-V
Database & Data warehousing Tools: MS SQL, MySQL, Hbase, Postgres, Redshift
Distributed System: Apache Hadoop Ecosystem, Hadoop HDFS, YARN, HBase, ZooKeeper, Pig, Hive, Scoop, Flume, Spark, Falcon, Ambari, Apache Solr/Lucene (SolrCloud), SplunkIT infrastructure frameworks & monitoring: Cloudera Manager, Cloud AWS infrastructure (EMR, EC2, S3), Lucidworks Fusion, Hortonworks HDP & HDF, Cloudera Distribution for Hadoop (CDH4, CDH3), Nagios, Kibana/Banana, Ambari
Web & Application Servers: Apache web Server, Jetty, Tomcat, Jboss
Networking: TCP/IP, DNS, LAN/WAN, NAT, LDAP/AD
Analysis/Design Methodologies: Scrum Agile, UML, REST, J2EE
Others: MS Visio, Subversion, Selenium, Puppet, Kerberos, Knox, Ranger
PROFESSIONAL EXPERIENCE
Confidential, Washington DC
Sr. Big Data Infrastructure Engineer
Responsibilities:
- Designed, installed and configured HDP 2.6.3 clusters include: Hadoop, YARN, HDFS, Zookeeper, Solr, Oozie,Flume, Hive, HBase,Kafka, Spark
- Setup and configured Ambari 2.6 for provisioning and monitoring
- Setup external solrcloud 7.1 cluster on Ambari and configured backup and recovery
- Setup firewallD in Redhat 7
- Setup Nagios for monitoring 27 Hadoop clusters wif over 217 hosts, virtual machine and bare metal Redhat Linux 6 and 7 servers
- Setup and configured Puppet for automation
- Knowledge transfer wif team members
Confidential, VA
Sr. Software Engineer
Responsibilities:
- Designed, Provisioned, maintained and upgraded Search Application Lucidworks Fusion (SolrCloud) in AWS GovCloud, Redhat Linux servers
- Setup and configured monitoring tools, New Relic and Zabbix
- Provided data management and indexing solutions for governmental agencies
- Leveraged daily Scrum/Agile meetings to communicate and clarify issues and solutions
- Took lead on migration of teh application from GovCloud to a new AWS environment
- Designed, installed and configured HDP-Hadoop cluster in AWS on Linux hosts
- Leveraged Splunk for log aggregation and stacks troubleshooting
Confidential, NYC
Big Data & Solr Engineer
Responsibilities:
- Designed and installation of HDP2.2cluster includes 20 nodes onRedHatLinux environment
- Setup and configure HDP cluster high availability
- Set upAmbari2.1for provisioning, managing and monitoring Hadoop cluster
- Secured Hadoop clusters wifKerberos,LDAP,RangerandKnox
- Setupand auto-configured Hadoop cluster inPuppetenvironment
- Installation, configuration, andadministration ofHDP Stack includes servicesHBase, Solr, Hive,YARN,Oozie,Flume,Kafka, SparkandHDFS
- Setupand configuredNagiosandAmbaritoleverage systemhealth check, metrics collection and alert framework
- Setupback-up system andSnapshot data replicationfor HDFSusingFalcon
- Setup and configuredSolrCloudCluster witansembleZooKeepercluster andKibana/Bananadata visualization
Confidential, Hamilton, MD
Software Search Engineer
Responsibilities:
- Designed and initiated a prototype for“FDA-Drug-Labels Indexing Solutions”
- Performed requirement gathering and data analysis
- Set up and configured Solr search engines in“Standalone”and“SolrCloud”modes
- Setup ETL tools, parsers, and transformers (DIH, Tika, XPath) to extract, transfer, and load data to Solr nodes
- Defined customized Solr Schema for indexing different data formats (XML, HTML, PDF, JPEG, metadata…)
- Investigated possibility of integration software, frameworks, and libraries to enhance performance of teh system including:“OpenNLP”(Machine learning based toolkit),“Logstash”(loading log files to Solr),“Kibana/Banana”(Data visualization)
- Set up and configured data visualization framework“Kibana/Banana”to visualize time-series and non-time-series data, indexed in Solr nodes
- Reconfigured and customized search engine and performed functional testing (both indexing and searching) for quality assurance
Confidential, VA
Solr Search Engineer
Responsibilities:
- Set up and configured SolrCloud search solution running on Jboss includes 3 clusters, 9 shards and 9replicas on Linux servers to process TBs of dataset andprovided near real time search
- Configured quorum ZooKeeper running as a central configuration for Solr clusters
- Installed and configured Jboss clusters
- Performed testing and debugging of different searchcomponentsfor high availability and performance enhancement
- Provided comprehensive documentation
Confidential, Washington DC
Healthcare Business Analyst & Application Engineer
Responsibilities:
- Setup & installed Solr/Lucenesearch solution underTomcat-Jboss environment on Azure Cloud
- Provided replication configuration for Solr/Lucene indexes
- Defined data source connectivity from Solr to MS SQL & S3 on AWS
- Setup & installed SSL connectivity between Solr nodes
- Installed and configured SolrCloud cluster on Unix servers
- Performed requirements analysis
- Daily administration and management of multiple Linux servers
- Installed and configured Hadoop nodes/Cluster on Amazon Web Services
- Automated start-up script for pulling data from MySQL and ingesting into Hadoop Distributed File System (HDFS)
- Wrote shell scripts for Log-Rolling day to day processes
- Executed importing and exporting data into HDFS and Hive using Sqoop
- Executed manipulating and managingdata wif Hive and Pig
- Managed teh day-to-day operations of teh cluster for backup and support
- Performed performance monitoring and capacity planning
- Reliability testing to reveal potential problems arising from extended runs
- Solr performance testing to determine teh response time of a given request
- Executed expectations and general performance standards as set forth by teh company
- Oversaw and directed teh quality assurance review of monthly activity reports including validation of results
