We provide IT Staff Augmentation Services!

Sr. Big Data Infrastructure Engineer Resume

5.00/5 (Submit Your Rating)

Washington, DC


  • 7+ years of intensive experience in development environments wifin cross - platform systems
  • Extensive experience implementing and maintaining Open-SourceDistributed System, Big Data and Search Engine solutions
  • Diverse experience includes developing reporting and procedural standards
  • Experienced working wif HL7 and NHINin Healthcare industry
  • Diagnosing and resolvingcomplextechnical problems in cross-platformsystems
  • Engineering and managementas well as performance monitoring and capacity planningon distributed systems
  • Extensive noledge ofSoftware Development Life Cycle(SDLC)
  • Strong technical noledge inRDBMS, NOSQL and Object Oriented Designed andimplementedsystem


Languages: Bash/Shell Scripting, SQL, Java, JSON, CSS, XML/XSLT, XPath, XQuery

Platform: UNIX,Linux, Windows, VMware, VirtualBox, Hyper-V

Database & Data warehousing Tools: MS SQL, MySQL, Hbase, Postgres, Redshift

Distributed System: Apache Hadoop Ecosystem, Hadoop HDFS, YARN, HBase, ZooKeeper, Pig, Hive, Scoop, Flume, Spark, Falcon, Ambari, Apache Solr/Lucene (SolrCloud), SplunkIT infrastructure frameworks & monitoring: Cloudera Manager, Cloud AWS infrastructure (EMR, EC2, S3), Lucidworks Fusion, Hortonworks HDP & HDF, Cloudera Distribution for Hadoop (CDH4, CDH3), Nagios, Kibana/Banana, Ambari

Web & Application Servers: Apache web Server, Jetty, Tomcat, Jboss


Analysis/Design Methodologies: Scrum Agile, UML, REST, J2EE

Others: MS Visio, Subversion, Selenium, Puppet, Kerberos, Knox, Ranger


Confidential, Washington DC

Sr. Big Data Infrastructure Engineer


  • Designed, installed and configured HDP 2.6.3 clusters include: Hadoop, YARN, HDFS, Zookeeper, Solr, Oozie,Flume, Hive, HBase,Kafka, Spark
  • Setup and configured Ambari 2.6 for provisioning and monitoring
  • Setup external solrcloud 7.1 cluster on Ambari and configured backup and recovery
  • Setup firewallD in Redhat 7
  • Setup Nagios for monitoring 27 Hadoop clusters wif over 217 hosts, virtual machine and bare metal Redhat Linux 6 and 7 servers
  • Setup and configured Puppet for automation
  • Knowledge transfer wif team members

Confidential, VA

Sr. Software Engineer


  • Designed, Provisioned, maintained and upgraded Search Application Lucidworks Fusion (SolrCloud) in AWS GovCloud, Redhat Linux servers
  • Setup and configured monitoring tools, New Relic and Zabbix
  • Provided data management and indexing solutions for governmental agencies
  • Leveraged daily Scrum/Agile meetings to communicate and clarify issues and solutions
  • Took lead on migration of teh application from GovCloud to a new AWS environment
  • Designed, installed and configured HDP-Hadoop cluster in AWS on Linux hosts
  • Leveraged Splunk for log aggregation and stacks troubleshooting

Confidential, NYC

Big Data & Solr Engineer


  • Designed and installation of HDP2.2cluster includes 20 nodes onRedHatLinux environment
  • Setup and configure HDP cluster high availability
  • Set upAmbari2.1for provisioning, managing and monitoring Hadoop cluster
  • Secured Hadoop clusters wifKerberos,LDAP,RangerandKnox
  • Setupand auto-configured Hadoop cluster inPuppetenvironment
  • Installation, configuration, andadministration ofHDP Stack includes servicesHBase, Solr, Hive,YARN,Oozie,Flume,Kafka, SparkandHDFS
  • Setupand configuredNagiosandAmbaritoleverage systemhealth check, metrics collection and alert framework
  • Setupback-up system andSnapshot data replicationfor HDFSusingFalcon
  • Setup and configuredSolrCloudCluster witansembleZooKeepercluster andKibana/Bananadata visualization

Confidential, Hamilton, MD

Software Search Engineer


  • Designed and initiated a prototype for“FDA-Drug-Labels Indexing Solutions”
  • Performed requirement gathering and data analysis
  • Set up and configured Solr search engines in“Standalone”and“SolrCloud”modes
  • Setup ETL tools, parsers, and transformers (DIH, Tika, XPath) to extract, transfer, and load data to Solr nodes
  • Defined customized Solr Schema for indexing different data formats (XML, HTML, PDF, JPEG, metadata…)
  • Investigated possibility of integration software, frameworks, and libraries to enhance performance of teh system including:“OpenNLP”(Machine learning based toolkit),“Logstash”(loading log files to Solr),“Kibana/Banana”(Data visualization)
  • Set up and configured data visualization framework“Kibana/Banana”to visualize time-series and non-time-series data, indexed in Solr nodes
  • Reconfigured and customized search engine and performed functional testing (both indexing and searching) for quality assurance

Confidential, VA

Solr Search Engineer


  • Set up and configured SolrCloud search solution running on Jboss includes 3 clusters, 9 shards and 9replicas on Linux servers to process TBs of dataset andprovided near real time search
  • Configured quorum ZooKeeper running as a central configuration for Solr clusters
  • Installed and configured Jboss clusters
  • Performed testing and debugging of different searchcomponentsfor high availability and performance enhancement
  • Provided comprehensive documentation

Confidential, Washington DC

Healthcare Business Analyst & Application Engineer


  • Setup & installed Solr/Lucenesearch solution underTomcat-Jboss environment on Azure Cloud
  • Provided replication configuration for Solr/Lucene indexes
  • Defined data source connectivity from Solr to MS SQL & S3 on AWS
  • Setup & installed SSL connectivity between Solr nodes
  • Installed and configured SolrCloud cluster on Unix servers
  • Performed requirements analysis
  • Daily administration and management of multiple Linux servers
  • Installed and configured Hadoop nodes/Cluster on Amazon Web Services
  • Automated start-up script for pulling data from MySQL and ingesting into Hadoop Distributed File System (HDFS)
  • Wrote shell scripts for Log-Rolling day to day processes
  • Executed importing and exporting data into HDFS and Hive using Sqoop
  • Executed manipulating and managingdata wif Hive and Pig
  • Managed teh day-to-day operations of teh cluster for backup and support
  • Performed performance monitoring and capacity planning
  • Reliability testing to reveal potential problems arising from extended runs
  • Solr performance testing to determine teh response time of a given request
  • Executed expectations and general performance standards as set forth by teh company
  • Oversaw and directed teh quality assurance review of monthly activity reports including validation of results

We'd love your feedback!