Big Data/hadoop Engineer Resume
Raleigh, NC
SUMMARY
- Experienced Software Developer with 6+ years of IT experience in development, system design, enhancement, maintenance, support, re - engineering, debugging and engineering of mission critical complex applications.
- Experience working with Openstack (icehouse, liberty), Ansible, Kafka, ElasticSearch, Hadoop, StreamSets MySql, Cloudera (CDH5.4), UNIX Shell Scripting, PIG scripting, Hive, FLUME, Zookeeper, Sqoop, Oozie. Python, Spark, git and a variety of RDBMS in the UNIX and Windows environment, agile methodology.
- 2+ years of experience in Hadoop and Big Data related technologies as a Developer, Administrator.
- Hands on experience with Cloudera and Apache Hadoop distributions.
- Followed Test driven development of Agile Methodology to produce high quality software.
- Worked with Data ingestion, querying, processing and analysis of big data.
- Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build.
- Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
- Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
- Good Knowledge on Tableau.
- Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.
- Worked with setting up benchmark results IO / memory bound tests.
- Experience in loading data from LINUX file system to HDFS using Sqoop.
- Experienced in working with OpenStack Platform and with all its components such as Compute, Orchestration and Swift.
- Worked with different services provided by Openstack such as Neutron, Nova and Cinder.
- Experience in Web Content Management.
- Knowledge in Web Technologies such as HTML, CSS, Word Press, Joomla.
- Ability to solve problems using advanced Excel skills like Pivot tables, Regression & VBA.
- Ability to work effectively in cross-functional team environments.
- Ability to learn new technologies and to deliver outputs in short deadlines.
- Research-oriented, motivated, proactive, self-starter with strong technical, analytical and interpersonal skills.
- Hands on experience with Spark architecture and its integrations like Spark SQL, Data Frames
- Replaced existing MR jobs with Spark streaming & Spark data transformations for efficient data processing
- Hands on experience with Real-time Streaming using Kafka & Spark into HDFS.
- Ability to spin up different Openstack instances using cloud formation templates.
- Experience with Cloud Infrastructure like Cisco Cloud Services.
- Hands on experience with NoSQL Databases like HBase for performing analytical operations.
- Hands on experience on writing Queries, Stored procedures, Functions and Triggers.
TECHNICAL SKILLS
HADOOP Ecosystem/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Impala, Zookeeper, Kafka, Spark SQL, Spark Streaming and Cloudera Manager
Cloud Infrastructure: Cisco Cloud Services
Programming Languages: SQL, C, C++, Core Java, Scala, Shell, Pig Latin, Hive-QL
Databases: Oracle, Teradata, MySQL, SQL Server, HBase, Cassandra
Web Technologies: HTML, DHTML, CSS, XML, XSLT, Java Script and CSS
Version Control: SVN, CVS and Git
IDES & Command Line Tools: Eclipse, Net Beans, WinSCP
PROFESSIONAL EXPERIENCE
Big Data/Hadoop Engineer
Confidential, Raleigh, NC
Responsibilities:
- Created Streamsets pipelines for collecting Logs, Alerts and metrics from customer Vpod s.
- Created automated python scripts to validate the data flow through elastic search. worked with Influxdb to store the metrics data collected from each customer Vpod.
- Configured Streamsets to attach the VpodID to each data flowing through and create topics in kafka.
- Setting up the project/tenant with keystone user role.
- Creating the network, router, Subnet.
- Creating instances in openstack for setting up the environment.
- Setting up the ELK( ElatsticSearch, Logstash, Kibana) Cluster.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Trouble shooting any Nova, Glance issue in openstack, Kafka, Rabbitmq bus.
- Performance testing of the environment - Creating python script to load on IO, CPU.
- Experience with OpenStack Cloud Platform.
- Good Understanding of Spark
- Experienced in Provisioning Hosts with flavors GP(General-purpose),SO(Storage Optimize), MO(Memory Optimize), CO(Compute Optimize).
Big Data/Hadoop Engineer
Confidential, Raleigh, NC
Responsibilities:
- Created Partitioning, Bucketing, and Mapside Join, Parallel execution for optimizing the hive queries decreased the time of execution
- Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
- Participated in requirement gathering of the project in documenting the business requirements.
- Experienced in working with OpenStack Platform and with all its components such as Horizon, Keystone, Heat.
- Experienced in working with Cisco Cloud Services which is built on top of openstack’s Havana and Icehouse releases.
- Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
- Experienced in working with Cloud Computing Services, Networking between different Tenants.
- Responsible for Performance BenchMarking Tests for Hadoop and Analyze the results with Bare metal servers
- Installed and Worked with Hive, Pig, Sqoop on the Hadoop cluster.
- Developed HIVE queries to analyze the data imported to hdfs.
- Worked with Sqoop commands to import the data from different databases.
- Experience with OpenStack Cloud Platform
- Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
- Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.
- Worked with setting up benchmark results IO / memory bound tests.
Environment: Openstack, Hadoop, Hive, pig, Map Reduce, Python, Ansible, Rhel7, Cloudera, Sqoop, Oozie, Impala.
Software Engineer
ConfidentialResponsibilities:
- Loaded the data using Sqoop from different RDBMS Servers like Teradata and Netezza to Hadoop HDFS Cluster.
- Performed Sqoop Incremental imports by using Oozie on the basis of every day.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce pattern.
- Performed Optimizations of Hive Queries using Map side joins, dynamic partitions and Bucketing.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
- Created different Hive Generic UDFs and UDAFs analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard and stored them in different summary tables.
- Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Monitored and managed Hadoop cluster using the Cloudera Manager web- interface.
- Coordinated the Pig and Hive scripts using Oozie workflow.
- Load and transform large sets of structured, semi structured and unstructured data that includes Avro, sequence files and xml files.
Environment: Hadoop, Big Data, HDFS, MapReduce, Sqoop, Shell Scripting, Oozie, Pig, Hive, Impala, HBase, Spark, Linux, Java, Eclipse.
JAVA Developer/.NET Developer
Confidential
Responsibilities:
- Developed E-commerce website using ASP.NET
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/ SQL code for procedures and functions.
- Used CVS, Perforce as configuration management tool for code versioning,
- Developed applications using Eclipse and used Maven as build and deploy tool.
- Used Log4J to print the logging, debugging, warning, info on the server console.
- Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for Unit Testing.
- Developed Java Server Pages for the Dynamic front end content that use Servlets and EJBs.
- Developed HTML pages using CSS for static content generation with JavaScript for validations.
Environment: Visual studio, MySQL, Windows.
JAVA/C++ Developer
Confidential
Responsibilities:
- Utilized strong C++ and JAVA Programming and communication skills in a team environment
- Developed Type3 application, disaster recovery module & console commands
- Debugged and fixed bugs for the MMC project using GDB & stack traces
- Solved memory allocation problems caused by Big Endian and Little Endian memory conversions
- Contributed to support and maintenance of software applications with GCC & ICC compilers
- Worked with CMS & CCMS configuration management tools which are proprietary tools ofALU and used FLexelint for lint checking.
- Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and applications to customer needs.
- Involved in design and implementation of web tier using Servlets and JSP.
Environment: Java, J2EE, JSP, Spring, Struts, Hibernate, Eclipse, SOA, WebLogic, Oracle, HTML, CSS, Web Services, JUnit, SVN, Windows, UNIX
JAVA Developer
Confidential
Responsibilities:
- Involved in development of Software Development Life Cycle (SDLC)and UML diagrams like Use Case Diagrams, Class Diagrams and Sequence Diagrams to represent the detail design phase.
- Extensively developed web pages using JSP, HTML5, JavaScript and CSS in the front end.
- Develop business layer components using Spring & Hibernate, and GUI using JSP.
- Developed web components using JSP, Servlets and JDBC.
- Worked on authentication modules to provide controlled access to users on various modules
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Developed complex SQL queries to perform efficient data retrieval operations including stored procedures, triggers etc.
- Designed and Implemented tables and indexes using SQL Server.
Environment: Eclipse, Java/J2EE, JSP, Oracle, HTML, CSS, Servlets, Struts, PL/SQL, Oracle, XML, SQL.