We provide IT Staff Augmentation Services!

Lead Bigdata Engineer Resume

2.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • Ten Plus (10+) years’ experience in software development, building scalable and high performance Big Data applications with specialization in ApacheHadoopStack, NoSQL databases, distributed computing and analytical experience indatavalidation,datacleansing/ updating, testing and business drivendataanalysis.
  • Expertise working across all phases of SDLC viz requirements gathering, system design, development, enhancement, maintenance, testing, deployment, production support, and documentation
  • Experience on Apache Hadoop technologies viz Hadoop distributed file system (HDFS), MapReduce, YARN, Pig, Hive, Impala, HCatalog, Sqoop, Spark, Kafka, NIFI, Storm, Spark SQL, Spark streaming, Hadoop streaming
  • Certified Hortonworks Hadoop Developer.
  • In depth understanding and usage of Hadoop Architecture frameworks and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts.
  • Experience in installing, configuring and using eco system components like Hadoop MapReduce, HIVE, HDFS, HBase, Sqoop, Pig, Flume, Oozie, Zookeeper, Kerberos, Ambari, Cloudera and Cassandra.
  • Working experience in creating complex data ingestion pipelines, data transformations, data management and data governance in a centralized enterprise data hub.
  • Experience in developing customized UDF’s in java/python to extend Hive and Pig Latin functionality.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Experience working on NoSQL databases including HBase, Cassandra, MongoDB
  • Experience in AWS environment to develop and deploy custom Big Data applications.
  • Experience importing and exporting data using Sqoop from HDFS to Relational Databases and vice - versa.
  • Experience in Spark Streaming in order to ingest data from multiple data sources into HDFS.
  • Hands on experience with Spark-Scala programming with good knowledge on Spark Architecture and its in-memory Processing.
  • Good experience using Apache Spark, Storm, and Kafka.
  • Experience in successfully leading and managing teams.
  • Experience in Data Scrubbing/Cleansing, Data Quality, Data Mapping, Data Profiling, Data Validation.
  • Experience using numpy, pandas packages in python and R programming for conventional, statistical and graphicaldataanalysis.
  • Experience in (D3.js) Java Script Data Visualization Framework in creating chart view.
  • Experienced in Java Application Development, Client/Server Applications, Internet/Intranet based applications using Core Java, J2EE patterns, RESTful Web Services, Oracle,SQLServer, and DB2.
  • Extensive experience in SQL, PL/SQL and T-SQL to create Packages, Stored Procedures, Functions, Triggers and Views to retrieve, manipulate and migrate complex data and implement database Normalization and Exception Handling.
  • Ability to adapt to evolving technology, strong sense of responsibility and accomplishment

TECHNICAL SKILLS

  • HDFS
  • Map Reduce
  • Pig
  • Hive
  • Impala
  • Sqoop
  • Flume
  • NIFI
  • Kafka
  • Storm
  • ElasticSearch
  • Spark
  • Spark SQL
  • Zookeeper
  • Oozie Workflow
  • Cloudera Manager
  • Hortonworks
  • Pig Latin
  • HCatalog
  • Spark Streaming
  • NIFI
  • Oracle 11g/10g/9i/8i
  • MySQL
  • DB2
  • Microsoft SQL Server
  • HBase
  • SQL
  • XML
  • Java
  • Scala
  • PL/SQL
  • HTML
  • Java Server Pages(JSP)
  • Unix Shell
  • Python Scripts
  • Groovy
  • R
  • Eclipse
  • Tableau
  • Kibana
  • D3
  • Crystal Report

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Lead BigData Engineer

Responsibilities:

  • Primary goal of this project to build data pipeline with several business logics for FPML messages sent through Message Queues using Java SpringFramework for different kind of messages from Option and Non-Option trades and push into Hbase DB and build a xml based of approved xsd for MIFID regulatory reporting.
  • Primary goal of this project to build data pipeline using existing ingestion framework and build Hive and NoSql tables for complete de-normalized data.

Environment: Hadoop, MapReduce, Hive, HDFS, Sqoop, Solace, IBM MQ, Java, J2EE, Spring, Spark, MapR, HBase, Oracle, ETL, NoSQL, Unix/Linux, Autosys, JIL, udeploy, Jenkins

Confidential, Herndon, VA

Software Engineer

Responsibilities:

  • Responsible for developing data pipeline using NIFI, Kafka, Storm, ElasticSearch and Hive to extract the data from log messages, other sources and store in HDFS and visualize in Kibana.
  • Build NIFI applications to parse log messages using inbuilt processors or Groovy/Java scripts to ingest into various Kafka topics.
  • Created Kafka topics to act as a buffer for millions of messages and route it to storm topology.
  • Developed Storm topologies to parse the messages using grok pattern, Java etc and enriched the topics to JSON documents with geolocation data.
  • Developed ElasticSearch mappings to route data from storm to indices. Worked on Lucene to develop several visualizations in Kibana.
  • Managed 2 to 3 developers as project lead.
  • Responsible for developing data pipeline using NIFI, Sqoop, HBase and Hive to extract the data from log messages, other sources and store in HDFS and create tables.
  • Retrieve data from webservices and parse the xml messages in NIFI using inbuilt processors, Java scripts.
  • Retrieve tables from RDBMS into HDFS using Sqoop.
  • Developed several Hive queries to create, partition, merge data from several sources and perform analytics.

Environment: Hadoop, MapReduce, Hive, HDFS, Sqoop, NIFI, Kafka, Storm, Java, Groovy, ElasticSearch, Kibana, Lucene, ZooKeeper, Hortonworks, HBase, Oracle, ETL, NoSQL, Unix/Linux

Confidential, Dallas, TX

Big Data Developer

Responsibilities:

  • Responsible for developing data pipeline using Flume, Kafka, Sqoop, Pig and Hive to extract the data from weblogs, other sources and store in HDFS.
  • Expertise in different data modeling and Data warehouse design and development. Used Hive Partitioning, bucketing on the data on date basis daily, weekly and monthly basis.
  • Exploring withSpark to improve the performance and optimization of the existing algorithms in Hadoop using Sparkcontext,Spark-SQL, Data Frame, pair RDD's.
  • Acted as team lead for certain applications.
  • Real time streaming the data using Spark with Kafka.
  • Write the data into HBase and Hive target from Kafka Consumer.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Expertise in different data modeling and data warehouse design and development.Used Hive Partitioning, bucketing on the data on date basis daily, weekly and monthly basis.
  • Setting up Cron job to schedule the automation of data workflows.
  • Experience in writing customized UDF to extend Pig scripts and Hive functionality.

Environment: Hadoop, MapReduce, HiveQL, HDFS, PIG, Sqoop, Kafka, Storm, Spark, Java, Scala, Flume, ZooKeeper, Ambari, HBase, Oracle, ETL, NoSQL, Unix/Linux

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Installation of Apache NIFI to activate server log simulator and transport data into HDFS.
  • Import the data flow and generate server log data and populate data files in HDFS.
  • Used Hive to create and format tables to build a relational view of data.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Involved in scheduling Oozie workflow to automatically update the firewall.
  • Experience in using Elastic Search for high level visualization of data.
  • Worked on D3 for front end data visualization and analytics.
  • Worked with application team to install/maintain Hadoop updates, patches, version upgrades as required.

Environment: Hadoop, MapReduce, NIFI, Elastic Search, D3, Pig, Hive, Impala, Java, HBase, Oozie, HDFS, Sqoop, Flume, CDH4, ETL, NoSQL, DB2, UNIX

Confidential, Philadelphia, PA

Hadoop Developer

Responsibilities:

  • Developed multiple MapReduce jobs to preprocess large amount of customer behavioral data, obtaining business insight on TV and Internet end users.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS, Impala and Hive using Sqoop and Flume.
  • Developed Python scripts for Hadoop streaming jobs to process XML, JSON and CSVdata.
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
  • Worked onTalendETL scripts to pull the data from TSV Files/Oracle Data Base into HDFS.
  • Prepared TEZ build from the source code and run the HIVE Query Jobs using TEZ execution engine rather using MR jobs for better performance.

Environment: Hadoop, Talend, MapReduce, HDFS, Hive, Impala, Pig, Spark, Java, Python, R, SQL, Sqoop, Flume, Oozie, Eclipse, NoSQL, Linux

Confidential, NY

Data Analyst

Responsibilities:

  • Participated in all phases including Analysis, Design, Coding, Testing and Documentation.
  • Reverse Engineer the existing Stored Procedures and write Mapping Documents for them.
  • Designed/developed PL/SQL, Python, R, and Shell Scripts, data import/export, data cleansing, data analysis.
  • Creating Mappings, Mapplets, Workflows using Informatica to replace the existing Stored Procedures
  • Pre-populate the static tables in the Data warehouse using PL/SQL procedures and SQL Loader.
  • Designed the procedures for getting the data from all systems to Data Warehousing system.
  • Extensively used ETL to load data from flat files (excel/access) to Oracle database.
  • Extensively worked on documentation of Data Model, Mapping Transformation and Scheduling jobs.
  • Worked extensively with BO Report Developers in solving critical issues of defining hierarchy and loops.
  • Designed Mapping Documents and Mapping Templates for Data Integrator ETL developer.

Environment: PL/SQL, Python, R, Oracle 10g, Talend, SAP BO, Erwin, MS Excel, MS Visio

Confidential, Middleton, WI

Data Analyst

Responsibilities:

  • Created test case scenarios, executed test cases and maintained defects in internal bug tracking systems.
  • Managed multiple OLAP and ETL projects for various testing needs.
  • Debugging the SQL-Statements and stored procedures for various business scenarios.
  • Developed advanced SQL queries to extract, manipulate, and/or calculate information to fulfill data and reporting requirements including identifying the tables and columns from which data is extracted.
  • Executed the UNIX shell scripts that invoked SQL loader to load data into tables. .
  • Loading Flat File Data into Teradata tables using UNIX Shell scripts.
  • Tested several BO Reports for several business needs including Dashboards, Drill-Down, Master-Detailed, Aggregated, KPI's, Grouped List, Cascade and Web Reports.

Environment: PL/SQL, Oracle 10g, SAP BO, Crystal Reports, Erwin, Shell Scripts, Informatica

Confidential, Houston, TX

Programmer Analyst

Responsibilities:

  • Analyzed the Business requirements of the project by studying the Business Requirement Specification documents.
  • Performed and documented Impact Analysis for the Market Changes.
  • Assisted in Development of various PL/SQL Modules like Packages, Functions, Procedures, Triggers, Records and Collections to implement business requirements.
  • Written shell scripts to automate loading files into database using crontab.
  • Developed XSLT Style Sheets to parse the XML files and load data into the database.
  • Involved in Unit testing, Integration Testing and writing the Test plans.

Environment: Oracle 10g/11g R2, VLDB, PL/SQL Developer, ERwin, Sun Solaris Unix, Shell scripts, XML, XSL, Eclipse, MS Excel

Confidential, Dallas, TX

Java Developer

Responsibilities:

  • Translate customer requirements into formal requirements and design documents
  • Implemented new Functional module using the J2EE and customized framework (OA)
  • Developed new screens for using JSP and Servlets
  • Customized business models using the EJB and JAVA
  • Extensively involved in SCRUM meetings for bug fixing found in different phases of testing and production.
  • Support production environment in trouble shooting real-time issues.

Environment: Java, J2EE Framework, Ant, Maven, GIT, HTML, JavaScript, JSP, Unix, SQL, Shell Scripting, Web sphere integrated Development

We'd love your feedback!