Sr Big Data Engineer (Consultant) Resume Wilmington, DE - Hire IT People

SUMMARY:

Overall 11 years of experience in IT industry as a software developer which includes 4 years of experience in design and development using hadoop eco system tools.
Very good experience in the application development and maintenance of SDLC projects using different programming languages such as C, Core Java, Scala and Python.
Experience in using different hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, Nifi and Control M tools.
Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark - Scala, PySpark, Spark-Sql, and Pig.
Well versed in using both Cloudera and Hortonworks platforms.
Conceptual understanding of Big data on AWS cloud architecture i.e. EC2, S3, EMR and RedShift.
Well-versed in using Map Reduce programming model for analyzing the data stored in HDFS and experience in writing Map Reduce codes in Java as per business requirements.
Experience in importing and exporting data using Sqoop from RDBMS to HDFS and Hive.
Expert in creating HIVE UDFs using java in order to analyze data sets for complex aggregate requirements.
Used Hbase for real time low latency read writes for multiple applications.
Well versed in developing the complex SQL queries using Hive and Spark Sql.
Experienced in preparing and executing unit test plan and unit test cases during software development.
Strong understanding in Object-Oriented Programming concepts and implementation.
Experience in providing training and guidance to new team members in the Project.
Experience in detailed system design using use case analysis, functional analysis, modeling program with class & sequence, activity and state diagrams using UML and rational rose.
Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product/ service.
Experience in interacting with customers and working at client locations for real time field testing of products and services.
Ability to communicate and work effectively with associates at all levels within the organization.
Strong background in mathematics and have very good analytical and problem solving skills.

TECHNICAL SKILLS:

Programming: Core Java, Scala, Python, C and SQL.

Big Data Eco System: HDFS, YARN, Map Reduce, Spark Core, Spark Streaming, SparkSQL, ImpalaHive, Pig, Kafka, Sqoop, Hbase, Nifi, and Control M

Scripting Languages: UNIX Shell scripting and Python scripting.

DBMS / RDBMS: Oracle 11g, SQL Server.

Version Control, CICD: Git/BitBucket and Jenkins

PROFESSIONAL EXPERIENCE:

Confidential, Wilmington, DE

Sr Big Data Engineer (Consultant)

Responsibilities:

Extensively used Spark core i.e. RDDs, DataFrames, and Spark Sql as part of developing multiple applications using both Python and Scala.
Built multiple data pipe lines using Pig scripts for processing data for specific applications.
Used different file formats such as Parquet, Avro, and ORC for storing and retrieving data in hadoop.
Used Spark-streaming for consuming event based data from Kafka and joined this data set with existing Hive table data to generate performance indicators for an application.
Developed analytical queries on different tables using Spark sql for finding insights and building data pipelines for data scientists to consume this data for applying ML models.
Spark performance tuning by applying different techniques: choosing optimum parallelism, Serialization format while shuffling the data, using broadcast variables, joins, aggregations, and memory management.
Written multiple custom Sqoop import scripts to load data from oracle into HDFS directories and Hive tables.
Used Nifi for automating and managing data flows between multiple systems.
Used different compression techniques while storing data into Hive tables for performance improvement: snappy and Gzip
Have used Impala for faster querying for a time critical application to generate reports.
Also used Hbase for OLTP purpose for an application requiring high scalability using hadoop.
Have written sqoop export scripts to write the date from HDFS into Oracle database.
Used Control M component to simplify and automate different batch workload Applications.
Worked closely with multiple data science and machine learning teams in building a data eco system to support AI.
Also developed a Java based application to automate most of the manual work in on boarding a tenant to a multi-tenant environment. This is saving around 4 to 5 hours of manual work per tenant per person every day.
Applied different job tuning techniques while processing data using Hive and spark frameworks to improve the performance of jobs.

Environment: Spark Core, Spark Streaming, Scala, Python, Nifi, Hive, Kafka, Impala, HBase, Sqoop, Kerberos(security), LDAP, and Control M.

Confidential, Piscataway, NJ

Data Engineer (Consultant)

Responsibilities:

Performed ETL operations using multiple tools: Spark, Pig, Mapreduce, and Hive mostly.
Storing the processed data into Hive tables for faster querying
Developed a backend customized application using Java to schedule jobs using Oozie workflows/ Coordinators.
Writing scripts to automate the Oozie workflows.
Developed multiple spark-scala based ETL applications to encrypt and de- crypt different columnar values based on configuration.
Worked on different input file formats such as XML, JSON, and Text file.
Used Avro, Parquet and ORC file formats along with suitable compression techniques for optimizing read/write data to/from HDFS.
Created custom Keys and custom Values while handling data in mappers and reducers based on input data and software requirements.
Used different input file formats such as text input format, combine file in- put format, multi input format and avro input format for different applications.
Regularly serve adhoc request analysis with priority for day to day customer business needs.
Development of aggregate jobs and KPI computation jobs regularly.
Used both Spark and Map reduce frameworks on development clusters.
Migrated Map Reduce jobs to Spark framework using Scala and rewritten most of the existing MapReduce jobs using Spark-Scala for performance.
Also applied different performance tuning techniques to improve the performance of existing jobs.
Collaborated with data architects and scientists to build different data pipe- lines for consumption by other teams.

Environment: MapReduce, Spark, HDFS, Pig, Hive, Oozie, Java (JDK 1.6), Eclipse, Scala, XML, JSON, Unix/Shell Scripting, Oracle DB.

Confidential, Sunnyvale, CA

Big data Developer(Consultant)

Responsibilities:

Feasibility testing in iCloud test environments to analyze and understand the business and reporting requirements using Big data.
Working with large datasets on HDFS using Hadoop Map-Reduce.
Design and development of big data Map-Reduce solutions using Pig Latin and Java.
Testing and validating Hadoop jobs using Pig scripts and/or unix commands.
Analysis of Hadoop job logs and counters to fine tune the performance of the jobs.
Performance improvement of Hadoop jobs using the following techniques.
Used combiner wherever applicable.
Fine tune the split size to optimize the no of mappers to be consumed for the job.
Fine tune the number of reducers based on the load on each reducer and input data size.
Explicitly mentioning the number of reducers for different operations in Pig for good performance.
Using Avro serialization for data transfers in distributed network.
Using Snappy compression technique to compress Map output data.
Minimize the Map disk spill using configuration.
Filter any records at Mapper phase only (not at Reducer) using the custom practitioner to load balance the data across in reducers.
Writing custom record readers for handling small files.
Used Bloom Filter technique to merge Large data set with a daily small data set. This has significantly improved the job performance.
Analysis of sudden changes in final KPIs computed on daily basis.
Used Pig Unit and MR Unit for Unit testing of jobs before testing on actual cluster.
Experienced in designing job architecture and flow before developing the actual software.
Good client interaction skills and capable of translating customer needs into feasible solutions required for projects.
Coordinating with off shore team to deliver the big data solutions on time.
Helping new team members to gain knowledge on in house customized frame- works at Confidential .

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Core Java, Eclipse, Oracle DB, and UNIX scripting.

Confidential

Senior Software Engineer

Responsibilities:

Study of System Requirements specification and feasibility study.
Mathematical modeling using MATLAB.
Applied OOAD principles for the analysis and design of the system.
Did extensive C Programming for software development part of building different software products.
Developed software drivers and communication protocols for serial communication interface (SCI), Inter Integrated Circuit (IIC), Universal Serial Bus (USB), and Dual Port Random Access Memory (DPRAM).
Software development for test jigs to thoroughly validate the complete system before delivering it to the client.
Unit testing and integration testing of the system thoroughly.
Interacting with the customers and vendors during the development of the sys- tem.
Contributed in technical discussions for architectural modifications, integrating with other systems, and for fixing the issues in the system.

Environment: System programming using C, RTOS, Unix/Linux, CAN, DPRAM, IIC, SPI, MATLAB, and Shell Scripting.

We provide IT Staff Augmentation Services!

Sr Big Data Engineer (consultant) Resume

Wilmington, DE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship