We provide IT Staff Augmentation Services!

Sr Big Data Engineer (consultant) Resume

4.00/5 (Submit Your Rating)

Wilmington, DE

SUMMARY:

  • Overall 11 years of experience in IT industry as a software developer which includes 4 years of experience in design and development using hadoop eco system tools.
  • Very good experience in the application development and maintenance of SDLC projects using different programming languages such as C, Core Java, Scala and Python.
  • Experience in using different hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, Nifi and Control M tools.
  • Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark - Scala, PySpark, Spark-Sql, and Pig.
  • Well versed in using both Cloudera and Hortonworks platforms.
  • Conceptual understanding of Big data on AWS cloud architecture i.e. EC2, S3, EMR and RedShift.
  • Well-versed in using Map Reduce programming model for analyzing the data stored in HDFS and experience in writing Map Reduce codes in Java as per business requirements.
  • Experience in importing and exporting data using Sqoop from RDBMS to HDFS and Hive.
  • Expert in creating HIVE UDFs using java in order to analyze data sets for complex aggregate requirements.
  • Used Hbase for real time low latency read writes for multiple applications.
  • Well versed in developing the complex SQL queries using Hive and Spark Sql.
  • Experienced in preparing and executing unit test plan and unit test cases during software development.
  • Strong understanding in Object-Oriented Programming concepts and implementation.
  • Experience in providing training and guidance to new team members in the Project.
  • Experience in detailed system design using use case analysis, functional analysis, modeling program with class & sequence, activity and state diagrams using UML and rational rose.
  • Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product/ service.
  • Experience in interacting with customers and working at client locations for real time field testing of products and services.
  • Ability to communicate and work effectively with associates at all levels within the organization.
  • Strong background in mathematics and have very good analytical and problem solving skills.

TECHNICAL SKILLS:

Programming: Core Java, Scala, Python, C and SQL.

Big Data Eco System: HDFS, YARN, Map Reduce, Spark Core, Spark Streaming, SparkSQL, ImpalaHive, Pig, Kafka, Sqoop, Hbase, Nifi, and Control M

Scripting Languages: UNIX Shell scripting and Python scripting.

DBMS / RDBMS: Oracle 11g, SQL Server.

Version Control, CICD: Git/BitBucket and Jenkins

PROFESSIONAL EXPERIENCE:

Confidential, Wilmington, DE

Sr Big Data Engineer (Consultant)

Responsibilities:

  • Extensively used Spark core i.e. RDDs, DataFrames, and Spark Sql as part of developing multiple applications using both Python and Scala.
  • Built multiple data pipe lines using Pig scripts for processing data for specific applications.
  • Used different file formats such as Parquet, Avro, and ORC for storing and retrieving data in hadoop.
  • Used Spark-streaming for consuming event based data from Kafka and joined this data set with existing Hive table data to generate performance indicators for an application.
  • Developed analytical queries on different tables using Spark sql for finding insights and building data pipelines for data scientists to consume this data for applying ML models.
  • Spark performance tuning by applying different techniques: choosing optimum parallelism, Serialization format while shuffling the data, using broadcast variables, joins, aggregations, and memory management.
  • Written multiple custom Sqoop import scripts to load data from oracle into HDFS directories and Hive tables.
  • Used Nifi for automating and managing data flows between multiple systems.
  • Used different compression techniques while storing data into Hive tables for performance improvement: snappy and Gzip
  • Have used Impala for faster querying for a time critical application to generate reports.
  • Also used Hbase for OLTP purpose for an application requiring high scalability using hadoop.
  • Have written sqoop export scripts to write the date from HDFS into Oracle database.
  • Used Control M component to simplify and automate different batch workload Applications.
  • Worked closely with multiple data science and machine learning teams in building a data eco system to support AI.
  • Also developed a Java based application to automate most of the manual work in on boarding a tenant to a multi-tenant environment. This is saving around 4 to 5 hours of manual work per tenant per person every day.
  • Applied different job tuning techniques while processing data using Hive and spark frameworks to improve the performance of jobs.

Environment: Spark Core, Spark Streaming, Scala, Python, Nifi, Hive, Kafka, Impala, HBase, Sqoop, Kerberos(security), LDAP, and Control M.

Confidential, Piscataway, NJ

Data Engineer (Consultant)

Responsibilities:

  • Performed ETL operations using multiple tools: Spark, Pig, Mapreduce, and Hive mostly.
  • Storing the processed data into Hive tables for faster querying
  • Developed a backend customized application using Java to schedule jobs using Oozie workflows/ Coordinators.
  • Writing scripts to automate the Oozie workflows.
  • Developed multiple spark-scala based ETL applications to encrypt and de- crypt different columnar values based on configuration.
  • Worked on different input file formats such as XML, JSON, and Text file.
  • Used Avro, Parquet and ORC file formats along with suitable compression techniques for optimizing read/write data to/from HDFS.
  • Created custom Keys and custom Values while handling data in mappers and reducers based on input data and software requirements.
  • Used different input file formats such as text input format, combine file in- put format, multi input format and avro input format for different applications.
  • Regularly serve adhoc request analysis with priority for day to day customer business needs.
  • Development of aggregate jobs and KPI computation jobs regularly.
  • Used both Spark and Map reduce frameworks on development clusters.
  • Migrated Map Reduce jobs to Spark framework using Scala and rewritten most of the existing MapReduce jobs using Spark-Scala for performance.
  • Also applied different performance tuning techniques to improve the performance of existing jobs.
  • Collaborated with data architects and scientists to build different data pipe- lines for consumption by other teams.

Environment: MapReduce, Spark, HDFS, Pig, Hive, Oozie, Java (JDK 1.6), Eclipse, Scala, XML, JSON, Unix/Shell Scripting, Oracle DB.

Confidential, Sunnyvale, CA

Big data Developer(Consultant)

Responsibilities:

  • Feasibility testing in iCloud test environments to analyze and understand the business and reporting requirements using Big data.
  • Working with large datasets on HDFS using Hadoop Map-Reduce.
  • Design and development of big data Map-Reduce solutions using Pig Latin and Java.
  • Testing and validating Hadoop jobs using Pig scripts and/or unix commands.
  • Analysis of Hadoop job logs and counters to fine tune the performance of the jobs.
  • Performance improvement of Hadoop jobs using the following techniques.
  • Used combiner wherever applicable.
  • Fine tune the split size to optimize the no of mappers to be consumed for the job.
  • Fine tune the number of reducers based on the load on each reducer and input data size.
  • Explicitly mentioning the number of reducers for different operations in Pig for good performance.
  • Using Avro serialization for data transfers in distributed network.
  • Using Snappy compression technique to compress Map output data.
  • Minimize the Map disk spill using configuration.
  • Filter any records at Mapper phase only (not at Reducer) using the custom practitioner to load balance the data across in reducers.
  • Writing custom record readers for handling small files.
  • Used Bloom Filter technique to merge Large data set with a daily small data set. This has significantly improved the job performance.
  • Analysis of sudden changes in final KPIs computed on daily basis.
  • Used Pig Unit and MR Unit for Unit testing of jobs before testing on actual cluster.
  • Experienced in designing job architecture and flow before developing the actual software.
  • Good client interaction skills and capable of translating customer needs into feasible solutions required for projects.
  • Coordinating with off shore team to deliver the big data solutions on time.
  • Helping new team members to gain knowledge on in house customized frame- works at Confidential .

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Core Java, Eclipse, Oracle DB, and UNIX scripting.

Confidential

Senior Software Engineer

Responsibilities:

  • Study of System Requirements specification and feasibility study.
  • Mathematical modeling using MATLAB.
  • Applied OOAD principles for the analysis and design of the system.
  • Did extensive C Programming for software development part of building different software products.
  • Developed software drivers and communication protocols for serial communication interface (SCI), Inter Integrated Circuit (IIC), Universal Serial Bus (USB), and Dual Port Random Access Memory (DPRAM).
  • Software development for test jigs to thoroughly validate the complete system before delivering it to the client.
  • Unit testing and integration testing of the system thoroughly.
  • Interacting with the customers and vendors during the development of the sys- tem.
  • Contributed in technical discussions for architectural modifications, integrating with other systems, and for fixing the issues in the system.

Environment: System programming using C, RTOS, Unix/Linux, CAN, DPRAM, IIC, SPI, MATLAB, and Shell Scripting.

We'd love your feedback!