Sr Big Data Engineer (consultant) Resume
Wilmington, DE
SUMMARY:
- Overall 11 years of experience in IT industry as a software developer which includes 4 years of experience in design and development using hadoop eco system tools.
- Very good experience in the application development and maintenance of SDLC projects using different programming languages such as C, Core Java, Scala and Python.
- Experience in using different hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, Nifi and Control M tools.
- Experience in developing ETL applications on large volumes of data using different tools: MapReduce, Spark - Scala, PySpark, Spark-Sql, and Pig.
- Well versed in using both Cloudera and Hortonworks platforms.
- Conceptual understanding of Big data on AWS cloud architecture i.e. EC2, S3, EMR and RedShift.
- Well-versed in using Map Reduce programming model for analyzing the data stored in HDFS and experience in writing Map Reduce codes in Java as per business requirements.
- Experience in importing and exporting data using Sqoop from RDBMS to HDFS and Hive.
- Expert in creating HIVE UDFs using java in order to analyze data sets for complex aggregate requirements.
- Used Hbase for real time low latency read writes for multiple applications.
- Well versed in developing the complex SQL queries using Hive and Spark Sql.
- Experienced in preparing and executing unit test plan and unit test cases during software development.
- Strong understanding in Object-Oriented Programming concepts and implementation.
- Experience in providing training and guidance to new team members in the Project.
- Experience in detailed system design using use case analysis, functional analysis, modeling program with class & sequence, activity and state diagrams using UML and rational rose.
- Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product/ service.
- Experience in interacting with customers and working at client locations for real time field testing of products and services.
- Ability to communicate and work effectively with associates at all levels within the organization.
- Strong background in mathematics and have very good analytical and problem solving skills.
TECHNICAL SKILLS:
Programming: Core Java, Scala, Python, C and SQL.
Big Data Eco System: HDFS, YARN, Map Reduce, Spark Core, Spark Streaming, SparkSQL, ImpalaHive, Pig, Kafka, Sqoop, Hbase, Nifi, and Control M
Scripting Languages: UNIX Shell scripting and Python scripting.
DBMS / RDBMS: Oracle 11g, SQL Server.
Version Control, CICD: Git/BitBucket and Jenkins
PROFESSIONAL EXPERIENCE:
Confidential, Wilmington, DE
Sr Big Data Engineer (Consultant)
Responsibilities:
- Extensively used Spark core i.e. RDDs, DataFrames, and Spark Sql as part of developing multiple applications using both Python and Scala.
- Built multiple data pipe lines using Pig scripts for processing data for specific applications.
- Used different file formats such as Parquet, Avro, and ORC for storing and retrieving data in hadoop.
- Used Spark-streaming for consuming event based data from Kafka and joined this data set with existing Hive table data to generate performance indicators for an application.
- Developed analytical queries on different tables using Spark sql for finding insights and building data pipelines for data scientists to consume this data for applying ML models.
- Spark performance tuning by applying different techniques: choosing optimum parallelism, Serialization format while shuffling the data, using broadcast variables, joins, aggregations, and memory management.
- Written multiple custom Sqoop import scripts to load data from oracle into HDFS directories and Hive tables.
- Used Nifi for automating and managing data flows between multiple systems.
- Used different compression techniques while storing data into Hive tables for performance improvement: snappy and Gzip
- Have used Impala for faster querying for a time critical application to generate reports.
- Also used Hbase for OLTP purpose for an application requiring high scalability using hadoop.
- Have written sqoop export scripts to write the date from HDFS into Oracle database.
- Used Control M component to simplify and automate different batch workload Applications.
- Worked closely with multiple data science and machine learning teams in building a data eco system to support AI.
- Also developed a Java based application to automate most of the manual work in on boarding a tenant to a multi-tenant environment. This is saving around 4 to 5 hours of manual work per tenant per person every day.
- Applied different job tuning techniques while processing data using Hive and spark frameworks to improve the performance of jobs.
Environment: Spark Core, Spark Streaming, Scala, Python, Nifi, Hive, Kafka, Impala, HBase, Sqoop, Kerberos(security), LDAP, and Control M.
Confidential, Piscataway, NJ
Data Engineer (Consultant)
Responsibilities:
- Performed ETL operations using multiple tools: Spark, Pig, Mapreduce, and Hive mostly.
- Storing the processed data into Hive tables for faster querying
- Developed a backend customized application using Java to schedule jobs using Oozie workflows/ Coordinators.
- Writing scripts to automate the Oozie workflows.
- Developed multiple spark-scala based ETL applications to encrypt and de- crypt different columnar values based on configuration.
- Worked on different input file formats such as XML, JSON, and Text file.
- Used Avro, Parquet and ORC file formats along with suitable compression techniques for optimizing read/write data to/from HDFS.
- Created custom Keys and custom Values while handling data in mappers and reducers based on input data and software requirements.
- Used different input file formats such as text input format, combine file in- put format, multi input format and avro input format for different applications.
- Regularly serve adhoc request analysis with priority for day to day customer business needs.
- Development of aggregate jobs and KPI computation jobs regularly.
- Used both Spark and Map reduce frameworks on development clusters.
- Migrated Map Reduce jobs to Spark framework using Scala and rewritten most of the existing MapReduce jobs using Spark-Scala for performance.
- Also applied different performance tuning techniques to improve the performance of existing jobs.
- Collaborated with data architects and scientists to build different data pipe- lines for consumption by other teams.
Environment: MapReduce, Spark, HDFS, Pig, Hive, Oozie, Java (JDK 1.6), Eclipse, Scala, XML, JSON, Unix/Shell Scripting, Oracle DB.
Confidential, Sunnyvale, CA
Big data Developer(Consultant)
Responsibilities:
- Feasibility testing in iCloud test environments to analyze and understand the business and reporting requirements using Big data.
- Working with large datasets on HDFS using Hadoop Map-Reduce.
- Design and development of big data Map-Reduce solutions using Pig Latin and Java.
- Testing and validating Hadoop jobs using Pig scripts and/or unix commands.
- Analysis of Hadoop job logs and counters to fine tune the performance of the jobs.
- Performance improvement of Hadoop jobs using the following techniques.
- Used combiner wherever applicable.
- Fine tune the split size to optimize the no of mappers to be consumed for the job.
- Fine tune the number of reducers based on the load on each reducer and input data size.
- Explicitly mentioning the number of reducers for different operations in Pig for good performance.
- Using Avro serialization for data transfers in distributed network.
- Using Snappy compression technique to compress Map output data.
- Minimize the Map disk spill using configuration.
- Filter any records at Mapper phase only (not at Reducer) using the custom practitioner to load balance the data across in reducers.
- Writing custom record readers for handling small files.
- Used Bloom Filter technique to merge Large data set with a daily small data set. This has significantly improved the job performance.
- Analysis of sudden changes in final KPIs computed on daily basis.
- Used Pig Unit and MR Unit for Unit testing of jobs before testing on actual cluster.
- Experienced in designing job architecture and flow before developing the actual software.
- Good client interaction skills and capable of translating customer needs into feasible solutions required for projects.
- Coordinating with off shore team to deliver the big data solutions on time.
- Helping new team members to gain knowledge on in house customized frame- works at Confidential .
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Core Java, Eclipse, Oracle DB, and UNIX scripting.
Confidential
Senior Software Engineer
Responsibilities:
- Study of System Requirements specification and feasibility study.
- Mathematical modeling using MATLAB.
- Applied OOAD principles for the analysis and design of the system.
- Did extensive C Programming for software development part of building different software products.
- Developed software drivers and communication protocols for serial communication interface (SCI), Inter Integrated Circuit (IIC), Universal Serial Bus (USB), and Dual Port Random Access Memory (DPRAM).
- Software development for test jigs to thoroughly validate the complete system before delivering it to the client.
- Unit testing and integration testing of the system thoroughly.
- Interacting with the customers and vendors during the development of the sys- tem.
- Contributed in technical discussions for architectural modifications, integrating with other systems, and for fixing the issues in the system.
Environment: System programming using C, RTOS, Unix/Linux, CAN, DPRAM, IIC, SPI, MATLAB, and Shell Scripting.