We provide IT Staff Augmentation Services!

Sr. R&d Engineer Resume

4.00/5 (Submit Your Rating)

Phoenix, ArizonA

SUMMARY:

Detail - oriented programmer and architect with around 7 years’ success devising innovative and tailored solutions to meet ever-changing business requirements within diverse industries including banking, EDA (Electronic Design Automation), telecom, finance. Advanced skill with leading-edge programming tools complemented by proven ability to assimilate and rapidly utilize emerging Confidential .

TECHNICAL SKILLS:

Programming Languages: R, Python, SQL, HiveQL, Shell Scripting, Java

Development: Statistical Modeling and Machine Learning, Big Data Architecture, Hadoop Development and Operations, Streaming data pipelines, DevOps, Agile Development, Project Management, Technical Documentation, Quality Assurance, Object-oriented Design

Bigdata skills: Hadoop, MapReduce, HDFS, Hbase, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume, Apache Spark, Cassandra, Kafka, Ambari

Statistics and Modelling skills: Supervised Learning, Regression, Classification Models, Linear Regression, Logistic Regression, Discriminant Analysis, Unsupervised Learning

Hadoop Distributions: Amazon EMR, Hortonworks (HDP 2.6)

Automation: Ansible, HDP Blueprints, Python

Services in AWS: S3, EMR (HDFS, Hive, Hue, Oozie, Sqoop), Redshift, Lambda, Dynamo DB, EMRFS, RDS (Relational Database Service), Athena

Databases: mysql, redis, Cassandra, Hbase

Analytics tools: Business Objects, Datameer

Dependencies: Ant, Maven, SBT

Methodologies: UML, Design Patterns

Versioning: Git, SVN, Perforce

PROFESSIONAL EXPERIENCE:

Sr. R&D Engineer

Confidential

Skills used: Hortonworks Data Flow and Data Platform, Hadoop, HDFS, Python, anaconda, Cassandra, Hbase, Kafka, Ansible, Spark, Shell Scripting, Avro schema, MySQL, Hive, Apache Nifi, R, Supervised Learning, Regression, Recommendations.

Responsibilities:

  • Responsible for design and development of analytic models, applications, stable and scalable data platform and supporting tools, which enable Data Scientists to create algorithms/models in a big data ecosystem.
  • Build use case driven solutions and architecture.
  • Build Regression models on resource monitoring and usage from the current usage using Supervised Learning .
  • Evaluating the component monitoring and configuration recommendations using Statistical Models .
  • Performed functional, technical design and code reviews ; managed code delivery and deployment; resource planning, mentorship.
  • Log aggregation from large number of sources using FluentD .
  • Monitored multiple Hadoop clusters environments using Ambari. Monitored workload, job performance, metrics for Hadoop cluster and alert based on threshold values for each metrics (Define alerts and threshold value for each metric).
  • Architecting a common API for users to access the data from the platform which includes Hadoop and other components like MySQL, redis, Cassandra etc.
  • Developed Multi-threaded and multi process applications in Python for data ingestion and aggregation.
  • Automation of solutions based on application using Python.
  • Provided deployment architecture, hardware sizing, and performance expectation guidelines for deploying Hortonworks Data Platform and designing real-time Dataflow.
  • Design and estimate hardware configuration for the bigdata platform including HDFS partitioning, capacity planning.
  • Data involved real-time streaming data, stable data, structured and unstructured log data.
  • Schema assignment: build schema across the platform in python for the various data formats including raw log files .
  • Developed a Data flow to pull the data from the aggregation layer using Apache Nifi (Dynamic schema generation) with context configuration enabled and used Nifi to provide real-time control for the moment of data between source and destination.
  • Handle streaming applications in kafka writing Avro schema using schema registry . Assign Hbase schema.
  • Design Kafka pipelines by configuring Producers and consumers after the log aggregation using FluentD.
  • Automate the provisioning of data platform as a part of the product development using Python and HDP Blueprints .
  • Designed the schema, configured and deployed Hbase for optimal storage and fast retrieval of data and used Spark Data frames, Spark-SQL for analytics.
  • Develop applications in Cassandra using the cassandra-python API’s .
  • Building backup and restore policies for data on HDFS, Hbase and Cassandra.
  • Streamlining multiple applications and handling crashes across the platform.
  • Build automated benchmarking and testing across all the services on the platform ensuring reliability.

Hadoop Developer

Confidential

Skills Used: Hadoop, AWS (EMR, S3, Redshift, RDS, Dynamo DB), Hive, Sqoop, Oozie, Redshift, Denodo, SAP Business Objects, Splunk

Responsibilities:

  • Architect the ETL system to reduce complexity by 25% managed 32 TB data per day utilizing AWS cloud S3, EMR, HIVE, Big Data .
  • Building analytical models using a rule-based mechanism on a distributed cloud infrastructure.
  • Cloud Computing infrastructure and architecture.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Provided process improvement and architectural direction, and reduced ETL processing time by 30% used Hive, Java, AWS, HDFS .
  • Experience in building highly scalable and performance driven Big Data application using Hive, Oozie and Redshift.
  • Performed functional, technical design and code reviews ; managed code delivery and deployment; resource planning, mentorship.
  • Visualize the HDFS data to customer using BI tool with the help of Hiveserver2.
  • Created prototype for large data set analytics using Hive, Amazon Web Services that enables business efficiency.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Worked extensively on performance tuning in Redshift, implementing Work Load Management.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive, Pig, and Sqoop.
  • Significantly increased efficiency by 30% Owned release cycle and cross-platform migration, mentored junior developers
  • Worked in a DevOps Environment using Chef and other CICD tools like Jenkins, GitHub .
  • Experience in Indexing logs using Splunk by directly connecting HDFS to Splunk server using Splunk Hadoop Connect.
  • Authoring Jenkins Job/Pipeline for AWS Stack Creation/Updating in CICD lifecycle.
  • Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of S3 and HDFS.
  • Experienced in designing Highly Scalable and Resilient Environments for Enterprise Wide Applications.

Technology Lead

Confidential, Phoenix, Arizona

Skills Used: Big Data Analytics Tools: Datameer, Hadoop, MapReduce, Java EE, Eclipse, Maven, SVN, Hive, MySQL

Responsibilities:

  • Creating control flows for Data Ingestion from various data sources using Hive Java API and Datameer REST API.
  • Implemented algorithms for defining the best data ingestion flow for a given source.
  • Compute results based on the analytical models built.
  • Worked on data ingestion for partitioned and un-partitioned data from Hive Tables.
  • Handling key data patterns at Amex (Amex parquet files cloaked and uncloaked data).
  • Crontab job scheduling and creating instructions and monitoring the Jobs running in production cluster.
  • Developed data pipeline using Flume, SQOOP, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Creating control flows on data in data ingestion and creating Datalinks for the data in MySQL tables to be analyzed using Datameer ( A Big Data Analytics tool).
  • Build automated processes for scheduling and Polling the Jobs and creating Triggers for Export Jobs to Poll the export jobs for the downstream jobs
  • Writing custom plugins using JavaEE for integrating the functionalities of the tool into the Hadoop cluster.
  • Working on the JSON objects created by Datameer for workbook, Export Job using the Datameer REST APIs for building custom functions.
  • Building custom functions on Datameer for friendly interaction for end users.
  • Unit Testing using Junit.

Hadoop Developer

Confidential

Skills used: MapReduce, Pig, Hive, Hbase, Flume, Sqoop, Maven, SVN, Eclipse

Responsibilities:

  • Building data pipelines from ingesting raw data to the reporting using data warehousing.
  • Designed workflow by scheduling processes for Log file data (raw text), which is streamed into HDFS using Flume .
  • Performed Batch processing of logs from various data sources using MapReduce.
  • Experienced in using various Data Input and Output Formats like TextInput Format(Default), Key Value Input Format and Binary File Formats like SequenceFile Input Format, RCFile format, Avro, Parquet serialization formats.
  • Strongly involved in making optimizations to the jobs to improve performance like using task Profiling.
  • Tuning a MapReduce Job and setting up environment variables for faster processing.
  • Defined Java UDFs using PIG and Hive to serve the client requests.
  • Implemented PIG UDF’s like Filter function and Eval functions and used various data loaders to get the Data into pig like PigStorage, TextStorage and HbaseStorage.
  • Design and implement MapReduce jobs to support distributed processing using java, Hive and Apache Pig.
  • Responsible for design and creation of Hive tables and worked on various performance optimizations like Partition, Bucketing in hive.
  • Handled incremental data loads from RDBMS into HDFS using Sqoop .
  • Worked on the streaming logs by implementing tasks on NoSQL data bases like Hbase .
  • Worked on Bulk loading and distribution of data on Hbase.
  • Created Hbase tables to load large sets of structured, semi-structured and unstructured data coming from various sources. (Java API)
  • Served Online queries using Hbase based on the client requests.
  • Involved in unit Testing the MapReduce codes using MRUnit and Junit .
  • Strongly involved in setting up the environment for the best performance of Jobs on the cluster and monitoring the Hadoop Log files.

Hadoop/Java Developer

Confidential

Skills Used: MapReduce, Pig, Hive, Flume, Oozie, Java, Eclipse, Maven

Responsibilities:

  • Developed MapReduce programs in Java using the Hadoop Java API to parse the raw data, populate staging tables and store the refined data in partitioned tables using Hive.
  • Most of the work included studying and analyzing various data algorithms for the several ingestion processes.
  • Used Oozie workflow (define xml flows) engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system jobs.
  • Building data pipelines from Streaming data to HDFS, analyzing it and sending it reporting.
  • Worked on the structured data and running analytics using SparkSQL integrated with Hive .
  • Shared responsibility for administration of Hadoop, Hive and Pig and optimization.
  • Installed and configured MapReduce, HIVE and the HDFS.
  • Assisted with performance tuning, monitoring and unit testing the jobs.
  • Analyzed data using HiveQL to generate payer by reports for transmission to payer’s form payment summaries.
  • Exported the analyzed data to the relational databases using Sqoop for virtualization and to generate reports for the BI team.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.

Assistant Systems Engineer (Java Developer)

Confidential

Skills Used: JavaEE (6,7), Oracle, Pl/SQL, JUnit

Responsibilities:

  • Interacting with business users and understanding the requirement and providing solutions to match their requirement.
  • Experience in Core Java with In-depth knowledge of Object oriented programming, Collections Frameworks, Threading concepts, and design pattern & analysis.
  • Deploying applications on Web Logic, Apache Tomcat, IBM Web sphere application servers.
  • Extensively used Relational database systems like Oracle and MySQL.
  • Have solid testing experience in unit testing, integration testing and system testing.
  • Strong exposure to Version Control like git.
  • Involved in moving the application wave releases from lower environment.
  • Troubleshooting any issues that arise during the release or deployment process.

We'd love your feedback!