We provide IT Staff Augmentation Services!

Senior Big Data/cloud Engineer Resume

5.00/5 (Submit Your Rating)

Indianapolis, IN

SUMMARY

  • 8+ years of professional IT experience in analyzing requirements, designing and testing highly distributed mission critical applications.
  • Strong knowledge in HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table - Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive.
  • Good experienced on moving the data in and out of Hadoop RDBMS, No-SQL and UNIX from various systems using SQOOP and other traditional data movement technologies.
  • Strong working experience in delivering Data Analytics solutions, which can drive business decisions, using AWS Cloud, Big Data and Hadoop ecosystems.
  • Successfully designed and implemented production grade Dynamic Programming algorithms, to analyses the inbound workload and effectively deploy transient clusters (Hybrid Cloud-based) to achieve maximum parallelism and cost effectiveness in AWS Cloud.
  • Successfully designed and implemented Cloud migration solutions, by adapting to Server less architecture, for on-premise Data Pipelines, which perform data intensive ETL jobs.
  • Effectively performed Spark job tuning and performance optimizations that reduced the runtimes (for Terabytes of volumes) from several hours to minutes, to meet the business level SLAs.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Independently designed, developed and implemented end-to-end automated data pipelines to perform ETL jobs on heavy volume loads in a shared, persistent on-premise Cloudera cluster, as well as in AWS Cloud.
  • Expertise in using various technologies in Hadoop ecosystems including Apache Spark, HBase, Hive, AWS EMR, Cloudera Altus, AWS Redshift, AWS Athena, AWS Lambda, AWS Cloud Formation, and AWS Step Functions.
  • Extensive experience in delivering enterprise level multi-platform Big Data Solutions, which involved multiple data sources and data formats (Parquet, Avro, CSV/TSV, and RCFILE).
  • Good knowledge in working with a range of source control systems and CI/CD tools- Git Hub, Bit Bucket, AWS Code Commit and AWS Code deploy, SonarQube, Jenkins, Maven, and Git Lab.
  • Can code and deliver high-standard and efficient scripts/application programs in Core Java, Scala, Bash scripting, C, C++, and Python.
  • Experienced in developing MapReduce jobs using Scala in Spark-Shell.
  • Good knowledge on spark architecture and real-time streaming using spark.
  • Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and Mongo DB.
  • Good experience in utilizing Cloud Storage Services like Git. Extensive knowledge in using GitHub.
  • Excellent understanding of Software Development Life Cycle (SDLC) and strong knowledge on various project implementation methodologies including Waterfall and Agile/Scrum.

TECHNICAL SKILLS

Big data Technologies: YARN Pig, Hive, Sqoop, Oozie, HBase, Hadoop and MapReduce, HDFS, Spark, ZooKeeper, Flume.

Programming Languages: Scala, Python, C, C++, Unix Shell Scripting YARN Pig, Core Java, ScalaNoSQL Databases: MongoDB, HBase, Dynamo DB, Oracle NoSQL Database

Databases: My SQL, Oracle, Microsoft SQL SERVER, Microsoft Azure SQL, PostgreSQL

Methodologies: Water Fall, Agile, Scrum

Operating Systems: Windows, Linux, MacOS

Cloud Services: Amazon Web Services, Microsoft Azure

Integration Tools: Jenkins and Hudson

Build Tools: Ant, Maven, Gradle

Version Control: SVN, Git tortoise, Git hub, TFS

IDE s: Eclipse, Net Beans IntelliJ IDEA, Notepad++, Visual Studio

Network Protocols: TCP/IP, HTTP, HTTPS, UDP, DNS, FTP

PROFESSIONAL EXPERIENCE

Confidential, Indianapolis, IN

Senior Big Data/Cloud Engineer

Responsibilities:

  • Detailed understanding of business and high-level requirements, providing solutions by applying Big Data concepts, elucidating the same to business and management personnel.
  • To actively participate in Agile/Scrum events, to work with Business Analysts and help in requirements gathering, to analyses, design, develop and test Big Data pipelines implemented in both AWS cloud and On-Premise platforms.
  • Independently designed, developed and implemented an end-to-end data pipeline using various technologies like Spark, Hive, HBase, Oracle, Map Reduce, HDFS, YARN
  • Independently developed complex post-job data extraction systems that scan and extract data, and perform business related transitions using Spark, Hive, Python, Bash, and SMTP
  • Performed unit tests and validations by developing scripts and custom test automation systems in HQL, PySpark, and Bash scripting.
  • Scripted effective and highly performant programs in Java, Python, PySpark, Scala, SQL, Hive QL, Bash scripting
  • Configured and automated data pipelines with event-based triggers to kick start the ETL jobs using Control-M, and Oozie
  • Implemented a comprehensive reporting system that provides results of complex data validation and cleansing operations done in Apache Spark.
  • Successfully enhanced the performances of ETL jobs that run in a shared YARN cluster, by applying optimization techniques (code-level and architecture level)
  • Worked on developing a Hadoop-Java application that interfaces HBase and Spark and performs real time and parallel transactions at scale.
  • Automated jobs and data pipelines using AWS Step Functions, AWS Lambda, and AWS Cloud Trail
  • Developed and Deployed secure authentication system for AWS Lambda to deploy and submits jobs on transient and Hybrid cloud clusters in Cloudera Altus.
  • Successfully developed multi-platform logging Framework to normalize application logging across the data pipeline using Py4J Logging
  • Designed, developed and implemented Dynamic Programming Algorithms to handle huge volumes of workloads and deploy CDH Altus clusters at scale to achieve maximum parallelism with high cost effectiveness.
  • Successfully managed environment migrations and deployments using Bit Bucket, AWS Code Deploy, AWS Code Commit, and SonarQube, by following CI/CD methods.

Environment: Cloudera Distribution 6.x, Apache Spark 2.x, YARN, HDFS, Stream Sets, Hive, HBase, Impala, Oracle, My SQL, SQL Server, RHEL, Control-M, Zoo Keeper, Oozie, SFTP, and Kerberos, AWS S3, AWS Lambda, AWS Step Functions, AWS Cloud Formation, AWS SFTP, AWS Code Commit, AWS Storage Gateway, AWS Cloud Watch.

Confidential, Libertyville, IL

Senior Hadoop Developer

Responsibilities:

  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Load the data into Spark RDD and performed in-memory data computation to get faster output response and implemented spark SQL queries on data formats like Text file, CSV file and XML files.
  • Involved in loading data from LINUX file system to HDFS.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API, Data Frames and Pair RDD's for faster processing of data.
  • Involved in designing the row, key in HBase to store text.
  • Integrated Maven build and designed workflows to automate the build and deploy process.
  • Collected and aggregated large amounts of policy data from different sources such as webservers in the form of XML using Kafka and intercepted the received data using Apache Flume and stored the data into HDFS for analysis.
  • Used HUE for running Hive queries. Created Partitions per day using Hive to improve performance.
  • Used Kafka for log accumulation like gathering physical log documents off servers and places them in a focal spot like HDFS for handling.
  • Used Scala libraries to process XML data that was stored in HDFS and processed data was stored in HDFS.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Built a RedShift cluster for querying data in S3 using MySQL. Monitoring cluster, managing nodes was performed on RedShift.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
  • Responsible for gathering requirements, process workflow, data modelling, architecture and design and led application development using Scrum.
  • Created Directories in HDFS according to the date using Scala code.
  • Spark streaming collects the data from Kafka in near real time and performs necessary transformations and aggregations on the fly to build the common learner data model.
  • Created SQOOP jobs to handle incremental loads from RDBMS into Hive to apply Spark Transformations and Actions.
  • Developed Python code to perform validation of input XML files, in separating bad data before ingestion into the Hive and Data Lake.
  • Written Spark applications using Scala to interact with the MySQL database using Spark SQL Context and accessed Hive tables using Hive Context.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.

Environment: Cloudera, HaaS (Hadoop as a Service), Apache Kafka and the AWS, HDFS, Hive, Pig, Sqoop, Putty, Spark, SPARK SQL, Maven, Java, Scala, SQL and Linux, YARN.

Confidential, Rochester, MN

Hadoop Developer

Responsibilities:

  • SDLC Requirements gathering, Analysis, Design, Development and Testing of application using AGILE and SCRUM methodology.
  • Detailed understanding on existing build system, Tools related for information of various products and releases and test results information
  • Designed and implemented map reduce jobs to support distributed processing using Java, Hive and Apache Pig.
  • Consumed Web Services for transferring data between different applications using RESTFUL APIs.
  • Worked on Spark- Cassandra Connector to load data to and from Cassandra.
  • Involved in Test Driven Development (TDD) and Acceptance Test Driven Development (ATDD).
  • Managed and deployed Amazon Web Services Elastic MapReduce (AWS EMR) clusters.
  • Build cloud-native applications using Amazon Web Services - specifically Elastic Map Reduce (EMR), Lambda, DynamoDB, and Elastic Beanstalk.
  • Managed data schema versions across various microservices.
  • Developed and tested the enterprise application with JUNIT.
  • Written Custom writable classes for Hadoop serialization and De-serialization of time series tuples.
  • Implemented custom file loader for Pig to query directly on large data files such as build logs
  • Used Python for pattern matching in build logs to format errors and warnings
  • Developed Pig Latin scripts & Shell scripts for validating the different query modes in Historian.
  • Created Hive external tables on the MapReduce output before partitioning; bucketing is applied on top of it.
  • Implemented a prototype to integrate PDF documents into a web application using Git hub.
  • Active participation in process improvement, normalization/de-normalization, data extraction, data cleansing, SCRUM data manipulation
  • Developed rich interactive visualizations integrating various reporting components from multiple data sources
  • Used Shell scripting for Jenkins job automation with Talend.
  • Building a custom calculation engine which can be programmed according to user needs.
  • Ingestion of data into Hadoop using Shell scripting for SCRUM, Elastic Sqoop and apply data transformations and using Pig and Hive.

Environment: Python, Maven, GIT, Jenkins, UNIX, MySQL, Eclipse, Oozie, Sqoop, Flume, Oracle, JDK 1.8/1.7, Agile and Scrum Development Process, NoSQL, JBoss, Flink, Java Script, Apache Hadoop, Hive, Scala, PIG, HDFS, Akka, Cloudera, Java Map-Reduce, Cassandra 2.2 and Mockito

Confidential, Monroe, MI

Hadoop Developer

Responsibilities:

  • Developed Hive (version 0.12.0) scripts to analyze data and mobile numbers are categorized into different segments and promotions are offered to customer based on segments.
  • Worked on Sequence files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a day, during a dumped from the multiple sources and the output was written back to HDFS.
  • Prepared an ETL pipeline with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
  • Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
  • Involved in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive, Pig and Sqoop to import files into Hadoop.
  • Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled Structured data using Spark SQL.
  • Expert knowledge on MongoDB NoSQL data modeling, tuning and backup.
  • Used Zookeeper to manage coordination among the clusters.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Responsible for building scalable distributed data solutions using Hadoop.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Developed ETL scripts based on technical specifications/Data design documents.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Reporting the data to analysts for further tracking of trends per various consumers.
  • Developed Custom Input Format, Record Reader, Mapper, Reducer, Partitioned as part of developing end to end Hadoop applications.
  • Followed Agile-Scrum project development methodology for implementation of projects, part of the daily scrum meetings and sprint meetings.
  • Worked with NoSQL databases like HBase in making HBase tables to load expansive arrangements of semi structured data.
  • Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.

Environment: Spark, SPARK SQL, Yarn, Java, Maven, Git, Cloudera, Mongo DB, HDFS, Map Reduce, Pig, Mesos, AWS Hive, Sqoop, Scala, Flume, Mahout, HBase, Eclipse and Shell Scripting.

Confidential

Java Developer

Responsibilities:

  • Developed the DAO objects using JDBC.
  • Business Services using the Servlets and Java.
  • Involved in analysis, design and development of Expense Processing systems.
  • Created used interfaces using JSP.
  • Used Spring 2.0 Framework for Dependency injection and integrated with the Struts Framework and Hibernate.
  • Used Hibernate 3.0 in data access layer to access and update information in the database.
  • Experience in SOA (Service Oriented Architecture) by creating the web services with SOAP and WSDL.
  • Developed the Web Interface using Servlets, Java Server Pages, HTML and CSS.
  • Design and development of User Interfaces and menus using HTML 5, JSP, Java Script, Client side and Server-side validations.
  • Developed GUI using JSP, Struts frame work.
  • Involved in developing the presentation layer using Spring MVC/Angular JS/jQuery and designing the user interfaces using Struts Tiles Framework.
  • Developed JUnit test cases for all the developed modules.
  • Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
  • Used CVS for version control across common source code used by developers.
  • Used Ant scripts to build the application and deployed on Oracle WebLogic Server 10.0.

Environment: SOAP, WSDL, JDBC, JavaScript, HTML, CVS, Log4J, JUNIT, Struts 1.2, Hibernate 3.0, Spring 2.5, JSP, Servlets, XML, Web logic App server, Eclipse, Oracle, Restful.

We'd love your feedback!