Hadoop Developer Resume
Bentonville, AR
PROFESSIONAL SUMMARY:
- Experience of around 6 +years in IT and 3 years in Hadoop.
- Practicality videlicet in Big Data Technologies and Hadoop Ecosystem counting HDFS, Map Reduce, Hive, Pig, Oozie, Flume, Sqoop, Impala, and Spark build on Big Data Platforms.
- Forbearance in exertion with BI team and transforming Big Data prepossession into Hadoop centric technologies.
- Wrought excellently in extracting source data from Sequential files, CSV files, Text files, transforming and loading it into the target data warehouse.
- Great exposure in working with Spark(Data frames, dataset, RDD, data source and data source API). Proficiency in writing Pig Latin, HiveQL Scripts and extended their functionality using User Defined Functions (UDF's).
- Recognizable knowledge on Java virtual machine (JVM) and multi - threaded processing and also with tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Hands on experience on AWS cloud services (VPC, EC2, S3, Snowflake, Data Pipeline, EMR, DynamoDB, Lambda, Kinesis, SNS, SQS).
- Implemented Custom Airflow Operators like SShOperator, BashOperator, SlackOperator, other Hadoop related component operators (Spark, Hive, Pig, Sqoop and Athena) and EMROperator to spin up the cluster on demand basis.
- Exposure on working with shell scripts and used IDEs like Eclipse and IntelliJ IDE for development activities.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/DB2 and vice-versa and in using Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
- Hands on experience in developing Oozie workflows for coordinating the cluster and scheduling the workflow and Good in Scala functional and object-oriented programming.
- Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
TECHNICAL SKILLS:
Big Data Technologies: MapReduce, HDFS, Hive, Pig, Impala, Hue, Sqoop, Kafka, Oozie, YARN. Spark, Spark SQL (Data Frames and Dataset), Spark Streaming.
Cloud Infrastructure: AWS Cloud Formation, S3, EC2-Classic and EC2-VPC.
Scripting/Programming Languages: SQL, C, C++, Java, Core Java, Python, Scala, Shell scripting, Python scripting.
Databases/Web Technologies: Oracle, MySQL, JavaScript, CSS, XML, CSS, HTML and JSP.
Operating Systems/Tools: Windows, UNIX/Linux and Mac OS, Eclipse, Toad, WinSCP, Apache Maven and Simple Build Tool (SBT).
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential - Bentonville, AR
Responsibilities:
- Developed complex transformations in Spark and Hive for calculating sales and returns of the products for forecasting the customer demand.
- Sourced Data using spark RDD and Data frame API and optimized the code further to reduce the number of shuffles and improve performance by reducing the latency.
- Developed a general-purpose utility in Spark for both In and Out of Data from RDBMS systems to Hadoop HDFS.
- Developed the Spark Pipeline to streamline the current processing engine.
- Worked with Spark-SQL context to create data frames to filter input data for model execution.
- Extensively worked on Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in Design and development for building the common architecture for retail data across the Geos.
- Designed, developed spark scripts for parsing the JSON files and storing in Parquet file format in EMR.
- Improved the performance of spark jobs by configuring job settings and cluster settings in cluster.
- Implemented Big Data analytical solutions using HiveQL, Impala, and Spark.
- Designed a merge and flatten view of multiple layouts across different data sources.
- Automating and scheduling the jobs through Oozie.
- Migrated existing On Premise Hadoop to AWS Cloud EMR cluster.
- Currently we are migrating the AWS EMR Jobs to Airflow for automation and scheduling the jobs.
- Implemented the generic template to spin up different AWS instances including EC2-classic and EC2-VPC.
Hadoop Developer
Confidential - CA., CA
Responsibilities:
- Importing and exporting data into HDFS and Hive using SQOOP.
- Implemented custom Data Types, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computations.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Good knowledge on AWS Data Pipeline, EMR-(/Hive/Spark).
- Automating and scheduling the jobs through Airflow
- A developed framework to execute the Hadoop jobs in EMR on demand and later terminate the cluster.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in the map-reduce way.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Involved in joining and data aggregation using Apache Crunch.
- Worked on Oozie workflow engine for job scheduling.
- Importing log files using Flume into HDFS and load into Hive tables to query data.
- Developed pig scripts and UDF's as per the Business logic.
- Designed and developed read lock capability in HDFS.
- Developed hive scripts for implementing control tables logic in HDFS.
- End to End implementation with AVRO and Snappy.
- Developed various processes to source the data from SAS Grid and RDBMS systems.
- Created Sample Views, validating metadata and Analysed the sample queries on Impala with High Configured RAM Machines with respect to the increase in a number of users.
- Involved in creating the Auto-sys Jil Scripts for Scheduling the PROD jobs.
Business Analyst
Confidential
Responsibilities:
- Developed Agent facing application for auto-answer queries and user-friendly interface by capturing business requirements from stakeholders.
- Interpret the business engrossment's and sympathized gaps in different processes and implemented process improvement initiatives across the business improvement model. deciphered the evident issue which incorporated of handling numerous customers at a time by hauling the prerequisites from stakeholders and making the developers understand the specifications and help them lay or repose a plan.
- Worked with a group of 5 developers and achieved a quick fix for the real-time issue in an unsubstantial period of time than the committed duration.
- Conducted a gap analysis on the existing Applications and suggested the possible solutions for the limitations which are already in the application.
- Also implemented the multiple solutions which are offered by taking help from stakeholders and developers.
Quality Analyst
Confidential
Responsibilities:
- Confab the Business stipulation and worked with the Development Team for fathoming the details of Functional and Non-Functional Requirement of the internal application used by Amazon CS formulated Test Plan throughout the initial phase of the project, braced test scripts, shaped the Release note in Application Lifecycle Management (ALM), setup/link Test Lab with release cycle, knocked off the test cases and recorded results in ALM.
- Prepared meeting notes on constant execution/highlight of the project and sends out the reports to management.
- Interact with Developers and Business Analysts to perform various types of testing throughout the Application Testing Life Cycle.
- Mainly tested the Application using Black box Testing Techniques Reporting errors generated and bugs found using VSTS(Visual Studio Team System).
Quality Analyst
Confidential
Responsibilities:
- Developed and maintained Quality Assurance procedure documentation.
- Implemented test scripts and recorded results.
- Worked and created use cases for the algorithms for which 100% automation testing is not possible.
- Creating testing uses cases to ensure that the Algorithms which is delineated to track down the utility flagpoles is error free and it is working in conformance to the specified functional requirements.
- Secure the test suites or cases thoroughly reliable with promised results.
- Attesting the errors found after the test cases are executed and handing over the Attestation to the development team and working closely with them to resolve the error or to at least find the root cause of the error.
- Once errors are fixed, again execute the errors found in test cases to verify they pass.