Lead Big Datadeveloper Resume
EwyorK
SUMMARY
- Over 8 years of IT experience in Analysis, Design, Development, Maintenance Big Data Technologies with over 4 years into Hadoop & Spark framework
- Strong knowledge and experience with Hadoop architecture and various components such as HDFS, YARN, Pig, Hive, Sqoop, Oozie, Flume, Spark and Map Reduce programming paradigm.
- Hands on experience in Import/Export of data using Hadoop Data Management tool SQOOP.
- Experience in analyzing data using SparkSQL, HIVEQL, PIG Latin and experience in developing custom UDFs using Pig and Hive
- Experience in ingesting data from various application servers/sources into HDFS, HIVE and HBase.
- Performed POC on Spark streaming using Kafka to achieve real time data analytics.
- Experience in designing both time - driven and data-driven work flows with Oozie
- Extensive knowledge in working with J2EE technologies such as Servlets, Struts, JSP, JDBC, EJB, JNDI, and JMS.
- Strong experience with SQL(DDL & DML) in Implementing & Developing Stored Procedures, Triggers, Nested Queries, Joins, Cursors, Views, User Defined Functions, Indexes,Relational Database Models.
- Proficient in designing Insight schema, ETL jobs and transformations in Pentaho Kettle.
TECHNICAL SKILLS
Hadoop ECO Systems: Map Reduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Spark-SQL, Oozie,Scala, shell scripting,Java, J2EE, AWS S3, AWS SNS, AWS MKS, AWS Glue, EMR,Dynamo DB, AWS Lambda, Python
J2EE Technologies: Servlets,Struts,JSP, EJB, JDBC, JNDI
Frame works: MVC,Jakarta Struts, Spring and Hibernate
NO SQL Database: Hbase
Database: MySQL 4.x/5.x, Teradata
Messaging: JMS
SCM/Version control tools: CVS, Sub Version
Build and continuous Integration: Maven, Jenkins
ETL Tool: Pentaho Kettle
IDE’s: Eclipse,RAD
PROFESSIONAL EXPERIENCE
Lead Big DataDeveloper
Confidential, Newyork
Responsibilities:
- Involved in design and development of Pyspark Data loader to automate data ingesion from On prime databases to AWS S3.
- Used AWS KMS to encrypt/decrypt Oracle On-prime passwords
- Involved in development of Cloud pipeline framework to operationalize datasets that feed Tableau
- Involved in development of recurring dataset framework to automate conversion of PAL or fixed width to delimited files.
- Involved in data validation of imported datasets from on prime or oncloud to AWS S3 by creating external hive tables.
- Involved in EMR upgrade process
- Performed POC on AWS Glue to automate data ingesion from Oracle/SQL server and perform analytics on ingested data.
- Evaluated Spark jdbc vs Sqoop for data ingestion from Onprime/Oncloud databases to AWS S3.
- Worked on Dynamo DB No-SQL database to store receipe details, metadata and log information.
- Created shell scripts to invoke pyspark commands.
- Create AWS Lambda functions to create/terminate/monitor AWS EMR
- Used step functions to invoke AWS lambda, monitor EMR status.
Environment: Spark, Scala,Java, Spark SQL, AWS S3, AWS SNS, AWS MKS, AWS Glue, EMR,Dynamo DB, AWS Lambda, Python, Eclipse,Bit bucket, Confluence Hive 1.1, Sqoop 1.4.6, parquet,JSON, Tidal Scheduler,Shell Scripting, Oracle, SQL Server
Hadoop Developer
Confidential
Responsibilities:
- Supported and Monitored Map Reduce Programs running on the cluster.
- Worked on data quality framework to generate reports to business on the quality of data processed in Hadoop
- Created Map reduce programs to process unstructured data from upstream applications.
- Written Hive UDFs to extract data from staging tables.
- Involved in creating Hive tables & views to load, transform the data
- Involved in writing Pig scripts
- Setup UAT Hadoop environment for QA testers for them to perform user acceptance testing for monthly releases.
- Worked on Avro & parquet data formats to create Impala views on parquet tables.
- Used Oozie scheduler to create workflows and scheduled jobs in Hadoop Cluster
- Spark RDDs are created for all the data files which then undergo transformations.
- The filtered RDDs are aggregated and curated based on the business rules and converted into dataframes and saved as temporary hive tables for intermediate processing.
- The RDDs and dataframes undergoes various transformations and actions and are stored in HDFS as parquet Files.
Environment: Java, Eclipse, Hadoop, Pig0.12, Hive 1.1, Map Reduce, HDFS,My SQL, Sqoop 1.4.6,CDH5.5.2, Oozie, Avro, parquet, Toad,Shell Scripting, Teradata, Impala, J2EE, Spark,Scala, Spark SQL
Hadoop Developer
Confidential
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Pig Latin scripts to extract the data from FTP server to load into HDFS.
- Created partitioned external tables in Hive which are used as staging table for the data loaded from the inbound feeds.
- Design the data model for storing the data.
- Designed and Developed the hive scripts to load the data into HIVE database
- Designed the pipeline and scheduled it using the Oozie scheduler.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HiveQL, Sqoop, Java (jdk 1.6), Eclipse, Maven.
Hadoop DeveloperConfidential
Responsibilities:
- Have done the Top-level analysis for dynamic reports generation.
- Analyzing and translating business requirements to technical requirements for change request.
- Developed User Interface POC based on MVC frameworks like JSF, Spring MVC.
- Generating the Information Links and Converting it in to Graphical Reports.
- Developed User Interface using JSP,Ajax,Java script and Ext JS for rich UI development.
- Extensively used Spring JDBC to develop DAO layer which performs all the DDL and DML operations for services.
- Involved in writing of Controller Classes, Utility Classes for both service layer and DAO layers.
- Involved in writing the SQL procedures and query’s to fetch the data.
- Involved in Bug Fixing.
- Prepared test case document and performed unit testing and system testing.
- Developed code for handling the exceptions using exceptional handing.
- Logging done using Log4j.
- Generated Reports using Fusion Charts for rich UI.
- Involved in writing reusable code for generating data in excel sheets.
Environment: Java, JSP, Servlets, Spring, Hibernate, EJB, JMS, PL/SQL, ExtJs, Ajax, log4j, Fusion Charts, Jboss, MySQL 4.1, Maven, ANT, SubVersion and Bugzilla.
Hadoop DeveloperConfidential
Responsibilities:
- Extensively involved in Designing Insight Schema & developing ETL.
- Involved in writing Jobs and Transformations using Pentaho Kettle.
- Worked in setting up schedulers to the Jobs created in ETL.
- Involved in the Performance tuning of queries for faster execution of reports.
- Involved in the writing complex procedures and user defined functions in PL/SQL for generating reports to the end users.
- Involved in writing SQL Queries to retrieve data for report generations.
- Understanding and interpreting the customers business and functional requirement for each of the change request.
- Involved in Bug Fixing and unit testing of reports.
- Environment & Technologies: Pentaho Kettle, MySql, Toad for MySql, MySQL 4.1, SubVersion and Bugzilla.