Senior Hadoop Developer/lead Resume
Chicago, IL
PROFESSIONAL SUMMARY:
- Senior Software Engineer with 8 + years of IT experience, 4+ years of experience in developing applications that perform large scale Distributed Data Processing using Big Data ecosystem tools Hadoop, Hive, Pig, Sqoop, Hbase, Cassandra, Spark, Spark Streaming, Spark SQL, Kafka , Mahout, Oozie, ZooKeeper, Flume and Yarn.
- Well - versed with the business domains like banking, Healthcare, Insurance, Entertainment and Travel sectors in Big-Data implementations.
- Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks).
- Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Hands on performing ad-hoc queries on structured data using Hive QL and used Partition and Bucketing techniques and joins with HIVE for faster data access.
- Real time exposure to Amazon Web Services, AWS command line interface, and AWS data pipeline, EMR, S3.
- Architect, design, construct, test, tune, and deploy Talend - ETL infrastructure based on the Hadoop ecosystem based technologies.
- Knowledge and experience of ELT for Data Lake to ETL for the data servicing layer life cycle.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- End-to-End handling of data related to customer transactions from ingestion into HDFS to making them available for analytics and downstream processes.
- Developed the framework to process the Data loads from internal systems and external systems like Salesforce, SAP Hana Studio, and various file formats which includes XML, JSON, CSV, Avro, Parquet, ORC, Sequence, Texts and received from the application end on incoming layer. Designing a sustainable data-ingestion solution to maintain data lineage for long term viability.
- Solid experience in consolidating, integrating and migrating Enterprise Data to Cloud based Data Lake.
- Experience in SOA (SOAP), Micro-services (API’s) and Server-less Lambda Architecture Environments.
- Experience in developing Web Services & Micro service SOAP, RESTful using CXF, Axis, Spring boot, JAX-RS, RESTEasy, JAX-B, XML, LDAP and WSDL.
- Experience in building enterprise Applications and Distributed Systems using technologies such as Core Java, J2EE (Spring, Hibernate, Struts, Servlets, JSP, JSF, JDBC, JMS) and XML.
- Good understanding of Data Streaming, Kafka and Machine Learning techniques.
- Experience on Cloud technologies on AWS, Spring cloud, Salesforce.
- Strengths include good team player, excellent communication interpersonal and analytical skills, flexible to work with new technologies and ability to work effectively in a fast-paced, high volume, deadline-driven environment.
TECHNICAL SKILLS:
Languages: Java, Scala
Big Data: Hadoop, Hive, Spark, HBase, Cassandra, Pig, Avro, Kafka, Atlas, Kerberos, phoenix, Datameer
Web Technologies: Spring MVC, Hibernate, JSP, JSTL, JSF, Servlets, D3, HTML, Java script, Ajax, JQuery, CSS, Angular JS, Nodejs
Web Services: Micro web service, RESTful, SOAP
Cloud Technologies: AWS - EC2, EMR, S3, Redshift,Salesforce.
Scripting: Python, Perl, Shell and Batch, Autoit, Ant, Maven.
IDE: Eclipse, STS, intellij
Source Control: GitHub, Subversion, CVS, VSS
Database: HBase, Teradata, SQL Server, Oracle, MS Access, MongoDB, SQLite
Application Servers: Tomcat 5/6/7, Web logic, JBoss 4.1, IIS 7
PROFESSIONAL EXPERIENCE:
Confidential - Chicago, IL
Senior Hadoop developer/Lead
Responsibilities:
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor by understanding of SQL, Talend ETL and Data Warehousing Technologies.
- Written generic Sqoop queries and used bash scripts to provide the external hive tables present as a part of Data Lake.
- Developed Scala scripts, UDFFs using both Data frames/SQL and RDD/MapReduce in Spark 2.2 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's..
- Experience in using spark SQL and Hive Query Language for data Analytics. Experienced in job workflow scheduling and monitoring with Zena.
- Moving data from DB2, Oracle, MySql, TeraData to HDFS and vice-versa using SQOOP.
- Hands on performing ad-hoc queries on structured data using Hive QL and used Partition and Bucketing techniques and joins with HIVE for faster data access.
- Experience in designing and developing jobs to validate the data, post migration like reporting field from source and designation system, using Spark SQL Frames.
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Good experience with Talend open studio for designing ETL Jobs for Processing of data.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Used GitHub for source code maintenance and for version control.
Environment: Hadoop (HDFS,Hive,Pig,Hbase), Salesforce, Scala/Java, Bulk API, XML, CSV, JSON,Oracle 11g,DB2,Teradata,, SQL Server, Jenkins, Maven, Agile/Scrum, Windows & Unix.
Confidential - Detroit, MI
Data engineer & Integration /Lead
Responsibilities:
- Written generic Sqoop queries and used bash scripts to provide the external hive tables present as a part of Data Lake.
- Ingest of raw data from disparate business systems that contain core banking, wealth management, risk data, trade & position data, customer account data, transaction data, wire data, payment data, event data and mortgage etc. into HDFS.
- Created Spark job to apply to transformation and generate model out data then based on model will defecting the Fraud.
- Used Spark Job for analyzing the Financial data to help by extracting data sets for meaningful information such as Account info, Risk, dealer, Fraud Detail, SAR etc.
- Created Spark Micro service to create case/alert in Salesforce system from datalake.
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor by understanding of SQL, ETL and Data Warehousing Technologies.
- Expertise in Spark eco system components such as RDD Transformations, Data source API, Dataframe API, Spark Sql, Dataframe DSL with SCALA and Zeppelin and Tableau visualization tools for distributed computing and test analytics.
- Experience in using Hive Query Language for data Analytics. Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Moving data from DB2, Oracle, Hana Studio, Salesforce data to HDFS and vice-versa using SQOOP.
- Developed Map Reduce program as required for joins, parsing and loading into HDFS information.
- Hands on performing ad-hoc queries on structured data using Hive QL and used Partition and Bucketing techniques and joins with HIVE for faster data access.
- Experience in designing and developing jobs to validate the data, post migration like reporting field from source and designation system, using Spark SQL Frames.
- Loading very large sets of data is becoming a common task for implementations, integrations and migration using Bulk API, SOAP, SOA. location.
- Used SVN for source code maintenance and for version control.
Environment: Hadoop (Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN), Salesforce, Scala/Java, Bulk API, SOAP, WSDL, JAX-B, XML, CSV, JSON,Oracle 11g, GIT, SQL Server, Jenkins, JUnit, Log4j, SVN, Maven, Agile/Scrum, Windows & Unix.
Confidential, Detroit, MI
Hadoop Developer
Responsibilities:
- Involved in project from Analysis to Production Implementation, with emphasis on identifying the source and source data validation, developing logic and transformations as per the requirement and creating mappings and loading the data into target tables.
- Involved in loading data from RDBMS into HDFS using Sqoop queries.
- Handle importing of data from various data sources, perform transformations using Hive, MapReduce, load data into HDFS and extract the data from MySQL into HDFS using Sqoop.
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in creating Pig tables, loading with data and writing Pig Latin queries which will run internally in Map Reduce way.
- Documented the systems processes and procedures for future references.
- Provide batch processing solution to certain unstructured and large volume of data by using Hadoop Map Reduce framework.
- Involved in writing Unix/Linux Shell Scripting for scheduling jobs and for writing pig scripts and hive QL.
- Developed Scripts and automated data management from end to end and sync up between all the clusters.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Assisted in performing unit testing of Map Reduce jobs using MRUnit.
- Used Zena Scheduler system to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
- Used Zookeeper for providing coordinating services to the cluster.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
- Extensively used Pig for data cleansing.
- Create partitioned tables in Hive.
- Environment: Hadoop, MapReduce, HDFS, Hive, Pig,Yarn,Java, HQL, Sqoop, Zena, Oozie,
Confidential
Java/J2EE Developer
Responsibilities:
- Analysed the requirements and designed class diagrams, sequence diagrams using UML and prepared high level technical documents.
- Developed ‘ hammurabi’ Rule Engine to compare Source and Designation XML and compare their parent and child relationship, Rule are configuration level and they are designed as most generic way.
- Components and code based on test-driven approach(TTD).
- Implemented the persistence layer using Hibernate and configured Hibernate with Spring to interact with the Database from the DAO.
- Used Java Persistence API (JPA) and Hibernate for performing the Database Transactions.
- Processed JSON data from RESTful web service by using Ajax to get resources from the database and populated the data to the client side.
- Used Java Persistence API (JPA) and Hibernate for performing the Database Transactions.
- Created the domains, application servers and load balancers using tomcat
- Wrote the JUnit test cases and integration testing of the system
Environment: Java 6, J2EE, Struts 2.0, Eclipse, JSF2.1, JPA, Hibernate3.0, Apache CXF, Jenkins, JAX-WS, XML, XSLT, JSP, JavaScript, jQuery, HTML, CSS, JUnit, Maven.
Confidential
Java Developer
Responsibilities:
- Involved in design and implementation of statistics report generating using JQuery and JSON data injection with different vacuolation option.
- Used Hibernate, Object Relational-Mapping (ORM) solution, technique of mapping data representation from MVC model to Oracle Relational Data Model with a SQL-based schema.
- Written Ant script to build the application and deploy into different server local and Amazon cloud.
- Developed Test cases, Test scenarios & Test scripts for Unit testing and Black box testing.
- Created UNIX shell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
- Tested of Web services using POSTMAN and SOAP-UI.
- Implemented logging using log4j for monitoring and debugging the application
- Monitored error logs using Log4J and fixed the problems.
- Worked on JUnit Framework for Test Driven Development.
Environment: Java, Struts, JSF, UML, JSP, Servlets, ANT, XML, JIRA, Oracle, Apache Tomcat, Python, Perl, C, C++, SQL, EJB, JQuery, MySQL, WSDL
Confidential
Java Developer
Responsibilities:
- Developed GUI using HTML, JSP, Spring frame work
- Involved in design and implementation Mail Box for every users
- Involved in design and implementation of statistics report generating using JQuery and JSON data injection
- Coded Entity Beans for data persistence within Data Layer.
- Developed user interface components for Deal, Activity modules along with business components.
- Developed a Spring MVC application in connecting to database.
- Adapted RESTful for performing web services.
- Written SQL, PL/SQL and stored procedures as part of database interaction.
- Extensively used JUnit for unit testing
- Responsible for code merging modules using clear case.
- Responsible for change requests and maintenance during development of the project.
Environment: Java, J2EE, Servlets, Spring, Servlets, Custom Tags, Java Beans, Restful web services, Ajax, JUnit, JSP, Log4j, XML