Sr. Big Data/ Hadoop Developer Resume
Oak Brook, IL
PROFESSIONAL SUMMARY:
- Over 8+ years of IT experience in Analysis, Design, Development and in Big Data,Scala, SparkHadoop and HDFS environment and experience in JAVA, J2EE.
- Experienced in developing and Implementing MapReduce programs using Hadoop to work with Big Data as per the requirement.
- Excellent experience on Scala, Apache Spark, Spark Streaming, Pattern - matching, Map-reducing, Frame-works like lift framework and Play framework, RDD (Resilient Distributed Datasets).
- Extensive testing ETL experience using Informatica 8.1 /7.1/6.2 (Power Center/ Power Mart) (Designer, Workflow Manager, Workflow Monitor and Server Manager).
- Developed ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Worked in Data Warehouse and Business Intelligence Projects along with the team of Informatica, Talend (ETL), Cognos 10, Impromptu and Powerplay.
- Excellent experience in Apache Hadoop ecosystem components like Hadoop Distributing File System (HDFS), MapReduce, Hive, Sqoop, Maven, HBase, PIG, Kafka, Zoo Keeper, Scala, Flume, Storm and Oozie.
- Good Knowledge of Hadoop architecture and various components such as HDFS Framework, Job Tracker, Task Tracker, Name Node, Data Node and MRV1 and MRV2 (YARN)
- Experienced in developing MapReduce jobs in Java for data cleansing, transformations, pre-processing and analysis. Multiple mappers are implemented to handle data from multiple sources.
- Experienced on Spark and Scala, Spark SQL, Spark Streaming, Spark GraphX, SparkMlib.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
- Experienced on Hadoop daemon functionalities, resource utilizations and dynamic tuning in order to make cluster available and efficient.
- Expertise in writing custom UDF's for extending Hive and Pig core functionality.
- Experienced in setting up data gathering tools such as Flume and Sqoop.
- Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
- Excellent Knowledge on NOSQL Databases like Cassandra, MongoDB and HBase.
- Experienced in Hive Partitioning, bucketing and perform different types of joins on Hive tables and implementing Hiveserdes like REGEX, JSON and Avro.
- Experienced in Scripting using UNIX shell script.
- Experienced in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development.
- Extensively worked on the ETL mappings, analysis and documentation of OLAP reports requirements. A good understanding of OLAP concepts working especially with large data sets.
- Experienced in Dimensional Data Modeling using star and snowflake schema.
- Extensively worked on Talend Designer Components - Data Quality (DQ), Data Integration (DI) and Master Data Management (MDM).
- Good knowledge on Data Mining and Machine Learning techniques.
- Proficient in Oracle … SQL and PL/SQL.
- Experienced in integration of various data sources like Oracle, DB2, and Sybase, SQL server and MS access and non-relational sources like flat files into staging area.
- Experienced in large cross-platform applications using JAVA, J2EE with experience in Java core concepts like OOPS, Multi-threading, Collections and IO.
- Experienced on applications using Java, RDBMS, and Linux shell scripting.
- Have good interpersonal skills, good communication, problem solving skills and a motivated team player.
- Have the ability to be a value contribution to the company.
TECHNICAL SKILLS:
Hadoop Eco System: Hadoop, Map Reduce, Sqoop, Hive, Oozie, Pig, HDFS, ZooKeeper, Flume, HBase, Impala, Spark, Storm, Hadoop (Cloudera), Horton Works and Pivotal).
No SQLDatabases: HBase, Cassandra, MongoDB
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, NetBeans, Eclipse
Languages: Java, SAS, Scala and Apache Spark, SQL, PL/SQL, PIG Latin, HiveQL, Unix Shell Scripting
Databases: Oracle … My SQL, DB2, MS SQL Server
Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic
Web Services: WSDL, SOAP, REST
Methodologies: Agile, Scrum
WORK EXPERIENCE
ConfidentialOak Brook, IL
Sr. Big Data/ Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
- Wrote the Spark code in Scala to connect to Hbase and read/write data to the HBase table.
- Extracted data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
- Developed the technical strategy of using Apache Spark on Apache Mesos as a next generation, Big Data and “Fast Data” (Streaming) platform.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite. Extensively used the concepts of ETL to load data from AS400, flat files to Salesforce.
- Extensively involved in ETL Data warehousing using Informatica PowerCenter Designer tools like Source Analyzer, Target Designer, Mapping Designer, Mapplets Designer, Transformation Developer, Workflow Manager and Workflow Monitor.
- Implemented Flume, Spark framework for real time data processing.
- Developed simple to complex Map Reduce jobs using Hive and Pig for analyzing the data.
- Used different Serdes for converting JSON data into pipe separated data.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Creating the Spark Streaming code to take the source files as input.
- Used Oozie workflow to automate all the jobs.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
- Developed spark programs using scala, Involved in creating Spark SQL Queries and Developed oozieworkflow for spark jobs
- Built analytics for structured and unstructured data and managing large data ingestion by using Avro, Flume, Thrift, Kafka and Sqoop.
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Spark, Apache Storm etc.
- Ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Exported the patterns analyzed back to Teradata using Sqoop.
- Developed Bankers Rounding UDF for Hive/Pig or Implemented Teradata Rounding in Hive/Pig.
- Continuously monitored and managed the Hadoop Cluster using ClouderaManager.
- Hands on experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions.
Environment: Hadoop, Map Reducer, Cloudera Manager, HDFS, Hive, Pig, Spark, Storm, Flume, Thrift, Kafka, Sqoop, Oozie, Impala, SQL, Scala, Java (JDK 1.6), Hadoop (Cloudera), Tableau, Eclipse and Informatica.
ConfidentialSunnyvale, CA
Sr. Big Data/ Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
- WroteSQL queries to process the data using SparkSQL.
- Extracted data from different databases and to copy into HDFS file system using Sqoop.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Created scrip using python to map reduce.
- Wrote scala classes to interact with the database and scala test cases to test scala written code.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Responsible to manage data coming from sources (RDBMS) and involved in HDFS maintenance and loading of structured data.
- Creating HaddonCluster and tables in Hbase in HUE and Expertise in hive, pig, hbase, sqoop in HUE.
- Optimized several Map Reduce algorithms in Java according to the client requirement for big data analytics.
- Responsible for importing data from MySQL to HDFS and provide the query capabilities using HIVE.
- Wrote entities in Scala and Java along with named queries to interact with database.
- Create a table inside rdbms, insert some data after load the same table into hdfs, hive using sqoop
- Used Sqoop to import the data from RDBMS to HadoopDistributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Developed the sqoop scripts in order to make the interaction between Pig and MySQL Database.
Environment: Hadoop, Map Reducer, HDFS, Hive, Pig, Spark, Storm, Kafka, Flume, Sqoop, Oozie, SQL, Scala, Java (jdk 1.6), Hadoop (Horton Works), Eclipse, Talend Studio and Informatica.
ConfidentialRaleigh, NC
Hadoop Developer
Responsibilities:
- Gathered business requirements from the Business Partners and subject matter experts and prepared Business Requirement document.
- Developed simple to complex Map/Reduce jobs using Hive and Pig.
- Handled importing of data from various data sources performed transformations using Hive, MapReduce, and loaded data into HDFS and extracted the data from MySQL into HDFS using sqoop.
- Wrote entities in Scala and Java along with named queries to interact with database.
- Analyzed the data by performing Hive queries and Pig scripts to study customer behavior.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HadoopDistributed File System and PIG to pre-process the data.
- Installed, Configured Cognos8.4/10 and Talend ETL on single and multi server environments.
- Involved inSpark Streaming which collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQLstore (Hbase).
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Used UDF's to implement business logic in Hadoop.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- UsedNoSQL database Cassandra for information retrieval.
- Implemented of Regression analysis using MapReduce.
- Developed scripts and Batch Jobs to schedule various Hadoop programs using Oozie.
- Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
- TestedMapReduce code using JUnit testing.
- UsedCloudera manager to monitor the health of the jobs which are running on the cluster.
Environment: Java (jdk 1.6), Hadoop, MapReduce, Pig, Hive, Scala, Spark, Cassandra, Sqoop, Oozie, HDFS, Hadoop (Cloudera), MySQL, Eclipse, Oracle.
ConfidentialMilwaukee, WI
JAVA Developer
Responsibilities:
- Prepared High Level and Low Level Design document implementing applicable Design Patterns with UML diagrams to depict components, class level details.
- Interacting with the system analysts & business users for design & requirement clarification.
- Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
- Developed JSPs according to requirement.
- Created complex mappings in Talend 5.x
- Created Talend Mappings to populate the data into Staging,Dimension and Fact tables.
- Excellent knowledge of NOSQL on Mongo and CassandraDB.
- Developed integration services using SOA, Mule ESB, Web Services, SOAP, and WSDL.
- Designed, developed and maintained the data layer using the ORM framework in Hibernate.
- Involved in Analysis, Design, Development, and Production of the Application and develop UML diagrams.
- Presented top level design documentation to the transition of various groups.
- Used spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated spring with JSF.
- Wrote AngularJScontrollers, views, and services.
- UsedAnt for building and the application is deployed on JBoss application server.
- Developed HTML reports for various modules as per the requirement.
- Analyzed known information into concrete concepts and technical solutions.
- Assisted in writing the SQL scripts to create and maintain the database, roles, users, tables in SQL Server.
Environment: Java, JDBC, spring, JSP, JBoss, Servlets, Maven, Jenkins, Flex, HTML, AngularJS, Mongo DB, Hibernate, JavaScript, Eclipse, Struts, SQL Server2000.
ConfidentialJr. JAVA Developer
Responsibilities:
- Analyzed Object Oriented Design and presented with UML Sequence, Class Diagrams.
- Performed data manipulations using Talend.
- Used Talend reusable components context variable and globalMap variables.
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed components using Java multithreading concept.
- Developed various EJBs (session and entity beans) for handling business logic and data manipulations from database.
- Involved in design of JSP's and Servlets for navigation among the modules.
- Designed cascading style sheets and XSLT and XML part of Order entry Module & Product Search Module and did client side validations with java script.
- Hosted the application on Web Sphere.
Environment: J2EE, Java/JDK, PL/SQL, JDBC, JSP, Servlets, JavaScript, EJB, JavaBeans, UML, XML, XSLT, Oracle9i,HTML/DHTML,UML,JavaScript