Hadoop Developer Resume
Hartford, CT
SUMMARY
- 5+ years of IT experience in Architecture, Analysis, design, development, implementation, maintenance and support wif experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Around 3 years of experience on BIG DATA using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
- Experience in data analysis using HIVE, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience in training people on Big data and cloud technologies
- Experience in writing custom UDFs in java for Hive and Pig to extend the functionality.
- Experience in writing MAPREDUCE programs in java for data cleansing and preprocessing.
- Excellent understanding /noledge on Hadoop (Gen - 1 and Gen-2) and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager (YARN).
- Experience in managing and reviewing Hadoop log files.
- Experience in working wif Flume to load the log data from multiple sources directly into HDFS.
- Excellent understanding and noledge of NOSQL databases like MongoDb, HBase, and Cassandra.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
- Experience on other big data technologies like Apache Kafka, Spark and Storm.
- Hands-on experience wif message brokers such as Apache Kafka, IBM WebSphere, and RabbitMQ.
- Worked extensively wif Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Implemented Hadoop based data warehouses, integrated Hadoop wif Enterprise Data Warehouse systems
- Built real-time Big Data solutions using HBASE handling billions of records.
- Good Experience in importing data from Teradata sources to HDFS vice versa.
- Good experience working wif Horton works Distribution and Cloudera Distribution.
- Worked extensively wif Dimensional modeling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses
- Experience in Object Oriented Analysis Design (OOAD) and development of software using UML Methodology, good noledge of J2EE design patterns and Core Java design patterns.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in writing UNIX shell scripts.
- Experience working wif JAVA, J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets, MS SQL Server.
- Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and Implementation of Enterprise level Data mart and Data warehouses.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database.
- Experience in J2EE technologies like Struts, JSP/Servlets, and spring.
- Good Exposure on scripting languages like JavaScript, Angular JS, jQuery and xml.
- Ability to work in high-pressure environments delivering to and managing stakeholder expectations
- Application of structured methods to: Project Scoping and Planning, risks, issues, schedules and deliverables.
- Strong analytical and Problem solving skills.
- Good Inter personnel skills and ability to work as part of a team. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines
TECHNICAL SKILLS
Technology: Hadoop Ecosystem/J2SE/J2EE / Data base
Operating Systems: Windows Vista/XP/NT/2000/ LINUX (Ubuntu, Cent OS), UNIX
DBMS/Databases: DB2, My SQL, PL/SQL
Programming Languages: C, C++, Core Java, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services, Xml.
Big Data Ecosystem: HDFS, Map Reducing, HDFS, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper, Kafka, storm, Apache Falcon and Hbase
Methodologies: Agile, Water Fall
NOSQL Databases: Cassandra, MongoDb, Hbase
Version Control Tools: SVN, CVS, VSS, PVCS
ETL Tools: IBM data stage 8.1, Informatica
PROFESSIONAL EXPERIENCEHadoop Developer
Confidential, Hartford CT
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop Ecosystem.
- Responsible for writing MapReduce jobs to handle files in multiple formats (JSON, Text, XML etc..)
- Developed PIG UDFs to perform data cleansing and transforming for ETL activities.
- Developed HIVE UDF, UDAF and UDTF for Data analysis and Hive table loads.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest data into HDFS for analysis
- Worked extensively on creating combiners, Partitioning, Distributed cache to improve the performance of MapReduce jobs.
- Worked on Creating the MapReduce jobs to parse the raw web logs data into delimited records.
- Used Pig to do data transformations, event joins and some pre-aggregations before storing the data on the HDFS.
- Developed Sqoop scripts to import and export data from and to relational sources by handling incremental data loading on the customer transaction data by date.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Migrated data from Teradata Sources to HDFS
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for creating complex tables using hive.
- Created partitioned tables in Hive for best performance and faster querying.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Performed extensive data analysis using Hive and Pig.
- Successfully loaded files to Hive and HDFS from Cassandra. Processed the source data to structured data and store in NoSQL database Cassandra. Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra. Worked in a language agnostic environment wif exposure to multiple web platforms such as AWS, databases like Cassandra.
- Responsible for managing data coming from different sources.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Successfully converted the AVRO data into PARQUET format in IMPALA for faster query processing.
- Realtime data streaming: Designed and developed solution for ESB pump realtime data ingestion using Kafka, Storm and HBase. dis involves Kafka and storm cluster design, installation and configuration
- Developed Kafka producer to produce 1 million messages per second
- Developed Kafka spout, HBase and enrichment bolts for dataingestion to HBase
- Configured Kafka and Storm cluster to handle the load and optimize to get desired throughput
Environment: Hadoop Framework, MapReduce, Hive, Sqoop, Pig, HBase, Flume, Oozie, Java(JDK1.6), UNIX Shell Scripting, Oracle 11g/12g, Windows NT, IBM Datastage 8.1, TOAD 9.6, Teradata, Kafka, Storm
Hadoop Developer
Confidential, Charlotte, NC
Responsibilities:
- Participated in development/implementation of Cloudera’s Hadoop environment.
- Installed Oozie workflow engine to run multiple map-reduce programs which run independently wif time and data.
- Performed Data scrubbing and processing wif Oozie.
- Responsible for managing data coming from different sources.
- Gained good experience wif NOSQL databases.
- Worked wif Flume to load the log data from multiple sources directly into HDFS.
- Maintained and Supported Map Reduce Programs running on the cluster.
- Involved in loading data from UNIX file system to HDFS.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume
- Developed Map Reduce code to transform unstructured data into a structured data which was later pushed into HBase from HDFS.
- Created Impala tables for faster operations. Designed ETL jobs to identify and remove duplicate records using sort and remove duplicate stage and Generated Keys for the Unique records using key Generator Stage.
- Integrated Hive and HBase for TEMPeffective usage and also performed MR Unit testing for the Map Reduce jobs.
- Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop and Pentaho kettle.
- Designed and Developed workflows by writing simple to complex Map Reduce jobs as per the requirement.
- Transportation of data to Hbase using pig.
- Involved increating Hive tables, loading wif data and writing Hive queries which will run internally in map reduce way.
- Analyzed the web log data using HiveQL to extract the number of unique visitors per day, page - views, visit duration and most popular service on website.
- Worked in installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop.
- Ability to create Hive Avro Tables to read and write the data wif the appropriate compressions.
- Good noledge onAgile Methodologyand the scrum process.
- Hands- on experience in developing web applications using Python on Linux and UNIX platform.
- Experience inAutomation Testing, Software Development Life Cycle (SDLC)using theWaterfall Modeland good understanding ofAgileMethodology.
- Maintaining quality of the project as per the standards
- Implemented the secure authentication for the Hadoop Cluster using Kerberos Authentication protocol.
- Successfully carried out a POC to extract and transform data TEMPeffectively and efficiently from Legacy Mainframe to open source wif the help of Syncsort DMX-h (ETL tool).
Environment: Apache Hadoop, Map Reduce, HDFS, Hive, Java, SQL, PIG, Zookeeper, Cassandra, Java (jdk1.6), Flat files, Oracle 11g/10g, MySQL, Windows NT, UNIX, Sqoop, Hive, Oozie, HBase.
Hadoop Developer
Confidential, Irvine, CA
Responsibilities:
- Experience indefining, designing and developing Java applications, specially using Hadoop Map/Reduce by leveraging frameworks such as Cascading and Hive.
- Experience in Architect and build Turn's multi-petabyte scale big data Hadoop infrastructure
- Developed workflow using Oozie for running Map Reduce jobs and Hive Queries.
- Worked on loading log data directly into HDFS using Flume.
- Involved in loading data from LINUX file system to HDFS.
- Responsible for managing data from multiple sources.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts
- Experience in working wif various kinds of data sources such as MongoDb Solar and Oracle.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar.
- Familiarity wif a NoSQL database such as MongoDb Solar.
- Experience in managing the CVS and migrating into Subversion.
- Experience in managing development time, bug tracking, project releases, development speed, release forecast, scheduling and many more. Using a custom framework of Nodes and MongoDb to take care of the back-end calls wif a lightning fast speed. Intensive Object-Oriented JavaScript, jQuery and plug-ins are used to work on dynamic user interface.
- Experience in Partner wif Hadoop developers in building best practices for Warehouse and Analytics environment.
- Successfully loaded files to Hive and HDFS from Mongo DB Solar.
- Extracted files from MySQL through Sqoop and placed in HDFS and processed.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Created Hbase tables to store various data formats of PII data coming from different portfolios.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Involved in loading data from LINUX file system to HDFS.
- Experience working on processing unstructured data using Pig and Hive.
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Maven, Hudson/Jenkins, Ubuntu, Linux Red Hat, Mongo DB Hive, Java (JDK 1.6), Hadoop Distribution Cloudera, MapReduce, PL/SQL, UNIX Shell Scripting.
Java Developer
Confidential, Hartford, CT
Responsibilities:
- Designed and developed Web Services usingJava/J2EE in WebLogic environment. Developed web pages usingJavaServlet, JSP, CSS,JavaScript, DHTML, and HTML. Added extensive Struts validation. Wrote Ant scripts to build and deploy the application.
- Involve in the Analysis, Design, and Development and Unit testing of business requirements.
- Developed business logic in JAVA/J2EE technology.
- Implemented business logic and generated WSDL for those web services using SOAP.
- Worked on Developing JSP pages
- Implemented Struts Framework.
- Developed Business Logic using Java/J2EE.
- Modified Stored Procedures in Oracle Database.
- Developed the application using Spring Web MVC framework.
- Worked wif Spring Configuration files to add new content to the website.
- Worked on the Spring DAO module and ORM using Hibernate. Used Hibernate Template and HibernateDaoSupport for Spring-Hibernate Communication.
- Configured Association Mappings such as one-one and one-many in Hibernate
- Worked wif JavaScript calls as the Search is triggered through JS calls when a Search key is entered in the Search window
- Worked on analyzing other Search engines to make use of best practices.
- Collaborated wif the Business team to fix defects.
- Worked on XML, XSL and XHTML files.
- Used Ivy for dependency management.
- As part of the team to develop and maintain an advanced search engine, would be able to attain expertise on a variety of new software technologies.
Environment: Java 1.6, J2EE, Eclipse SDK 3.3.2, Java Spring 3.x, JQuery, Oracle 10i, Hibernate, JPA, Json, Apache Ivy, SQL, stored procedures, Shell Scripting, JQuery, XML, HTML and JUnit, TFS, Ant, VisualStudio.