Hadoop Developer Resume
New York, NY
SUMMARY
- Having 8+ Years of professional experience in Software Development with Linux and Hadoop/Big Data technologies.
- experience with Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, Flume, Sqoop, impala, ZooKeeper, Hue, Oozie and HBase.
- Experience implementing big data projects using Cloudera.
- Installed, Configured and Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Hands - on experience in designing and implementing solutions using Apache Hadoop 2.4.0, HDFS 2.7, MapReduce2, Hbase 1.1, Hive 1.2, Oozie 4.2.0, Tez 0.7.0,Yarn 2.7.0,Sqoop 1.4.6,MongoDB.
- Experience in implementing the Kafka Producers and consumer groups to read the messages from various partitions parallely.
- Setting up and integrating Hadoop eco system tools - Hbase, Hive, Pig, Sqoop etc.
- Hands on experience loading the data into Spark RDD and performing in-memory data computation
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Storm, Spark, Kafka and Flume.
- Strong understanding of Data Modeling and experience with Data Cleansing, Data Profiling and Data analysis.
- Configured Hadoop clusters in OpenStack and Amazon Web Services (AWS)
- Experience in ETL (Data stage) analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of databases.
- Experience in extracting source data from Sequential files, XML files, Excel files, transforming and loading it into the target data warehouse.
- Strong experience with Java/J2EE technologies such as Core Java, JDBC, JSP, JSTL, HTML, JavaScript, JSON
- Experience in deploying and managing the multi-node development and production Hadoop cluster with different Hadoop components (HIVE, PIG, SQOOP, OOZIE, FLUME, HCATALOG, HBASE, ZOOKEEPER) using Horton works Ambari.
- Gaining optimum performance with data compression, region splits and by manually managing compaction in Hbase.
- Upgrading from HDP 2.1 to HPD 2.2 and then to HDP 2.3.
- Working experience in Map Reduce programming model and Hadoop Distributed File System.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Hands on experience on Unix/Linux environments, which included software installations/ upgrades, shell scripting for job automation and other maintenance activities.
- Thorough knowledge and experience in SQL and PL/SQL concepts.
- Expertise in setting up standards and processes for Hadoop based application design and implementation.
TECHNICAL SKILLS
Hadoop ECO Systems: Spark-core, Kafka, Spark- SQL, HDFS, YARN, Sqoop, PIG, Hive, Oozie, Flume, Map Reduce, Storm
Development And Building Tools: Eclipse, Net Beans, IntelliJ, ANT, Maven, IVY, TOAD, SQL Developer
Data Bases: HBase, Cassandra, Oracle, SQL Server 2008 R2/2012, My SQL,ODI
Languages: Languages Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, PL/SQL,Scala,Python
Operating Systems: Windows Server 2000/2003/2008, Windows XP/Vista, Mac OS, UNIX, LINUX
Java Technologies: Spring 3.0, Struts 2.2.1, Hibernate 3.0, Spring-WS, Apache Kafka
Frameworks: JUnit and Jest
IDE’s & Utilities: Eclipse, Maven, NetBeans.
SQL Server Tools: SQL Server Management Studio, Enterprise Manager, Query Analyser, Profiler, Export & Import (DTS).
WebDev. Technologies: ASP.NET, HTML,HTML5, XML,CSS3, JavaScript/JQuery
PROFESSIONAL EXPERIENCE
Confidential, New York, NY
Hadoop Developer
Responsibilities:
- Developed ETL data pipelines using Spark, Spark streaming and Scala.
- Loaded data from RDBMS to Hadoop using Sqoop
- Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
- Responsible for loading Data pipelines from web servers using Sqoop, Kafka and Spark Streaming API
- Developed the Kafka producers, partitions in brokers and consumer groups
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework .
- Implemented Spark using Scala andSparkSQLfor faster testing and processing of data.
- Data Processing: Processed data using Map Reduce and Yarn. Worked on Kafka as a proof of concept for log processing.
- Monitoring the hive meta store and the cluster nodes with the help of Hue.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Created AWS EC2 instances and used JIT servers.
- Data Integrity checks have been handledusing hive queries,HadoopandSpark
- Worked on performing transformations & actions on RDDs and Spark Streaming data with Scala
- Implemented the Machine learning algorithms using Spark with Python
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Responsible in handling Streaming data from web server console logs
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Worked on developing ETL processes (Data Stage Open Studio) to load data from multiple data sources to HDFS using FLUME and SQOOP, and performed structural modifications using Map Reduce, HIVE.
- Involved in NoSQL database design, integration and implementation
- Loaded data into NoSQL database HBase.
- Developed Kafka producer and consumers,HBaseclients,SparkandHadoopMapReduce jobs along with components on HDFS, Hive.
- Very good understanding ofPartitions,Bucketingconcepts in Hive and designed both Managed and External tables inHiveto optimize performance
Environment: Spark, Spark Streaming, Apache Kafka, Hive, AWS, ETL, PIG, UNIX, Linux, Tableau, Teradata, Pig, Sqoop, Hue, Oozie, Java, Scala, Python, GIT
Confidential - Eden Prairie, MN
Hadoop (Big Data) Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase database and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented nine nodes CDH3 Hadoop cluster on CentOS
- Implemented Apache Crunch library on top of map reduce and spark for data aggregation.
- Involved in loading data from LINUX file system to HDFS.
- Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Implemented a script to transmit suspiring information from Oracle to HBase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Applied design patterns and OO design conceptsto improve the existingJava/J2EEbased code base.
- DevelopedJAX-WSweb services
- Handling Type 2 and type 1 slowly changing dimensions.
- Importing and exporting data into HDFS from database and vice versa using Sqoop
- Written hive jobs to parse the logs and structure them in tabular format to facilitate effective querying
- Involved in the design, implementation, and maintenance of Data warehouses
- Involved in creating Hive tables, loading with data and writing Hive queries
- Implemented custom interceptors for flume to filter data as per requirement.
- Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
- Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
- Wrote Pig Latin scripts for running advanced analytics on the data collected.
- Configured daily workflow for extraction, processing and analysis of data using Oozie Scheduler.
- Proactively involved in ongoing maintenance, support and improvements in Hadoopcluster.
- Wrote Pig Latin scripts for running advanced analytics on the data collected.
- Implemented custom interceptors for flume to filter data as per requirement.
- Used Hive and Pig to analyze data in HDFS to identify issues and behavioral patterns.
- Created internal and external Hive tables and defined static and dynamic partitions for optimized performance.
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, CDH3, CentOS,Sqoop, Oozie, UNIX, T-SQL
Confidential, San Mateo, CA
Hadoop Developer
Responsibilities:
- suggestions on converting to Hadoop using MapReduce, Hive, Sqoop, Flume and Pig Latin
- Experience in writingSpark applications for Data validation, cleansing, transformations and custom aggregations.
- Imported data from different sources into Spark RDD for processing.
- Developed custom aggregate functions usingSparkSQLand performed interactive querying.
- Worked oninstalling cluster, commissioning & decommissioning ofDatanode, Namenodehigh availability, capacity planning, and slots configuration.
- Responsible for managing data coming from different sources.
- Imported and exported data into HDFS using Flume.
- Experienced in analyzing data with Hive and Pig.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Setup Hadoop cluster on Amazon EC2 using whirr for POC.
- Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie, Flume, and Kafka.
- Experienced in managing and reviewing Hadoop log files.
- Helped with Big Data technologies for integration of Hive with HBASE and Sqoop with HBase.
- Analyzed data with Hive, Pig and Hadoop Streaming.
- Involved in transforming therelational databseto legacy labels to HDFS, andHBASEtables usingSqoopand vice versa.
- Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
- Moved the data from traditional databases like MySQL, MS SQL Server and Oracle into Hadoop
- Worked on Integrating Talend and SSIS with Hadoop and performed ETL operations.
- Installed Hive, Pig, Flume, Sqoop and Oozie on the Hadoop cluster.
- Used Flume to collect, aggregate and push log data from different log servers
Environment: Hadoop, Hortonworks, Linux, HDFS, Hive, Sqoop, Flume, Zookeeper and HBase
Confidential
Java Developer
Responsibilities:
- Developed the business logic using Java Beans and Session Beans.
- Developed system to access to legacy system database (JDBC).
- Implemented validation rules using Struts framework
- Developed user interface using JSP, HTML, Velocity template
- Persistence Layer operations are encapsulated in a Data Access Objects (DAO) and used Hibernate for data retrieval from the database.
- Developed Web services component usingXML, WSDL, andSOAPwithDOMparser to transfer and transform data between applications.
- Exposed various capabilities as Web Services usingSOAP/WSDL.
- Used SOAPUIfor testing theRestful Webservicesby sending anSOAPrequest.
- Used AJAX framework for server communication and seamless user experience.
- Created test framework onSeleniumand executed Web testing inChrome, IEandMozillathroughWeb driver.
- Used client side Java scripting:JQUERYfor designingTABSandDIALOGBOX.
- CreatedUNIXshell scripts to automate the build process, to perform regular jobs like file transfers between different hosts.
- Design, Build, Test, and Deploy enhanced web services.
- Involved in system design, coding, testing, installation, documentation and post-deployment audits, all performed in accordance with the established standards.
- Developed RESTful Web Service using Spring and Apache CXF
- Created Java Servlets and other classes deployed as EAR file, connecting to Oracle database using JDBC.
Environment: Hibernate, MVC, JavaScript, CSS, Maven, Java 1.6, XML, Junit, SQL, PL-SQL, Eclipse, Web Sphere
Confidential
Java/J2EE Consultant
Responsibilities:
- Modified application flows and the existing UML diagrams.
- Involved in Change Request Technical solution document, and implementation plan.
- Followed MVC architecture using Struts.
- Worked on Struts Framework and developed action and form classes for User interface.
- Mapping of event class, HTML file and JavaBean Class using Xml
- Used J2EE design patterns like Singleton, DAO and DTO.
- Developed UI usingHTML,JavaScript, andJSP, and developed Business Logic and Interfacing components using Business Objects,XML, andJDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity usingJDBCfor querying/inserting & data management including triggers and stored procedures.
- Developed various EJBs for handling business logic and data manipulations from database.
- Involved in design ofJSP’sandServletsfor navigation among the modules.
- Designed cascading style sheets andXMLpart of Order entry Module & Product Search Module and did client side validations with java script.
- Developed client customized interfaces for various clients using CSS and JavaScript
- Performing the code review for peers and maintaining of the code repositories using GIT
- Enhanced the mechanism of logging and tracing with Log4j.
- Web services client generation using WSDL file.
- Involved in development of presentation layer using STRUTS and custom tag libraries.
- Performing integration testing, supporting the project, tracking the Confidential with help of JIRA
- Acted as the first point of contact for the Business queries during development and testing phase
- Working closely with clients and QA team to resolve critical issues/bugs
Environment: EcommCore, JavaScript, CSS, IVY, Java 1.6, YUI 2.8, Web Services, XML, XML Parsers SAX/ JAXB, Junit, DAO/DTO, Blue zone, Eclipse, Apache Tomcat, GIT, Jenkins, Arthur, GIT