Hadoop Developer Resume
Columbia, MD
SUMMARY:
- Over all 8 years of experience in IT which includes 3+ years' experience using Apache Hadoop and experience using spark for analyzing the Big Data as per the requirement.
- In depth knowledge of understanding the Hadoop architecture and its components such as HDFS, Job tracker, Task tracker, Name Node, Data Node, Resource Manager, Node Manager, Map Reduce programs and YARN paradigm.
- Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Good Exposure on Apache Hadoop Map Reduce programming, Hive, PIG scripting and HDFS.
- Hands on experience in installing, configuring, monitoring and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works, Flume, Kafka, Oozie, Elastic search, Apache Spark, Impala, R, QlikView.
- Hands on experience in testing and implementation phase of all the big data Technologies.
- Strong experience in writing Map Reduce programs for Data Analysis. Hands on experience in writing custom partitions for Map Reduce.
- Experience in working with Cloudera Hadoop distribution.
- Refactoring of legacy code into Java 8 with implementation of Java 8 new features for code optimization. Lambda expressions, parallel operations.
- Experience in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
- Installed and configured Tableau Desktop to connect to the Hortonworks Hive Framework (Database) which contains the Bandwidth data form the locomotive through the Hortonworks ODBC connector for further analytics of the data.
- Good experience with Big Data and Hadoop ecosystem, Hadoop Distributed File System (HDFS), supported databases like Hive and Pig, installation of Hortonworks.
- Extensive experience in creating Tableau score cards, Tableau dashboards using stack bars, bar graphs and geographical maps.
- Experience in working on Agile and Rally tool is a plus.
- Responsible for remodelling the Existing business logic to new Netezza models.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map - Reduce, HIVE and analyze data using visualization/reporting tools.
- Implemented ETL Informatica designs and processes for loading data from the sources to target warehouse.
- Used Amazon Lambda for developing API to manage servers and run the code in AWS.
- Involved in Apache Flink alternate to Map reduce.
- Hands on experience in application development and database management using the technologies JAVA, RDBMS, Linux/Unix shell scripting and Linux internals.
- Good Knowledge on real time data feeding platform-KAFKA.
- Good working experience on Azure ARM templates, Operations Management Suit, PowerShell scripting and creating Websites.
- Perform backup and recovery of Netezza databases using nzbackup and nzrestore.
- Proficiency in utilizing ETL tool Informatics Power Centre 9.x/8.x for developing the Data warehouse loads with work experience focused in Data Acquisition and Data Integration.
- Experience in Integration software like Talend.
- Experience with Apache Nifiin Horton works dataflow.
- Solid understanding of open source monitoring tools: Apache Ambari, Cloudera Manager.
- Experience with ETL tools, like Informatica, Talend.
- Developed Scala programs to perform data scrubbing for unstructured data.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, Hbase and Cassandra.
- Good knowledge in creating Custom Serdesin Hive.
- Experience in developing customUDFs for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HQL HiveQL and Used UDFs from Piggybank UDF Repository.
- Having good knowledge on apache akka.
- Capable of building hive(hql), pig and map-reduce script and to adapt and learn new tools, techniques, and approaches.
- Very Good knowledge and Hands-on experience in Cassandra and Spark (YARN).
- Hands on Experience in Apache Ranger and Knox security tools.
- Familiar with akka and play frameworks.
- Evaluation of ETL and OLAP tools and recommend the most suitable solutions based on business needs.
- Designed and configured Azure Virtual Networks (VNets), subnets, Azure networks settings, DHCP address blocks, DNS settings, security policies and routing.
- Familiar in Core Java with strong understanding and working knowledge in Object Oriented Concepts like Collections, Multithreading, Data Structures, Algorithms, Exception Handling and Polymorphism as well Data Mining which includes Eclipse, Weka, R, Net beans.
- Good knowledge and experience in Core Java, JSP, Servlets, Multi-Threading, JDBC, HTML.
- Experience in working in 24X7 Support and used to meet deadlines, adaptable to ever changing priorities.
TECHNICAL SKILLS:
Frameworks: Spring, Hibernate, Struts.
Big Data Technologies: Hive, Map Reduce, Hdfs, Sqoop, R, Flume, Spark, Apache Kafka, Hbase, Pig, Elastic search, AWS, Oozie, Zookeeper, Apache hue, Apache Tez, YARN, Talend, Storm, Impala, Tableau and Qlikview.
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans, MySQL Workbench, Tableau.
Web/Application servers: Apache Tomcat, Web logic.
Web Technologies: HTML, CSS, JSP, Web Services, XML, JavaScript.
Scripting Languages: UNIX Shell scripting, SQL and PL/SQL, JavaScript, Shell Scripting.
Cluster Monitoring Tools: Apache Tomcat, Web logic.
Methods: Worked in most of the phases of Agile and Waterfall methodologies.
Webservices: AWS.
Languages: SQL, C, C++, Java, J2EE, Pig Latin, Hive, Scala, Java, Python, TSQL, Latin, HiveQL
IDEs: Eclipse, Net Beans, MS Office, Microsoft Visual Studio
Web Technologies: JDK 1.4/1.5/1.6 HTML, XML, DHTML, MSXML, ASPX, Eclipse.
Operating System: Windows Different distributions of Linux/Unix/Ubuntu.
Database System: SQL, MySQL, Hbase, MongoDB, Cassandra.
Domain Experience: Banking and financial services, Manufacturing.
PROFESSIONAL EXPERIENCE:
Confidential, Columbia, MD
Hadoop Developer
Responsibilities:
- Provided a solution using HIVE, SQOOP (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
- Created UDF's and Oozie workflows to Sqoop the data from source to HDFS and then to the target tables.
- Implemented custom Data types, Input Format, Record Reader, Output Format, Record Writer for MapReduce computations.
- Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs.
- Involved in Apache Ranger and Knox security tools.
- Lambda Architecture - Planned and help execute real-time streaming and analysis of sentiment for twitter data. Plugged into twitter API to follow certain keywords and sentiment was calculated.
- Implemented MVC architecture application using Spring and created Lambda function in AWS using Spring Framework.
- Wrote Lambda functions in python for AWS's Lambda, Kinesis and Elastic Search which invokes python scripts to perform various transformations and analytics on large data sets in AMAZON EMR clusters.
- Migrated applications from Java 1.7 to Java 1.8 using Lambdas and parallel streams.
- Accessed Hbase data with ApachePhoenix which is very easier than direct hbaseapi use.
- Developed the PigUDF's to pre-process the data for analysis.
- Experience in implementing the physical data model in Netezza Database.
- Developed a data flow to pull the data from REST API using ApacheNifi.
- Worked on Azure Blobs for storing document, media file, and cloud objects.
- Complete understand on delivering Microsoft Azure product with agile methodology.
- Used PigLatinscripts to extract the data from the output files, process it and load into HDFS.
- Worked on 100 node multi clusters on Horton works platform.
- Experience in NoSQL data stores Hbase, Accumulo, Cassandra and MongoDB.
- Extensively involved in entire QA Process and defect Management life cycle.
- Used Horton works 2.5 and 2.7 versions.
- Migrating various HiveUDF's and queries into SparkSQL for faster requests.
- Prepared custom shell scripts for connecting to Teradata and pulling the data from Teradata tables to HDFS.
- Used Kafka with combination of ApacheStorm, Hbase for real time analysis of streaming of data.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Used Hcatalog for reading and writing different file formats.
- Used to store the code in GIT repository.
- Experience in managing and reviewing Hadoop Log files using FLUME and Kafkaand also developed the Pig UDF's and Hive UDF's to pre-process the data for analysis.
- Used the Data formats like Avro, Parquet.
- Worked on Integration of Bigdata and cloud platforms Using Talend.
- Worked on implementing Flume to import streaming data logs and aggregating the data to HDFS through Flume.
- Involved in analyzing data coming from various sources and creating Meta-files and control files to ingest the data in to the DataLake.
- Involved in configuring batch job to perform ingestion of the source files in to the DataLake.
- Designed and implemented Hive and pig UDF's using python for evaluation, filtering, loading and storing of data.
- Worked with different teams to install operating system, Hadoop updates, patches, version upgrades of Hortonworks as required.
- Worked with testing team to finalize the test schedule and test cases.
- Interacted with Hortonworks to resolve Hive and Pig connectivity issues.
- Implemented Nagios and integrated with puppet for automatic monitoring of servers known to puppet.
- Used JUNIT for unit testing and continuum for integration testing.
- Used hive schema to create relations in pig using Hcatalog.
- Good Knowledge in AmazonAWS concepts like EMR and EC2 web services which provides fast and efficient processing of BigData.
- Developed Hive queries to process the data and generate the results in a tabular format.
- Handled importing of data from multiple data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
- Perform refreshes of Netezza development appliance using nzrestore.
- Involved in requirement analysis, ETL design and development for extracting data from the source systems like Sql server, Mainframe, DB2, Sybase, Oracle, flat files and loading into Netezza.
- Utilized Agile Scrum methodology.
- Written Hive queries for data analysis to meet the business requirements.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, SQL, Cloudera Manager, Sqoop, Zookeeper, Oozie Java, Eclipse, weka, R, Flume, Tableau, Apache Kafka, Horton works, Phoenix, Kerberos, Apache Talend, Python, Spark, putty, Lambda.
Confidential, Harrisburg, PAHadoop Developer
Responsibilities:
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like Hbase and Sqoop.
- Developed MapReduce programs to perform data filtering for unstructured data.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Impala.
- Successfully loaded files to hive and HDFS from MongoDB, Cassandra and Hbase.
- Worked on Classic and Yarn distributions of Hadoop like the ApacheHadoop 2.0.0, ClouderaCDH4 and CDH5.
- Created and altered HBase tables on top of data residing in Data Lake.
- Used ES-Hadoop native interface for index and query from elastic search.
- Created mappings using pushdown optimization to achieve good performance in loading data into Netezza.
- Used NodeJS code in AWS Lambda Functions.
- Experienced in AWS Elastic Beanstalk for app deployments and worked on AWS lambda with Amazon kinesis.
- Installed and configured Tableau Desktop on one of the three nodes to connect to the Hortonworks Hive Framework (Database) through the Hortonworks ODBC connector for further analytics of the cluster. install, configure, and troubleshoot Red Hat Enterprise Linux. Well-versed in AWS Elastic Beanstalk for application deployments and operated on AWS lambda with Amazon kinesis.
- Implemented search on HDFS using operational metadata stored in ElasticSearch.
- Developed scripts and batch jobs to schedule various Hadoop programs.
- Wrote Hive queries for data analysis to meet the business requirements.
- Used the Spark for the transformation of data in storage and fast processing of data.
- Used Amazon red shift to solve challenging problems that will revolutionize database computing in the cloud.
- Queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Utilized Ansible and AWS lambda, kinesis, elastic cache and cloudwatch logs to automate the creation of log aggregation pipeline with ElasticSearch, Logstash, Kibana stack (ELK stack) to send all of our team's logs coming into cloudwatch, to process them and send them off to ElasticSearch.
- Built a Full-Service Catalog System which has a full workflow using Elastic search, Log stash, Kibana, Kinesis, and Cloud Watch. Creating AWS S3 buckets and restricting access to buckets and directories to specific IAM users.
- Developed and maintained HiveQL, PigLatinScripts, Scala and MapReduce.
- Virtual Machine Backup and Recover from a Recovery Services Vault using Azure PowerShell and Portal.
- Experience with Core Distributed computing and Data Mining Library using ApacheSpark.
- Used tableau and Qlikview for data visualization and generating reports.
- Responsible for developing efficient MapReduce on AWS cloud programs like claim data to detect and separate fraudulent claims.
- Extensively worked in the performance tuning of the programs, ETL Procedures and processes.
- Worked on Integration of Big data and cloud platforms Using Talend.
- Created Databases on Azure SQL Server .
- Configured Azure SQL Server Firewall for Performance Tuning .
- Responsible for determining the bottlenecks and fixing the bottlenecks with performance tuning using Netezza Database.
- Responsible to manage data coming from different and multiple sources.
- Familiar with Hadoop data modeling and data mining, machine learning and advanced data processing. Experience optimizingETLworkflows.
- Experience in using Sqoop to migrate data to and fro from HDFS and My SQL and deployed Hive and HBase integration to perform OLAP operations on HBasedata.
- Creating Hbase tables for random read/writes by the mapreduce programs.
- Worked on CSV files while trying to get input from the MySQLdatabase and worked with CSV files in weka and Rcode.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, SQL, Cloudera Manager, Sqoop, Oozie Java, Eclipse, weka, R, Apache Kafka, Tableau, Horton works, Amazon Kinesis .
Confidential, Atlanta, GAHadoop Developer
Responsibilities:
- Involved in Design, Architecture and Installation of Big Data and Hadoop ecosystem components.
- Worked on analyzing, writing Hadoop Map reduce jobs using Java API, Pig and Hive.
- Involved in loading data from edge node to HDFS.
- Worked on installing cluster, commissioning & decommissioning of data node, name node high availability, capacity planning, and slots configuration.
- Created Custom Hive Queries as per Business requirements to perform analysis on Marketing and Sales Data.
- Designed documented and implemented ETL standards to be followed by developers to maintain consistency in the code and performed the code reviews in HA environment with multiple available nodes.
- Performed Complex Data set processing and Multi Dataset Operations with pig and hive.
- Used Indexing and Bucketing for improving the Hive query performance.
- Used Shell and python to automate daily jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA).
- Used to leverage the robust image-processing libraries written in C and C++.
- Performed joins, group by and other operations in MapReduce by using Java and PIG.
- Assisted in managing and reviewing Hadoop log files.
- Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured).
- Managed Hadoop clusters include adding and removing cluster nodes for maintenance and capacity needs.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Involved in writing Pig Scripts for Cleansing the data and implemented Hive tables for the processed data in tabular format.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Java, SQL, Cloudera Manager, Sqoop, Eclipse, weka, R, Apache Kafka, Storm, Web Services.
Confidential, RidgelandHadoop Engineer/ Java developer
Responsibilities:
- Hands on experience creating Hive tables and written Hive queries for data analysis to meet business requirements.
- Experience in Sqoop to import and export the data Mysql.
- Involved in processing of unstructured health care records using pig.
- Integrating Health Care entities including nursing, Hospitals.
- Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization, and Multithreading.
- Implemented Java Script for client-side validations.
- Designed and developed user interface static and dynamic web pages using JSP, HTML and CSS.
- Involved in generating screens and reports in JSP, Servlets, HTML, and JavaScript for the business users.
- Writing Hive queries for joining multiple tables based on business requirement.
- Used complex data types like bags, tuples and maps in Pig for handling data.
- Developed multiple MapReduce Jobs in java for data cleaning and pre-processing.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Experience in implementing data transformation and processing solutions (ETL) using Hive.
- Experience in creating Oozie workflow jobs for Map-reduce/Hive/Sqoop/actions.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Loading files to HDFS and writing hive queries to process required data.
- Loading data to hive tables and writing queries to process.
- Involved in loading data from LINUX file system to HDFS.
- Experience in developing Java MapReduce jobs.
- Good knowledge on No-SQL databases- HBASE.
- Proficient in adapting to the new Work Environment and Technologies.
- Experience in managing and reviewing Hadoop log files.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, HBase, Java/J2EE, SQL, Cloudera Manager, Sqoop, Eclipse, weka, R.
ConfidentialHadoop developer
Responsibilities:
- Installation, Configuration & Upgrade of Solaris and Linux operating system.
- Experience in Extraction, Transformation, and Loading (ETL) of data from multiple sources like Flat files, XML files, and Databases.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Involved in creating Hive tables and loading and analyzing data using hive queries.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Installed and configured Hive and also written Hive User Defined Functions.
- Load and transform large sets of structured, semi structured and unstructured data using MapReduce programming.
- Using Sqoop to import and export functionalities to handle large data set transfer between DB2 database and HDFS.
- Experience in writing Hive JOIN Queries.
- Using Flume to stream the data and loaded it into Hadoop cluster.
- Created MapReduce programs to process the data.
- Used Sqoop to move the structured data from MySql to HDFS, Hive, Pig and HBase.
- Used Pig predefined functions to convert the fixed width file to delimited file.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, LINUX, Core Java, Scala, MYSQL, Teradata.
ConfidentialJava Developer
Responsibilities:
- Competency in using XML Web Services by using SOAP to transfer data to supply chain and for domain expertise Monitoring Systems.
- Worked on Maven to build tool for building jar files.
- Used the Hibernate framework (ORM) to interact with the database.
- Knowledge in struts tiles framework for layout management.
- Worked on design, analysis, and development and testing various phases of the application.
- Develop named HQL queries and Criteria for use in application.
- Developed user interface using JSP and HTML.
- Used JDBC for the Database connectivity.
- Involved in projects utilizing Java, Java EE web applications in the creation of fully-integrated client management systems.
- Consistently met deadlines as well as requirements for all production work orders.
- Executed SQL statements for searching contactors depending on Criteria.
- Development and integration of the application using Eclipse IDE.
- Developed Junit for server side code.
- Involved in multi-tiered J2EE design utilizing spring (IOC) architecture and Hibernate.
- Involved in the development of front end screens using technologies like JSP, HTML, AJAX and JavaScript.
- Configured spring managed beans.
- Spring Security API is used for configured security.
- Investigated, debug and fixed the potential bugs in the implementation code.
Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse.