Hadoop Developer Resume
Bentonville, AR
EXPERIENCE SUMMARY:
- 6+ years of total IT experience.
- 3.5+ Years of Hands On experience on Hadoop Ecosystem and Big Data components including Apache Spark, Python 2.7, Shell Scripting, HDFS, YARN, Sqoop, Hive, Map Reduce, KAFKA and Hbase.
- Excellent experience in Oracle SQL, PL SQL, Data Modeling in Data Mart and Data Base creation projects and Migration projects.
- Actively worked in the Data Warehouse environments.
- Excellent experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (H DFS), MapReduce, HIVE, HBASE, Sqoop.
- Created tool to calculate data statistics using Apache Spark 2.0. Tool read Parquet file and generated output in the Json format.
- Performance tuning of Data profiler tool created using Spark 2.0 by tuning configuration parameters.
- Experience with Cloudera distribution (5.6, 5.7, 5.8), Hortonworks platform.
- Developed analytical components using Spark and Spark Streaming .
- Hands on experience on using Spark RDD, Data frames, Datasets, Spark SQL, and Pair RDD .
- Experience on Structured Streaming in Apache Spark and near real - time streaming using Kafka .
- Hands on experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
- Hands on experience in Benchmarking & Performance Tuning of Hive queries using Partitions, Bucketing and Map Side joins .
- Extensive experienced in working with structured data using Hive QL , join operations, writing custom UDF's and experienced in optimizing Hive Queries .
- Worked as Data Modeler for Data Mart Design in the Data Warehousing environment.
- Worked as Data Modeler for database creation for data migration .
- Expertise in working with Apache Hive data warehouse infrastructure-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Strong experience in analysing large amount of data using Hive QL.
- Larger sized Batch and Stream processing using Apache Spark.
- Handled large table partitioning in Hive, optimized query performance using TEZ, Vectorization, and Bucketing.
- Hands on experience in setting up workflow using Apache OOZIE workflow engine for managing and scheduling Hadoop jobs. Other job schedulers used such as Autosys, Cron.
- Build real time data solutions using HBASE handling huge data volume.
- Expertise in handling File Formats Sequence Files, RC, ORC, Text/CSV, Avro, Parquet and analyzed using HiveQL .
- Used Cron job scheduler along with Autosys to schedule and monitor Spark jobs.
- Clickstream log data ingestion from multiple sources using Kafka by adding transformation on it to HDFS and HBASE .
- Recursively copied data from S3 buckets.
- Experienced with Java API and REST to access HBase data.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns.
- Good understanding of XML methodologies (XML, XSL, XSD) including Web Services and SOAP.
- Experience with build tools such as Maven using Eclipse .
- Hands on Experience in UNIX Shell Scripting .
- Expertise in writing efficient SQL, PL SQL Procedures, Functions, Packages, Triggers, Collections, Advance Pl/SQL, Dynamic SQL, Analytical functions in oracle and Performance Tuning .
- Experience in Effort Estimation, Scheduling, Project planning, execution, Management and closure.
- Strong leadership, conflict resolution, communication and facilitation skills.
- Strong process orientation and client interaction capabilities.
- Extensive exposure to all stages of software development having worked in Waterfall, Agile model.
- Sound understanding of continuous integration & continuous deployment environments.
- Strong exposure to Data Management, Governance and Controls functions.
TECHNOLOGY EXPERIENCE:
Technology: Apache Spark, Hadoop Ecosystem, Oracle, Teradata, Visual Basic 6.0
Programming Languages: PL SQL oracle, DB2, Python, Java, C,C++, JSE, XML, JSP/Servlets,, HTML..
Hadoop Platforms: Cloudera (5.6, 5.7, 5.8)
Databases: Oracle, DB2, Teradata, AWS RDS, Postgres
Big Data Ecosystem: Spark, HDFS, Map Reducing, Hive, Sqoop, Kafka, Hbase, Python Pandas.
Cluster Management Tools: Cloudera Manager, Hadoop Security Tools, Hortonworks
Scripting Language: HTML, XML, Python,Shell
Operating Systems: Windows Vista/XP/NT/2000, UNIX, Mac OS
Methodologies: Agile, Waterfall, Lean, Edge
Software Applications: Application Software, -E-Commerce Software, Database Systems, Web Portal Software, Data Warehousing
Version Control Tools: GitHub, VSS, Perforce, SVN,TFS
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, Bentonville, AR
Responsibilities:
- Handled the importing of data from various data sources like TERADATA , ORACLE , DB2 , SQL SERVER and GREENPLUM using Datastage, SQOOP and TPT, performed transformation using Hive and loaded the data into HDFS.
- Handled the exporting of the data from HDFS back to various Databases like TERADATA and Green Plum using the SQOOP and TPT.
- Involved in creating Hive tables and written multiple Hive queries to load the hive tables for analyzing the market data coming from distinct sources.
- Participate in requirement gathering and documenting the business requirements by conducting workshops/meetings with various business users.
- Involved in preparing sprint planning (Agile methodology) for each implementation task.
- Created extensive SQL queries for data extraction to test the data against the various databases like ORACLE, TERADATA and DB2 .
- Involved in preparing the design flow for the Datastage objects to pull the data from various upstream applications and do the required transformations and load the data into various downstream applications.
- Collaborate with Business Analysts to clarify application requirements.
- Follow procedures and standards set by the project.
- Perform structured application code reviews and walkthroughs.
- Identify the technical cause and potential impact of errors and implement coding or configuration changes.
- Create/Update documentation for the application as and when code changes are applied.
- Participating in pre and post implementation support activities.
Environment: UNIX Scripting,Python, Hadoop , HDFS, Hive and SQOOP, ORACLE, TERADATA, DB2 and CA7
Hadoop Developer
Confidential, Conway, AR
Responsibilities:
- Installed multi cluster nodes on Cloudera platform with the help of Admin.
- Data was Ingested which is received from various database providers using Sqoop onto HDFS for analysis and data processing.
- Ingested the data from various file system to HDFS using Unix command line utilities.
- Worked with Hive, HBase, NoSQL database HBASE and Sqoop, for analyzing the Hadoop cluster as well as big data.
- Importing and exporting data job's, to perform operations like copying data from HDFS and to HDFS using Sqoop.
- Defined job flows on EC2 server, load and transform large sets of structured, semi - structured and unstructured data
- Implemented the NoSQL databases like Casandra and later HBase, the management of the other tools and process observed running on YARN.
- Wrote and Implemented Apache PIG scripts to load data from and to store data into Hive.
- Wrote Hive UDFS to extract data from staging tables and analyzed the web log data using the Hive QL.
- Involved in creating Hive tables, loading data and writing hive queries, which runs map reduce in backend and further Partitioning and Bucketing was done when required.
- Used UML for the dataflow design for testing and filtering data.
- Used Zookeeper for various types of centralized configurations.
- Tested the data coming from the source before processing.
- Tested and solved the critical problem faced into project.
- Managed Hadoop log files using Kafka.
- Designed Oozie jobs for the auto processing of similar data.
- Provided design recommendations and thought leadership to stakeholders that improved review processes and resolved technical problems.
- Debugged the technical issues and errors was resolved.
Environment: Java 8, Eclipse, Hadoop, Hive, HBase, Cassandra, Linux, Map Reduce, HDFS, Oozie, Shell Scripting, MySQL.
Confidential
Java Developer
Responsibilities:
- As a developer, most of the work have done using core-java, algorithms for the problem-solving.
- Involved in design, development, Object Oriented Analysis and testing the application.
- Used Cloud Front to deliver data from AWS edge locations to users, allowing for further reduction of load on front-end servers.
- Developed Action Servlet, Action Form, Action Class, Java Beans classes using Struts Framework.
- Used JavaScript for client-side validations in the JSP and HTML pages.
- Enhance the debugging and trouble-shooting skills.
- Used IBM RAD7 as IDE tool to develop the application and JIRA for bug and issue tracking
- Used Subversion for software configuration management and version control.
- Involved in the team of 8 people and delivery the tasks and monitor the team progress through JIIRA.
- Mainly involved in developing applications using Java and J2EE using mostly Factory, Singleton, and Prototype patterns for the solutions.
- Used SOA (Spring WS) for implementing third party Services.
- Created servlets for redirecting to the proper JSP's in the application as apart in the MVC.
- Deployed the application on the Web Sphere Application Server.
- Prepared manual test cases for test the application against requirements and specifications.
- Conducted UAT testing for the Time collection Software with our team during the release.
Environment: Java, JSP, HTML, CSS, XML, Subversion, AWS, Servlets, EJB, Maven, WebSphere Application Server 6.1, Web services, JIRA, Junit, RAD7.