Hadoop Developer Resume CO - Hire IT People

SUMMARY

8+ years of experience in IT Industry and 5 Years of experience in Big Data Analytics, Hadoop, Java, Database Administration and Software development.
Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
Involved in designing and deploying a multitude application, utilizing the entire AWS stack (Including EC2, RDS, VPC, IAM) focusing on high - availability, fault tolerance and auto-scaling.
Experienced in real-time Big Data solutions using HBase, handling billions of records.
Hands on experience in Hadoop Framework and its ecosystem including HDFS Architecture, MapReduce Programming, Hive, Pig, Sqoop, HBase, Zookeeper, Couchbase, Storm, Solr, Oozie, Spark, Scala, Flume, Strom, and Kafka.
Skilled in developing applications in Python language for multiple platforms.
Experience in importing and exporting data into HDFS and Hive using Sqoop.
Excellent knowledge onHadoopArchitecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
Experience in analyzing data using HIVEQL and Pig Latin and custom Map Reduce programs in Java and Scala.
Integrated different data sources, data wrangling: cleaning, transforming, merging, and reshaping data sets by writing Python scripts.
Hands on experience in installing, configuring Cloudera's ApacheHadoop ecosystem components like: Flume-ng, HBase, ZooKeeper, Oozie, Hive, Spark, Storm, Sqoop, Kafka, Hue, Pig, Hue with CDH3&4 Clusters
Architected, Designed, and maintained high performing ELT/ETL Processes.
Experienced in loading data to Hive partitions and creating buckets in Hive
Skilled in managing and reviewing Hadoop log files.
Experienced in configuring Flume to stream data into HDFS.
Familiarity with distributed coordination system Zookeeper.
Good knowledge on building Apache spark applications using Scala.
Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark with Hive and SQL/Teradata.
Potential experience in (SDLC) Analysis, Design, Development, Integration and Testing in diversified areas of Client-Server/Enterprise applications using Java, J2EE technologies.
Experienced in Administration, installing, upgrading, and managing distributions of Cassandra.
Excellent understanding of relational databases such as MySQL, Oracle, and NoSQL databases (HBase, Mongo DB, Couchbase and Cassandra).
Good work experience on JAVA, JDBC, Servlets, JSP.
Proficient in Java, J2EE, JDBC, Collections, Servlets, JSP, Struts, Spring, Hibernate, JAXB, JSON, XML, XSLT, XSD, JMS, WSDL, WADL, REST, SOAP Web services, CXF, Groovy, Grails, Jersey, Gradle and Eclipse Link.
Good knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.

TECHNICAL SKILLS

Big Data Eco System: Hadoop 2.1, HDFS, MapReduce, PIG 0.8, Hive0.13, HBase 0.94, Sqoop 1.4.4, Zookeeper 3.4.5, Storm, Yarn, Spark Streaming, Spark SQL, Kafka, Scala, Cloudera CDH3, CDH4, Hortonworks, Oozie, Flume, Impala, Talend, Tableau/QlikView

Hadoop management & Security: Hortonworks Ambari, Cloudera Manager, Kafka

NoSQL Databases: MongoDB, HBase, Redis, Couchbase and Cassandra

Web Technologies: DHTML, HTML, XHTML, XML, XSL (XSLT, XPATH), XSD, CSS, JavaScript, Servlets, SOAP, Amazon AWS

Server-Side Scripting: UNIX Shell Scripting

Database: Oracle 11g/10g/9i/8i, MS SQL Server 2012/2008, DB2 v8.1, MySQL, Teradata

Programming Languages: Java, J2EE, JSTL, JDBC 3.0/2.1, JSP 1.2/1.1

Scripting Languages: Python, Perl, Shell Scripting, JavaScript, Scala

OS/Platforms: Windows7/2008/Vista/2003/XP/2000/NT, Macintosh, Linux (All major distributions, mainly Centos and Ubuntu), Unix

Client side: JavaScript, CSS, HTML, jQuery

Build tools: Maven and ANT

Methodologies: Agile, UML, Design Patterns, SDLC

Tools: FileZilla, Putty, MySQL Workbench, ETL, DWH, JUnit, SQL Oracle Developer

Office Tools: MS Office - Excel, Word, PowerPoint

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer

Responsibilities:

Ensure performance and reliability of data processes
Define and implement data stores based on system requirements and consumer requirements
Document and test data processes including performance of through data validation and verification
Collaborate with cross functional team to resolve data quality and operational issues and ensure timely delivery of products
Used Flume to collect, aggregate, and store dynamic web log data from different sources like web servers, mobile devices and pushed to HDFS.
Stored and fast update data in HBase, provided key based access to specific data.
Installed, configured, monitored, and maintained Hadoop cluster on Big Data platform.
Configured Zookeeper, worked on Hadoop High Availability with Zookeeper failover controller, add support for scalable, fault-tolerant data solution.
Develop SQL queries to extract data for analysis and model construction.
Wrote multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
Used Pig UDFs to do data manipulation, transformations, joins and some pre-aggregations.
Created multiple Hive tables, implemented partitioning, dynamic partitioning, and buckets in Hive for efficient data access.
Extracted files from Cassandra and MongoDB through Sqoop and placed in HDFS and processed.
Configured Spark to optimize data process.
Applied MLlib to build statistical model to classify and predict.
Worked on Oozie workflow engine for job scheduling.
Created HDFS Snapshots to do data backup, protection against user errors and disaster recovery.

Environment: Hadoop 2.4.x, HDFS, MapReduce 2.4.0, YARN 2.6.2, Pig 0.14.0, Hive 0.13.0, HBase 0.94.0, Sqoop 1.99.2, Flume 1.5.0, Oozie 4.0.0, Zookeeper 3.4.2, Cassandra, MongoDB, Spark1.1.1, Kafka 0.8.1.

Confidential, CO

Hadoop Developer

Responsibilities:

Hadoop development and implementation (Environment - HDFS, HBase, Spark, Kafka, Ozie, Scoop, Flume, Kerberos, Oracle ASO, MySQL)
Used Oozie scheduler to submit workflows.
Extracted BSON files from MongoDB and placed in HDFS and processed.
Loading from disparate data sets using Hadoop stack of ingestion and workflow tool.
Pre-processing using Hive and Pig.
Responsible for using Oozie to control workflow.
Involved in loading data from Teradata, Oracle database into HDFS using Sqoop queries.
Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Worked on shell scripting in Linux and the Cluster. Used shell scripts to run hive queries from beeline.
Developed Scripts and automated data management from end to end and sync up between all the clusters.
Worked with Hue GUI in scheduling jobs with ease and File browsing, Job browsing, Metastore management.
Used Flume to collect the log data from different resources and transfer the data type to Hive tables using different SerDes to store in JSON, XML and Sequence file formats.
Loaded files to Hive and HDFS from MongoDB Solr.
Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews, test development, test automation.
Used Zookeeper for providing coordinating services to the cluster.

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Spark, Sqoop, Kafka, Oozie, and Big Data, Python, Apache Java (jdk1.6), Data tax, Flat files, MySQL, Windows NT, LINUX, AVRO Files, SQL, ETL, DWH, Cloudera Manager, Talend, Scala, MongoDB.

Confidential

Business Analyst/Big Data Analyst

Responsibilities:

Gather Requirements dat are to be incorporated into the system.
Involved in Data analysis for data conversion - included data mapping from source to target database schemas.
Responsible for preparing the use cases and for designing and developing object models, class diagrams, use case and activity diagrams with UML Specifications.
Developed business flow diagrams, Activity/State diagrams and Sequence diagrams using MS Visio so dat developers and other stakeholders can understand the business process.
Importing and exporting data in HDFS and Hive using Sqoop.
Used Oozie scheduler to submit workflows.
Extracted BSON files from MongoDB and placed in HDFS and processed.
Written Hive UDFs to extract data from staging tables.
Worked on analyzing Hadoop cluster and different big data analytic tools such as HiveQL.
Hands on writing MapReduce code to make unstructured data as structured data and for inserting data into.
Experience in creating integration between Hive and HBase.
Involved in creating Hive tables, loading with data.
Designed and developed MapReduce jobs to process data coming in BSON format.
Worked on the POC to bring data to HDFS and Hive.
Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all the jobs.
Review QA test cases with the QA team.

Environment: Hadoop 1.2.1, Java JDK 1.6, MapReduce 1.x, HBase 0.70, MySQL, MongoDB, Oozie 3.x

Confidential

Business Analyst

Responsibilities:

Perform analysis of business issues, provide recommendations for possible solutions, work with Business users and IT project teams to drive decision making and define requirements for application development, ensuring dat business needs are being met.
Conducted Scope Analysis, Stakeholder Analysis, GAP Analysis, and Traceability Matrix, and used MS Visio for Business Process Modeling adopting UML standards.
Responsible for writing every possible scenario called Product Backlog Indexes/Items (PBI’s) in the company standard Gherkin language and then explaining and estimating the same with Developers and the QA team.
Produced and presented various Use Cases, Sequence Diagrams, Flow Charts, and Activity Diagrams.
Created Screen Mockups and wireframes using Axure RP tool.
Responsible for organizing and conducting requirements elicitation meetings with various businesses and documenting them in JIRA. Collected functional requirements from clients for technology enhancement and initiatives.
Performed Gap Analysis by doing an in-depth study of the current process, planned implementation actions and Change Control Process.
Facilitated Agile Daily Standup Meetings and Product Backlog sprint meetings.
Responsible to Verify Back End Data using SQL queries with distinct batch processes and front end .Net applications and logic validations.
Responsible to advice Project Managers to glide through risk and Anticipate business needs and solutions in advance.
Tracked and triaged defects provided the severity to delivery leads and Release Management to have them fixed in a release.
Worked with Product Owner to review the working demo of Potentially Shippable Product Increment (PSPI) in review meeting and halped deciding which features are done and satisfies the acceptance criteria.
Reviewed existing stored procedures, supported application developers by tuning SQL queries for better performance.
Maintained project milestones, schedules, and project progress using MS Project.
Collaborated with the Testers for designing the Scenario, writing and execution of test cases.
Conducted User Acceptance Testing and verified performance, reliability, and fault.

Environment: MS Visio, JIRA, Microsoft Word, Excel, Power Point, Axure RP, Microsoft Project.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

CO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship