Hadoop Developer Resume
Bentonville, AR
PROFESSIONAL SUMMARY:
- 6+ years of professional IT experience with Over 4 Years of Hadoop/Spark experience in ingestion, storage, querying, processing and analysis of big data.
- Proficient in Installation, Configuration and migrating and upgrading of data from Hadoop MapReduce, HIVE, HDFS, HBase, Sqoop, Pig, Cloudera, YARN.
- Exposure to design and development of database driven systems.
- Good knowledge of Hadoop architectural components like Hadoop Distributed File System, Name Node, Data Node, Task Tracker, Job Tracker, and Map Reduce programming.
- Experience in developing and deploying of applications using Hadoop based components like Hadoop Map Reduce(MR1), YARN (MR2), HDFS, Hive, Pig, HBase, Flume, Sqoop, Spark (Streaming, Spark SQL, Spark ML), Storm, Kafka, Oozie, ZooKeeper and Avro.
- Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
- Exposure on Big Data technologies and Hadoop eco system, in depth understanding of Map Reduce and Hadoop infrastructure.
- Experience in writing MapReduce jobs using Python, Pig, Hive for data processing.
- Hands on experience in importing and exporting data into HDFS and Hive using Sqoop.
- Exposure on usage of NoSQL databases column oriented HBase and Cassandra.
- Extensive experienced in working with structured, semi - structured, and unstructured data by implementing complex map reduce programs using design patterns.
- Excellent knowledge of multiple platforms such as Cloudera, Hortonworks, MapR etc.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
- Hands on experience in major Big Data components Apache Kafka, Apache spark, Zookeeper, Avro.
- Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
- Experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Kafka, Flume, Map reduce, Hive etc.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR etc) to fully implement and leverage new Hadoop features.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
- Experienced in involving complete SDLC life cycle includes requirements gathering, design, development, testing and production environments.
TECHNICAL SKILLS:
Hadoop/Big Data: HDFS, Map reduce, HBase, Pig, Hive, Sqoop, Flume, Spark, Kafka, Apache, Zookeeper, Cloudera Manager
Data warehousing: ETL, Informatica Power Exchange, Metadata, Data Mining, SQL, OLAP, OLTP, Workflow manager and workflow monitor.
Real Time/ Streaming Process: Apache Strom, Apache spark
Programming languages & Scripting: Python, Java, Linux shell scripts
Databases: MS: SQL Server, Hadoop, Oracle
NoSQL Databases: HBase, Cassandra, MongoDB
Web Servers: Apache Tomcat, AWS
Web Technologies: HTML, XML, JavaScript
Monitoring Tools: Puppet, Chef, Ganglia
Operating Systems: Linux, Unix, Windows 10, 8, 7, Windows Server 2008/2003
PROFESSIONAL EXPERIENCE:
Confidential, Bentonville, AR
Hadoop Developer
Responsibilities:
- Using Sqoop jobs for importing and exporting data into HDFS and Hive.
- Used Fair scheduling to allocate resources in yarn.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading with data ad writing Hive queries.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked on CICD pipeline, integrating code changes to Git repository and build using Jenkins.
- Read the ORC files and create Data frames to use in spark.
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Spark, Oozie, Jenkins, Yarn, CICD.
Confidential
Big Data Developer
Responsibilities:
- Involved in installing Hadoop Ecosystem components (Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Flume, Zookeeper and HBase).
- Installed and configured three node cluster in Full Distributed mode and pseudo Distributed mode.
- Imported and exported data (MySQL, CSV and text file) from local/ external file system and MySQL to HDFS on a regular basis.
- Worked with structured, semi-structured and unstructured data which is automated in the tool Big Bench.
- Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
- Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
- Worked with Spark to create structured data from the pool of unstructured data received.
- Created HBase and Cassandra tables to load huge amount of data.
- Analyzed the data and proposed NoSQL database solutions to meet requirements.
- Install EC2 server clusters.
- Installed and integrated redshift with Hadoop to meet business requirements.
- Install and maintain Hadoop Horton works Cluster on EC2 servers.
- Configure Hadoop stack on EC2 servers. Transferred data between s3 and EC2 instances.
- Involved in developing machine learning libraries for data analysis and data visualization.
- Developed multiple MapReduce jobs in Java and python for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Involved in continuous monitoring and managing the Hadoop cluster using Hortonworks.
- Used streamsets in the data pipeline.
- Developed optimal strategies for distributing the web log data over the cluster; importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Used tableau for data visualization.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Loading data from various data sources and legacy systems into Teradata production and development warehouse using BTEQ, FASTEXPORT, MULTI LOAD, FASTLOAD and Informatica.
- Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata.
- Developer and maintained ETL (Data extraction, Transformation and loading) mappings using Informatica designer.
- Explore prebuilt ETL metadata, mappings and DAC metadata and Develop and maintain SQL code as needed for SQL Server database.
- Responsible to Configure on the Hadoop cluster and troubleshoot the common Cluster Problem.
- Involved in handling the issues related to cluster start, node failures on the system.
- Cluster configuration and data transfer (distcp and hftp), inter and intra cluster data transfer.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: Hadoop, HDFS, Map Reduce, Hive, Sqoop, Spark, Oozie, Cassandra, Python, Maven, Shell Scripting.
Confidential
Hadoop/SQL Developer
Responsibilities:
- Imported log files of master card, baseII, visa organizations from mainframes using Golden Gate Software and injected these logfiles into hive tables by creating hive external tables for each type of log files.
- Written complex Hive and spar SQL queries for data analysis to meet business requirements.
- Creating Hive external tables to store the GGS output. Working on them for data analysis to meet the business requirements.
- Created and HBase tables to load huge amount of structured, semi-structured and unstructured data coming from NoSQL and Tandem system.
- Used ESP schedule jobs to automate the pipeline workflow and orchestrate the map reduces jobs that extract the data on a timely manner.
- Involved in Hive performance optimizations like partitioning, bucketing and perform several types of joins on Hive tables and implementing Hive serdes like JSON and Avro.
- Designed and implemented Map Reduce-based large-scale parallel relation-learning system.
- Worked with Parquet, Avro Data Serialization system to work with all data formats.
- Implemented several types of scripts like shell scripts, python, and HQL scripts to meet the business requirements.
- Performing technical analysis, ETL design, development, and deploying on the data as per the business requirement.
- Involved in performing various data manipulations using various Talend components.
- Developed Spark Streaming applications for real time Processing.
- Experienced in managing and reviewing Hadoop log file.
- Used streamsets engine to stream the data in real time.
- Experienced in working with different scripting technologies like Python, Unix shell scripts.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Designed and developed a corporate intranet that is used in daily workflow to increase.
- Developed Spark Streaming applications for real time Processing.
- Applied different Transformations and actions in Spark-SQL like joins and collect.
- Drives and leads solution design services, including requirements analysis, functional and technical design leadership, and documentation / review with business and IT constituents
- Used agile development environments using continuous integration and deployments.
Environment: Hadoop, HDFS, Hive, Spark, MapReduce, Cloudera, Avro, CDH, Shell script, HBase, Eclipse, Python, MySQL.
Confidential
Associate QA Engineer
Responsibilities:
- Organizing material and complete editing assignment according to set standards regarding order, clarity, conciseness, style and terminology.
- Systematically analyzed business processes, procedures and activities with the goal of highlighting organizational problems and recommending solutions.
- Accountable for timely escalation of process/technical issues & challenges faced. Effective communication and Liaison with US and HGC team to ensure that I establish good working relationship which helps me in improving the process.
- Designing the layout and structure of the documents by screening and selecting appropriate items as well as identifying company’s client groups and devising communication plan for the same.
Confidential
Content Reviewer, Quality Assurance & Bug logger
Responsibilities:
- Organizing material and complete editing assignment according to set standards regarding order, clarity, conciseness, style and terminology.
- Systematically analyzed business processes, procedures and activities with the goal of highlighting organizational problems and recommending solutions.
- Accountable for timely escalation of process/technical issues & challenges faced. Effective communication and Liaison with US and HGC team to ensure that I establish good working relationship which helps me in improving the process.
- Designing the layout and structure of the documents by screening and selecting appropriate items as well as identifying company’s client groups and devising communication plan for the same.
Confidential
Associate QA Engineer
Responsibilities:
- Understanding the requirements from the client and to analyze the SRS and Use case documents.
- Writing the test cases based on the assigned User Stories.
- Involved in User acceptance testing and Regression Testing
- Coordinating with the development team to discuss about the defects to get it resolved quickly
- Actively managed and communicated the test status to onsite and offshore leads.