Sr. Hadoop Developer Resume
Atlanta, GA
SUMMARY
- Over 8+ Years of experience in IT industry, 5+ Years of experience in developing large scale applications using Hadoop and Other Big data tools.
- Expertise in HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie and Hadoop ecosystem.
- Experience in analyzing the different types of data that flow from data lakes to Hadoop Clusters.
- Hands on experience with Big Data Hadoop core and Eco - System components (HDFS, MR1, Yarn, Hive, Impala, Beeline, Sqoop, Flume, Oozie, Zookeeper and Pig).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node.
- Experience in partitioning the Big Data according the business requirements using Hive Indexing, partitioning and Bucketing.
- Experience working with Cloudera Distribution of Hadoop.
- Working with data transformation from HDFS, HIVE, PIG, and MySQL.
- Experience in creating UDF's, UDAF's for Hive and Pig.
- Optimized streaming log files with no time latency using Flume and more importantly operating the data down stream flow to Hadoop ecosystems and it analysis segments.
- Experience in converting MapReduce applications to Spark.
- Experience in reporting analyzed data in vivid formats using reporting tool Tableau.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Good knowledge in using job scheduling and workflow designing tools like Oozie.
- Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
- Prolific in generating the splendid and informative dashboards for Business Intelligence teams.
- Experience in using design pattern, Java, Servlets, JSP, JavaScript, HTML, JQuery, Angular JS, Mobile JQuery, XML, Web Logic, JBOSS 4.2.3, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
- Expertise in relational databases like Oracle, My SQL and SQL Server.
- Experience in Agile methodologies.
- Proficient communication skills with an ability to lead a team & keep them motivated.
- Extensive experience with Java complaint IDE's like Eclipse.
- Adept in handling the team in untoward situations and capable of sailing the team to deliver the quality output.
- Highly motivated and versatile team player with the ability to work independently & adapt quickly to new emerging technologies.
TECHNICAL SKILLS
Programming Language: C++, JAVA, Python, Scala
Hadoop/Big Data Stack: Hadoop, HDFS, MapReduce, Hive, Pig, Spark-streaming, Scala, Kafka, Storm, Zoo Keeper, Yarn, Spark, Sqoop, Flume.
Hadoop Distributions: Cloudera, Horton Works
Query Languages: HiveQL, SQL, PL/SQL, Pig
Web Technologies: Java, J2EE, Struts, Spring, JSP, Servlet, JDBC, EJB, JavaScript
Frameworks: MVC, Struts, Spring, Hibernate
IDE's: Eclipse, Intellij
Build Tools: Ant, Maven, SB2, DataStage 7.5, QualityStage 7.5.
Databases: Oracle, MYSQL, MS Access, DB2, Teradata
No SQL: HBase, Cassandra
Operating Systems: Windows, Linux, Unix, CentOS
Scripting Languages: Shell scripting
Version Control system: SVN, GIT, CVS
PROFESSIONAL EXPERIENCE
Confidential, Atlanta, GA
Sr. Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources performed transformations using Spark and loaded data into hive.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way.
- Involved in the process of data acquisition, data pre-processing and data exploration of Retail project.
- As a part Data acquisition, used Sqoop and Flume to inject the data from server to Hadoop using incremental import.
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- In pre-processing phase used Spark to remove all the missing data and data transformation to create new features.
- In data exploration stage used hive and impala to get some insights about the customer data.
- Used Flume, Sqoop, Hadoop, spark and Oozie for building data pipeline.
- Worked with Spark core, Spark Streaming and Spark SQL modules of Spark.
- Used Scala to write the code for all the use cases in Spark.
- Used slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
- Used Scala collection framework to store and process the complex consumer information. Based on the offers setup for each client, the requests were post processed and given offers.
- Requests were processed using the powerful Scala collections and persisted to the database in synchronous fashion.
- Exploring with Spark various modules of Spark and working with Data Frames, RDD and Spark Context.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and Processing.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
- Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
- Performed map-side joins on RDD
Environment: Hadoop, Big Data, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Flume, LINUX, Java, Eclipse, Hadoop Distribution of Cloudera., Windows, UNIX Shell Scripting, and Eclipse.
Confidential, Chevy Chase, MD
Hadoop Developer
Responsibilities:
- Involved in gathering and analyzing business requirements, and designing Hadoop Stack as per the requirements.
- Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- Develop different components of system like Hadoop process that involves Map Reduce and Hive.
- Developed interface for validating incoming data into HDFS before kicking off Hadoop process.
- Written hive queries using optimized ways like user-defined functions, customizing Hadoop shuffle & sort parameters.
- Transforming data using pig scripts
- Working on different hive optimization and performance tuning techniques.
- Working on Ingestion of logs into Hadoop using Flume and Kafka.
- Processing logs using spark streaming and loaded into hive tables.
- Created shell script to ingestion the files from Edge Node to HDFS.
- Worked on creating Map Reduce scripts for processing the data.
- Working extensively on HIVE, SQOOP, MAPREDUCE, SHELL, PIG and PYTHON.
- Using SQOOP to move the structured data from MySQL to HDFS, HIVE, PIG.
- Using Hive SerDe to read and write data in different formats.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using Hive.
- Involved in Design, Architecture and Installation of Big Data and Hadoop ecosystem components.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Automated Hadoop jobs using Oozie scheduler.
- Experience in writing aggression logics in different combinations to perform complex data analytics to the business needs.
Environment: Hadoop, MapReduce, HDFS, Hive, Oozie, Sqoop, Kafka, Spark 1.6
Confidential, Overland park, KS
Hadoop Developer
Responsibilities:
- Understand Business requirement and involved in preparing Design document preparation according to client requirement.
- Analyzed Tera Data procedure to prepare all individual queries information.
- Developed hive queries according to business requirement.
- Developed UDF's in Hive where we don't have some default functions in hive.
- Developed UDF for converting data from Hive table to JSON format as per client requirement.
- Implemented Dynamic partitioning and Bucketing in Hive as part of performance tuning.
- Implemented the workflow and coordinator files using Oozie framework to automate tasks.
- Involved in Unit, Integration and System Testing.
- Prepared all unit test case documents and flow diagrams for all scripts which are used in the project.
- Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
- Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
- Transforming unstructured data into structured data using PIG.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
- Cross examining data loaded in Hive table with the source data in oracle.
- Developing structured, efficient and error free codes for Big Data requirements using my knowledge in Hadoop and its Eco-system.
- Storing, processing and analyzing huge data-set for getting valuable insights from them.
Environment: HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Flume, LINUX, Java, Eclipse, PL/SQL, UNIX Shell Scripting, and Eclipse.
Confidential
Datastage Developer
Responsibilities:
- Worked on different partitioning methods (Round Robin, Entire, Hash by field, Modulus and Range) and collection methods (Round Robin, Order and Sort merge) in processing sequential data.
- Involved in production automation and maintenance and was responsible for troubleshooting issues that arose while upgrading Datastage.
- Developed jobs using Datastage in Designer to extract data from different operational Sources like Flat Files, CSV Files, Delimited Files and performed business operations on data like Cleansing, Transforming and Load (Initial/Incremental) into Target Datawarehouse.
- Extensively used import and export utility in datastage manager to import metadata and for creating new categories and new data elements.
- Involved in creating dimensional relational data models using Data modeling tool Erwin.
- Designed the Target Schema definition and Extraction, Transformation and Loading (ETL) using Data stage.
- Mapping Data Items from Source Systems to the Target System.
- Used the Datastage Designer to develop processes for extracting, cleansing, transforming, integrating, and loading data into data mart database.
- Different Stages used are Dataset, Sort, Join, Merge, Lookup, Filter, Modify, Aggregator, Column Generator, ODBC stage, Oracle stage.
- Worked on programs for scheduling Data loading and transformations using Data Stage from legacy system and Data Warehouse to Oracle 10g.
- Involved in creating parallel jobs.
- Used the Data Stage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the resulting executable versions.
Environment: Datastage 7.5, Qualitystage 7.5, Designer, Director, Manager, Oracle 9i, PL/SQL, Aix, Toad, QTP 7.0, Win Runner, Test Director 7.5, XML, XSLT, SQL, Test Cases, Test Scripts, Test Plan, Traceability Matrix.
Confidential
Jr. Java Developer
Responsibilities:
- Involved in the Design process, Coding and Testing phases of the Software Development Lifecycle (SDLC).
- Involved in Analysis, design and coding on JAVA Environment
- Coordinate requirements gathering, change requests, and prioritization of features.
- Design of application components using Java Collections and providing concurrent database access using multithreading.
- Provided XML and JSON response format to support various service clients.
- Implemented design patterns such as MVC and Singleton under J2EE Architecture.
- Follow the best practices in coding like removing unnecessary casting, using generics in Hash Map, caching, Collections like sort, comparator, list, set and Design patterns.
- Experienced in preparing Test Plans and Test Cases.
- Developed UI using HTML5, CSS, JavaScript and AngularJS.
- Used Connection Pooling to get JDBC connection and access database procedures.
- Implemented design patterns like Filter, Cache Manager and Singleton to improve the performance of the application.
- Implemented the reports module of the application using Jasper Reports to display dynamically generated reports for business intelligence.
- Deployed the application in client's location on Tomcat Server.
Environment: Java 1.4, JDBC, Oracle 9i-SQL & PL SQL, Java beans, EJB (Session beans, Entity Beans and JMS), Windows, UNIX.
