Sr Hadoop Developer Resume
UtaH
SUMMARY:
- Overall 7+ years of IT experience and 5+ years of experience on BIG DATA using Hadoop framework and related technologies such as HDFS,HBASE, MapReduce, HIVE, PIG, FLUME, MongoDB, OOZIE, SQOOP, and ZOOKEEPER.
- Experience analyzing data using HIVE, Pig Latin, HBase and custom Map Reduce programs in Java.
- Expertise in working with Cloudera hadoop distribution.
- Experience in building data pipelines and defining data flow across large systems.
- Deep understanding of data import and export from relational database into Hadoop cluster.
- Experience in handling data load from Flume to HDFS.
- Experience in handling data import from NoSQL solutions like MongoDB to HDFS.
- Experience in data extraction and transformation using MapReduce jobs.
- Experience in Big Data Analytics using Cassandra, MapReduce and relational databases.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Zookeeper, Oozie, Hive, Sqoop, Pig, and Flume.
- Involved in Data Migration between Teradata and DB2 (Platinum).
- Experience in working with Mainframe files, COBOL files, XML, and Flat Files.
- Have a strong experience in Teradata development and index’s (PI, SI, PARTITION, JOIN INDEX) etc.
- Experience in data management and implementation of Big Data applications using Hadoop frameworks.
- Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper.
- Experience in writing Map Reduce jobs using Java.
- Built real - time Big Data solutions using HBASE handling billions of records.
- Extensive experience in GUI design using JSP, JSF, HMVC Pattern, MVC Architecture, leading to substantial reduction in time and effort.
- Experience in building enterprise Applications and Distributed Systems using technologies J2EE, EJB 2.1/3.0,OpenEJB, RMI, JPA, IBM MQ Series, Active MQ, OpenJPA, JDBC, JSP, Struts, Servlets, JMS, EMS,XML and JavaScript.
- Hands-on experience in writing Pig Latin scripts, working with grunt shells and scheduling workflows with Oozie.
- Worked on Classic and Yarn distributions of Hadoop like the Apache Hadoop 2.0.0, Cloudera CDH4 and CDH5.
- Use of IDE for developing environment like Eclipse, NetBeans, Sun ONE Studio, Web Sphere Studio 7.0 8.0, Jbuilder, Web Gain Business Designer Structure Builder, Elixir Case, and Visual Source Safe and Erwin for Data base Scheme Design.
- Sound RDBMS concepts and extensively worked with Oracle 8i 9i 10g 11g, DB2, SQL Server 8.0 9.0 10.0 10.5 11.0, MySQL, MS-Access and Toad.
- Experienced in writing PL SQL procedures, Triggers in Oracle and Stored Procedures in DB2 and MySQL.
- Generating and analyzing Teradata PDCR Capacity and Performance reports.
- Workload analysis using Teradata Priority Scheduler, TDWM and Teradata Active System Management (TASM)
- Good Knowledge in Teradata tools like Teradata Sql Assistant, PMON, Teradata Administrator, Viewpoint, Netvault.
- Creating Teradata Roles according to the user groups and granting access to the production databases.
- Creating Teradata Profiles and maintaining them. tJavarow, tDie, tAggregate Row, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap etc.
- Hands on experience in in-memory data processing with Apache Spark.
- Configured and developed complex dashboards and reports on Splunk
- Strong experience on Hadoop distributions Horton works & Cloudera
- Expert in Coding Teradata SQL, Teradata Stored Procedures, Macros and Triggers.
- Extracted data from various data source including OLE DB, Excel, Flat files and XML.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra and Talend
Programming Languages: Java, C/C++, eVB, Assembly Language (8085/8086)
Scripting Languages: JSP & Servlets, PHP, JavaScript, XML, HTML, Python and Bash
Databases: Teradata, NoSQL, Oracle, MySQL
UNIX Tools: Apache, Yum, RPM
Tools: Eclipse, JDeveloper, JProbe, CVS, Ant, MS Visual Studio
Platforms::Windows(2000/XP), Linux, Solaris, AIX, HPUX
Application Servers: Apache Tomcat 5.x 6.0, Jboss 4.0
Testing Tools: NetBeans, Eclipse, WSAD, RAD
Methodologies: Agile, UML, Design Patterns
Web Services: REST, SOAP, WebLogic
Cloud: AWS
Version Control: GIT, SVN
PROFESSIONAL EXPERIENCE:
Confidential, Utah
Sr Hadoop Developer
Responsibilities:
- Good Understanding and related experience with Hadoop Stack- Internals, Hive, Pig and Map Reduce experience in setting up of clusters utilizing cloudera manager.
- Migrated 160 tables from Oracle to Cassandra using Apache Spark.
- Built out the frontend using Spray, the actor-based framework. This proved to be an excellent choice to build a Restful, lightweight and asynchronous web service
- Implemented various roots for the application using spray.
- Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
- Used Scala to write code for all Spark use cases.
- Assigned name to each of the columns using case class option in Scala.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop
- Using Spark Context, Spark-SQL, Data Frame, Pair RDD's and YARN.
- Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of parallelism, selection of correct Serialization & memory tuning.
- Involved in Spark-Cassandra data modeling.
- Manual and automated installation of cloudera’s distribution including Apache hadoop CDH3,CDH4 environment.
- Deep understanding of schedulers, workload management, availability, scalability and distributed data platforms.
- Installed and configured hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Involved in loading data from UNIX file system to HDFS.
- Wrote MapReduce jobs to discover trends in data usage by users.
- Involved in managing and reviewing Hadoop log files.
- Involved in running Hadoop streaming jobs to process terabytes of text data.
- Load and transform large sets of structured, semi structured and unstructured data.
- Wrote pig UDF's.
- Designed and developed various analytical reports from multiple data sources by blending data on a single worksheet in Tableau Desktop.
- Developed HIVE queries for the analysts.
- Analyze business requirements and data sources from Excel, Oracle, SQL Server for design, development, testing, and production rollover of reporting and analysis projects within Tableau.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Exported the result set from HIVE to MySQL using Shell scripts.
- Used Zookeeper for various types of centralized configurations.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Java 1.6, UNIX Shell Scripting
Confidential, Atlanta
Hadoop Developer
Responsibilities:
- Designed applications, contributing to all phases of SDLC, including analysis requirements, design, coding, testing, and requirement specification documentation.
Followed Agile methodology and involved in brainstorming with the team to come up with various solution.
- Using Git repository for code merging.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics
- Performed DataIngestion from multiple internal clients using Apache Kafka.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data
- Hands on experience in AWS provisioning and good knowledge of AWS services like EC2,S3,EMR,ELB,RDS, Lambda.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Worked on Spark streaming using Apache Kafka for real time data processing.Experience in designing and implementing distributed data processing pipelines using Spark
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
- Shared responsibility for administration of Hadoop, Hive and Pig.
Environment: Apache Spark, Cloudera (CDH 5.8), Scala, Apache Spark 1.6, Data Frames, ApacheKafka, SBT build, HUE, Apache Sqoop, Zookeeper, Cloudera Manager, HDFS, Git hub, Maven.
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Experience in upgrading Cloudera hadoop cluster from 5.3.8 to 5.8.0 and 5.8.0 to 5.8.2.
- Hands-on experience on all hadoop ecosystems (HDFS, YARN, Map Reduce, Hive, Spark, Flume, Oozie, Zookeeper, Spark, Impala, HBase and Sqoop) through Cloudera manager.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Experienced in querying data using Spark SQL on top of Spark engine for faster data sets processing.
- Worked on implementing Spark Framework, a Java based Web Frame work.
- Worked with Apache SOLR to implement indexing and wrote Custom SOLR query segments to optimize the search.
- Written java code to format XML documents, uploaded them to Solr server for indexing.
- Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
- Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
- Expert knowledge on MongoDB, NoSQL data modeling, tuning, disaster recovery backup used it for distributed storage and processing using CRUD.
- Extracted and restructured the data into MongoDB using import and export command line utility tool.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Implemented Custom Sterilizer, interceptors to Mask, created confidential data and filter unwanted records from the event payload in flume.
- Experience in creating tables, dropping and altered Confidential run time without blocking updates and queries using HBase and Hive.
- Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Imported several transactional logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
- Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Installed, Configured Talend ETL on single and multi-server environment.
Environment: Hadoop, HDFS, Hive, Map Reduce, AWS Ec2, SOLR, Impala, MySQL, Oracle, Sqoop, Kafka, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, SBT, Akka, Linux-Ubuntu, Scala, Ab Initio, Tableau, Maven, Jenkins, Java (JDK 1.6), Cloudera, JUnit, Agile Methodology.
Confidential
ETL / Teradata Developer
Responsibilities:
- Involved in Complete Software Development Lifecycle Experience (SDLC) from Business Analysis to Development, Testing, Deployment and Documentation.
- Used Teradata utilities Fastload, Multiload, Tpump to load data.
- Wrote BTEQ scripts to transform data.
- Wrote Fast export scripts to export data.
- Wrote, tested and implemented Teradata Fastload, Multiload and Bteq scripts, DML and DDL.
- Constructed Korn shell driver routines (write, test and implement UNIX scripts)
- Wrote views based on user and/or reporting requirements.
- Wrote Teradata Macros and used various Teradata analytic functions.
- Involved in migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Teradata
- Performance tuned and optimized various complex SQL queries.
- Wrote many UNIX scripts.
- Good knowledge onTDWM, PMON, DBQL, SQL assistant and BTEQ.
- Gathered system design requirements, design and write system specifications.
- Excellent knowledge on ETL tools such as Informatica, SAP BODS to load data to Teradata by making various connections to load and extract data to and from Teradata efficiently.
- Agile team interaction.
- Worked on data warehouses with sizes from 30-50 Terabytes.
- Coordinated with the business analysts and developers to discuss issues in interpreting the requirements.
Environment : Teradata 12/13, Teradata SQL Assistant, SQL, VSS, Outlook, Putty, MLOAD, TPUMP, FAST LOAD, FAST EXPORT, TDWM, PMON, DBQ.