We provide IT Staff Augmentation Services!

Sr. Data Engineer/ Sr. Hadoop Developer Resume

0/5 (Submit Your Rating)

San Francisco, CA

SUMMARY

  • 9+ years of IT experience in Software Development with 5+ years' work experience as Big Data /Hadoop Developer with good noledge of Hadoop framework.
  • Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
  • Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement, and support (SDLC & Agile techniques).
  • Experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
  • Experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark, Scala, and Python.
  • Experience in AWS cloud solution development using Lambda, SQS, SNS, Dynamo DB, Athena, S3, EMR, EC2, Redshift, Glue, and CloudFormation.
  • Experience in using Microsoft Azure SQL database, Data Lake, Azure ML, Azure data factory, Functions, Databricks and HDInsight.
  • Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 300 TB.
  • Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS and Azure.
  • Good experience with Snowflake utility SnowSQL.
  • Experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
  • Strong experience in implementing data models and loading unstructured data using HBase, Dynamo Db and Cassandra.
  • Experienced withPythonframeworks likeWebapp2 and, Flask.
  • Created multiple report dashboards, visualizations and heat maps using tableau, QlikView and qliksense reporting tools.
  • Experienced in developing web - based applications usingPython, Django, QT, C++, XML, CSS, JSON, HTML, DHTML, JavaScript and JQuery.
  • Strong experience in extracting and loading data using complex business logic's using Hive from different data sources and built teh ETL pipelines to process tera bytes of data daily.
  • Experienced in transporting and processing real time event streaming using Kafka and Spark Streaming.
  • Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
  • Experienced in processing real time data using Kafka 0.10.1 producers and stream processors and implemented stream process using Kinesis and data landed into data lake S3.
  • Experience in implementing multitenant models for teh Hadoop 2.0 Ecosystem using various big data technologies.
  • Designed and developed spark pipelines to ingest real time event-based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
  • Excellent working experience in Scrum / Agile framework, Iterative and Waterfall project execution methodologies.
  • Designed data models for both OLAP and OLTP applications using Erwin and used both star and snowflake schemas in teh implementations.
  • Capable of organizing, coordinating, and managing multiple tasks simultaneously.
  • Excellent communication and inter-personal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks TEMPeffectively and concurrently.
  • Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions(SCD).
  • Strong analytical skills with ability to quickly understand client's business needs. Involved in meetings to gather information and requirements from teh clients.
  • Experience in change implementation, monitoring and troubleshooting of AWS Snowflake databases and cluster related issues.

TECHNICAL SKILLS

Operating System: Linux, UNIX, IOS, TinyOS, Sun Solaris, HP-UX, Windows 8, Windows 7, UNIX, Linux, Centos, Ubuntu.

Hadoop/Big Data: Apache Spark, HDFS, MapReduce, MRUnit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, Apache Cassandra, Scala, Flume, Apache ignite, Avro, AWS.

Languages: Scala. Java JDK1.4 1.5 1.6 (JDK 5 JDK 6), C/C++, SQL, Scala HQL, R, Python, XPath, Spark, PL/SQL, Pig Latin.

Data Warehousing& BI: Informatica Power Center 9x/8x/7x, Power Exchange, IDQ, ambari view, consumption framework

ETL Tools: IBM Info sphere Data stage 11.5, MSBI (SSIS), Sqoop, TDCH, Manual, etc

Database: Oracle 11g, AWS Redshift, AWS Athena, IBM Netezza, HBase, Apache Phoenix, SQL Server,Oracle, and MYSQL, HBase, Mongo DB, Cassandra.

Debugging tools: Microsoft SQL Server Management Studio 2008, Business Intelligence Development Studio 2008, RAD, Subversion, BMS Remedy

Version Controller: Tortise HG, Microsoft TFS, SVN, GIT, CVS, Tpump, Mload, Fast Export.

GUI Editors: IntelliJ Community Edition, IntelliJ Data grip, dB Visualizer, SQL SQL, DBeaver

PROFESSIONAL EXPERIENCE

Confidential, San Francisco, CA

Sr. Data engineer/ Sr. Hadoop Developer

Responsibilities:

  • Written ETL jobs in using spark data pipelines to process data from different source to transform data to multiple targets.
  • Created streams using Spark and processed real time data into RDDs & data frames and created analytics using SPARK SQL.
  • Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
  • Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing.
  • Involved in building database Model, APIs and Views utilizingPython, in order to build an interactive web based solution.
  • Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.
  • Bulk loading and unloading data into Snowflake tables using COPY command.
  • AWS Infrastructure setup on EC2 and S3 API implementation for accessing S3 bucket data file.
  • Designed "Data Services" to intermediate data exchange between teh Data Clearinghouse and teh Data Hubs.
  • Ensure ETL/ELT’s succeeded and loaded data successfully in Snowflake DB.
  • Created User Controls and simple animations usingJavaScript andPython.
  • Written ETL flows and MapReduce to process data from AWS S3 to dynamo DB and HBase.
  • Involved in teh ETL phase of teh project & Designed and analyzed teh data in oracle and migrated to Redshift and Hive.
  • Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
  • Developed multi-threaded standalone app inPython, PHP, C++ to view Circuit parameters and performance.
  • Developed entire frontend and backend modules usingPythonon Django Web Framework.
  • Performed real time analytics on transactional data using python to create statistical model for predictive and reverse product analysis.
  • Involving in client meetings and explaining teh views to supporting and gathering requirements.
  • Working in an agile methodology, understand teh requirements of teh user stories
  • Prepared High-level design documentation for approval
  • Also, data visualization software tableau, quick sight and Kibana are used as part of bringing new insights from data extracted and better representation of data.
  • Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs

Environment: Had0op, Spark, Spark Streaming, Apache Kafka, Hive, Tez, AWS, ETL, PIG, UNIX, Linux, Tableau, Teradata, Pig, Sqoop,HDFS, Map Reduce, Flume, Hive,, Hadoop 2.x, NOSQL, Python, Eclipse, Maven, Java, agile methodologies, Elastic search.

Confidential, San Francisco CA

Sr. Hadoop Developer

Responsibilities:

  • Create a complete solution by integrating a variety of programming languages & tools together and data models to reduce system complexities and hence increase efficiency & reduce cost by Setting & achieving individual as well as teh team goal.
  • Introduce new data management tools & technologies into teh existing system to make it more efficient.
  • Responsible for developing solutions to increase profitability and minimize cost by resolving technical deficiencies.
  • Offer technical expertise and developed software design proposals for different components.
  • Demonstrate sound project management capabilities by using agile methodologies and scrum-based strategies in a python-based working environment.
  • Craft well-defined strategies with various types of Statistical modeling, multivariate analysis, model testing, problem analysis, among others.
  • Developed entire frontend and backend modules usingPythonon Django Web Framework.
  • Assist in opening up new DataStage server job roles for Extraction Transformation and Loading process of DW by understanding technical specifications, etc.
  • Develop complex jobs using various stages like Lookup, Join, Transformer, Dataset.
  • Establish relationships and TEMPeffectively respond to competing priorities with key customers, account management and other stakeholders.
  • Rewrite existingJavaapplication inPythonmodule to deliver certain format of data
  • Used CouchbasePythonSDK to build applications dat use Couchbase Server.
  • Wrote Python code embedded with JSON and XML to produce HTTP GET request, parsing HTML data from websites.
  • Coordinate with business analysts and modelers to better understand subject areas and modified specifications to reflect precise user needs.
  • Used several python libraries like wxPython, numPY, Jython and matPlotLib.
  • Develop outcome-driven queries to compare datasets between two databases to ensure dat accurate user needs are fulfilled.
  • Analyze teh necessary transforms to map out teh source and target databases by studying and understanding teh specifications.
  • Unearth DataStage parallel jobs for Extraction Transformation and Loading process.
  • Implement testing of teh application on development by using DataStage Director.
  • Perform operations like database reorgs, speeding up data warehouse operations, sorting and aggregating a bulk of files by using Co Sort utility.
  • Implement Change data capture techniques with slowly growing target, Simple Pass- through mapping, slowly changing dimension (SCD) type1 and type2.
  • Transform data to provide KPIs for Lead Gen reports, Executive dashboards, Call tends Reports, cancel/drop reports and many more.

Environment: Hadoop, Hive, Map Reduce, Sqoop, Spark, Eclipse, Maven, Java, agile methodologies, AWS, Tableau, Pig, MapReduce, Java Collection, MySQL, apache AVRO, Sqoop Zookeeper, SVN, Jenkins, windows AD, windows KDC.

Confidential, Nashville TN

Big Data Engineer/Hadoop Developer

Responsibilities:

  • Full life cycle of teh project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Conferring with data scientists and other qlikstream developers to obtain information on limitations or capabilities for data processing projects
  • Designed and developed automation test scripts using Python
  • Creating Data Pipelines using Azure Data Factory.
  • Automating teh jobs using Python.
  • Creating tables and loading data in Azure MySQL Database
  • Creating Azure Functions, Logic Apps for Automating teh Data pipelines using Blob triggers.
  • Analyze SQL scripts and design teh solution to implement using Pyspark
  • Developed Spark code using Python (Pysaprk) for faster processing and testing of data.
  • Used SparkAPI to perform analytics on data in Hive
  • Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing, or other advanced techniques.
  • Data Cleansing, Integration and Transformation using PIG
  • Implement One time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.
  • Involved in exporting and importing data from local file system and RDBMS to HDFS
  • Designing and coding teh pattern for inserting data into Data lake.
  • Moving teh data from On-Prem HDP clusters to Azure
  • Building, installing, upgrading, or migrating petabyte size big data systems
  • Fixing Data related issues
  • Loading data to DB2 data base using Data Stage.
  • Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions(SCD).
  • Stage teh API or Kafka Data(in JSON file format) into Snowflake DB by FLATTENing teh same for different functional services.
  • Monitoring teh functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they operate Confidential their peak performance Confidential all times.
  • Created Hive tables, and loading and analyzing data using hive queries
  • Communicating regularly with teh business teams to ensure dat any gaps between business requirements and technical requirements are resolved.
  • Reading and translating data models, data querying and identifying data anomalies and provide root cause analysis.
  • Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
  • Engage in project planning and delivering to commitments.
  • POC's on new technologies (Snowflake) dat are available in teh market to determine teh best suitable one for teh Organization needs

Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Jenkins, windows AD, windows KDC, Horton works distribution of Hadoop 2.3, Oozie, HDFS, Storm, MongoDB,CDH3, Centos, Sqoop, Oozie, UNIX, T-SQLHortonworks.

Confidential, Dayton OH

Hadoop/Spark Developer

Responsibilities:

  • Suggestions on converting to Hadoop using MapReduce, Hive, Sqoop, Flume and Pig Latin.
  • Experience in writingSpark applications for Data validation, cleansing, transformations and custom aggregations.
  • Imported data from different sources into Spark RDD for processing.
  • Developed custom aggregate functions usingSparkSQLand performed interactive querying.
  • Worked oninstalling cluster, commissioning & decommissioning ofDatanode, Name nodehigh availability, capacity planning, and slots configuration.
  • Responsible for managing data coming from different sources.
  • Imported and exported data into HDFS using Flume.
  • Experienced in analyzing data with Hive and Pig.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Map-Reduce, Oozie, Flume, and Kafka.
  • Experienced in managing and reviewing Hadoop log files.
  • Helped with Big Data technologies for integration of Hive with HBASE and Sqoop with HBase.
  • Analyzed data with Hive, Pig and Hadoop Streaming.
  • Involved in transforming therelational databaseto legacy labels to HDFS andHBASEtables usingSqoopand vice versa.
  • Involved in Cluster coordination services through Zookeeper and Adding new nodes to an existing cluster.
  • Moved teh data from traditional databases like MySQL, MS SQL Server and Oracle into Hadoop.
  • Worked on Integrating Talend and SSIS with Hadoop and performed ETL operations.
  • Installed Hive, Pig, Flume, Sqoop and Oozie on teh Hadoop cluster.
  • Used Flume to collect, aggregate and push log data from different log servers.

Environment: Pig, Hive, Oozie, Sqoop, Flume, HBase, Java, Maven, Avro, Cloudera, Eclipse and Shell Scripting Hadoop, Horton works, Linux, HDFS, Cloudera Hadoop, Linux, HDFS, Map reduce, Oracle, SQL Server, Eclipse, Java and Oozie scheduler.

Confidential, Parsippany, NJ

Hadoop Developer

Responsibilities:

  • Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Developing Managed, external and partition tables as per teh requirement.
  • Ingested structured data into appropriate schemas and tables to support teh rule and analytics.
  • Developing custom User Defined Functions (UDF's) in Hive to transform teh large volumes of data with respect to business requirement.
  • Developing Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from edge node to HDFS using shell scripting
  • Implemented scripts for loading data from UNIX file system to HDFS.
  • Load and transform large sets of structured, semi structured, and unstructured data.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Actively participated in Object Oriented Analysis Design sessions of teh Project, which is based on MVC Architecture using Spring Framework.
  • Developed teh presentation layer using HTML, CSS, JSPs, BootStrap, and AngularJS.
  • Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
  • Implemented Object-relation mapping in teh persistence layer using hibernate framework in conjunction with spring functionality.
  • Generated POJO classes to map to teh database table.
  • Configured Hibernates second level cache using EHCache to reduce teh number of hits to teh configuration table data.
  • ORM tool Hibernate to represent entities and fetching strategies for optimization.
  • Implementing teh transaction management in teh application by applying Spring Transaction and Spring AOP methodologies.
  • Written SQL queries and stored procedures for teh application to communicate with Database
  • Used Junit framework for unit testing of application.
  • Used Maven to build and deploy teh application.

Environment: HDFS, Cloudera Hadoop, Linux, HDFS, Map reduce, Oracle, SQL Server Junit, SQL, PL-SQL, Eclipse, Web Sphere.

Confidential

Hadoop Developer/SQL Developer

Responsibilities:

  • Set up and built AWS infrastructure with various services available by writing cloud formation templates (CFT) in json and yaml.
  • Developed Cloud Formation scripts to build EC2 on demand
  • With teh halp of IAM created roles, users and groups and attached policies to provide minimum access to teh resources.
  • Updating teh bucket policy with IAM role to restrict teh access to user.
  • Configured AWS Identity Access Management (IAM) Group and users for improved login authentication.
  • Created topics in SNS to send notifications to subscribers as per teh requirement.
  • Involved in full life cycle of teh project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
  • Moving data from Oracle to HDFS using Sqoop
  • Data profiling on critical tables from time to time to check for teh abnormalities
  • Created Hive Tables, loaded transactional data from Oracle using Sqoop and Worked with highly unstructured and semi structured data.
  • Developed MapReduce (YARN) jobs for cleaning, accessing, and validating teh data.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables
  • Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
  • Developed optimal strategies for distributing teh web log data over teh cluster importing and exporting teh stored web log data into HDFS and Hive using Sqoop.
  • Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
  • Developed Pig Latin scripts for replacing teh existing legacy process to teh Hadoop and teh data is fed to AWS S3.
  • Working on CDC (Change Data Capture) tables using Spark Application to load data into Dynamic Partition Enabled Hive Tables.
  • Designed and developed automation test scripts using Python
  • Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
  • Analyzed teh SQL scripts and designed teh solution to implement using Pyspark
  • Implemented HiveGenericUDF's to in corporate business logic into HiveQueries.
  • Responsible for developing data pipeline with Amazon AWS to extract teh data from weblogs and store in HDFS.
  • Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
  • Supporting data analysis projects by using Elastic MapReduce on teh Amazon Web Services (AWS) cloud performed Export and import of data into s3.
  • Involved in designing teh row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
  • Integrated Oozie with teh rest of teh Hadoop stack supporting several types of Hadoop jobs out of teh box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Creating Hive tables and working on them using Hive QL.
  • Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
  • Developed multiple POCs using PySpark and deployed on teh YARN cluster, compared teh performance of Spark, with Hive and SQL and
  • Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.

Environment: JSP, Servlets, Struts, Hibernate, HTML, CSS, JavaScript, JSON, REST, JUnit, XML, SASS, DOM, Web Logic (Oracle App server), Web Services, Eclipse, Agile.

Confidential

Java Developer

Responsibilities:

  • Coordinated software system installation and monitor equipment functioning to ensure specifications are met.
  • Actively involved in project development and bug fixing for teh project. Worked closely with students, halped in teh deeper understanding of concepts.
  • Involved in requirement analysis and client interaction, responsible for writing hibernate mapping XML Files, HQL.
  • Worked closely with business analysts, project managers and project leaders to analyze business requirements.
  • Used J2EE design patterns like Factory, Singleton. Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, modeling, analysis, architecture design and development.
  • Worked in a fast-paced environment and meeting all teh requirements.
  • Created applications involving JSP, JavaScript, jQuery and HTML. Extensively used various collection classes like Array List, Hash Map, Hash Table, and Hash Set.
  • Creating technical specifications, coding, unit and system integration testing for teh enhancements and conducted reviews with end users.
  • Created new database connection for MYSQL.
  • Developed teh application using Spring MVC Framework by implementing controller and backend service classes.
  • Followed Java coding standards while developing teh application.

Environment: Eclipse, MySQL Server, JSP, JavaScript, jQuery, Java, C, HTML5 and CSS3.

We'd love your feedback!