Big Data Developer Resume
Richardson, TX
SUMMARY
- Experience in developing & leading the end to end implementation of Big Data projects, comprehensive experience in Hadoop Ecosystem like Hadoop, Map Reduce, Hadoop Distributed File System(HDFS), HIVE, PIG, Python, Yarn, Ozie, Flume, Hue, Spark.
- Worked on Importing and exporting data from different databases like Oracle, Teradata into HDFS and Hive using Sqoop.
- Worked on Talend Data Mapper(TDM) for avro schema creation and converting xml,csv file to avro .
- Created complex mappings in Talend 6.4.1/6.3.1 using tMap, tDie, tJoin, tReplicate, tFilterRow, tParallelize, tFixedFlowInput, tAggregateRow, tIterateToFlow etc.
- Created Context Groups, Generic schemas and also context Variables to run jobs against different environments like Dev, Test and Prod.
- Experience in converting Hive/SQL queries into Spark transformations using Talend/Java
- Experience in collecting and storing stream data like log data in HDFS using Apache Flume.
- Wrote Hive and Pig queries for data analysis to meet the business requirements.
- Involved in Creating tables, partitioning, bucketing of table and creating UDF's in Hive.
- Strong trouble - shooting and problem-solving skills with a Logical and pragmatic attitude
- Team player with strong oral and interpersonal skills
- Work with business to gather requirements and define the Data Quality solutions for data profiling, standardization and cleansing etc.
- Define and contribute to development of standards, guidelines, design patterns and common development frameworks & components
- Experience in analyzing data using HiveQL, Pig Latin, and custom Map Reduce programs in Java.
- Strong Technical background, excellent analytical ability, team player and goal oriented, with a commitment toward excellence.
- Highly organized with the ability to manage multiple projects and meet deadlines.
TECHNICAL SKILLS
Hadoop ecosystem: Map Reduce, Sqoop, Hive, Impala, Oozie, Hue, PIG, HBase, HDFS, Zookeeper, Flume, Spark
ETL: Talend Big Data Studio7.3.1, 6.4.1,6.3.1
Databases: Teradata (13.0/14.10), Oracle 11g (SQL, PL/SQL Basics), SQL Server 2005, DB2, HBase
Tools: & Utilities: Toad, Microsoft Visio, Winscp, Appworx, Control-M, Remedy, AutoSys
Languages: Java, SQL, PL/SQL, Pig Latin, HiveQL, Unix shell, Python scripting, Scala
Operating System: Windows 2010,7, UNIX, Linux
Domain Knowledge: Finance, Banking, Telecom, Healthcare, Insurance, Manufacturing
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Richardson, TX
Big Data Developer
Responsibilities:
- Involved in loading data from UNIX file system to HDFS using command and scripts.
- Attending daily meeting with Customer to find out the exact requirement and providing the technical solution to meet the customer requirement.
- Gather and analyze business and technical requirements.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that halped analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Evaluated the performance of Apache Spark in analyzing genomic data.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Prepare Unit Test Plan/ Design (with direction from customer).
- Experience to prepare HLD, LLD, UTC, Tech Design Document.
- Loading data from different source (database & files) into Hive using Talend tool.
- Data migration from relational (Oracle. Teradata) databases or external data to HDFS using Sqoop and Flume & Spark.
- Used Talend most used components (tMap, tDie, tConvertType, tFlowMeter, tLogCatcher, tRowGenerator, tSetGlobalVar, tHashInput & tHashOutput and many more)
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tPigLoad, tPigFilterRow, tPigFilterColumn, tPigStoreResult, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Worked on converting Talend big datajobs in to Azure notebooks.
- Analyzed the data by performing Hive queries and running Pig scripts.
- Designed both Managed and External tables in Hive to optimize performance.
- Regular monitoring of Hadoop Cluster to ensure installed applications are free from errors and warnings.
- Exposure in optimizing Hive queries using Partitioning and Bucketing techniques
- Migrated data on promise datalake to azure synapse
- Used dockers coantainer for Jenkins builds
- Involved in running Hadoop streaming jobs to process terabytes of text data using Flume.
- Implemented Hive UDF's for evaluation, filtering, loading and storing of data.
- Developed HIVE queries for the analysts.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
Environment: Hadoop, HDFS, Map Reduce, Scala, Python, Azure Synapse, Hive, ASG Zena,Pig, Sqoop, Spark, Db2, Oracle, Hbase,Teradata, SQL,Eclipse, UNIX,Talend Real-Time Bigdata Platform
Confidential, Lewisville, Texas
Senior Big Data developer
Responsibilities:
- Worked on developing ETL applications.
- Create mapping, sessions and workflows, as per requirement of business to implement the logic.
- Created a frame work which copy’s data from any Linux based file system to MongoDB.
- Worked on tuples, dictionaries, object-oriented concepts based inheritance features for making algorithms.
- Queried MongoDB database queries from pythonusing pymongo to retrieve information.
- Design reports, dashboards and data visualization using Matplotlib.
- Successfully interpreted data to draw conclusions for managerial action and strategy.
- For larger datasets used Pyspark Data Framesto analyze the data.
- Modified queries, functions, cursors, triggers and stored procedures for MongoDB to improve performance, while processing data.
- Performed troubleshooting, fixed and deployed many Pythonbug fixes of the two main applications that were a main source of data for both customers and internal customer service team.
- Used Pandas to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
- Cleaned data and processed third party spending data into maneuverable deliverables within specific formats with pythonlibraries. Used TDD (Test driven development) methodology.
- An individual with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills.
- Strong oral and written communication, initiation, interpersonal, learning and organizing skills matched with the ability to manage time and people effectively.
Environment: Python, MongoDB, SFTP, Linux, MapReduce, HDFS, Hive, PyCharm, Jupyter Notebook, UNIX Shell Scripting.
Confidential, Rochester, New York
Big Data / AWS developer
Responsibilities:
- Worked on bigdata infrastructure build out for batch, transactional processing as well as real-time processing.
- Developed systems on Hadoop and Amazon Web Services.
- Brought the AWS-based system to production in 9 months.
- AWS system built with Lambda, API Gateway, DynamoDB, and S3.
- Automated big-data processing using EMRand Spark.
- Developed deployment pipelines with Code Pipeline, Code Build, and Cloud Formation, Terraform.
- Developed components for the normalization Rest service using Scala.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Good working knowledge on AWSIAM service, IAM policies, Roles, Users, Groups, AWSaccess keys and Multi Factor Authentication. And migrated applications to the AWS Cloud.
- Experience with AWSCommand line interface and PowerShell for automating administrative tasks. Defined AWSSecurity Groups which acted as virtual firewalls that controlled the traffic reaching one or more AWS EC2, LAMBDA instances.
- Hand-On experience in Implementing, Build and Deployment of CI/CD pipelines using Jenkins, managing projects often includes tracking multiple deployments across multiple pipeline stages (Dev, Test/QA staging and production).
- Worked on transferring data from on - perm to dynamo db.
- Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and dynamo db.
- Hands on Experience in Oozie Job Scheduling.
- Developed Kafka producer and consumers, HBase clients, Sparkand Hadoop MapReduce jobs along with components on HDFS, Hive.
- Exploring with the Sparkimproving the performance and optimization of the existing algorithms in Hadoop.
- Involved in converting Hive/SQL queries into Sparktransformations using SparkRDD and Scala.
- Agile methodology was used for development using XP Practices (TDD, Continuous Integration).
- Exposure to burn-up, burn-down charts, dashboards, velocity reporting of sprint and release progress.
- An individual with excellent interpersonal and communication skills, strong business acumen, creative problem solving skills, technical competency, team-player spirit, and leadership skills.
- Strong oral and written communication, initiation, interpersonal, learning and organizing skills matched with the ability to manage time and people effectively.
Environment: MapReduce, HDFS, AWS, Hive, Hue, Oozie, Bigdata, Core Java, Eclipse, Hbase, Spark, Scala, Kafka, Jenkins, Cloudera Manager, LINUX, Puppet, IDMS, U
Confidential, Sunnyvale, California
Big Data Developer
Responsibilities:
- Worked on bigdata infrastructure build out for batch processing as well as real-time processing.
- Developed, Installed and configured Hive, Hadoop, Bigdata, hue, Oozie, pig, Sqoop, Kafka, Elastic Search, Java, J2EE, HDFS, XML, PHP and Zookeeper on the Hadoop cluster.
- Created Hive Tables, loaded retail transactional data from Teradata using Scoop.
- Managed thousands of Hive databases totaling 250+ TBs.
- Developed enhancements to Hive architecture to improve performance and scalability.
- Collaborated with development teams to define and apply best practices for using Hive.
- Worked on Hadoop, Hive, Oozie, and MySQL customization for batch data platform setup.
- Worked on implementation of a log producer in SCALA that watches for application logs, transforms incremental logs and sends them to a Kafka and Zookeeper based log collection platform.
- Implemented a data export application to fetch processed data from these platforms to consuming application databases in a scalable manner.
- Extending Hive functionality by writing custom UDF’s and UDTF’s.
- Developed Hive UDTF’s and executed them using spark
- Involved in loading data from Linux file system to HDFS.
- Experience in setting up salt-formulas for centralized configuration management.
- Monitoring Cluster using various tools to see how the nodes are performing.
- Experience on Oozie workflow scheduling.
- Developed Spark scripts by using Scala Shell commands as per the requirement.
- Expertise in cluster task like adding Nodes, Removing Nodes without any effect to running jobs and data.
- Transferred data from Hive tables to HBase via stage tables using Pig and used Impalafor interactive querying of HBase tables.
- Implementation of auditing the data for accounting by capturing various logs like HDFS Audit logs, YarnsAudit logs, Audit logs.
- Worked on a proof of concept to implement Kafka-Stormbased data pipeline.
- Configured job scheduling in Linux using shell scripts
- Worked on Machine Learning Library (MLib) for clustering and underlying optimization primitives.
- Created custom Solr Query components to enable optimum search matching.
- Utilized the Solr API to develop custom search jobs and GUI based search applications.
- Also, implemented multiple output formats in the same program to match the use cases.
- Developed Hadoop streaming Map/Reduce works using Python.
- Installation of Apache SPARK on Yarnand managing Master and Worker nodes.
- Performed benchmarking of the No-SQL databases, Cassandra and HBase.
- Created data model for structuring and storing the data efficiently. Implemented partitioning and bucketing of tables in Cassandra.
- Implemented test scripts to support test driven development and continuous integration.
- Clear understanding of ClouderaManager Enterprise edition.
- Design and developed Microsites, Aggregators and Micro Services.
- Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive serdes.
- Working on POC and implementation & integration of Cloudera for multiple clients.
- Good knowledge on Creating ETLjobs to load Twitter JSON data into MongoDB and jobs to load data from MongoDB into Data warehouse.
- Design and Implement the Various ETL Projects using Informatica, Data stage as data integration tool
- Exported the analyzed data to HBase using Sqoop and to generate reports for the BI team.
- POC work is going on using Spark and Kafka for real time processing.
- Developed Python scripts to monitor health of Mongodatabases and perform ad-hoc backups using Mongodump and Mongorestore.
- Deployed the project in Linuxenvironment
- Automated the apache installation and its components using salt.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Worked with NoSQL databases like Cassandra and Mongo DB for POC purpose.
- Implement POC with Hadoop. Extract data with Spark into HDFS.
Environment: MapReduce, HDFS, Hive, Pig, Hue, Oozie, Solr, Bigdata, Core Java, Python, Eclipse, Hbase, Flume, Spark, Scala, Kafka, Cloudera Manager, Impala, UNIX RHEL, Cassandra, LINUX
Confidential
Java Developer
Responsibilities:
- Developing new pages for personals.
- Implementing MVC Design pattern for the Application.
- Used Content Management tool Dynapub for publishing data.
- Implementing AJAX to represent data in friendly and efficient manner.
- Developing and Action Classes.
- Used JMeter for load testing of the application and captured the response time of the application
- Created simple user interface for application's configuration system using MVC design patterns and swing framework.
- Implementing Log4j for logging and debugging.
- Implementing Form based approach for ease of programming team.
Environment: Core Java, Java Swing, Struts, J2EE (JSP/Servlets), XML, AJAX, DB2, My SQL, Tomcat, JMeter.