Big Data/hadoop Developer Resume
0/5 (Submit Your Rating)
Round Rock, TexaS
SUMMARY
- Around Six years of comprehensive IT experience in Big Data domain with tools like Hadoop, Hive and other open source tools/technologies.
- Experience in web - based languages such as HTML, CSS, PHP, XML and other web methodologies including Web Services and SOAP.
- Extensive experience in all the phases of the software development lifecycle (SDLC).
- Experience in deploying applications in heterogeneous Application Servers TOMCAT, Web Logic and Oracle Application. Server.
- Expertise in Informatica 9.6, Informatica Cloud, Cast Iron, force .com technology, Jitterbit, Apex Data Loader, ORACLE 11g, SQL, PL/SQL,SOSQL, SQL Server 2008, Business Objects XIR2,Autosys 11,Erwin 4.1 and UNIX.
- Extensive knowledge of NoSQL databases such as HBase.
- Worked on Multi Clustered environment and setting up Cloudera Hadoop echo System.
- Substantial experience writing MapReduce jobs in Java, Pig, Flume, Zookeeper and Hive and Storm.
- Experience in development ofBigDataprojects using Hadoop, Hive, HDP, Pig, Flume, Storm and MapReduce open source tools/technologies.
- Created Aggregate Tables, Control Tables and Staging tables inTeradata 13.10
- Used DB EE stage load large volumes of data into Teradata using Teradata Fast Export, Multi Load.
- Extensive Knowledge on automation tools such as Puppet and Chef.
- Experience in working with Java, C++ and C.
- System was implemented using Informatica and the Warehouse inTeradata. Fast-Load, Multi Load, BTEQ & Fast-Export were theTeradatautilities used.
- Hands-on experience with Productionalizing Hadoop applications such as administration, configuration management, monitoring, debugging, and performance tuning.
- Designed and developed the Informatica mapping to load the data from Legacy system to Salseforce.com for User, Account, Contact, Opportunity, Product, Service Request, Territory, Activity Case and Notes.
- Installation and Configuration of Informatica MDM Hub, Cleanse and Match Server, Informatica Power Center.
- Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Hbase, AVRO, Zoo Keeper, Oozie, Hive, HDP, Cassandra, Sqoop, Pig, Flume.
- Extensive experience in SQL and NoSQL development.
- Data integration into the Data Warehousing System using Teradata.
- Have been involved in taking responsibility to implement the Informatica module of the project
- In-depth understanding of Data Structure and Algorithms.
- Background with traditional databases such as Oracle, Teradata, Netezza, SQL Server, ETL tools / processes and data warehousing architectures.
- Extensive experience in designing analytical/OLAP and transactional/OLTP databases.
- Proficient using ERwin to design backend data models and entity relationship diagrams (ERDs) for star schemas, snowflake dimensions and fact tables.
- Familiar about SDLC (Software Development Life Cycle) Requirements, Analysis, Design, Testing, Deployment of Informatica Power Center.
- Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.
TECHNICAL SKILLS
- Data Warehouse/Business Intelligence (BI)
- Control M
- Vertis
- Datastage
- Hadoop
- HIVE
- HDP
- PIG
- Sqoop
- Flume
- MapReduce
- Splunk
- HDFS
- Zookeeper
- Storm
- Shell
- Python
- AVRO
- AIX 5.1
- Red Hat Linux
- Cent OS
- Lucene and Solr
- Apache Contributer
- Puppet and Chef
- JIRA
- SDLC
- MongoDB
- Data stage
- Talend Open Studio
- Tableau
- Qlickview
- Giraph
- IBM DB2
- Teradata
- MySQL
- NoSQL
- AWS (Amazon Web Services)
- EMR
- Data Pipeline and Redshift
PROFESSIONAL EXPERIENCE
Confidential, Round Rock, Texas
Big Data/Hadoop Developer
Responsibilities:
- Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Working on GE on loading files to HIVE and HDFS from MongoDB.
- Founded and developed environmental search engine engine using PHP5, JAVA, Lucene/SOLR, Apache and MYSQL.
- Created scripts (BTEQ, Fast Load, and MultiLoad) and written queries to move the data from source to destination.
- Worked in data Extraction, Transformation and Loading from source to target system using BTEQ, Fast Load and MultiLoad.
- Designed and Developed Complex Informatica Mappings, Mapplet, reusable Transformations, tasks, sessions and Workflows for Daily, Weekly and Monthly process to load heterogeneous data into the Oracle Data warehouse or external vendors and to Data marts and then to downstream systems.
- Led the evaluation of Big Data software like Splunk, Hadoop for augmenting the warehouse, identified use cases and led Big Data Analytics solution development for Customer Insights and Customer Engagement teams.
- Experience to implement, maintain and enhance enterprise level data warehouse usingInformatica Power Center.
- Designed techniques and wrote effective and successful programs in JAVA, Linux shell scripting to push the large data including the Text and Byte type of data to successfully migrate to NO SQL Stores using various Data Parser techniques in addition to Map Reduce jobs.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Worked on TOADfor Data Analysis, ETL/Informatica for data mapping and the data transformation between the source and the target database.
- Extensively worked withTeradatautilities like BTEQ, Fast Export, Fast Load, Multi Load to export and load data to/from different source systems including flat files.
- Hands-on experience using Unix\shell scripting, MS SQL Server,TeradataVer 14.10, Netezza, DWH Appliance, Oracle 11g, PL/SQL, Business Objects, COGNOS, Informatica Data Integration tools.
- Working on Hive/Hbase vs RDBMS, imported data to Hive, HDP created tables, partitions, indexes, views, queries and reports for BI data analysis.
- Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Currently working on XML parsing using PIG, Hive, HDP and Redshift.
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud based servers.
- Written custom BTEQ scripts to transform data after load.
- Extensively worked withInformaticaPower Center tools - Designer, Workflow Manager and Workflow Monitor.
- UsesSplunkto detect any malicious activity against webservers.
- Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs, to enable healthy operation of Map Reduce jobs to push the data from SQL to No SQL store.
- Built, stood up and delivered HADOOP cluster in Pseudo distributed Mode with Name Node, Secondary Name node, Job Tracker, and the Task tracker running successfully with Zookeeper installed, configured and Apache Accumulate (NO SQL Google's Big table) is stood up in Single VM environment.
- Worked on Distributed/Cloud Computing (MapReduce/Hadoop, Pig, HBase, AVRO, Zookeeper, etc.), Amazon Web Services (S3, EC2, EMR, etc.), Oracle SQL Performance Tuning and ETL, Java 2 Enterprise, Web Development, Mobile Application Developement (Objective-C, Java Native Mobile Apps, Mobile Web Apps), Agile Software Development, Team Building & Leadership, Engineering Management, Internet of Things (Amateur Sensor Networks, Embedded Systems and Electrical Engineering).
- Working as a lead on Big Data Integration and Analytics based on Hadoop, SOLR and web Methods technologies.
- Good knowledge onTeradataManager, TDWM, PMON, DBQL, SQL assistant and BTEQ.
- Working in implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Wrote, tested and implementedTeradataFastload, MultiLoad and Bteq scripts, DML and DDL.
Confidential
Big Data/Hadoop Developer
Responsibilities:
- Responsible for complete SDLC management using different methodologies like Agile, Incremental, Waterfall, etc.
- Responsible to manage data coming from different sources.
- Supported Map Reduce Programs those are running on the cluster.
- Jobs management using Fair scheduler.
- Managed works including indexing data, tuning relevance, developing custom tokenizers and filters, adding functionality includes playlist, custom sorting and regionalization with Solr Search Engine.
- Cluster coordination services through Zoo Keeper.
- Involved in loading data from UNIX file system to HDFS
- Installed and configured Hive and also written Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Worked on setting up Pig, Hive, Redshift and Hbase on multiple nodes and developed using Pig, Hive, Hbase, MapReduce and Storm.
- Worked on Teradata as database and extracting the Data from backend using Python, Django and reflecting that on the Front End Web Page.
- Design and implement data processing using AWS Data Pipeline.
- Drove holistic tech transformation to Big Data platform for UnitedHealth care, create strategy, define blueprint, design roadmap, build end-to-end stack, evaluate leading technology options, benchmark selected products, migrate products, reconstruct information architecture, introduce metadata management, leverage machine learning, productionize consolidated data store: Hadoop, MR, Hive, HDP and MapReduce.
- Designed new database tables to meet business information needs designed Mapping document, which is a guideline to ETL Coding.
- Developed Simple to complex MapReduce Jobs using Hive and Pig.
- Worked on automate monitoring and optimizing large volume data transfer processes between Hadoop clusters and AWS.
- Worked on SDLC tasks like creation of requirement description, task allocation, test case creation and capturing of test results
- Strong knowledge on Data Warehousing experience using Informatica Power Center.
- Configure and manageSplunkForwarders,SplunkIndexers andSplunkSearch Heads.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Installed and configured Hadoop through Amazon Web Services in cloud.
- Designed, planned and delivered proof of concept and business function/division based implementation of Big Data roadmap and strategy project (Apache Hadoop stack with Tableau) in UnitedHealthcare using Hadoop.
- Developed MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used Bash Shell Scripting, Sqoop, AVRO, Hive, HDP, Redshift, Pig and Java Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Worked on NoSQL databases including Hbase and MongoDB.
- Worked on data for classification of different Healthcare Boards using Mahout.
- Analyzed, gathered and understanding Business Requirement Document written in JIRA.
- Used data stores included Accumulo/Hadoop and graph database.
- Exploited Hadoop MySQL - Connector to store Map Reduce results in RDBMS.
- Worked on Business Intelligence (BI)/Data Analytics, Data Visualization, Big Data with Hadoop and Cloudera based projects, SAS/R, Data warehouse Architecture Design and MDM/Data Governance.
- Created a new dimensional data model for the IT team based on business and functional requirements using Ralph data warehouse design methodology.
- Worked on deployed technologies for exclusively off-site using the Amazon infrastructure and ecosystem (EMR, Redshift, Hive, DynamoDB)
- Worked on loading all tables from the reference source database schema through sqoop.
- Worked on designed, coded and configured server side J2EE components like JSP, AWS and JAVA.
- Used Confidential Crowbar as a wrapper to Chef to deploy Hadoop.
- Developed scalable Big Data Architecture which process the Terabytes of semi-structure data to extract business insights.
- Worked on supporting Linux engineers in the use of the company's Puppet infrastructure.
- Collected data from different databases (i.e. Teradata, Oracle, MySql) to Hadoop
- Used oozie and Zookeeper for workflow scheduling and monitoring.
- Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
- Experienced in managing and reviewing Hadoop log files.
- Responsible for coding of .net data ingestion tool for solr 4.1 and integration/adaptation of current IIS/.net/Microsoft solution into solr exclusively.
- Created design approach to lift and shift the existing mappings to Netezza.
- Created Hbase tables to store various data formats of PII data coming from different portfolios
- Conduct vulnerability analyses; reviewing, analyzing and correlating threat data from available sources such asSplunk.
- Working on extracting files from MongoDB through Sqoop and placed in HDFS and processed.
- Worked with Different file formats (AVRO, RC file Format)
- Worked on Data Architecture, Data Modelling, ETL, Data Migration, Performance tuning and optimization.
- Worked with testing team and helped in testing the Informatica mappings, workflows and scripts.
- Worked on Hadoop installation & configuration of multiple nodes on AWS EC2 system.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Load and transform large sets of structured, semi structured and unstructured data.
Confidential
JAVA Developer
Responsibilities:
- Executed regression and test cases defects in JIRA.
- Worked on building scalable, fault tolerant and distributed data solutions with cloud computing stack such as Hadoop, Hbase & Accumulo.
- Worked on entire data pipeline for automating using Flume and for the jobs scheduled periodically using Oozie.
- Expertise in writing shell scripts and oozie workflows
- Worked on developing and extending serialization frameworks like AVRO.
- Development and maintenance ofsplunkdashboards based on the requirements.
- Worked on configure AWS EMR (Elastic MapReduce).
- Written the Apache PIG scripts to process the HDFS data.
- Reviewing the existing Hadoop environment and make recommendations of new features that may be available and performance tuning with the other tools like Hive, Pig, MapReduce, Storm and Flume.
- Worked on Monitoring, Replication and Sharding Techniques in MongoDB.
- Performed Manual Testing, reported defects in JIRA and was responsible to keep track of them.
- Used BI solution as a custom application build using OBIEE as front end and ODI and PL/SQL used for ETL.
- Worked on Hadoop AVROFiles to Network File System for recording Audit data.
- Worked in Installation, Configuration and Management of Hadoop Cluster spanning multiple racks using automated tools like puppet and chef.
- Performed LDM/PDM using the Erwin for Landing zone (Source Image Area) withTeradata Temporal and Partition features.
- Worked on providing support for AWS Data Pipeline.
- Worked on automating the jobs using Oozie in the project.
- Used sequence and AVRO file formats and snappy compressions while storing data in HDFS.
- Developed MapReduce application using Hadoop, Redshift, MapReduce programming and Hbase.
- Monitoring Hadoop scripts which take the input from HDFS and load the data into Hive.
- Worked on Python scripts with the push of a button it deploys the entire site including chef-client runs on multiple servers.
- Worked with Big Data platform Cloudera Hadoop (CDH) and with different modules within Hadoop including HIVE and PIG to generate queries to the data.
- Created several AWS instances in the Amazon cloud for some interim solutions
- Used ETL, reporting, RDBMS as well as hardware in order to support data warehouse architecture.
- Worked in designing parameters for extraction, cleansing, validation and transformation of data from various source systems to Data Warehouse.
- Worked on analyzingHadoopcluster and different Big Data analytic tools including Pig, Hbase database and Sqoop.
- Implemented nine nodes CDH3Hadoopcluster on Red hat LINUX
- Worked on Mahout and Python's stats models for running similar statistical analysis, as being done by SAS.