Hadoop/big Data Developer Resume
Indianapolis, IN
SUMMARY:
- Over 8+ years of strong experience in the IT industry that includes 4 years as a Hadoop and Spark Developer in domains like financial services and Healthcare. Maintained positive communications and working relationship at all levels. An enthusiastic and goal - oriented team player possessing excellent communication, interpersonal skills with good work ethics.
- Expertise in Hadoop eco system components HDFS, MapReduce, Yarn, HBase, Pig, Sqoop, Flume and Hive for scalability, distributed computing and high-performance computing.
- Experience in using Hive Query Language and Spark for data Analytics .
- Experienced in Installing, Maintaining and Configuring Hadoop Cluster .
- Expertise in using various Hadoop infrastructures such as MapReduce, Pig, Hive, ZooKeeper, HBase, Sqoop, Oozie, Flume and spark for data storage and analysis.
- Experience in developing customUDFs for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HQLHiveQL and Used UDFs from Piggybank UDF Repository.
- Strong knowledge on creating and monitoring Hadoop clusters on VM, HortonWorks Data Platform 2.1 & 2.2, CDH5 Cloudera Manager, HDP on Linux, Ubuntu etc.
- Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
- Supported Web Sphere Application Server WPS, IBM HTTP/ Apache Web Servers in Linux environment for various projects.
- Having good working Knowledge on Map Reduce Framework.
- Strong knowledge in NOSQL column oriented databases like HBase and its integration with Hadoop cluster .
- Using Build tools like Maven to build projects.
- Experience in developing customUDFs for Pig and Hive to in corporate methods and functionality of Python/Java into PigLatin and HQLHiveQL and Used UDFs from Piggybank UDF Repository.
- Good knowledge on Kafka, Active MQ and Spark Streaming for handling Streaming Data.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Analyse data, interpret results and convey findings in a concise and professional manner
- Partner with Data Infrastructure team and business owners to implement new data sources and ensure consistent definitions are used in reporting and analytics
- Promote full cycle approach including request analysis, creating/pulling dataset, report creation and implementation and providing final analysis to the requestor
- Good Exposure on Data Modelling, Data Profiling, Data Analysis, Validation and Metadata Management.
- Flexible with Unix/Linux and Windows Environments working with Operating Systems like Cent OS 5/6, Ubuntu 13/14.
- Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Working on different file formats like JSON, XML, CSV, XLS etc.
- Using Amazon AWSEMR and EC2 for cloud big data processing.
- Experience in Version Control Tools like Github.
- Good experience in Generating Statistics and reports from the Hadoop.
- Have sound knowledge on designing ETL applications with using Tools like Talend.
- Experience in working with job scheduler like Oozie.
- Strong in databases like MySQL, Teradata, Oracle, MS SQL.
- Strong understanding of Agile and Waterfall SDLC methodologies.
- Strong communication, collaboration & team building skills with proficiency at grasping new Technical concepts quickly and utilizing them in a productive manner.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, MapReduce, HBase, Hive, Pig, Impala, SQOOP, Flume, OOZIE, Spark, SparkQL, and Zookeeper, AWS, Cloudera, Horton works, Kafka, Avro, and BigQuery.
Languages: Core Java, XML, HTML and HiveQL.
J2EE Technologies: Servlets, JSP, JMS, JSTL, AJAX, DOJO, JSON and Blaze DS.
Frameworks: Spring 2, Struts 2 and Hibernate 3.
XML Processing: JAXB
Reporting Tools: BIRT 2.2.
Application & Web Services: WebSphere 6.0, JBoss 4.X and Tomcat 5.
Scripting Languages: Java Script, Angular JS, Pig Latin, Python 2.7and Scala.
Database (SQL/No SQL): Oracle 9i, SQL Server 2005, MySQL, HBase and Mongo DB 2.2
IDE: Eclipse and Edit plus.
PM Tools: MS MPP, Risk Management, ESA.
Other Tools: SVN, Apache Ant, Junit and Star UML, TOAD, Pl/SQL Developer, Perforce, JIRA, Bugzilla, Visual Source, QC, Agile Methodology.
EAI Tools: TIBCO 5.6.
Bug Tracking/ Ticketing: Mercury Quality Center and Service Now.
Operating System: Windows 98/2000, Linux /Unix and Mac.
PROFESSIONAL EXPERIENCE:
Confidential, Indianapolis, IN
Hadoop/Big Data Developer
Responsibilities:
- Worked on HadoopStack, ETLTOOLS like TALEND, Reporting tools like Tableau and Security like Kerberos, User provisioning with LDAP and lot of other BigDatatechnologies for multiple use cases.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
- Worked with the Data Science team to gather requirements for various data mining projects.
- Here I have installed 5 Hadoop clusters for different teams, we have developed a Data lake which serves as a Base layer to store and do analytics for Developers, we provide services to developers, install their custom software’s, upgrade Hadoop components, solve their issues, and help them troubleshooting their long running jobs, we are L3 and L4 support for the Datalike, and I also manage clusters for other teams.
- Building automation frameworks for data ingestion, processing in Python, and Scala with NoSQL and SQL databases and Chef, Puppet, Kibana, ElasticSearch, Tableau, GoCD, Redhat infrastructure for data ingestion, processing, and storage.
- Im a mix of Devops and Hadoop admin here, and work on L3 issues and installing new components as the requirements comes and did as much automation and implemented CI /CD Model.
- Involved in implementing security on HortonworksHadoop Cluster using with Kerberos by working along with operations team to move non-secured cluster to secured cluster.
- Responsible for upgrading HortonworksHadoopHDP2 . 2.0 and MapReduce2.0 with YARN in Multi Clustered Node environment. Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS . Hadoop security setup using MIT Kerberos, AD integration( LDAP ) and Sentry authorization.
- Migrated services from a managed hosting environment to AWS including: service design, network layout, data migration, automation, monitoring, deployments and cutover, documentation, overall plan, cost analysis, and timeline.
- Used R for an effective data handling and storage facility,
- Managing Amazon Web Services ( AWS ) infrastructure with automation and configuration management tools such as Chef, Ansible, Puppet, or custom-built. Designing cloud-hosted solutions, specific AWS product suite experience.
- Performed a Major upgrade in production environment from HDP1.3 to HDP2.2 . As an admin followed standard Back up policies to make sure the high availability of cluster.
- Monitored multiple Hadoop clusters environments using Ganglia and Nagios . Monitored workload, job performance and capacity planning using Ambari . Installed and configured Hortonworks and Cloudera distributions on single node clusters for POCs .
- Created Teradata Database Macros for Application Developers which assist them to conduct performance and space analysis, as well as object dependency analysis on the Teradata database platforms
- Implementing a Continuous Delivery framework using Jenkins, Puppet, Maven,& Nexus in Linux environment. Integration of Maven/Nexus, Jenkins, Urban Code Deploy with Patterns / Release, Git, Confluence, Jira and Cloud Foundry.
- Defined Chef Server and workstation to manage and configure nodes.
- Experience in setting up the chef repo, chef work stations and chef nodes.
- Involved in running Hadoop jobs for processing millions of records of text data. Troubleshoot the build issue during the Jenkins build process. Implement Docker to create containers for Tomcat Servers, Jenkins.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hortonworks Hadoop, Cassandra, Flat files, Oracle 11g/10g, MySQL, Toad 9.6, Windows NT, Sqoop, Hive, Oozie, Ambari, SAS, SPSS, Unix Shell Scripts, Zoo Keeper, SQL, Map Reduce, Pig.
Confidential
Hadoop/Big Data Developer
Responsibilities:
- Driving Digital Products in the bank for IOT for campaigning system, Blockchain for payment and Trading etc.
- Defining Architecture Standards, Bigdata Principles,and PADS across Program and usage of VP for Modelling.
- Developed pigscripts to transformdata and loaded intoHBase tables.
- Developed Hive scripts for implementing dynamic partitions
- Created Hive snapshot tables and HiveORC tables from Hive tables.
- In the Data ProcessingLayer data is finally stored in Hive Tables in ORC file format using SparkSQL, in this layer logic for maintaining SCDtype2 is implemented for non-transactionalincremental feeds.
- Development of a Ruleengine which would further add columns to existing data based on certain BusinessRules specified by Reference Data provided by Business.
- Optimized hive joins for large tables and developed map reduce code for the full outer join of two large tables.
- Used spark to parse XML files and extract values from tags and load it into multiple hive tables using map classes.
- Experience in using HDFS and My SQL and deployed HBase integration to perform OLAP operations on HBase data.
- Involved in creating Hivetables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used TalendBigDataOpenStudio5.6.2 to create framework for executing extract framework
- Monitored workload, job performance, and capacity planning.
- Implemented partitioning and bucketing techniques in Hive
- Used different bigdata components in Talend like throw, thiveInput, tHDFSCopy, tHDFSput, tHDFSGet, tMap, tdenormalize, tFlowtoIterate etc.,
- Scheduled different talend jobs using TAC (TalendAdminConsole).
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like HBase. Developed MapReduce programs to perform data filtering for unstructured data.
- Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
- Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequencefiles, XML, JSON, and Parquet.
- Created multi-stage Map-Reduce jobs in Java for ad-hoc purposes
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions
- Adding/installation of new components and removal of them through Ambari.
- Collaborating with application teams to install theoperating system and Hadoopupdates, patches, versionupgrades.
Environment: Hadoop, MapReduce,TAC, HDFS, HBase, HDP Horton, Sqoop, SparkSQL, Hive ORC, Data Processing Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron, JSON, XML, Parquet.
Confidential, Texas, Dallas.
Hadoop/Big Data Developer
Responsibilities:
- Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R,Mahout, Hadoop, and MongoDB.
- Involvement in Test data preparation using Blackbox testingTechniques (Like BVA, ECP .)
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
- Performed Exploratory DataAnalysis and DataVisualizations using R, and Tableau.
- Perform a proper EDA, Univariate and bivariateanalysis to understand the intrinsic effect/combined effects.
- Developed, Implemented & Maintained the Conceptual, Logical&Physical Data Models using Erwin for Forwarding/ReverseEngineered Databases.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
- Coded R functions to interface with CaffeDeepLearning Framework
- Performed datacleaning and imputation of missing values using R.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau
- Creating customized business reports and sharing insights to the management
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Working in AmazonWebServices cloud computing environment
- Wrote complex HIVEQL and pig scripts queries that will pull data as per the requirement, to perform data validation against report o/p.
- Come up with data load and security strategies and workflow designs with the help of administrators and other developers.
- UsedTableau to automatically generate reports.
- Worked with partially adjudicated insurance flat files, internalrecords, 3rdparty data sources, JSON, XML and more.
- Developed automated workflow to schedule the jobs using Oozie
- Developed a technique to incrementally update HIVE tables (a feature currently not supported by HIVE).
- Created metrics and executed unit tests on input, output and intermediate data
- Lead the testing team and meetings with onshore for requirement gathering
- Assist the team in creating documents that entail the process involved in cluster set up
- Established Data architecturestrategy, bestpractices, standards, and roadmaps.
- Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team.
- Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs.
Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie,UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.
Confidential, Burlinton, MA.
Hadoop/Big Data Developer.
Responsibilities:
- Developed multiple Map-Reduce jobs in java for data cleaning and preprocessing.
- Performed Map Reduce Programs those are running on the cluster.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Configured Hadoop cluster with Namenode and slaves and formatted HDFS.
- Performed Importing and exporting data from Oracle to HDFS and Hive using Sqoop
- Performed source data ingestion, cleansing, and transformation in Hadoop.
- Supported Map-Reduce Programs running on the cluster.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive queries for Analysis across different banners.
- Extracted data from Twitter using Java and TwitterAPI. Parsed JSON formatted twitter data and uploaded to thedatabase.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Worked on improving the performance of existing Pig and Hive Queries.
- Involved in developing HiveUDFs and reused in some other requirements. Worked on performing Join operations.
- Developed Serde classes.
- Develop histograms using R.
- Developed fingerprinting rules on HIVE which help in uniquely identifying a driver profile
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Launching Amazon EC2 Cloud Instances using AmazonImages (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Used Hive to partition and bucket data.
Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie,UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.
Confidential
Data Analyst.
Responsibilities:
- Developed code to pre-process large sets of various types of file formats such as Text, Avro, SequenceFiles, XML, JSON, and Parquet.
- Extracting data from Oracle and Flat file, Excel files sources and performed complex Joiner, Expression , Aggregate , Lookup , Storedprocedure , Filter , RouterTransformations and UpdateStrategyTransformations to extract and load data into the target systems.
- Identified data source systems integration issues and proposing feasible integration solutions.
- Partnered with Business Users and DWDesigners to understand the processes of Development Methodology, and then implement the ideas in Development accordingly.
- Worked with Data modeler indeveloping STAR Schemas and Snowflake schemas.
- Created reusable MailingAlerts , Events , Tasks , Sessions , ReusableWorklets and Workflows in Workflowmanager .
- Created Oracle PL/SQL queries and Stored Procedures , Packages , Triggers , Cursors and backup - recovery for the various tables.
- Extensively used TOAD for source and target database activities
- Generated simple reports from the data marts using BusinessObjects .
- Identifying and tracking the SlowlyChangingDimensions ( SCD ).
- Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files .
- Loaded data from UNIX file system to HDFS and written HiveUserDefinedFunctions
- Created multi-stage Map-Reduce jobs in Java for ad-hoc purposes.
- Scheduled the workflows at aspecified frequency according to the business requirements and monitored the workflows using Workflow Monitor.
Environment: MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie,UNIX, MySQL, RDBMS, Ambari, Solr Cloud, PL/SQL, TOAD, Java.
Confidential
Data Analyst.
Responsibilities:
- Involved in writing DetailDesignDocuments with UML Specifications.
- Designed and developed Struts like MVC2 Web framework using the front-controller design pattern, which is used successfully in many production systems.
- Maintained records in Excel Spread Sheet and exported data into SQLServerDatabase using SQLServerIntegrationServices (SSIS).
- Experience in providing Logging, Error handling by using EventHandler, and CustomLogging for SSIS Packages.
- Resolved product complications at customer sites and funneled the insights to the development and deployment teams to adopt long term product development strategy with minimal roadblocks.
- Developed XML parser for File parsing.
- Spearheaded the “QuickWins” project by working very closely with the business and end users to improve the current website’s ranking from being 23rd to 6th in just 3 months.
- Normalized Oracledatabase, conforming to design concepts and best practices.
- Involved in unit testing and system testing and also responsible for preparing test scripts for the system testing.
- Responsible for packaging and deploying components into the WebSphere.
- Developed backend components, DB Scripts for the backend communication.
- Responsible for performance tuning of the product and eliminating memory leakages in the product.
- Applied design patterns and OO design concepts to improve the existing Java/JEE based code base.
- Identified and fixed transactional issues due to incorrect exception handling and concurrency issues due toanunsynchronized block of code.
Environment: Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie,UNIX, MySQL, RDBMS, Ambari, Solr Cloud, PL/SQL, TOAD, Java.