Sr. Hadoop/big Data Developer Resume
Chicago, IL
SUMMARY:
- Over 8+ years of IT experience in analysis, design, and development usingBigdata - Hadoop Technology like AWS, Cloudera, Hortonworks, Hive, Impala, Sqoop, Ozzie, Kafka, HBase, Avro, MapReduce, DataProcessing&DataLakeArchitectural,JavaJ2EE, SQL.ApacheHadoop components such as PIG, Scala, Spark, Impala, OOZIE, Flume, HCatalog.
- Proficient in Oracle Packages, Procedures, Functions, Trigger, Views, SQLLoader, PerformanceTuning, UNIX ShellScripting, DataArchitecture.
- Proficient in ProjectManagement, Project Technical Delivery, Estimation&Planning, Coding&Execution&StatusReporting activities.
- Proficient in Requirement Analysis which includes AS-IS process &TO-BE process and documentation of all the business processes, technical requirements, and as a whole the entire project.
- FLUME in collecting the data and populate Hadoop.
- Architected, Designed and Developed Big Data solutions for various implementations.
- Worked on Data Modelling using various ML (MachineLearningAlgorithms) via R and Python (Graphlab) Worked on Programming Languages like CoreJava and Scala.
- Worked on HBase in conducting the quick look ups such as updates, inserts, and deletes in Hadoop.
- Experience in Datamodeling, complexdatastructures, Dataprocessing, Dataquality, Datalifecycle.
- Knowledge on administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig.
- A very good experience in developing and deploying the applications using Weblogic, ApacheTomcat, and JBoss.
- Strong Experience withSQL, PL/SQL, and the database concepts.
- Experience on NoSQLDatabases such as Hbase and Casandra.
- A very good understanding of job workflow scheduling and monitoring tools like Oozie and ControlM.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUMEand SQOOP, perform structural modifications using Map-Reduce, HIVE and analyze data using visualization/reporting tools.
- Good knowledge on YARN (MRV2) architecture.
- Experience in running Map-Reduce and Sparkjobs over YARN.
- Hands-on experience in complete project life cycle (design, development, testing, and implementation) of ClientServer and Webapplications
- Participated in design reviews, code reviews, unit testing and integration testing.
- Worked on HDFS, NAMENODE, JOBTRACKER, DATANODE, TASKTRACKER and the Map-Reduce concepts.
- Experience in writing UNIX shell scripting.
- Worked on analyzing the data using HiveQL, PigLatin, and the Map-Reduce programs in Java.
- Worked on writing custom UDF'S in extending Hive and Pig core functionality.
- Worked on managing and reviewing Hadoop log files.
- Worked on Sqoop, in moving the data from a relational database into Hadoop and used
- Hands on experience and Good Knowledge on real time data feeding platform-KAFKA, integration software like Talend,
- Good Knowledge on NOSQL databases like MongoDB, HBase, and Cassandra.
- Experience in using Sqoop to import data and export data into HDFS from RDBMS.
- Solid understanding of open source monitoring tool: ClouderaManager.
- Experience in using and understanding of Pig, Hive and HBase and Hive Built-in functions and Hive partitioning, bucketing and perform different types of joins on Hive tables and HDFSDesigns, Daemons, HDFS high availability (HA).
- Familiar with data warehousing and ETL tools like Informatica.
- Familiar with Core Java with astrong understanding and working knowledge of Object Oriented Concepts like Collections, Multithreading, DataStructures, Algorithms, JSP, Servlets, Multi-Threading, JDBC, HTML.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent, resultorientedwith problem-solving as well maintaining the leadership skills and ability to work well with people andto maintain a good relationship with theorganization
PROFESSIONAL EXPERIENCE:
Sr. Hadoop/Big Data Developer
Confidential - Chicago, IL
Responsibilities:
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Installed and configured HadoopMapReduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Worked on Installing and configuring the HDPHortonWork2.X and Cloudera (CDH 5.5.1) Clusters in Dev and Production Environments.
- Volumetric Analysis for 43 feeds (CurrentApproximate Size of Data (70TB), Based on which size of ProductionCluster is to be decided.
- Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data ProcessingLayer.
- Worked on Capacity planning for the ProductionCluster.
- Installed HUE Browser.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hivetables, loading with data and writing hive queries which will run internally in MapReduce way.
- Worked on Installation of HORTONWORKS 2.1 in AZURELinuxServers.
- Worked on cluster upgradation in Hadoop from HDP2.1 to HDP 2.3.
- Responsible for implementation and ongoing administration of Hadoop infrastructure
- Managed and reviewed Hadoop log files.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Worked on indexing the HBase tables using Solr and indexing the Json data and Nested data.
- Responsible for Clustermaintenance, Monitoring, commissioning and decommissioningDatanodes, Troubleshooting, Manage and review data backups, Manage&review log files.
- Day to day responsibilities includes solving developer issues, deployments moving code from one environment to another environment, providing access to new users and providing instant solutions to reduce the impact and documenting the same and preventing future issues.
- Adding/installation of new components and removal of them through Ambari.
- Collaborating with application teams to install theoperating system and Hadoopupdates, patches, version upgrades.
- Monitored workload, job performance, and capacity planning
- Involved in Analyzingsystemfailures, identifyingrootcauses, and recommended acourse of actions.
- Creating and deploying a corresponding SolrCloud collection.
- Creating collections and configurations, Register a LilyHBase Indexer configuration with the Lily HBase Indexer Service.
Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Cloudera, Lily HBase, Cron.
Hadoop Developer
Confidential
Responsibilities:
- Driving Digital Products in the bank for IOT for campaigning system, Blockchain for payment and Trading etc.
- Defining Architecture Standards, Bigdata Principles, and PADS across Program and usage of VP for Modelling.
- Developed pigscripts to transformdata and loaded intoHBase tables.
- Developed Hive scripts for implementing dynamic partitions
- Created Hive snapshot tables and HiveORC tables from Hive tables.
- In the Data ProcessingLayer data is finally stored in Hive Tables in ORC file format using SparkSQL. In this layer logic for maintaining SCD type2 is implemented for non-transactional incremental feeds.
- Development of a Ruleengine which would further add columns to existing data based on certain BusinessRules specified by Reference Data provided by Business.
- Optimized hive joins for large tables and developed map reduce code for the full outer join of two large tables.
- Used spark to parse XML files and extract values from tags and load it into multiple hive tables using map classes.
- Experience in using HDFS and My SQL and deployed HBase integration to perform OLAP operations on HBase data.
- Involved in creating Hivetables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used TalendBigDataOpenStudio5.6.2 to create framework for executing extract framework
- Monitored workload, job performance, and capacity planning.
- Implemented partitioning and bucketing techniques in Hive
- Used different bigdata components in Talend like throw, thiveInput, tHDFSCopy, tHDFSput, tHDFSGet, tMap, tdenormalize, tFlowtoIterate etc.,
- Scheduled different talend jobs using TAC (TalendAdminConsole).
- Worked on evaluation and analysis of Hadoop cluster and different big data analytic tools like HBase. Developed MapReduce programs to perform data filtering for unstructured data.
- Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
- Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequencefiles, XML, JSON, and Parquet.
- Created multi-stage Map-Reduce jobs in Java for ad-hoc purposes
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions
- Adding/installation of new components and removal of them through Ambari.
- Collaborating with application teams to install theoperating system and Hadoopupdates, patches, versionupgrades.
Environment: Hadoop, MapReduce, TAC, HDFS, HBase, HDP Horton, Sqoop, SparkSQL, Hive ORC, Data Processing Layer, HUE, AZURE, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron, JSON, XML, Parquet.
Hadoop Developer
Confidential - Atlanta, GA
Responsibilities:
- Worked with Data Governance, Data quality, data lineage, Data architect to design various models and processes.
- Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, spacetime.
- Implemented end-to-end systems for DataAnalytics, DataAutomation and integrated with custom visualization tools using R, Mahout, Hadoop, and MongoDB.
- Involvement in Test data preparation using BlackboxtestingTechniques (Like BVA, ECP.)
- Gathering all the data that is required from multiple data sources and creating datasets that will be used in the analysis.
- Performed Exploratory DataAnalysis and DataVisualizations using R, and Tableau.
- Perform a proper EDA, Univariate and bivariateanalysis to understand the intrinsic effect/combined effects.
- Developed, Implemented & Maintained the Conceptual, Logical&Physical Data Models using Erwin for Forwarding/ReverseEngineered Databases.
- Independently coded new programs and designed Tables to load and test the program effectively for the given POC's using with Big Data/Hadoop.
- Designed data models and data flow diagrams using Erwin and MSVisio.
- As an Architect implemented MDM hub to provide clean, consistent data for an SOA implementation.
- Coded R functions to interface with CaffeDeepLearning Framework
- Performed datacleaning and imputation of missing values using R.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce
- Take up ad-hoc requests based on different departments and locations
- Used Hive to store the data and perform datacleaning steps for huge datasets.
- Created dash boards and visualization on regular basis using ggplot2 and Tableau
- Creating customized business reports and sharing insights to the management
- Worked with BTEQ to submit SQL statements, import and export data, and generate reports in Teradata.
- Working in AmazonWebServices cloud computing environment
- Wrote complex HIVEQL and pig scripts queries that will pull data as per the requirement, to perform data validation against report o/p.
- Come up with data load and security strategies and workflow designs with the help of administrators and other developers.
- UsedTableau to automatically generate reports.
- Worked with partially adjudicated insurance flat files, internalrecords, 3rdparty data sources, JSON, XML and more.
- Developed automated workflow to schedule the jobs using Oozie
- Developed a technique to incrementally update HIVE tables (a feature currently not supported by HIVE).
- Created metrics and executed unit tests on input, output and intermediate data
- Lead the testing team and meetings with onshore for requirement gathering
- Assist the team in creating documents that entail the process involved in cluster set up • Established Data architecturestrategy, bestpractices, standards, and roadmaps.
- Lead the development and presentation of a dataanalytics data-hub prototype with the help of the other members of the emerging solutions team
- Interacted with the other departments to understand and identify dataneeds and requirements and work with other members of the ITorganization to deliver data visualization and reportingsolutions to address those needs.
Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.
Hadoop Developer
Confidential - Elgin, IL
Responsibilities:
- Developed multiple Map-Reduce jobs in java for data cleaning and preprocessing.
- Performed Map Reduce Programs those are running on the cluster.
- Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Configured Hadoop cluster with Namenode and slaves and formatted HDFS.
- Performed Importing and exporting data from Oracle to HDFS and Hive using Sqoop
- Performed source data ingestion, cleansing, and transformation in Hadoop.
- Supported Map-Reduce Programs running on the cluster.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive queries for Analysis across different banners.
- Extracted data from Twitter using Java and TwitterAPI. Parsed JSON formatted twitter data and uploaded to thedatabase.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Worked on improving the performance of existing Pig and Hive Queries.
- Involved in developing HiveUDFs and reused in some other requirements. Worked on performing Join operations.
- Developed Serde classes.
- Develop histograms using R.
- Developed fingerprinting rules on HIVE which help in uniquely identifying a driver profile
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Launching Amazon EC2 Cloud Instances using AmazonImages (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.
- Exported the result set from Hive to MySQL using Sqoop after processing the data.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Used Hive to partition and bucket data.
Environment: Hadoop, MapReduce, HDFS, HBase, HDP Horton, Sqoop, Data Processing Layer, HUE, AZURE, Erwin, MS Visio, Tableau, SQL, MongoDB, Oozie, UNIX, MySQL, RDBMS, Ambari, Solr Cloud, Lily HBase, Cron.
JAVA Developer
Confidential
Responsibilities:
- Worked in Java, JDBC, My SQL, Jsp, Java Script, AJAX.
- Involved in design (UML), development, coding, validation & verification of Purchase Order subsystem.
- Involved in developing the Delivery Challan and Sales Invoice module.
- Involved in designing Admin front page
- Designing Login screen of front page.
- Developing Junits and testing.
Environment: Java, Servlets, Jdbc, JSP, Java Script, My SQL, Tomcat, Windows NT.
JAVA Developer
Confidential
Responsibilities:
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
- Prepared the High and Low-level design document and Generating Digital Signature
- For analysis and design of application created Use Cases, Class and Sequence Diagrams.
- For the registration and validation of the enrolling customer developed logic and code.
- Developed web-based user interfaces using struts frame work.
- Handled Client-side Validations used JavaScript and • Involved in integration of various Struts actions in the framework.
- Used Validation Framework for Server-side Validations
- Created test cases for the Unit and Integration testing.
- Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
Environment: Java Servlets, JSP, Java Script, XML, HTML, UML, Apache Tomcat, JDBC, Oracle, SQL.