Sr. Big Data Engineer Resume
TexaS
SUMMARY:
- Big Data Engineer - 8+ years of overall IT experience and over 4 years of experience in Big Data technologies, designing and implementing large-scale Data warehousing solutions through Hadoop and Spark cluster-computing framework.
- Experience with distributed systems, large-scale relational data stores, MapReduce systems, data modelling and big data systems.
- Experience in spark SQL and spark Dataframes, Experience in AWS S3 storage
- Experience in creating, managing Hive external and managed tables
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Experience in UNIX shell scripting through writing driver scripts to invoke Hive, Pig and HBase.
- Experience in writing mapreduce programs in Java and Python.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
- Created HBase tables to load large sets of structured data. Experience in Hbase table splits and storing data in split regions. Experience in manipulating Hbase table data through Pig scripts.
- Created PIG script jobs in maintaining minimal query optimization.
- Managed and reviewed Hadoop logs and generate reports on the data.
- Experience in optimal configuration of Hive, Pig, HBase and Hadoop Streaming jar.
- Experience working in Oracle, DB2, Snowflake and My SQL database.
- Hands on experience in using MapReduce programming model for Batch processing of data stored on HDFS.
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners,and partitioners. Experience in handling different file formats like CSV, Sequence, AVRO, JSON and PARQUET files and different compression Techniques which include Gzip, Bzip2,and Snappy.
- Experience in different Hadoop distributions include Cloudera and IBM BigInsights.
- Experience in developing end to end frameworks for ETL/ELT in Java and Spark to handle huge volume of data.
- Experience in advance Complex reporting in Business Objects and Data Warehousing Concepts.
- Building intuitive reports leveraging various business object reporting tools like, WebI Reports, Crystal Reports and Universes, Dashboards, IDT.
- Experience in RDBMS concepts and constructs along with Database Objects creation such as Tables, User Defined Data Types, Indexes, Stored Procedures, Views, User Defined Functions.
- Self-motivated quick learner and excellent team player, ability and work under pressure.
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review. Experience in working Agile Projects.
- Excellent client-facing, negotiation & conflict resolution skills; a highly motivated self-starter and team-player interacting effectively with stakeholders to translate business requirements into IT deliverable.
TECHNICAL SKILLS:
Big Data Tech: Hadoop, YARN, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Hadoop Streaming, Spark Dataframes, Spark SQL, Spark Streaming, Kafka
Hadoop distribution: Cloudera CDH 4, IBM BigInsights
Cloud Technologies: Amazon Web Services S3
Scripting: UNIX shell, Pig Latin, Python
Language: Java, J2EE
Database: MySQL, MS SQL, Oracle, DB2, Netezza, Maria DB
NoSQL Database: HBase
Data Warehousing tools: IBM DataStage, SAP Business Objects
Support tools: Putty, WinSCP, IBM Rational Team Concert
Web Technologies: HTML, CSS, Javascript, Angular JS
Web Servers: Apache Tomcat
Development Tools: Eclipse, IntelliJ, PyCharm, SQL Management Studio, MySQL Workbench, Notepad++, HUE
Versioning Tool: GIT
Operating System: Windows, Mac, Linux
SDLC: Waterfall, Test driven development, Agile
PROFESSIONAL EXPERIENCE:
Confidential,Texas
Sr. Big data engineer
Responsibilities:- Involve in requirements gathering, design the high and low level of big data system that can support decision making for loan servicing finance team.
- Use technologies like Spark, Java, Scala, Amazon web service (AWS) S3. etc. to create a framework that can quality check the source data, a framework that can perform slowly changing dimension on the data and load to AWS S3 cloud storage.
- Use technologies like Shell scripting, Python, Snowflake database to create a framework that can load the data from AWS S3 to Snowflake database, a utility that can schedule the end to end jobs synchronously.
- Used Spark SQL to process the huge amount of structured data. Used Utility - DistCp extensively to write files from HDFS to AWS S3.
- Involve in unit testing throughout the development phase and involve in integration and system testing the big data system end to end. Involve in building business intelligence reports and dashboards on snowflake database using Tableau.
- Big data technology mentor and coach for project team members.
- Implemented continuous integration & deployment (CICD) through Jenkins forHadoop jobs.
- Support the production migration activities and user acceptance testing.
Technologies: Spark SQL and Dataframes, AWS S3, Java, Python, UNIX, Snowflake Database, Elastic MapReduce, Oracle, HDFS, Eclipse
Confidential,New York
Sr. Hadoop developer
Responsibilities:- Worked with business analyst team for gathering requirements
- Designed and deployed hadoop cluster and different big data tools including Hive, Pig, Hbase, Sqoop.
- Imported and exported data using Sqoop from HDFS to RDBMS - Oracle, DB2 and MSSQL.
- Built Hive external and managed tables to store historical data.
- Fine-tuned hive tables performance by overriding the map reduce properties.
- Worked on creating, maintaining and optimization of Hive tables through Partition and Bucketing
- Developed Pig script to analyse and process the data stored in hive.
- Analyzed large data sets using hive queries and Pig Scripts.
- Fine-tuned Pig scripts by overriding map reduce properties.
- Created map reduce programs for requirements that are not met by scripting
- Involved developing Java API and Spark-SQL for analysis of data in Spark
- Involved in spark and spark streaming in creating RDDs, applying transformations and storage.
- Worked on UNIX through writing UNIX shell script driving codes for invoking Hive queries, Pig Latin scripts and hadoop streaming jar.
Technologies: Hive, Pig, Sqoop, Hbase, Spark, Python, UNIX, Java, Spring MVC, Maria DB
Confidential,Texas
Sr. Hadoop developer
Responsibilities:
- Imported and exported data using Sqoop from HDFS to RDBMS - Oracle, DB2 and MSSQL.
- Built Hive external and managed tables to store historical data.
- Developed Pig script to analyse and process the data stored in hive.
- Developed complex Hive queries to join multiple tables.
- Analyzed large data sets using hive queries and Pig Scripts. Used Hadoop streaming jar for parallel processing of XML files and store it to Hadoop.
- Have experience on setting the map reduce configurations through Hadoop streaming jar, Pig and Hive.
- Manipulated huge files by bringing into the current working directory where map reduce program runs
- Written user defined functions in java and used them in Pig Latin scripts.
- Worked on UNIX through writing UNIX shell script driving codes for invoking Hive queries, Pig Latin scripts and hadoop streaming jar.
- Involved in handling XML files, JSON formatted files. Involved in scheduling driving code in BMC control M scheduler.
Technologies: Hive, Pig, Sqoop, Python, UNIX, Hadoop streaming jar
Confidential,Texas
Hadoop developer
Responsibilities:- Involved in creating and configuring Hbase table.
- Experience in Hbase region splits and managing data across different split regions.
- Written pig scripts to extract, transform and load on Hbase tables.
- Written Python mapper and reducer script to load run control tables for files sent by vendor.
- Used Hadoop streaming jar for parallel processing of vendor files which is huge.
- Imported and exported data using Sqoop from HDFS to RDBMS - Oracle, DB2 and MSSQL. Built Hive external and managed tables to store historical data.
- Developed Pig script to analyse and process the data stored in hive. Analyzed large data sets using hive queries and Pig Scripts. Developed complex Hive queries to join multiple tables.
- Used Hadoop streaming jar for parallel processing of XML files and store it to Hadoop.
- Written user defined functions in java and used them in Pig Latin scripts.
Technologies: HBase, UNIX, Python, Hadoop Streaming jar
Confidential
Sr. BI Developer
Responsibilities:- Designed universe, Created Contexts and Aliases for resolving Loops in the Universes.
- Create complex reports in Web intelligence using advanced functionalities like report level prompts, scope of analysis and ranking.
- Using Xcelsius created interactive dashboards using different types of selectors with dynamic visibility and alert features and created hyper linking to explore in-depth financial analysis.
- Created dashboards with different components like charts, alerts, gauges, tab sets for showing tendency over a period of time (Weekly/Monthly/Quarterly/6 months/Yearly) using Xcelsius.
- Created Crystal reports by connecting to the existing BO universes and SQL database.
Tools: SAP BO Universe, WebI, Dashboards, Crystal reports
Confidential
Sr. BI Developer
Responsibilities:- Designed universe, Created Contexts and Aliases for resolving Loops in the Universes.
- Create complex reports in Web intelligence using advanced functionalities like report level prompts, scope of analysis and ranking.
- Using Xcelsius created interactive dashboards using different types of selectors with dynamic visibility and alert features and created hyper linking to explore in-depth financial analysis.
- Created dashboards with different components like charts, alerts, gauges, tab sets for showing tendency over a period of time (Weekly/Monthly/Quarterly/6 months/Yearly) using Xcelsius.
- Created Crystal reports by connecting to the existing BO universes and SQL database.
Tools: SAP BO Universe, WebI, Dashboards, Crystal reports